Menu

  • Home
  • Work
    • AI
    • Cloud
      • Virtualization
      • IaaS
      • PaaS
    • Architecture
    • BigData
    • Python
    • Java
    • Go
    • C
    • C++
    • JavaScript
    • PHP
    • Others
      • Assembly
      • Ruby
      • Perl
      • Lua
      • Rust
      • XML
      • Network
      • IoT
      • GIS
      • Algorithm
      • Math
      • RE
      • Graphic
    • OS
      • Linux
      • Windows
      • Mac OS X
    • Database
      • MySQL
      • Oracle
    • Mobile
      • Android
      • IOS
    • Web
      • HTML
      • CSS
  • Life
    • Cooking
    • Travel
    • Gardening
  • Gallery
  • Video
  • Music
  • Essay
  • Home
  • Work
    • AI
    • Cloud
      • Virtualization
      • IaaS
      • PaaS
    • Architecture
    • BigData
    • Python
    • Java
    • Go
    • C
    • C++
    • JavaScript
    • PHP
    • Others
      • Assembly
      • Ruby
      • Perl
      • Lua
      • Rust
      • XML
      • Network
      • IoT
      • GIS
      • Algorithm
      • Math
      • RE
      • Graphic
    • OS
      • Linux
      • Windows
      • Mac OS X
    • Database
      • MySQL
      • Oracle
    • Mobile
      • Android
      • IOS
    • Web
      • HTML
      • CSS
  • Life
    • Cooking
    • Travel
    • Gardening
  • Gallery
  • Video
  • Music
  • Essay

Kubernetes Migration

27
Dec
2022

Kubernetes Migration

By Alex
/ in PaaS
/ tags K8S
0 Comments

Migrating a Kubernetes cluster from one cloud provider to another usually breaks into three separate problems: moving Kubernetes resources, moving the data attached to workloads, and moving the container images those workloads depend on.

  1. Kubernetes resource migration
  2. Persistent volume migration
  3. Container image migration

Kubernetes resources and persistent volumes can be handled with Velero. Image registry migration is simpler in most cases. Common open source options include Alibaba Cloud's image-syncer and Tencent Cloud's image-transfer.

Velero
Overview

Velero is an open source backup and restore system built for Kubernetes. A common cross-cloud migration pattern is to back up the source cluster and restore that backup into the target cluster.

Velero consists of two parts:

  1. Server-side components running inside the Kubernetes clusters being backed up or restored
  2. A CLI client

The server side is a collection of controllers that watch Velero custom resources for backup and restore operations. The CLI mostly saves you from writing those custom resources by hand.

Notable newer capabilities

Compared with the version we reviewed in the earlier note on Kubernetes failure detection and self-healing, Velero has added several capabilities that matter in real migrations:

  1. ReadWriteMany volumes are no longer backed up repeatedly.
  2. Cloud provider plugins have been split out from the core Velero repository.
  3. Restic-based persistent volume backups are always incremental, even when Pods move.
  4. Namespace cloning can automatically clone the related persistent volumes.
  5. CSI-backed persistent volumes are supported, including the mainstream AWS, Azure, and GCP cases.
  6. Backup and restore progress reporting is supported.
  7. Velero can back up all API versions of a resource.
  8. Volume backup through Restic can be enabled by default with --default-volumes-to-restic.
  9. restoreStatus can be used to control which resource status fields are restored.
  10. --existing-resource-policy can change restore behavior when a resource already exists. The default is to skip existing resources, except for ServiceAccounts. Setting it to update makes Velero update existing resources instead.
  11. Since 1.10, Velero supports Kopia as an alternative to Restic. Kopia often performs better on large backup sets or very large file counts.
Backup flow

Velero supports both on-demand and scheduled backups. In both cases it collects Kubernetes resources, applies filters if requested, packages the result, and uploads it to an object storage backend.

A typical backup flow looks like this:

  1. The user runs velero backup create, which creates a Backup resource.
  2. BackupController sees the new Backup resource and validates it.
  3. If validation succeeds, the controller runs the backup. By default, Velero creates snapshots for all persistent volumes. Use --snapshot-volumes=false to change that behavior.
  4. The controller uploads the backup data to object storage.

When Velero backs up resources, it stores them using the preferred API version. If the source API server exposes two versions of a group, for example teleport/v1alpha1 and teleport/v1, and v1 is the preferred version, the backup stores the resource in v1 form. The target cluster does not have to prefer that version, but it must support it. That is one reason restore can fail across clusters with different Kubernetes or CRD versions.

Backups can have a retention period through --ttl. When that retention window expires, Velero deletes the Kubernetes backup records, the backup files, the snapshots, and the related Restore objects. If garbage collection fails, Velero adds a velero.io/gc-failure=REASON label to the Backup object.

There is one important caveat for cross-cloud migration: snapshot-based volume backup is not enough. A snapshot created on cloud A is not something you can usually restore directly on cloud B.

Restore flow

Restore takes a previous backup, including Kubernetes resources and volume data, and replays it into the target cluster. The target cluster can be the source cluster itself, and the restore can be filtered so only part of the backup is restored.

Restored Kubernetes resources receive the label velero.io/restore-name=RESTORE_NAME. By default, the restore name is BACKUP_NAME-TIMESTAMP, where the timestamp format is YYYYMMDDhhmmss.

A typical restore flow looks like this:

  1. The user runs velero restore create, which creates a Restore resource.
  2. RestoreController sees the Restore object and validates it.
  3. If validation succeeds, the controller reads the backup metadata from object storage and performs prechecks, including API version checks, to see whether the resources can run on the new cluster.
  4. The controller restores resources one by one.

By default, Velero does not delete or overwrite existing objects in the target cluster. If a resource already exists, Velero skips it. Setting --existing-resource-policy=update tells Velero to try to update matching existing resources instead.

Object storage as source of truth

The object storage backend is Velero's single source of truth. That has two practical consequences:

  1. If object storage contains backup data but the Kubernetes API does not contain the matching Backup resource, Velero recreates the Backup object.
  2. If Kubernetes contains a Backup resource but object storage does not contain the matching backup data, Velero deletes the Backup object.

This is also why cross-cloud migration works at all. The source and target clusters do not need to talk directly to each other. Object storage becomes the only shared medium.

The CRD that defines where backup metadata is stored is BackupStorageLocation. It points to a bucket or a prefix inside a bucket. Velero stores backup metadata there, and file-system-based volume backups through Restic or Kopia also live there. Snapshot-based volume backups do not live in that bucket, because the snapshot implementation is controlled by the cloud provider.

Each Backup can use one BackupStorageLocation.

Snapshot locations

Snapshot-related information is stored in VolumeSnapshotLocation. The actual fields depend on the cloud plugin, because snapshot implementation is provider-specific.

Each Backup can use one VolumeSnapshotLocation per volume snapshot provider.

Providers and plugins

Velero uses a plugin model that keeps storage and cloud provider integrations outside the core project.

Hooks

Velero also exposes hooks around the standard backup and restore flow.

Backup hooks run during backup. One standard use is telling a database to flush in-memory buffers before a snapshot or file backup starts.

Restore hooks run during restore. They are often used for initialization steps that need to happen before the application starts normally.

Installation
Installing the CLI

Install the Velero CLI binary, extract it, and place velero on $PATH. To enable shell completion:

Shell
1
echo 'source <(velero completion bash)' >> ~/.bashrc

Client-side configuration can be adjusted like this:

Shell
1
2
3
4
5
# Enable client features
velero client config set features=EnableCSI
 
# Disable color output
velero client config set colorized=false
Installing the server components

The CLI can also install the server components:

Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
velero install \
    --namespace=teleport-system \
    --use-node-agent \
    --default-volumes-to-fs-backup \
    --features=EnableCSI,EnableAPIGroupVersions \
    --velero-pod-cpu-request \
    --velero-pod-mem-request \
    --velero-pod-cpu-limit \
    --velero-pod-mem-limit \
    --node-agent-pod-cpu-request \
    --node-agent-pod-mem-request \
    --node-agent-pod-cpu-limit \
    --node-agent-pod-mem-limit \
    --provider aws \
    --bucket backups \
    --secret-file ./aws-iam-creds \
    --backup-location-config region=us-east-2 \
    --snapshot-location-config region=us-east-2 \
    --no-default-backup-location \
    --dry-run -o yaml

Several flags in that example matter in migration scenarios:

  • --use-node-agent enables file-system-based backup support.
  • --default-volumes-to-fs-backup makes file-system backup the default for Pod volumes. Without it, volumes normally have to be selected through annotations.
  • --features=EnableCSI,EnableAPIGroupVersions turns on feature gates that matter in newer storage and API-version scenarios.
  • Resource request and limit flags often need adjustment when file-system backup is used heavily.

After installation, you can configure default backup and snapshot locations:

Shell
1
2
3
4
5
6
7
velero backup-location create backups-primary \
    --provider aws \
    --bucket velero-backups \
    --config region=us-east-1 \
    --default
 
velero server --default-volume-snapshot-locations="PROVIDER-NAME:LOCATION-NAME,PROVIDER2-NAME:LOCATION2-NAME"

You can also add extra snapshot providers after the initial install:

Shell
1
2
3
4
5
velero plugin add registry/image:version
 
velero snapshot-location create NAME \
    --provider PROVIDER-NAME \
    [--config PROVIDER-CONFIG]
A cross-cloud migration test

One practical test setup is to create one Kubernetes cluster on Alibaba Cloud as the source cluster and another on Tencent Cloud as the target cluster, then use Velero to move workloads across them.

Creating the clusters

Create the clusters through the two cloud consoles. The exact steps depend on the providers and are not the point here.

Migrating stateless workloads

The most common Kubernetes use case is still stateless workloads. Stateful infrastructure such as databases is often delegated to cloud PaaS products instead of being hosted inside the cluster. That reality makes Kubernetes migration much easier, because volume migration often drops out of scope.

A simple test case is an Nginx Deployment plus a Service. Start by creating the resources on the source cluster:

nginx.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:1.7.9
        name: nginx
        ports:
        - containerPort: 80
 
---
 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

For this kind of workload, the migration path is fairly direct: back up the namespace or selected resources from the source cluster, restore them into the target cluster, and then verify that the restored Deployment, Service, and related objects match expectations. The harder cases show up later, when CRDs, API version skew, storage classes, and cloud-specific integrations enter the picture.

← Terraform: a practical guide to infrastructure as code
LangChain: Architecture, LCEL, Agents, LangGraph, Retrieval, and Production Patterns →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Related Posts

  • Kata Containers学习笔记
  • Kustomize学习笔记
  • 基于Helm的Kubernetes资源管理
  • IPVS模式下ClusterIP泄露宿主机端口的问题
  • 限制Pod磁盘空间用量

Recent Posts

  • 人工智能知识 - 编程(二)
  • 人工智能知识 - 编程(一)
  • 人工智能知识 - 智能体
  • 人工智能知识 - Transformers和大模型
  • 人工智能知识 - 主要应用领域
ABOUT ME

汪震 | Alex Wong

江苏淮安人,现居北京。目前供职于腾讯云,专注国际化和AI落地。

GitHub:gmemcc

Git:git.gmem.cc

Email:gmemjunk@gmem.cc@me.com

ABOUT GMEM

绿色记忆是我的个人网站,域名gmem.cc中G是Green的简写,MEM是Memory的简写,CC则是我的小天使彩彩名字的简写。

我在这里记录自己的工作与生活,同时和大家分享一些编程方面的知识。

GMEM HISTORY
v2.00:微风
v1.03:单车旅行
v1.02:夏日版
v1.01:未完成
v0.10:彩虹天堂
v0.01:阳光海岸
MIRROR INFO
Meta
  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org
Recent Posts
  • 人工智能知识 - 编程(二)
    这一篇承接人工智能知识 - 编程(一)。前一篇已经梳理 AI 训练与推理编程的横向工程栈;本篇进入重点框架详解与 ...
  • 人工智能知识 - 编程(一)
    这一篇专门处理 AI 训练、微调、推理与部署中的编程栈问题。前几篇分别讲了机器学习基础、任务版图、Transfo ...
  • 人工智能知识 - 智能体
    这一篇处理模型之外的系统层问题,包括上下文工程、Harness Engineering、检索增强生成(RAG)与 ...
  • 人工智能知识 - Transformers和大模型
    这一篇聚焦现代大模型主线,内容从 Transformer 架构出发,延伸到语言模型、多模态模型、预训练与微调,以 ...
  • 人工智能知识 - 主要应用领域
    这一篇从任务视角进入现代 AI 的几个核心应用方向,重点讨论自然语言处理、计算机视觉、语音和音频处理、搜索/推荐 ...
  • 人工智能知识 - 算法和机器学习
    这一篇从常用算法进入机器学习基础概念、经典机器学习与神经网络,重点讨论“模型如何被构造、训练、评估与正则化”。前 ...
  • 人工智能知识 - 数学基础
    这一篇整理 AI 所需的数学基础,包括基础数学、线性代数、微积分与概率论统计。它回答的核心问题是:模型里的向量、 ...
  • 人工智能知识 - 简介
    这一篇作为整套 AI 总纲的导论,先回答更根本的问题,不急于进入公式和具体模型细节:什么叫智能,人工智能究竟在试 ...
  • 多语言敏感信息检测模型训练日志
    这篇文章记录一个多语言敏感信息识别项目的完整训练日志。它关注的是工程路径本身:原始 AI 合成语料如何被清洗成可 ...
  • DevPod on Kubernetes: turning devcontainer.json into a persistent remote workspace
    DevPod is an open source workspace manager ...
  • OpenClaw: Architecture, Components, and Deployment Notes
    Four Months, 343,000 Stars On November 24, 2025, ...
  • Replacing Docker Desktop with Colima on macOS
    Colima is one of the cleanest ways ...
  • Kubernetes GPU Sharing
    GPU sharing in Kubernetes depends on what ...
  • Investigating and Solving the Issue of Failed Certificate Request with ZeroSSL and Cert-Manager
    In this blog post, I will walk ...
  • A Comprehensive Study of Kotlin for Java Developers
    Introduction Purpose of the Study Understanding the Mo ...
  • LangChain: Architecture, LCEL, Agents, LangGraph, Retrieval, and Production Patterns
    LangChain is no longer best understood as ...
  • Kubernetes Migration
    Migrating a Kubernetes cluster from one cloud ...
  • Terraform: a practical guide to infrastructure as code
    Terraform is an infrastructure-as-code tool. You describ ...
TOPLINKS
  • Zitahli's blue 91 people like this
  • 梦中的婚礼 64 people like this
  • 汪静好 61 people like this
  • 那年我一岁 36 people like this
  • 为了爱 28 people like this
  • 小绿彩 26 people like this
  • 彩虹姐姐的笑脸 24 people like this
  • 杨梅坑 6 people like this
  • 亚龙湾之旅 1 people like this
  • 汪昌博 people like this
  • 2013年11月香山 10 people like this
  • 2013年7月秦皇岛 6 people like this
  • 2013年6月蓟县盘山 5 people like this
  • 2013年2月梅花山 2 people like this
  • 2013年淮阴自贡迎春灯会 3 people like this
  • 2012年镇江金山游 1 people like this
  • 2012年徽杭古道 9 people like this
  • 2011年清明节后扬州行 1 people like this
  • 2008年十一云龙公园 5 people like this
  • 2008年之秋忆 7 people like this
  • 老照片 13 people like this
  • 火一样的六月 16 people like this
  • 发黄的相片 3 people like this
  • Cesium学习笔记 90 people like this
  • IntelliJ IDEA知识集锦 59 people like this
  • Bazel学习笔记 38 people like this
  • 基于Kurento搭建WebRTC服务器 38 people like this
  • PhoneGap学习笔记 32 people like this
  • NaCl学习笔记 32 people like this
  • 使用Oracle Java Mission Control监控JVM运行状态 29 people like this
  • Ceph学习笔记 27 people like this
  • 基于Calico的CNI 27 people like this
Tag Cloud
ActiveMQ AspectJ CDT Ceph Chrome CNI Command Cordova Coroutine CXF Cygwin DNS Docker eBPF Eclipse ExtJS F7 FAQ Groovy Hibernate HTTP IntelliJ IO编程 IPVS JacksonJSON JMS JSON JVM K8S kernel LB libvirt Linux知识 Linux编程 LOG Maven MinGW Mock Monitoring Multimedia MVC MySQL netfs Netty Nginx NIO Node.js NoSQL Oracle PDT PHP Redis RPC Scheduler ServiceMesh SNMP Spring SSL svn Tomcat TSDB Ubuntu WebGL WebRTC WebService WebSocket wxWidgets XDebug XML XPath XRM ZooKeeper 亚龙湾 单元测试 学习笔记 实时处理 并发编程 彩姐 性能剖析 性能调优 文本处理 新特性 架构模式 系统编程 网络编程 视频监控 设计模式 远程调试 配置文件 齐塔莉
Recent Comments
  • xdemo on 人工智能知识 - 编程(二)
  • 杨松涛 on snmp4j学习笔记
  • kaka on Cilium学习笔记
  • JackZhouMine on Cesium学习笔记
  • 陈黎 on 通过自定义资源扩展Kubernetes
  • qg on Istio中的透明代理问题
  • heao on 基于本地gRPC的Go插件系统
  • 黄豆豆 on Ginkgo学习笔记
  • cloud on OpenStack学习笔记
  • 5dragoncon on Cilium学习笔记
  • Archeb on 重温iptables
  • C/C++编程:WebSocketpp(Linux + Clion + boostAsio) – 源码巴士 on 基于C/C++的WebSocket库
  • jerbin on eBPF学习笔记
  • point on Istio中的透明代理问题
  • G on Istio中的透明代理问题
  • 绿色记忆:Go语言单元测试和仿冒 on Ginkgo学习笔记
  • point on Istio中的透明代理问题
  • 【Maven】maven插件开发实战 – IT汇 on Maven插件开发
  • chenlx on eBPF学习笔记
  • Alex on eBPF学习笔记
  • CFC4N on eBPF学习笔记
  • 李运田 on 念爷爷
  • yongman on 记录一次KeyDB缓慢的定位过程
©2005-2026 Gmem.cc | Powered by WordPress | 京ICP备18007345号-2