Kubernetes ETCD备份恢复操作指南

✅ 环境确认

项目

命名空间

apisix

ETCD Pod 名称格式

apisix-etcd-0apisix-etcd-1apisix-etcd-2

部署方式

Bitnami Helm Chart(内置 StatefulSet)

协议

HTTP(非 TLS)

客户端访问地址

http://apisix-etcd.apisix.svc.cluster.local:2379

数据目录

/bitnami/etcd/data

ETCDCTL_API

3

认证

未启用 TLS/证书认证


✅ 最终定制版:自动备份方案

请保存以下 YAML 文件为:

vim etcd-backup-cronjob.yaml

然后直接执行 kubectl apply -f etcd-backup-cronjob.yaml 即可使用。


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: etcd-backup-pvc
  namespace: apisix
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: apisix
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          securityContext:
            fsGroup: 0           # Pod 级别设置
          imagePullSecrets:       # 私有仓库认证
            - name: huaweicloud-registry-secret
          containers:
          - name: etcd-backup
            image: swr.cn-north-4.myhuaweicloud.com/cfhy-common/etcd:3.5
            imagePullPolicy: IfNotPresent
            securityContext:
              runAsUser: 0       # 容器内以 root 运行
              runAsGroup: 0
            command:
            - /bin/bash
            - -c
            - |
              set -e
              BACKUP_DIR=/backup
              TIMESTAMP=$(date +%Y%m%d-%H%M%S)
              SNAPSHOT_FILE=${BACKUP_DIR}/etcd-snapshot-${TIMESTAMP}.db

              echo "=== 开始 etcd 快照备份 ==="
              etcdctl --endpoints="http://apisix-etcd.apisix.svc.cluster.local:2379" \
                      snapshot save ${SNAPSHOT_FILE}

              echo "=== 校验快照 ==="
              etcdctl snapshot status ${SNAPSHOT_FILE}

              echo "=== 清理 7 天前快照 ==="
              find ${BACKUP_DIR} -type f -name "etcd-snapshot-*.db" -mtime +7 -delete || true

              echo "=== 列出现有快照 ==="
              ls -lh ${BACKUP_DIR}
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: TZ
              value: Asia/Shanghai
            volumeMounts:
            - name: backup
              mountPath: /backup
              readOnly: false
          restartPolicy: OnFailure
          volumes:
          - name: backup
            persistentVolumeClaim:
              claimName: etcd-backup-pvc

🔍 验证执行结果

首次执行可以手动触发一次:

kubectl create job --from=cronjob/etcd-backup -n apisix etcd-backup-manual

查看任务状态:

kubectl get pods -n apisix | grep etcd-backup
kubectl logs -n apisix <pod-name>

正常输出会包含:

=== 开始 etcd 快照备份 ===
Snapshot saved at /backup/etcd-snapshot-20251111-020000.db
=== 校验快照 ===
=== 当前备份列表 ===

📂 查看备份文件

#压测
ls -al /mnt/test-nfs/cfhy-pet-cce-01/pvc-243394b9-3f15-4094-9d24-7c6c2fcf64e8/
#小绿
/mnt/green-prod-nfs/cfhy-prod-green-cce/pvc-12b4f3cf-9c20-4357-abfd-1796e06f9d08
#小蓝
ls -al /mnt/prod-nfs/cfhy-prod-cce/pvc-6ea800d4-9584-4aa4-9919-7b35f610d33a

输出示例:


-rw-rw----  1 root root 2363424 Dec 15 15:40 etcd-snapshot-20251215-074032.db
-rw-rw----  1 root root 2367520 Dec 15 15:44 etcd-snapshot-20251215-154421.db
-rw-------  1 root root 2363424 Dec 16 02:00 etcd-snapshot-20251216-020011.db

♻️ 恢复流程(灾难恢复/迁移)

1,确保 etcd 完全停掉

kubectl scale statefulset apisix-etcd -n apisix --replicas=0

确认是否全部停止

kubectl get pod -n apisix | grep etcd

2,创建恢复pod

用“restore Pod”挂载同一个 PVC

⚠️ 注意:

  • 这个 Pod 只做 restore
  • 不会启动 etcd
cat etcd-restore.yaml
apiVersion: v1
kind: Pod
metadata:
  name: etcd-restore
  namespace: apisix
spec:
  restartPolicy: Never

  securityContext:
    fsGroup: 0

  containers:
  - name: restore
    image: swr.cn-north-4.myhuaweicloud.com/cfhy-common/etcd:3.5
    securityContext:
      runAsUser: 0
      runAsGroup: 0

    command: ["/bin/bash","-c"]
    args:
      - |
        set -e
        echo "== 清空旧 data-dir =="
        rm -rf /bitnami/etcd/data/*
        echo "== 开始 restore =="
        etcdctl snapshot restore /backup/etcd-snapshot-20251222-020012.db \
          --name apisix-etcd-0 \
          --initial-cluster apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380 \
          --initial-cluster-token apisix-etcd-cluster \
          --initial-advertise-peer-urls http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380 \
          --data-dir /bitnami/etcd/data
        echo "restore done"

    env:
    - name: ETCDCTL_API
      value: "3"

    volumeMounts:
    - name: data
      mountPath: /bitnami/etcd
    - name: backup
      mountPath: /backup

  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-apisix-etcd-0
  - name: backup
    persistentVolumeClaim:
      claimName: etcd-backup-pvc

注:更换备份文件名(改为需要恢复的文件名)

kubectl apply -f etcd-restore.yaml
kubectl logs -n apisix etcd-restore
看到 restore done 即成功。

3,删除 restore Pod(可选)

kubectl delete pod etcd-restore -n apisix

 

4,启动 etcd

kubectl scale statefulset apisix-etcd -n apisix --replicas=1

若pod启动报以下错误

{"level":"warn","ts":"2025-12-22T04:02:21.307808Z","caller":"rafthttp/stream.go:653","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"90126cc714381e07","remote-peer-cluster-id":"bfbce2358fdf94e6","local-member-id":"2c16fb63879f0d98","local-member-cluster-id":"b0d7015fda1525c8","error":"cluster ID mismatch"}

修改变量重新启动

- name: ETCD_INITIAL_CLUSTER_STATE
  value: new
- name: ETCD_INITIAL_CLUSTER
  value: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380


修改以上变量为
- name: ETCD_INITIAL_CLUSTER_STATE
  value: "new"
- name: ETCD_INITIAL_CLUSTER
  value: "apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380"

 

5,恢复后必做校验(别跳过)

kubectl exec -n apisix apisix-etcd-0 -- \
  etcdctl get /apisix --prefix --keys-only | head

看到以下内容即恢复成功

/apisix/consumer_groups/
/apisix/consumers/
/apisix/global_rules/
/apisix/global_rules/1
/apisix/plugin_configs/

 

‼️恢复后出现报错时看以下配置及操作

出现报错时当前应该是一个节点正常,增加副本后副本启动异常,此时操作以下步骤

- name: ETCD_INITIAL_CLUSTER_STATE
  value: "new"
- name: ETCD_INITIAL_CLUSTER
  value: "apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380"


修改以上变量为
- name: ETCD_INITIAL_CLUSTER_STATE
  value: new
- name: ETCD_INITIAL_CLUSTER
  value: apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380

 

等待pod启动,若pod正启动则本次操作全部完成

若依然报错执行以下操作

# 检查集群状态
kubectl exec -n apisix apisix-etcd-1 --   etcdctl member list
2c16fb63879f0d98, started, apisix-etcd-1, http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380, http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2379,http://apisix-etcd.apisix.svc.cluster.local:2379, false
3ff1b5cd453a87df, started, apisix-etcd-2, http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380, http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2379,http://apisix-etcd.apisix.svc.cluster.local:2379, false
90126cc714381e07, started, apisix-etcd-0, http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380, , false

此状态为 apisix-etcd-1,apisix-etcd-2为正常

apisix-etcd-0 为异常节点

执行以下操作修复

# 删除相关pod及其pvc,集群会自动重建,重新加入集群(删除vpc时若卡住,手动删除对应pod即可)
kubectl delete pvc data-apisix-etcd-0 -n apisix 

 

再次检查集群状态,此状态为正常

kubectl exec -n apisix apisix-etcd-1 --   etcdctl member list
2c16fb63879f0d98, started, apisix-etcd-1, http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2380, http://apisix-etcd-1.apisix-etcd-headless.apisix.svc.cluster.local:2379,http://apisix-etcd.apisix.svc.cluster.local:2379, false
3ff1b5cd453a87df, started, apisix-etcd-2, http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2380, http://apisix-etcd-2.apisix-etcd-headless.apisix.svc.cluster.local:2379,http://apisix-etcd.apisix.svc.cluster.local:2379, false
90126cc714381e07, started, apisix-etcd-0, http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2380, http://apisix-etcd-0.apisix-etcd-headless.apisix.svc.cluster.local:2379,http://apisix-etcd.apisix.svc.cluster.local:2379, false

 

 

 

 《不怕走得慢,只怕停下来。》

posted @ 2025-12-24 15:06  木易-故事里的人  阅读(2)  评论(0)    收藏  举报