Kuberntes ETCD简介及高可用备份

https://etcd.io/
https://github.com/etcd-io/etcd
https://etcd.io/docs/v3.5/op-guide/maintenance/


etcd具有下面这些属性:
完全复制: 集群中的每个节点都可以使用完整的存档
高可用性:Etcd可用于避免硬件的单点故障或网络问题
一致性: 每次读取都会返回跨多主机的最新写入
简单: 包括一个定义良好、面向用户的API (gRPC)
安全:实现了带有可选的客户端证书身份验证的自动化TLS
快速: 每秒10000次写入的基准速度
可靠: 使用Raft算法实现了存储的合理分布Etcd的工作原理

etcd配置文件

[root@k8s-etcd01 ~]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd #数据保存目录
ExecStart=/opt/kube/bin/etcd \ #二进制文件路劲
  --name=etcd-192.168.40.106 \ #当前node名字
  --cert-file=/etc/kubernetes/ssl/etcd.pem \
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.40.106:2380 \ #通告自己的集群端口
  --listen-peer-urls=https://192.168.40.106:2380 \ #集群之间通讯端口
  --listen-client-urls=https://192.168.40.106:2379,http://127.0.0.1:2379 \ #客户端访问地址
  --advertise-client-urls=https://192.168.40.106:2379 \ #通告自己的客户端
  --initial-cluster-token=etcd-cluster-0 \ #创建集群使用的token,一个集群内的节点保持一致
  --initial-cluster=etcd-192.168.40.106=https://192.168.40.106:2380,etcd-192.168.40.107=https://192.168.40.107:2380,etcd-192.168.40.108=https://192.168.40.108:2380 \ #集群所有的节点信息
  --initial-cluster-state=new \ #新建集群的时候的值为new,如果是已经存在的集群为existing
  --data-dir=/var/lib/etcd \ #数据目录路径
  --wal-dir= \
  --snapshot-count=50000 \
  --auto-compaction-retention=1 \ #第一次压缩等待10小时,以后每次10小时*10%=1小时压缩一次。
  --auto-compaction-mode=periodic \ #周期性压缩
  --max-request-bytes=10485760 \ #request size limit(请求的最大字节数,默认一个key最大1.5Mib,官方推荐最大10Mib)
  --quota-backend-bytes=8589934592 #storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有警告信息)
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

集群碎片整理

[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl defrag \
> --cluster \
> --endpoints=https://192.168.40.106:2379 \
> --cacert=/etc/kubernetes/ssl/ca.pem \
> --cert=/etc/kubernetes/ssl/etcd.pem \
> --key=/etc/kubernetes/ssl/etcd-key.pem
Finished defragmenting etcd member[https://192.168.40.108:2379]
Finished defragmenting etcd member[https://192.168.40.107:2379]
Finished defragmenting etcd member[https://192.168.40.106:2379]

查看成员信息

etcd有多个不同的APl访向版本,版本已经废弃,etcd v2 和 3 本质上是共享同一套 raft 协议代码的两个独立的应用,接口不一样,存储不一样,数据
互相隔离,也就是说如果从 Etcd v2 升级到 Etcd v3,原来v2的数据还是只能用v2的接口访,v3 的接口创建的数据也只能访问通过v3的接口访问。

[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl member list
47e90ade6dd71220, started, etcd-192.168.40.108, https://192.168.40.108:2380, https://192.168.40.108:2379, false
8b89979739961514, started, etcd-192.168.40.107, https://192.168.40.107:2380, https://192.168.40.107:2379, false
dcc47085fa6566ff, started, etcd-192.168.40.106, https://192.168.40.106:2380, https://192.168.40.106:2379, false

验证当前etcd所有成员状态

[root@k8s-etcd01 bin]# export NODE_IPS="192.168.40.106 192.168.40.107 192.168.40.108"
[root@k8s-etcd01 bin]# 
for ip in ${NODE_IPS}; do \
     ETCDCTL_API=3 \
     /opt/kube/bin/etcdctl --endpoints=https://${ip}:2379 \
     --cacert=/etc/kubernetes/ssl/ca.pem \
     --cert=/etc/kubernetes/ssl/etcd.pem \
     --key=/etc/kubernetes/ssl/etcd-key.pem \
     endpoint health; \
 done
https://192.168.40.106:2379 is healthy: successfully committed proposal: took = 5.208784ms
https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.464308ms
https://192.168.40.108:2379 is healthy: successfully committed proposal: took = 5.851526ms

#显示集群成员信息
[root@k8s-etcd01 bin]# export NODE_IPS="192.168.40.106 192.168.40.107 192.168.40.108"
[root@k8s-etcd01 bin]# 
for ip in ${NODE_IPS}; do \
    ETCDCTL_API=3 \
    /opt/kube/bin/etcdctl --write-out=table --endpoints=https://${ip}:2379 \
    --cacert=/etc/kubernetes/ssl/ca.pem \
    --cert=/etc/kubernetes/ssl/etcd.pem \
    --key=/etc/kubernetes/ssl/etcd-key.pem \
    endpoint status;
done
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.106:2379 | dcc47085fa6566ff |   3.5.4 |  1.6 MB |      true |      false |         6 |     342761 |             342761 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.107:2379 | 8b89979739961514 |   3.5.4 |  1.6 MB |     false |      false |         6 |     342761 |             342761 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.108:2379 | 47e90ade6dd71220 |   3.5.4 |  1.6 MB |     false |      false |         6 |     342761 |             342761 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

 查看etcd数据信息

[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only

#查看pod信息

[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep pod
/calico/ipam/v2/handle/k8s-pod-network.3713d7f47bb5fc877beabfe04a3cefa488fd04544dec12023000417871b78982
/calico/ipam/v2/handle/k8s-pod-network.9fc4701ea265ecfdaaabeb24335e5f7dee34b721a38818d7147bec0c0e4a649d
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.horizontal-pod-autoscaler
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.pod-garbage-collector
/registry/clusterrolebindings/system:controller:horizontal-pod-autoscaler
/registry/clusterrolebindings/system:controller:pod-garbage-collector
/registry/clusterroles/system:controller:horizontal-pod-autoscaler
/registry/clusterroles/system:controller:pod-garbage-collector
/registry/pods/default/net-tesing-2
/registry/pods/default/net-testing
/registry/pods/kube-system/calico-kube-controllers-5c8bb696bb-4hv6k
/registry/pods/kube-system/calico-node-cr4pc
/registry/pods/kube-system/calico-node-d8l9x
/registry/pods/kube-system/calico-node-wcg99
/registry/pods/kube-system/calico-node-z6sqs
/registry/serviceaccounts/kube-system/horizontal-pod-autoscaler
/registry/serviceaccounts/kube-system/pod-garbage-collector

#查看namespace信息

[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep namespaces
/registry/namespaces/default
/registry/namespaces/kube-node-lease
/registry/namespaces/kube-public
/registry/namespaces/kube-system

#查看控制器信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep deployment
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.deployment-controller
/registry/clusterrolebindings/system:controller:deployment-controller
/registry/clusterroles/system:controller:deployment-controller

#查看calico组件信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep calico
/calico/ipam/v2/assignment/ipv4/block/10.200.122.128-26
/calico/ipam/v2/assignment/ipv4/block/10.200.32.128-26
/calico/ipam/v2/assignment/ipv4/block/10.200.58.192-26
/calico/ipam/v2/assignment/ipv4/block/10.200.85.192-26
/registry/deployments/kube-system/calico-kube-controllers
/registry/serviceaccounts/kube-system/deployment-controller

etcd增删改查数据

#增添数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /nikname "BIRKHOFF"
OK
#查询数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname
/nikname
BIRKHOFF
#改动数据 直接覆盖就是更新数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /nikname "YEYE"
OK
#验证改动
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname
/nikname
YEYE
#删除数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl del /nikname
1
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname

etcd数据watch机制

基于不断监看数据,发生变化就主动触发通知客户端,Etcd v3 的watch机制支持watch某个固定的key,也支持watch一个范围
#在etcd node1上watch一个key,没有此key也可以执行watch,后期可以再创建:
[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl watch /watchdata
#在etcd node2修改数据,验证etcd node1是否能够发现数据变化
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /watchdata "DATA v2"

etcd V3 API版本数据备份与恢复

#WAL是write ahead log(预写日志)的缩写,顾名思义,也就是在执行真正的写操作之前先写一个日志,预写日志。
#wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前、都要先写人到WAL中

#V3版本 备份数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl snapshot save snapshottest.db
{"level":"info","ts":"2023-11-23T20:38:36.559+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"snapshottest.db.part"}
{"level":"info","ts":"2023-11-23T20:38:36.562+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-23T20:38:36.562+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2023-11-23T20:38:36.589+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-23T20:38:36.591+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"1.7 MB","took":"now"}
{"level":"info","ts":"2023-11-23T20:38:36.591+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"snapshottest.db"}
Snapshot saved at snapshottest.db

#V3版本 恢复数据:将数据恢复到一个新的不存在的目录中
[root@k8s-etcd02 ~]# mkdir /opt/restore-etcd-dir
[root@k8s-etcd02 ~]# ETCDCTL_API=3 /opt/kube/bin/etcdctl snapshot restore snapshottest.db --data-dir=/opt/restore-etcd-dir
Deprecated: Use `etcdutl snapshot restore` instead.

2023-11-23T21:03:48+08:00       info    snapshot/v3_snapshot.go:248     restoring snapshot      {"path": "snapshottest.db", "wal-dir": "/opt/restore-etcd-dir/member/wal", "data-dir": "/opt/restore-etcd-dir", "snap-dir": "/opt/restore-etcd-dir/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:129\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
2023-11-23T21:03:48+08:00       info    membership/store.go:141 Trimming membership information from the backend...
2023-11-23T21:03:48+08:00       info    membership/cluster.go:421       added member    {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2023-11-23T21:03:48+08:00       info    snapshot/v3_snapshot.go:269     restored snapshot       {"path": "snapshottest.db", "wal-dir": "/opt/restore-etcd-dir/member/wal", "data-dir": "/opt/restore-etcd-dir", "snap-dir": "/opt/restore-etcd-dir/member/snap"}

#自动化脚本自动备份数据
[root@k8s-etcd02 snap]# mkdir /data/etcd-backup-dir/ -p
[root@k8s-etcd02 snap]# vim etcd-backup.sh
[root@k8s-etcd02 snap]# chmod a+x etcd-backup.sh
[root@k8s-etcd02 snap]# bash etcd-backup.sh
{"level":"info","ts":"2023-11-23T21:11:17.909+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db.part"}
{"level":"info","ts":"2023-11-23T21:11:17.910+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-23T21:11:17.910+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2023-11-23T21:11:17.922+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-23T21:11:17.924+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"1.7 MB","took":"now"}
{"level":"info","ts":"2023-11-23T21:11:17.924+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db"}
Snapshot saved at /data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db

etcd V2 API版本数据备份与恢复

#V2版本帮助信息
[root@k8s-etcd02 snap]# ETCDCTL_API=2 /opt/kube/bin/etcdctl backup --help
NAME:
   etcdctl backup - --data-dir=... --backup-dir={output}

USAGE:
   [deprecated] offline backup an etcd directory.

DESCRIPTION:
   Performs an offline backup of etcd directory.

Moved to `./etcdutl backup` and going to be decomissioned in v3.5

The recommended (online) backup command is: `./etcdctl snapshot save ...`.

OPTIONS:
   --data-dir value        Path to the etcd data dir #源数据目录
   --wal-dir value         Path to the etcd wal dir
   --backup-dir value      Path to the backup dir #备份目录
   --backup-wal-dir value  Path to the backup wal dir
   --with-v3               Backup v3 backend data

#V2 版本备份数据
[root@k8s-etcd02 snap]# ETCDCTL_API=2 /opt/kube/bin/etcdctl backup --data-dir /var/lib/etcd/ --backup-dir /opt/etcd_backup
2023-11-23T21:17:13+08:00       info    etcdutl/backup_command.go:252   ignoring v3 raft entry

etcd 集群V3 版本数据自动备份与恢复

#备份ansible脚本
[root@k8s-deploy playbooks]# pwd
/etc/kubeasz/playbooks
[root@k8s-deploy playbooks]# cat 94.backup.yml
# cluster-backup playbook
# read the guide: 'op/cluster_restore.md'

- hosts:
  - localhost
  tasks:
  # step1: find a healthy member in the etcd cluster
  - name: set NODE_IPS of the etcd cluster
    set_fact: NODE_IPS="{% for host in groups['etcd'] %}{{ host }} {% endfor %}"

  - name: get etcd cluster status
    shell: 'for ip in {{ NODE_IPS }};do \
              ETCDCTL_API=3 {{ base_dir }}/bin/etcdctl \
              --endpoints=https://"$ip":2379 \
              --cacert={{ cluster_dir }}/ssl/ca.pem \
              --cert={{ cluster_dir }}/ssl/etcd.pem \
              --key={{ cluster_dir }}/ssl/etcd-key.pem \
              endpoint health; \
            done'
    register: ETCD_CLUSTER_STATUS
    ignore_errors: true

  - debug: var="ETCD_CLUSTER_STATUS"

  - name: get a running ectd node
    shell: 'echo -e "{{ ETCD_CLUSTER_STATUS.stdout }}" \
             "{{ ETCD_CLUSTER_STATUS.stderr }}" \
             |grep "is healthy"|sed -n "1p"|cut -d: -f2|cut -d/ -f3'
    register: RUNNING_NODE

  - debug: var="RUNNING_NODE.stdout"

  - name: get current time
    shell: "date +'%Y%m%d%H%M'"
    register: timestamp

  # step2: backup data on the healthy member
  - name: make a backup on the etcd node
    shell: "mkdir -p /etcd_backup && cd /etcd_backup && \
        ETCDCTL_API=3 {{ bin_dir }}/etcdctl snapshot save snapshot_{{ timestamp.stdout }}.db"
    args:
      warn: false
    delegate_to: "{{ RUNNING_NODE.stdout }}"

  - name: fetch the backup data
    fetch:
      src: /etcd_backup/snapshot_{{ timestamp.stdout }}.db
      dest: "{{ cluster_dir }}/backup/"
      flat: yes
    delegate_to: "{{ RUNNING_NODE.stdout }}"

  - name: update the latest backup
    shell: 'cd {{ cluster_dir }}/backup/ && /bin/cp -f snapshot_{{ timestamp.stdout }}.db snapshot.db'
    
#1.执行kubeasz 脚本备份
[root@k8s-deploy kubeasz]# ./ezctl backup k8s-cluster-kubeasz
ansible-playbook -i clusters/k8s-cluster-kubeasz/hosts -e @clusters/k8s-cluster-kubeasz/config.yml playbooks/94.backup.yml
2023-11-23 21:44:12 INFO cluster:k8s-cluster-kubeasz backup begins in 5s, press any key to abort:


PLAY [localhost] **********************************************************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [localhost]

TASK [set NODE_IPS of the etcd cluster] ***********************************************************************************************************************************************
ok: [localhost]

TASK [get etcd cluster status] ********************************************************************************************************************************************************
changed: [localhost]

TASK [debug] **************************************************************************************************************************************************************************
ok: [localhost] => {
    "ETCD_CLUSTER_STATUS": {
        "changed": true,
        "cmd": "for ip in 192.168.40.107 192.168.40.106 192.168.40.108 ;do ETCDCTL_API=3 /etc/kubeasz/bin/etcdctl --endpoints=https://\"$ip\":2379 --cacert=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/ca.pem --cert=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/etcd.pem --key=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/etcd-key.pem endpoint health; done",
        "delta": "0:00:00.140713",
        "end": "2023-11-23 21:44:29.099833",
        "failed": false,
        "rc": 0,
        "start": "2023-11-23 21:44:28.959120",
        "stderr": "",
        "stderr_lines": [],
        "stdout": "https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.228375ms\nhttps://192.168.40.106:2379 is healthy: successfully committed proposal: took = 6.026178ms\nhttps://192.168.40.108:2379 is healthy: successfully committed proposal: took = 6.11315ms",
        "stdout_lines": [
            "https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.228375ms",
            "https://192.168.40.106:2379 is healthy: successfully committed proposal: took = 6.026178ms",
            "https://192.168.40.108:2379 is healthy: successfully committed proposal: took = 6.11315ms"
        ]
    }
}

TASK [get a running ectd node] ********************************************************************************************************************************************************
changed: [localhost]

TASK [debug] **************************************************************************************************************************************************************************
ok: [localhost] => {
    "RUNNING_NODE.stdout": "192.168.40.107"
}

TASK [get current time] ***************************************************************************************************************************************************************
changed: [localhost]

TASK [make a backup on the etcd node] *************************************************************************************************************************************************
changed: [localhost -> 192.168.40.107]

TASK [fetch the backup data] **********************************************************************************************************************************************************
changed: [localhost -> 192.168.40.107]

TASK [update the latest backup] *******************************************************************************************************************************************************
changed: [localhost]

PLAY RECAP ****************************************************************************************************************************************************************************
localhost                  : ok=10   changed=6    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

#保存路径
[root@k8s-deploy backup]# pwd
/etc/kubeasz/clusters/k8s-cluster-kubeasz/backup
[root@k8s-deploy backup]# ll
total 3272
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot_202311232144.db
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot.db

#2.测试删除一个POD 
[root@k8s-deploy backup]# kubectl delete pod net-tesing-2
pod "net-tesing-2" deleted
[root@k8s-deploy backup]# ll
total 3272
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot_202311232144.db
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot.db
[root@k8s-deploy kubeasz]# ./ezctl restore k8s-cluster-kubeasz
ansible-playbook -i clusters/k8s-cluster-kubeasz/hosts -e @clusters/k8s-cluster-kubeasz/config.yml playbooks/95.restore.yml
2023-11-23 21:49:17 INFO cluster:k8s-cluster-kubeasz restore begins in 5s, press any key to abort:

#检验是否恢复 net-tesing-2
[root@k8s-deploy kubeasz]# kubectl get pods
NAME           READY   STATUS    RESTARTS      AGE
net-tesing-2   1/1     Running   0             8d
net-testing    1/1     Running   1 (27h ago)   8d

 ETCD数据恢复流程

当etCd集群宕机数量超过集群总节点教一半以上的时候(如总数为三台宕机两台)。就会导致整合集群宕机。后期需要重新恢复数据、则恢复流程如下

1.恢复服务器系统
2.重新部署ETCD集群
3.停上kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
4.停止ETCD集群
5.各ETCD节点恢复同一份备份数据
6.启动各节点并验证ETCD集群
7.启动kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
8.验证k8s master状态及pod数据

 

 

 

 

 

 

 

 

 

 

 

 

 

 
posted @ 2023-11-23 21:56  しみずよしだ  阅读(81)  评论(0)    收藏  举报