Kuberntes ETCD简介及高可用备份
https://etcd.io/
https://github.com/etcd-io/etcd
https://etcd.io/docs/v3.5/op-guide/maintenance/
etcd具有下面这些属性:
完全复制: 集群中的每个节点都可以使用完整的存档
高可用性:Etcd可用于避免硬件的单点故障或网络问题
一致性: 每次读取都会返回跨多主机的最新写入
简单: 包括一个定义良好、面向用户的API (gRPC)
安全:实现了带有可选的客户端证书身份验证的自动化TLS
快速: 每秒10000次写入的基准速度
可靠: 使用Raft算法实现了存储的合理分布Etcd的工作原理
etcd配置文件
[root@k8s-etcd01 ~]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd #数据保存目录
ExecStart=/opt/kube/bin/etcd \ #二进制文件路劲
--name=etcd-192.168.40.106 \ #当前node名字
--cert-file=/etc/kubernetes/ssl/etcd.pem \
--key-file=/etc/kubernetes/ssl/etcd-key.pem \
--peer-cert-file=/etc/kubernetes/ssl/etcd.pem \
--peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
--initial-advertise-peer-urls=https://192.168.40.106:2380 \ #通告自己的集群端口
--listen-peer-urls=https://192.168.40.106:2380 \ #集群之间通讯端口
--listen-client-urls=https://192.168.40.106:2379,http://127.0.0.1:2379 \ #客户端访问地址
--advertise-client-urls=https://192.168.40.106:2379 \ #通告自己的客户端
--initial-cluster-token=etcd-cluster-0 \ #创建集群使用的token,一个集群内的节点保持一致
--initial-cluster=etcd-192.168.40.106=https://192.168.40.106:2380,etcd-192.168.40.107=https://192.168.40.107:2380,etcd-192.168.40.108=https://192.168.40.108:2380 \ #集群所有的节点信息
--initial-cluster-state=new \ #新建集群的时候的值为new,如果是已经存在的集群为existing
--data-dir=/var/lib/etcd \ #数据目录路径
--wal-dir= \
--snapshot-count=50000 \
--auto-compaction-retention=1 \ #第一次压缩等待10小时,以后每次10小时*10%=1小时压缩一次。
--auto-compaction-mode=periodic \ #周期性压缩
--max-request-bytes=10485760 \ #request size limit(请求的最大字节数,默认一个key最大1.5Mib,官方推荐最大10Mib)
--quota-backend-bytes=8589934592 #storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有警告信息)
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
集群碎片整理
[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl defrag \
> --cluster \
> --endpoints=https://192.168.40.106:2379 \
> --cacert=/etc/kubernetes/ssl/ca.pem \
> --cert=/etc/kubernetes/ssl/etcd.pem \
> --key=/etc/kubernetes/ssl/etcd-key.pem
Finished defragmenting etcd member[https://192.168.40.108:2379]
Finished defragmenting etcd member[https://192.168.40.107:2379]
Finished defragmenting etcd member[https://192.168.40.106:2379]
查看成员信息
etcd有多个不同的APl访向版本,版本已经废弃,etcd v2 和 3 本质上是共享同一套 raft 协议代码的两个独立的应用,接口不一样,存储不一样,数据
互相隔离,也就是说如果从 Etcd v2 升级到 Etcd v3,原来v2的数据还是只能用v2的接口访,v3 的接口创建的数据也只能访问通过v3的接口访问。
[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl member list
47e90ade6dd71220, started, etcd-192.168.40.108, https://192.168.40.108:2380, https://192.168.40.108:2379, false
8b89979739961514, started, etcd-192.168.40.107, https://192.168.40.107:2380, https://192.168.40.107:2379, false
dcc47085fa6566ff, started, etcd-192.168.40.106, https://192.168.40.106:2380, https://192.168.40.106:2379, false
验证当前etcd所有成员状态
[root@k8s-etcd01 bin]# export NODE_IPS="192.168.40.106 192.168.40.107 192.168.40.108"
[root@k8s-etcd01 bin]#
for ip in ${NODE_IPS}; do \
ETCDCTL_API=3 \
/opt/kube/bin/etcdctl --endpoints=https://${ip}:2379 \
--cacert=/etc/kubernetes/ssl/ca.pem \
--cert=/etc/kubernetes/ssl/etcd.pem \
--key=/etc/kubernetes/ssl/etcd-key.pem \
endpoint health; \
done
https://192.168.40.106:2379 is healthy: successfully committed proposal: took = 5.208784ms
https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.464308ms
https://192.168.40.108:2379 is healthy: successfully committed proposal: took = 5.851526ms
#显示集群成员信息
[root@k8s-etcd01 bin]# export NODE_IPS="192.168.40.106 192.168.40.107 192.168.40.108"
[root@k8s-etcd01 bin]#
for ip in ${NODE_IPS}; do \
ETCDCTL_API=3 \
/opt/kube/bin/etcdctl --write-out=table --endpoints=https://${ip}:2379 \
--cacert=/etc/kubernetes/ssl/ca.pem \
--cert=/etc/kubernetes/ssl/etcd.pem \
--key=/etc/kubernetes/ssl/etcd-key.pem \
endpoint status;
done
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.106:2379 | dcc47085fa6566ff | 3.5.4 | 1.6 MB | true | false | 6 | 342761 | 342761 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.107:2379 | 8b89979739961514 | 3.5.4 | 1.6 MB | false | false | 6 | 342761 | 342761 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.40.108:2379 | 47e90ade6dd71220 | 3.5.4 | 1.6 MB | false | false | 6 | 342761 | 342761 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
查看etcd数据信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only
#查看pod信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep pod
/calico/ipam/v2/handle/k8s-pod-network.3713d7f47bb5fc877beabfe04a3cefa488fd04544dec12023000417871b78982
/calico/ipam/v2/handle/k8s-pod-network.9fc4701ea265ecfdaaabeb24335e5f7dee34b721a38818d7147bec0c0e4a649d
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.horizontal-pod-autoscaler
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.pod-garbage-collector
/registry/clusterrolebindings/system:controller:horizontal-pod-autoscaler
/registry/clusterrolebindings/system:controller:pod-garbage-collector
/registry/clusterroles/system:controller:horizontal-pod-autoscaler
/registry/clusterroles/system:controller:pod-garbage-collector
/registry/pods/default/net-tesing-2
/registry/pods/default/net-testing
/registry/pods/kube-system/calico-kube-controllers-5c8bb696bb-4hv6k
/registry/pods/kube-system/calico-node-cr4pc
/registry/pods/kube-system/calico-node-d8l9x
/registry/pods/kube-system/calico-node-wcg99
/registry/pods/kube-system/calico-node-z6sqs
/registry/serviceaccounts/kube-system/horizontal-pod-autoscaler
/registry/serviceaccounts/kube-system/pod-garbage-collector
#查看namespace信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep namespaces
/registry/namespaces/default
/registry/namespaces/kube-node-lease
/registry/namespaces/kube-public
/registry/namespaces/kube-system
#查看控制器信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep deployment
/calico/resources/v3/projectcalico.org/profiles/ksa.kube-system.deployment-controller
/registry/clusterrolebindings/system:controller:deployment-controller
/registry/clusterroles/system:controller:deployment-controller
#查看calico组件信息
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get / --prefix --keys-only | grep calico
/calico/ipam/v2/assignment/ipv4/block/10.200.122.128-26
/calico/ipam/v2/assignment/ipv4/block/10.200.32.128-26
/calico/ipam/v2/assignment/ipv4/block/10.200.58.192-26
/calico/ipam/v2/assignment/ipv4/block/10.200.85.192-26
/registry/deployments/kube-system/calico-kube-controllers
/registry/serviceaccounts/kube-system/deployment-controller
etcd增删改查数据
#增添数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /nikname "BIRKHOFF"
OK
#查询数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname
/nikname
BIRKHOFF
#改动数据 直接覆盖就是更新数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /nikname "YEYE"
OK
#验证改动
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname
/nikname
YEYE
#删除数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl del /nikname
1
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl get /nikname
etcd数据watch机制
基于不断监看数据,发生变化就主动触发通知客户端,Etcd v3 的watch机制支持watch某个固定的key,也支持watch一个范围
#在etcd node1上watch一个key,没有此key也可以执行watch,后期可以再创建:
[root@k8s-etcd01 bin]# ETCDCTL_API=3 /opt/kube/bin/etcdctl watch /watchdata
#在etcd node2修改数据,验证etcd node1是否能够发现数据变化
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl put /watchdata "DATA v2"
etcd V3 API版本数据备份与恢复
#WAL是write ahead log(预写日志)的缩写,顾名思义,也就是在执行真正的写操作之前先写一个日志,预写日志。
#wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中,所有数据的修改在提交前、都要先写人到WAL中
#V3版本 备份数据
[root@k8s-etcd02 ssl]# ETCDCTL_API=3 /opt/kube/bin/etcdctl snapshot save snapshottest.db
{"level":"info","ts":"2023-11-23T20:38:36.559+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"snapshottest.db.part"}
{"level":"info","ts":"2023-11-23T20:38:36.562+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-23T20:38:36.562+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2023-11-23T20:38:36.589+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-23T20:38:36.591+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"1.7 MB","took":"now"}
{"level":"info","ts":"2023-11-23T20:38:36.591+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"snapshottest.db"}
Snapshot saved at snapshottest.db
#V3版本 恢复数据:将数据恢复到一个新的不存在的目录中
[root@k8s-etcd02 ~]# mkdir /opt/restore-etcd-dir
[root@k8s-etcd02 ~]# ETCDCTL_API=3 /opt/kube/bin/etcdctl snapshot restore snapshottest.db --data-dir=/opt/restore-etcd-dir
Deprecated: Use `etcdutl snapshot restore` instead.
2023-11-23T21:03:48+08:00 info snapshot/v3_snapshot.go:248 restoring snapshot {"path": "snapshottest.db", "wal-dir": "/opt/restore-etcd-dir/member/wal", "data-dir": "/opt/restore-etcd-dir", "snap-dir": "/opt/restore-etcd-dir/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:129\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225"}
2023-11-23T21:03:48+08:00 info membership/store.go:141 Trimming membership information from the backend...
2023-11-23T21:03:48+08:00 info membership/cluster.go:421 added member {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2023-11-23T21:03:48+08:00 info snapshot/v3_snapshot.go:269 restored snapshot {"path": "snapshottest.db", "wal-dir": "/opt/restore-etcd-dir/member/wal", "data-dir": "/opt/restore-etcd-dir", "snap-dir": "/opt/restore-etcd-dir/member/snap"}
#自动化脚本自动备份数据
[root@k8s-etcd02 snap]# mkdir /data/etcd-backup-dir/ -p
[root@k8s-etcd02 snap]# vim etcd-backup.sh
[root@k8s-etcd02 snap]# chmod a+x etcd-backup.sh
[root@k8s-etcd02 snap]# bash etcd-backup.sh
{"level":"info","ts":"2023-11-23T21:11:17.909+0800","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db.part"}
{"level":"info","ts":"2023-11-23T21:11:17.910+0800","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-11-23T21:11:17.910+0800","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":"2023-11-23T21:11:17.922+0800","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-11-23T21:11:17.924+0800","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","size":"1.7 MB","took":"now"}
{"level":"info","ts":"2023-11-23T21:11:17.924+0800","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db"}
Snapshot saved at /data/etcd-backup-dir/etcd-snapshot-2023-11-23_21-11-17.db
etcd V2 API版本数据备份与恢复
#V2版本帮助信息
[root@k8s-etcd02 snap]# ETCDCTL_API=2 /opt/kube/bin/etcdctl backup --help
NAME:
etcdctl backup - --data-dir=... --backup-dir={output}
USAGE:
[deprecated] offline backup an etcd directory.
DESCRIPTION:
Performs an offline backup of etcd directory.
Moved to `./etcdutl backup` and going to be decomissioned in v3.5
The recommended (online) backup command is: `./etcdctl snapshot save ...`.
OPTIONS:
--data-dir value Path to the etcd data dir #源数据目录
--wal-dir value Path to the etcd wal dir
--backup-dir value Path to the backup dir #备份目录
--backup-wal-dir value Path to the backup wal dir
--with-v3 Backup v3 backend data
#V2 版本备份数据
[root@k8s-etcd02 snap]# ETCDCTL_API=2 /opt/kube/bin/etcdctl backup --data-dir /var/lib/etcd/ --backup-dir /opt/etcd_backup
2023-11-23T21:17:13+08:00 info etcdutl/backup_command.go:252 ignoring v3 raft entry
etcd 集群V3 版本数据自动备份与恢复
#备份ansible脚本
[root@k8s-deploy playbooks]# pwd
/etc/kubeasz/playbooks
[root@k8s-deploy playbooks]# cat 94.backup.yml
# cluster-backup playbook
# read the guide: 'op/cluster_restore.md'
- hosts:
- localhost
tasks:
# step1: find a healthy member in the etcd cluster
- name: set NODE_IPS of the etcd cluster
set_fact: NODE_IPS="{% for host in groups['etcd'] %}{{ host }} {% endfor %}"
- name: get etcd cluster status
shell: 'for ip in {{ NODE_IPS }};do \
ETCDCTL_API=3 {{ base_dir }}/bin/etcdctl \
--endpoints=https://"$ip":2379 \
--cacert={{ cluster_dir }}/ssl/ca.pem \
--cert={{ cluster_dir }}/ssl/etcd.pem \
--key={{ cluster_dir }}/ssl/etcd-key.pem \
endpoint health; \
done'
register: ETCD_CLUSTER_STATUS
ignore_errors: true
- debug: var="ETCD_CLUSTER_STATUS"
- name: get a running ectd node
shell: 'echo -e "{{ ETCD_CLUSTER_STATUS.stdout }}" \
"{{ ETCD_CLUSTER_STATUS.stderr }}" \
|grep "is healthy"|sed -n "1p"|cut -d: -f2|cut -d/ -f3'
register: RUNNING_NODE
- debug: var="RUNNING_NODE.stdout"
- name: get current time
shell: "date +'%Y%m%d%H%M'"
register: timestamp
# step2: backup data on the healthy member
- name: make a backup on the etcd node
shell: "mkdir -p /etcd_backup && cd /etcd_backup && \
ETCDCTL_API=3 {{ bin_dir }}/etcdctl snapshot save snapshot_{{ timestamp.stdout }}.db"
args:
warn: false
delegate_to: "{{ RUNNING_NODE.stdout }}"
- name: fetch the backup data
fetch:
src: /etcd_backup/snapshot_{{ timestamp.stdout }}.db
dest: "{{ cluster_dir }}/backup/"
flat: yes
delegate_to: "{{ RUNNING_NODE.stdout }}"
- name: update the latest backup
shell: 'cd {{ cluster_dir }}/backup/ && /bin/cp -f snapshot_{{ timestamp.stdout }}.db snapshot.db'
#1.执行kubeasz 脚本备份
[root@k8s-deploy kubeasz]# ./ezctl backup k8s-cluster-kubeasz
ansible-playbook -i clusters/k8s-cluster-kubeasz/hosts -e @clusters/k8s-cluster-kubeasz/config.yml playbooks/94.backup.yml
2023-11-23 21:44:12 INFO cluster:k8s-cluster-kubeasz backup begins in 5s, press any key to abort:
PLAY [localhost] **********************************************************************************************************************************************************************
TASK [Gathering Facts] ****************************************************************************************************************************************************************
ok: [localhost]
TASK [set NODE_IPS of the etcd cluster] ***********************************************************************************************************************************************
ok: [localhost]
TASK [get etcd cluster status] ********************************************************************************************************************************************************
changed: [localhost]
TASK [debug] **************************************************************************************************************************************************************************
ok: [localhost] => {
"ETCD_CLUSTER_STATUS": {
"changed": true,
"cmd": "for ip in 192.168.40.107 192.168.40.106 192.168.40.108 ;do ETCDCTL_API=3 /etc/kubeasz/bin/etcdctl --endpoints=https://\"$ip\":2379 --cacert=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/ca.pem --cert=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/etcd.pem --key=/etc/kubeasz/clusters/k8s-cluster-kubeasz/ssl/etcd-key.pem endpoint health; done",
"delta": "0:00:00.140713",
"end": "2023-11-23 21:44:29.099833",
"failed": false,
"rc": 0,
"start": "2023-11-23 21:44:28.959120",
"stderr": "",
"stderr_lines": [],
"stdout": "https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.228375ms\nhttps://192.168.40.106:2379 is healthy: successfully committed proposal: took = 6.026178ms\nhttps://192.168.40.108:2379 is healthy: successfully committed proposal: took = 6.11315ms",
"stdout_lines": [
"https://192.168.40.107:2379 is healthy: successfully committed proposal: took = 6.228375ms",
"https://192.168.40.106:2379 is healthy: successfully committed proposal: took = 6.026178ms",
"https://192.168.40.108:2379 is healthy: successfully committed proposal: took = 6.11315ms"
]
}
}
TASK [get a running ectd node] ********************************************************************************************************************************************************
changed: [localhost]
TASK [debug] **************************************************************************************************************************************************************************
ok: [localhost] => {
"RUNNING_NODE.stdout": "192.168.40.107"
}
TASK [get current time] ***************************************************************************************************************************************************************
changed: [localhost]
TASK [make a backup on the etcd node] *************************************************************************************************************************************************
changed: [localhost -> 192.168.40.107]
TASK [fetch the backup data] **********************************************************************************************************************************************************
changed: [localhost -> 192.168.40.107]
TASK [update the latest backup] *******************************************************************************************************************************************************
changed: [localhost]
PLAY RECAP ****************************************************************************************************************************************************************************
localhost : ok=10 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
#保存路径
[root@k8s-deploy backup]# pwd
/etc/kubeasz/clusters/k8s-cluster-kubeasz/backup
[root@k8s-deploy backup]# ll
total 3272
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot_202311232144.db
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot.db
#2.测试删除一个POD
[root@k8s-deploy backup]# kubectl delete pod net-tesing-2
pod "net-tesing-2" deleted
[root@k8s-deploy backup]# ll
total 3272
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot_202311232144.db
-rw-------. 1 root root 1671200 Nov 23 21:44 snapshot.db
[root@k8s-deploy kubeasz]# ./ezctl restore k8s-cluster-kubeasz
ansible-playbook -i clusters/k8s-cluster-kubeasz/hosts -e @clusters/k8s-cluster-kubeasz/config.yml playbooks/95.restore.yml
2023-11-23 21:49:17 INFO cluster:k8s-cluster-kubeasz restore begins in 5s, press any key to abort:
#检验是否恢复 net-tesing-2
[root@k8s-deploy kubeasz]# kubectl get pods
NAME READY STATUS RESTARTS AGE
net-tesing-2 1/1 Running 0 8d
net-testing 1/1 Running 1 (27h ago) 8d
ETCD数据恢复流程
当etCd集群宕机数量超过集群总节点教一半以上的时候(如总数为三台宕机两台)。就会导致整合集群宕机。后期需要重新恢复数据、则恢复流程如下
1.恢复服务器系统
2.重新部署ETCD集群
3.停上kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
4.停止ETCD集群
5.各ETCD节点恢复同一份备份数据
6.启动各节点并验证ETCD集群
7.启动kube-apiserver/controller-manager/scheduler/kubelet/kube-proxy
8.验证k8s master状态及pod数据