Kubernetes ETCD事故处理

故障描述

由于服务器断电 异常蓝屏 关机 
k8s服务起不来,查询下来etcd无法启动 导致kube-apiserver 无法连接到etcd 
排查etcd 服务器发现以下报错

报错日志

Jan 05 09:07:57 k8s-etcd02 systemd[1]: Starting Etcd Server...
-- Subject: Unit etcd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit etcd.service has begun starting up.
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/opt/kube/bin/etcd","--name=etcd-192.168.40.107","--cert-file=/etc/kube
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/etcd","dir-type":"member"}
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["https://192.168.40.107:2380"]}
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /etc/kubernetes/ssl/etcd.pem, key = /etc/kubernete
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:2379","https://192.168.
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"warn","ts":"2024-01-05T09:07:57.267+0800","caller":"embed/etcd.go:607","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","client-url
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: {"level":"info","ts":"2024-01-05T09:07:57.267+0800","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"08407ff76","go-version":"go1.16.15
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: panic: freepages: failed to get all reachable pages (page 2487: multiple references)
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: goroutine 124 [running]:
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: go.etcd.io/bbolt.(*DB).freepages.func2(0xc00007a600)
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1056 +0xe9
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: created by go.etcd.io/bbolt.(*DB).freepages
Jan 05 09:07:57 k8s-etcd02 etcd[1999]: /go/pkg/mod/go.etcd.io/bbolt@v1.3.6/db.go:1054 +0x1cd
Jan 05 09:07:57 k8s-etcd02 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 05 09:07:57 k8s-etcd02 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit etcd.service has failed.
--
-- The result is failed.
Jan 05 09:07:57 k8s-etcd02 systemd[1]: Unit etcd.service entered failed state.
Jan 05 09:07:57 k8s-etcd02 systemd[1]: etcd.service failed.

处理方法

由于断电导致 etcd数据文件损坏起不来

需要经常做一个备份操作,然后进行备份恢复etcd

 

posted @ 2024-01-05 09:12  しみずよしだ  阅读(116)  评论(0)    收藏  举报