故障

1、登录kubesphere报错:request to http://ks-apiserver/oauth/token failed, reason: getaddrinfo EAI_AGAIN ks-apiserver

kubectl -n kube-system edit cm coredns -o yaml         #编辑coredns

# forward . /etc/resolv.conf {
# max_concurrent 1000
# }

kubectl delete pod -n kube-system  $ (kubectl get pod -n kube-system | grep coredns | awk '{print  $ 1}')   #删除coredns的pod重启pod

vim  /etc/resolve.conf

nameserver 8.8.8.8
nameserver 1.1.1.1

2、main.go:39: failed to initialize: dial tcp: lookup nightingale-database on 169.254.25.10:53: no such host

也是修改coredns

kubectl -n kube-system edit cm coredns -o yaml

apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
loop
cache 30
reload
loadbalance
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health {\n lameduck 5s\n }\n ready\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n ttl 30\n }\n prometheus :9153\n forward . /etc/resolv.conf {\n prefer_udp\n max_concurrent 1000\n }\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists"},"name":"coredns","namespace":"kube-system"}}
creationTimestamp: "2025-04-23T10:12:47Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
name: coredns
namespace: kube-system
resourceVersion: "313641"
uid: d905e013-1661-49b2-b2d9-1abfecec6294

大部分解析问题都和coredns相关

 

3、访问所有的本地nodeport都失效

查看svc

检查kubeproxy

Failed to get properties: Connection timed out

检查服务器资源  这一步很重要,如果不能调度就会一直异常

 

4、coredns探针失败

coredns报错Readiness probe failed: Get "http://10.233.106.12:8181/ready": dial tcp 10.233.106.12:8181: connect: connection refused

就绪探针配置检查

kubectl get deployment -n kube-system coredns -o yaml
readinessProbe:
  httpGet:
    path: /ready
    port: 8181
  initialDelaySeconds: 3
  periodSeconds: 5

coredns配置:

Corefile: |
  .:53 {
      ready 8181  # 确保存在此行
      # 其他插件(如 errors, health, kubernetes 等)
  }

查看 Events 部分是否有 OOMKilled 或资源争用提示 ,如果有需要增加资源

5、etcd突然的故障

[root@master01 ~]# ETCDCTL_API=3 etcdctl endpoint status

{"level":"warn","ts":"2025-04-29T22:28:39.75038+0800","logger":"etcd-client","caller":"v3@v3.5.13/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000474000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"error reading server preface: EOF\""}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)

 

posted @ 2025-04-29 17:23  技术根据地  阅读(108)  评论(0)    收藏  举报