8.K8s集群常见报错2
26.configMap未进行命名
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 08-rc-configmaps-env.yaml The ReplicationController "oldboyedu-rc-cm-env" is invalid: * spec.template.spec.containers[0].env[0].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') * spec.template.spec.containers[0].env[1].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
(2).问题原因
表示环境变量引用时,没有指定configMap的名称。
(3).解决方案
没有指定cm的名称。
27.configMap中未定义key/value
(1).报错信息
[root@master231 replicationcontrollers]# kubectl describe pod oldboyedu-rc-cm-env-ml8pk Name: oldboyedu-rc-cm-env-ml8pk Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 31s default-scheduler Successfully assigned default/oldboyedu-rc-cm-env-ml8pk to worker233 Normal Pulled 3s (x4 over 30s) kubelet Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine Warning Failed 3s (x4 over 30s) kubelet Error: couldn't find key SchooL in ConfigMap default/oldboyedu-linux94
(2).问题原因
表示环境变量引用时,在对应cm中未找到对应KEY。
(3).解决方案
查看cm的KEY和pod引用的KEY是否相同。
28.拉取镜像出错
(1).报错信息
[root@master231 case-demo]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-rc-harbor-2kt2t 0/1 ErrImagePull 0 2s 10.100.2.116 worker233 <none> <none> oldboyedu-rc-harbor-crhgh 0/1 ErrImagePull 0 2s 10.100.1.61 worker232 <none> <none> oldboyedu-rc-harbor-q57ff 0/1 ErrImagePull 0 2s 10.100.2.117 worker233 <none> <none> [root@master231 case-demo]# kubectl describe pod oldboyedu-rc-harbor-q57ff Name: oldboyedu-rc-harbor-q57ff Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 14s default-scheduler Successfully assigned default/oldboyedu-rc-harbor-q57ff to worker233 Normal Pulling 13s kubelet Pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest" Warning Failed 13s kubelet Failed to pull image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest": rpc error: code = Unknown desc = Error response from daemon: unauthorized: unauthorized to access repository: oldboyedu-linux/alpine, action: pull: unauthorized to access repository: oldboyedu-linux/alpine, action: pull Warning Failed 13s kubelet Error: ErrImagePull Normal BackOff 12s (x2 over 13s) kubelet Back-off pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest" Warning Failed 12s (x2 over 13s) kubelet Error: ImagePullBackOff
(2).问题原因
拉取镜像出错,可能是未认证登录。
(3).解决方案
- 1.使用secrets创建认证信息,从而可以进行认证。 - 2.镜像不存在。
29.原ns未完全删除不能创建同名ns
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 18-ns-rc-svc-jenkins.yaml Warning: Detected changes to resource devops which is currently being deleted. namespace/devops unchanged Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": replicationcontrollers "oldboyedu-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": services "svc-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated [root@master231 case-demo]# kubectl get ns NAME STATUS AGE default Active 5d devops Terminating 99s kube-flannel Active 4d8h kube-node-lease Active 5d kube-public Active 5d kube-system Active 5d
(2).问题原因
由于资源还未删除,处于Terminating,就开始重新创建该资源导致的错误。
(3).解决方案
等待其删除完成即可,若长时间一直处于该状态,则考虑去etcd中删除对应的数据。
30.CNI基础组件不正常
(1).报错信息
[root@master231:huidu]# kubectl apply -f metallb-ip-pool.yaml Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.200.75.222:443: i/o timeout Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded
(2).问题原因
服务内部错误,根据信息提示应该是连接超时。从报错信息来看,应该是coreDNS组件应该是正常工作的。 能够将"metallb-webhook-service.metallb-system.svc"解析为"10.200.75.222",可以初步排除是coreDNS组件的问题。 检查基础组件CNI是否正常。
(3).解决方案
巡检CNI组件是否正常,比如: "https://www.cnblogs.com/yinzhengjie/p/18353027#八k8s主机巡检流程"
31.多端口映射需要指定名称
(1).报错信息
[root@master231 endpoints]# kubectl apply -f 01-ep-harbor.yaml endpoints/oldboyedu-harbor created The Service "oldboyedu-harbor" is invalid: * spec.ports[0].name: Required value * spec.ports[1].name: Required value
(2).问题原因
svc在配置多端口映射时,应该定义名称以区分不同端口的作用。名称自定义能够唯一标识即可。
(3).解决方案
定义svc端口映射时添加名称即可。
32.未识别主机信息
(1).报错信息
[root@harbor250 ~]# docker pull harbor.oldboyedu.com/oldboyedu-db/mysql:8.0.36-oracle Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": dial tcp: lookup harbor.oldboyedu.com on 127.0.0.53:53: no such host
(2).问题原因
对于域名无法实现解析,可以添加hosts解析即可。
(3).解决方案
添加hosts文件解析。
33.ds资源DaemonSet不支持svc暴露Services
(1).报错信息
[root@master231 kubernetes]# kubectl expose ds ds-xiuxian --port=80 --target-port=80 --type=ClusterIP error: cannot expose a DaemonSet.apps
(2).问题原因
ds资源不支持svc暴露。
(3).解决方案
换个资源暴露
34.ClusterIP字段不支持热更新
(1).报错信息
[root@master231 services]# kubectl apply -f 06-sessionAffinity.yaml The Service "oldboyedu-xiuxian" is invalid: spec.clusterIPs[0]: Invalid value: []string{"10.200.0.88"}: may not change once set
(2).问题原因
svc资源的ClusterIP字段设置后不可修改。
(3).解决方案
如果真的是想要修改svc的地址,则需要删除原有的svc重新创建即可。
35.重启策略的值不符合
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 12-rc-xiuxian-restartPolicy.yaml The ReplicationController "oldboyedu-rc-restartpolicy" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"
(2).问题原因
根据报错"Unsupported value"说明重启策略不支持Never,这是rc资源报错信息。
(3).解决方案
尽管官方文档说支持3种重启策略,但是rc,rs,deploy等控制器貌似仅支持ALways。对于OnFailure和Never并不支持。 [root@master231 replicationcontrollers]# kubectl explain rc.spec.template.spec.restartPolicy KIND: ReplicationController VERSION: v1 FIELD: restartPolicy <string> DESCRIPTION: Restart policy for all containers within the pod. One of Always, OnFailure, Never. Default to Always. More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
36.没有空闲的端口无法完成调度
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deploy-scheduler-hostnetwork-78f5cfb654-6wwdj 1/1 Running 0 30s 10.0.0.233 worker233 <none> <none> deploy-scheduler-hostnetwork-78f5cfb654-j7qnr 1/1 Running 0 30s 10.0.0.232 worker232 <none> <none> deploy-scheduler-hostnetwork-78f5cfb654-l2624 0/1 Pending 0 30s <none> <none> <none> <none>
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-hostnetwork-78f5cfb654-l2624 Name: deploy-scheduler-hostnetwork-78f5cfb654-l2624 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 39s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports. Warning FailedScheduling 19s (x1 over 37s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.
(2).问题原因
目前集群环境没有空闲的端口可以占用。
(3).解决方案
- 减少副本数
- 增加节点
37.CPU资源不足
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deploy-scheduler-resources-68586785c4-dp5vn 0/1 Pending 0 4s <none> <none> <none> <none> deploy-scheduler-resources-68586785c4-rkcdp 0/1 Pending 0 4s <none> <none> <none> <none> deploy-scheduler-resources-68586785c4-zbkq9 0/1 Pending 0 4s <none> <none> <none> <none> [root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-68586785c4-dp5vn Name: deploy-scheduler-resources-68586785c4-dp5vn Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 12s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.
(2).问题原因
当前集群节点CPU资源不充足。
(3).解决方案
- 降低用户的期望资源。
- 提高集群的CPU配置
38.内存资源不足
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deploy-scheduler-resources-79d77c6758-4xzlf 0/1 Pending 0 3s <none> <none> <none> <none> deploy-scheduler-resources-79d77c6758-9pghn 0/1 Pending 0 3s <none> <none> <none> <none> deploy-scheduler-resources-79d77c6758-r6sdg 0/1 Pending 0 3s <none> <none> <none> <none> [root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-79d77c6758-4xzlf Name: deploy-scheduler-resources-79d77c6758-4xzlf ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 10s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient memory.
(2).问题原因
当前集群节点内存资源不充足。
(3).解决方案
- 降低用户的期望资源。
- 提高集群的内存资源。
39.k8s默认不支持gpu的资源限制
(1).报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml The Deployment "deploy-scheduler-resources" is invalid: * spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource type or fully qualified * spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource for containers
(2).问题原因
K8S集群默认不支持gpu的配置。
(3).解决方案
- 需要单独安装第三方插件。
40.k8s默认不支持gpu的资源限制
(1).报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml The Deployment "deploy-scheduler-resources" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "2Gi": must be less than or equal to memory limit
(2).问题原因
requests的期望资源不得超过limits资源的限制。
(3).解决方案
修改requests小于或等于limits资源即可。
41:已经存在污点,修改时必须覆盖
(1).报错信息
[root@master231 ~]# kubectl taint node worker233 school=laonanhai:NoSchedule error: node worker233 already has school taint(s) with same effect(s) and --overwrite is false
(2).问题原因
添加污点时,如果key和effect相同,则会冲突,因此需要使用"--overwrite"才能进行覆盖。
(3).解决方案
使用"--overwrite"进行污点的覆盖。
42.ds控制器DaemonSet的Pod无法驱逐
(1).报错信息
[root@master231 scheduler-pods]# kubectl drain worker232 node/worker232 cordoned error: unable to drain node "worker232" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22, continuing command... There are pending nodes to be drained: worker232 cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22
(2).问题原因
ds资源创建的pod无法进行驱逐,因此在驱逐时应该忽略ds资源,创建你的pods即可,使用--ignore-daemonset选项。
(3).解决方案
使用“--ignore-daemonsets”进行驱逐。
43:Pod反亲和性不匹配规则
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deploy-scheduler-podantiaffinity-77b58fc685-qz2fz 1/1 Running 0 29s 10.100.2.13 worker233 <none> <none> deploy-scheduler-podantiaffinity-77b58fc685-r6brl 0/1 Pending 0 8s <none> <none> <none> <none> deploy-scheduler-podantiaffinity-77b58fc685-smlbj 1/1 Running 0 29s 10.100.1.21 worker232 <none> <none> deploy-scheduler-podantiaffinity-77b58fc685-trbhb 0/1 Pending 0 8s <none> <none> <none> <none> deploy-scheduler-podantiaffinity-77b58fc685-w7sd4 1/1 Running 0 29s 10.100.0.24 master231 <none> <none> [root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-podantiaffinity-77b58fc685-r6brl Name: deploy-scheduler-podantiaffinity-77b58fc685-r6brl Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 17s default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules.
(2).问题原因
当前集群环境没有任意节点符合Pod反亲和性规则。
(3).解决方案
检查现有环境是否有符合规则,或者说修改Pod调度策略。
44: 未安装Jenkins的依赖fontconfig包
(1).报错信息
java.lang.NullPointerException: Cannot load from short array because "sun.awt.FontConfiguration.head" is null
(2).问题原因
说明没有安装fontconfig软件包,主要是Jenkins的一些字体工具包。
(3).解决方案
apt-get install fontconfig
45.gitee认证失败
(1).报错信息
无法连接仓库:Command "git ls-remote -h -- https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git HEAD" returned status code 128: stdout: stderr: remote: [31m[session-3a49c78b] Username for 'https: Incorrect username or password (access token)[0m fatal: Authentication failed for 'https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git/'
(2).问题原因
Jenkins无法从gitee拉取代码。
(3).解决方案
- 1.配置认证信息,或者免密登录 - 2.将项目设置为公开。 - 3.检查密码是否正确,忘记密码修改密码即可;
46.Jenkins沒有docker运行环境
(1).报错信息
[oldboyedu-linux94-yiliao] $ /bin/sh -xe /tmp/jenkins16273452721317027695.sh + docker build -t harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v1 . /tmp/jenkins16273452721317027695.sh: 2: docker: not found Build step 'Execute shell' marked build as failure Finished: FAILURE
(2).问题原因
Jenkins没有docker命令。
(3).解决方案
Jenkins节点安装docker环境即可。
47.Jenkins沒有kubectl运行环境
(1).报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2 /tmp/jenkins9730372368520249844.sh: 5: kubectl: not found Build step 'Execute shell' marked build as failure
(2).问题原因
kubectl是管理K8S集群的客户端命令行工具。Jenkins节点未安装。
(3).解决方案
Jenkins节点安装kubectl工具即可。
48.Jenkins没有K8S认证文件
(1).报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2 error: the server doesn't have a resource type "deploy" Build step 'Execute shell' marked build as failure
(2).问题原因
Jenkins节点缺少K8S集群的认证文件
(3).解决方案
将K8S集群的认证文件拷贝到Jenkins节点即可。
49.资源类型未知
(1).报错信息
[root@master231 deployments]# kubectl apply -f 11-deploy-readinessProbe-tcpSocket.yaml deployment.apps/deploy-livenessprobe-readinessprobe-tcpsocket created service/svc-xiuxain created error: unable to recognize "11-deploy-readinessProbe-tcpSocket.yaml": no matches for kind "configMap" in version "v1"
(2).问题原因
K8S集群的资源类型写错了。
(3).解决方案
检查资源的kind字段,是否符合K8S集群的类型,可以通过kubectl api-resource查看K8S资源的类型。
或者使用“kubectl explain cm”也可以检查cm资源所属的类型。
50.Metrics server组件不工作
(1).报错信息
[root@master231 deployments]# kubectl top node
error: Metrics API not available
(2).问题原因
根据报错提示,Metrics server组件不正常工作导致的。
(3).解决方案
- 1.检查是否安装Metrics server组件; - 2.检查K8S集群网络插件,可能导致metrics server不正常工作;
———————————————————————————————————————————————————————————————————————————
无敌小马爱学习
浙公网安备 33010602011771号