8.K8s集群常见报错2

26.configMap未进行命名

(1).报错信息

[root@master231 replicationcontrollers]# kubectl apply -f 08-rc-configmaps-env.yaml
The ReplicationController "oldboyedu-rc-cm-env" is invalid: 
* spec.template.spec.containers[0].env[0].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
* spec.template.spec.containers[0].env[1].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

(2).问题原因

    表示环境变量引用时，没有指定configMap的名称。

(3).解决方案

    没有指定cm的名称。

27.configMap中未定义key/value

(1).报错信息

[root@master231 replicationcontrollers]# kubectl describe pod oldboyedu-rc-cm-env-ml8pk 
Name:         oldboyedu-rc-cm-env-ml8pk
Namespace:    default
...
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  31s               default-scheduler  Successfully assigned default/oldboyedu-rc-cm-env-ml8pk to worker233
  Normal   Pulled     3s (x4 over 30s)  kubelet            Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine
  Warning  Failed     3s (x4 over 30s)  kubelet            Error: couldn't find key SchooL in ConfigMap default/oldboyedu-linux94

(2).问题原因

    表示环境变量引用时，在对应cm中未找到对应KEY。

(3).解决方案

    查看cm的KEY和pod引用的KEY是否相同。

28.拉取镜像出错

(1).报错信息

[root@master231 case-demo]# kubectl get pods -o wide
NAME                        READY   STATUS         RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
oldboyedu-rc-harbor-2kt2t   0/1     ErrImagePull   0          2s    10.100.2.116   worker233   <none>           <none>
oldboyedu-rc-harbor-crhgh   0/1     ErrImagePull   0          2s    10.100.1.61    worker232   <none>           <none>
oldboyedu-rc-harbor-q57ff   0/1     ErrImagePull   0          2s    10.100.2.117   worker233   <none>           <none>
[root@master231 case-demo]# kubectl describe pod oldboyedu-rc-harbor-q57ff 
Name:         oldboyedu-rc-harbor-q57ff
Namespace:    default
...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  14s                default-scheduler  Successfully assigned default/oldboyedu-rc-harbor-q57ff to worker233
  Normal   Pulling    13s                kubelet            Pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
  Warning  Failed     13s                kubelet            Failed to pull image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest": rpc error: code = Unknown desc = Error response from daemon: unauthorized: unauthorized to access repository: oldboyedu-linux/alpine, action: pull: unauthorized to access repository: oldboyedu-linux/alpine, action: pull
  Warning  Failed     13s                kubelet            Error: ErrImagePull
  Normal   BackOff    12s (x2 over 13s)  kubelet            Back-off pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
  Warning  Failed     12s (x2 over 13s)  kubelet            Error: ImagePullBackOff

(2).问题原因

    拉取镜像出错，可能是未认证登录。

(3).解决方案

    - 1.使用secrets创建认证信息，从而可以进行认证。
    - 2.镜像不存在。

29.原ns未完全删除不能创建同名ns

(1).报错信息

[root@master231 case-demo]# kubectl apply -f 18-ns-rc-svc-jenkins.yaml
Warning: Detected changes to resource devops which is currently being deleted.
namespace/devops unchanged
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": replicationcontrollers "oldboyedu-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": services "svc-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
[root@master231 case-demo]# kubectl get ns
NAME              STATUS        AGE
default           Active        5d
devops            Terminating   99s
kube-flannel      Active        4d8h
kube-node-lease   Active        5d
kube-public       Active        5d
kube-system       Active        5d

(2).问题原因

    由于资源还未删除，处于Terminating，就开始重新创建该资源导致的错误。

(3).解决方案

    等待其删除完成即可，若长时间一直处于该状态，则考虑去etcd中删除对应的数据。

30.CNI基础组件不正常

(1).报错信息

[root@master231:huidu]#  kubectl apply -f metallb-ip-pool.yaml
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.200.75.222:443: i/o timeout
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

(2).问题原因

    服务内部错误，根据信息提示应该是连接超时。从报错信息来看，应该是coreDNS组件应该是正常工作的。
    能够将"metallb-webhook-service.metallb-system.svc"解析为"10.200.75.222"，可以初步排除是coreDNS组件的问题。
    检查基础组件CNI是否正常。

(3).解决方案

    巡检CNI组件是否正常，比如: "https://www.cnblogs.com/yinzhengjie/p/18353027#八k8s主机巡检流程"

31.多端口映射需要指定名称

(1).报错信息

[root@master231 endpoints]# kubectl apply -f 01-ep-harbor.yaml 
endpoints/oldboyedu-harbor created
The Service "oldboyedu-harbor" is invalid: 
* spec.ports[0].name: Required value
* spec.ports[1].name: Required value

(2).问题原因

    svc在配置多端口映射时，应该定义名称以区分不同端口的作用。名称自定义能够唯一标识即可。

(3).解决方案

    定义svc端口映射时添加名称即可。

32.未识别主机信息

(1).报错信息

[root@harbor250 ~]# docker pull harbor.oldboyedu.com/oldboyedu-db/mysql:8.0.36-oracle
Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": dial tcp: lookup harbor.oldboyedu.com on 127.0.0.53:53: no such host

(2).问题原因

    对于域名无法实现解析，可以添加hosts解析即可。

(3).解决方案

    添加hosts文件解析。

33.ds资源DaemonSet不支持svc暴露Services

(1).报错信息

[root@master231 kubernetes]# kubectl expose ds ds-xiuxian --port=80 --target-port=80 --type=ClusterIP
error: cannot expose a DaemonSet.apps

(2).问题原因

    ds资源不支持svc暴露。

(3).解决方案

    换个资源暴露

34.ClusterIP字段不支持热更新

(1).报错信息

[root@master231 services]# kubectl apply -f 06-sessionAffinity.yaml 
The Service "oldboyedu-xiuxian" is invalid: spec.clusterIPs[0]: Invalid value: []string{"10.200.0.88"}: may not change once set

(2).问题原因

    svc资源的ClusterIP字段设置后不可修改。

(3).解决方案

    如果真的是想要修改svc的地址，则需要删除原有的svc重新创建即可。

35.重启策略的值不符合

(1).报错信息

[root@master231 replicationcontrollers]# kubectl apply -f 12-rc-xiuxian-restartPolicy.yaml
The ReplicationController "oldboyedu-rc-restartpolicy" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"

(2).问题原因

    根据报错"Unsupported value"说明重启策略不支持Never，这是rc资源报错信息。

(3).解决方案

    尽管官方文档说支持3种重启策略，但是rc,rs,deploy等控制器貌似仅支持ALways。对于OnFailure和Never并不支持。
[root@master231 replicationcontrollers]# kubectl explain rc.spec.template.spec.restartPolicy
KIND:     ReplicationController
VERSION:  v1

FIELD:    restartPolicy <string>

DESCRIPTION:
     Restart policy for all containers within the pod. One of Always, OnFailure,
     Never. Default to Always. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

36.没有空闲的端口无法完成调度

(1).报错信息

[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
deploy-scheduler-hostnetwork-78f5cfb654-6wwdj   1/1     Running   0          30s   10.0.0.233   worker233   <none>           <none>
deploy-scheduler-hostnetwork-78f5cfb654-j7qnr   1/1     Running   0          30s   10.0.0.232   worker232   <none>           <none>
deploy-scheduler-hostnetwork-78f5cfb654-l2624   0/1     Pending   0          30s   <none>       <none>      <none>           <none>

[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-hostnetwork-78f5cfb654-l2624 
Name:           deploy-scheduler-hostnetwork-78f5cfb654-l2624
...
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  39s                default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.
  Warning  FailedScheduling  19s (x1 over 37s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.

(2).问题原因

    目前集群环境没有空闲的端口可以占用。

(3).解决方案

    - 减少副本数
    - 增加节点

37.CPU资源不足

(1).报错信息

[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
deploy-scheduler-resources-68586785c4-dp5vn   0/1     Pending   0          4s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-68586785c4-rkcdp   0/1     Pending   0          4s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-68586785c4-zbkq9   0/1     Pending   0          4s    <none>   <none>   <none>           <none>

[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-68586785c4-dp5vn 
Name:           deploy-scheduler-resources-68586785c4-dp5vn
Namespace:      default
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  12s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.

(2).问题原因

    当前集群节点CPU资源不充足。

(3).解决方案

    - 降低用户的期望资源。
    - 提高集群的CPU配置

38.内存资源不足

(1).报错信息

[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
deploy-scheduler-resources-79d77c6758-4xzlf   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-79d77c6758-9pghn   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-79d77c6758-r6sdg   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-79d77c6758-4xzlf 
Name:           deploy-scheduler-resources-79d77c6758-4xzlf
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  10s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient memory.

(2).问题原因

    当前集群节点内存资源不充足。

(3).解决方案

    - 降低用户的期望资源。
    - 提高集群的内存资源。

39.k8s默认不支持gpu的资源限制

(1).报错信息

[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid: 
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource type or fully qualified
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource for containers

(2).问题原因

    K8S集群默认不支持gpu的配置。

(3).解决方案

    - 需要单独安装第三方插件。

40.k8s默认不支持gpu的资源限制

(1).报错信息

[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "2Gi": must be less than or equal to memory limit

(2).问题原因

    requests的期望资源不得超过limits资源的限制。

(3).解决方案

    修改requests小于或等于limits资源即可。

41:已经存在污点，修改时必须覆盖

(1).报错信息

[root@master231 ~]# kubectl taint node worker233 school=laonanhai:NoSchedule
error: node worker233 already has school taint(s) with same effect(s) and --overwrite is false

(2).问题原因

    添加污点时，如果key和effect相同，则会冲突，因此需要使用"--overwrite"才能进行覆盖。

(3).解决方案

    使用"--overwrite"进行污点的覆盖。

42.ds控制器DaemonSet的Pod无法驱逐

(1).报错信息

[root@master231 scheduler-pods]# kubectl drain worker232 
node/worker232 cordoned
error: unable to drain node "worker232" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22, continuing command...
There are pending nodes to be drained:
 worker232
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22

(2).问题原因

    ds资源创建的pod无法进行驱逐，因此在驱逐时应该忽略ds资源，创建你的pods即可，使用--ignore-daemonset选项。

(3).解决方案

    使用“--ignore-daemonsets”进行驱逐。

43：Pod反亲和性不匹配规则

(1).报错信息

[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                                READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
deploy-scheduler-podantiaffinity-77b58fc685-qz2fz   1/1     Running   0          29s   10.100.2.13   worker233   <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-r6brl   0/1     Pending   0          8s    <none>        <none>      <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-smlbj   1/1     Running   0          29s   10.100.1.21   worker232   <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-trbhb   0/1     Pending   0          8s    <none>        <none>      <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-w7sd4   1/1     Running   0          29s   10.100.0.24   master231   <none>           <none>
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Name:           deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Namespace:      default
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  17s   default-scheduler  0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules.

(2).问题原因

    当前集群环境没有任意节点符合Pod反亲和性规则。

(3).解决方案

    检查现有环境是否有符合规则，或者说修改Pod调度策略。

44: 未安装Jenkins的依赖fontconfig包

(1).报错信息

 java.lang.NullPointerException: Cannot load from short array because "sun.awt.FontConfiguration.head" is null

(2).问题原因

    说明没有安装fontconfig软件包，主要是Jenkins的一些字体工具包。

(3).解决方案

    apt-get install fontconfig

45.gitee认证失败

(1).报错信息

无法连接仓库：Command "git ls-remote -h -- https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git HEAD" returned status code 128:
stdout:
stderr: remote: [31m[session-3a49c78b] Username for 'https: Incorrect username or password (access token)[0m
fatal: Authentication failed for 'https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git/'

(2).问题原因

    Jenkins无法从gitee拉取代码。

(3).解决方案

    - 1.配置认证信息，或者免密登录
    - 2.将项目设置为公开。
    - 3.检查密码是否正确，忘记密码修改密码即可;

46.Jenkins沒有docker运行环境

(1).报错信息

[oldboyedu-linux94-yiliao] $ /bin/sh -xe /tmp/jenkins16273452721317027695.sh
+ docker build -t harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v1 .
/tmp/jenkins16273452721317027695.sh: 2: docker: not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE

(2).问题原因

    Jenkins没有docker命令。

(3).解决方案

    Jenkins节点安装docker环境即可。

47.Jenkins沒有kubectl运行环境

(1).报错信息

+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
/tmp/jenkins9730372368520249844.sh: 5: kubectl: not found
Build step 'Execute shell' marked build as failure

(2).问题原因

    kubectl是管理K8S集群的客户端命令行工具。Jenkins节点未安装。

(3).解决方案

    Jenkins节点安装kubectl工具即可。

48.Jenkins没有K8S认证文件

(1).报错信息

+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
error: the server doesn't have a resource type "deploy"
Build step 'Execute shell' marked build as failure

(2).问题原因

    Jenkins节点缺少K8S集群的认证文件

(3).解决方案

    将K8S集群的认证文件拷贝到Jenkins节点即可。

49.资源类型未知

(1).报错信息

[root@master231 deployments]# kubectl apply -f 11-deploy-readinessProbe-tcpSocket.yaml
deployment.apps/deploy-livenessprobe-readinessprobe-tcpsocket created
service/svc-xiuxain created
error: unable to recognize "11-deploy-readinessProbe-tcpSocket.yaml": no matches for kind "configMap" in version "v1"

(2).问题原因

    K8S集群的资源类型写错了。

(3).解决方案

    检查资源的kind字段，是否符合K8S集群的类型，可以通过kubectl api-resource查看K8S资源的类型。
    或者使用“kubectl explain cm”也可以检查cm资源所属的类型。

50.Metrics server组件不工作

(1).报错信息

[root@master231 deployments]# kubectl top node 
error: Metrics API not available

(2).问题原因

    根据报错提示，Metrics server组件不正常工作导致的。

(3).解决方案

    - 1.检查是否安装Metrics server组件;
    - 2.检查K8S集群网络插件，可能导致metrics server不正常工作;

———————————————————————————————————————————————————————————————————————————

无敌小马爱学习

posted on 2025-03-31 09:49 马俊南阅读(161) 评论(0) 收藏举报