8.K8s集群常见报错2

26.configMap未进行命名
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 08-rc-configmaps-env.yaml
The ReplicationController "oldboyedu-rc-cm-env" is invalid: 
* spec.template.spec.containers[0].env[0].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
* spec.template.spec.containers[0].env[1].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
(2).问题原因
    表示环境变量引用时,没有指定configMap的名称。
(3).解决方案
    没有指定cm的名称。
27.configMap中未定义key/value
(1).报错信息
[root@master231 replicationcontrollers]# kubectl describe pod oldboyedu-rc-cm-env-ml8pk 
Name:         oldboyedu-rc-cm-env-ml8pk
Namespace:    default
...
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  31s               default-scheduler  Successfully assigned default/oldboyedu-rc-cm-env-ml8pk to worker233
  Normal   Pulled     3s (x4 over 30s)  kubelet            Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine
  Warning  Failed     3s (x4 over 30s)  kubelet            Error: couldn't find key SchooL in ConfigMap default/oldboyedu-linux94
(2).问题原因
    表示环境变量引用时,在对应cm中未找到对应KEY。
(3).解决方案
    查看cm的KEY和pod引用的KEY是否相同。
28.拉取镜像出错
(1).报错信息
[root@master231 case-demo]# kubectl get pods -o wide
NAME                        READY   STATUS         RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
oldboyedu-rc-harbor-2kt2t   0/1     ErrImagePull   0          2s    10.100.2.116   worker233   <none>           <none>
oldboyedu-rc-harbor-crhgh   0/1     ErrImagePull   0          2s    10.100.1.61    worker232   <none>           <none>
oldboyedu-rc-harbor-q57ff   0/1     ErrImagePull   0          2s    10.100.2.117   worker233   <none>           <none>
[root@master231 case-demo]# kubectl describe pod oldboyedu-rc-harbor-q57ff 
Name:         oldboyedu-rc-harbor-q57ff
Namespace:    default
...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  14s                default-scheduler  Successfully assigned default/oldboyedu-rc-harbor-q57ff to worker233
  Normal   Pulling    13s                kubelet            Pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
  Warning  Failed     13s                kubelet            Failed to pull image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest": rpc error: code = Unknown desc = Error response from daemon: unauthorized: unauthorized to access repository: oldboyedu-linux/alpine, action: pull: unauthorized to access repository: oldboyedu-linux/alpine, action: pull
  Warning  Failed     13s                kubelet            Error: ErrImagePull
  Normal   BackOff    12s (x2 over 13s)  kubelet            Back-off pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
  Warning  Failed     12s (x2 over 13s)  kubelet            Error: ImagePullBackOff
(2).问题原因
    拉取镜像出错,可能是未认证登录。
(3).解决方案
    - 1.使用secrets创建认证信息,从而可以进行认证。
    - 2.镜像不存在。
29.原ns未完全删除不能创建同名ns
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 18-ns-rc-svc-jenkins.yaml
Warning: Detected changes to resource devops which is currently being deleted.
namespace/devops unchanged
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": replicationcontrollers "oldboyedu-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": services "svc-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
[root@master231 case-demo]# kubectl get ns
NAME              STATUS        AGE
default           Active        5d
devops            Terminating   99s
kube-flannel      Active        4d8h
kube-node-lease   Active        5d
kube-public       Active        5d
kube-system       Active        5d
(2).问题原因
    由于资源还未删除,处于Terminating,就开始重新创建该资源导致的错误。
(3).解决方案
    等待其删除完成即可,若长时间一直处于该状态,则考虑去etcd中删除对应的数据。
30.CNI基础组件不正常
(1).报错信息
[root@master231:huidu]#  kubectl apply -f metallb-ip-pool.yaml
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.200.75.222:443: i/o timeout
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded
(2).问题原因
    服务内部错误,根据信息提示应该是连接超时。从报错信息来看,应该是coreDNS组件应该是正常工作的。
    能够将"metallb-webhook-service.metallb-system.svc"解析为"10.200.75.222",可以初步排除是coreDNS组件的问题。
    检查基础组件CNI是否正常。
(3).解决方案
    巡检CNI组件是否正常,比如: "https://www.cnblogs.com/yinzhengjie/p/18353027#八k8s主机巡检流程"
31.多端口映射需要指定名称 
(1).报错信息
[root@master231 endpoints]# kubectl apply -f 01-ep-harbor.yaml 
endpoints/oldboyedu-harbor created
The Service "oldboyedu-harbor" is invalid: 
* spec.ports[0].name: Required value
* spec.ports[1].name: Required value
(2).问题原因
    svc在配置多端口映射时,应该定义名称以区分不同端口的作用。名称自定义能够唯一标识即可。
(3).解决方案
    定义svc端口映射时添加名称即可。
32.未识别主机信息
(1).报错信息
[root@harbor250 ~]# docker pull harbor.oldboyedu.com/oldboyedu-db/mysql:8.0.36-oracle
Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": dial tcp: lookup harbor.oldboyedu.com on 127.0.0.53:53: no such host
(2).问题原因
    对于域名无法实现解析,可以添加hosts解析即可。
(3).解决方案
    添加hosts文件解析。
33.ds资源DaemonSet不支持svc暴露Services
(1).报错信息
[root@master231 kubernetes]# kubectl expose ds ds-xiuxian --port=80 --target-port=80 --type=ClusterIP
error: cannot expose a DaemonSet.apps
(2).问题原因
    ds资源不支持svc暴露。
(3).解决方案
    换个资源暴露
34.ClusterIP字段不支持热更新
(1).报错信息
[root@master231 services]# kubectl apply -f 06-sessionAffinity.yaml 
The Service "oldboyedu-xiuxian" is invalid: spec.clusterIPs[0]: Invalid value: []string{"10.200.0.88"}: may not change once set
(2).问题原因
    svc资源的ClusterIP字段设置后不可修改。
(3).解决方案
    如果真的是想要修改svc的地址,则需要删除原有的svc重新创建即可。
35.重启策略的值不符合
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 12-rc-xiuxian-restartPolicy.yaml
The ReplicationController "oldboyedu-rc-restartpolicy" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"
(2).问题原因
    根据报错"Unsupported value"说明重启策略不支持Never,这是rc资源报错信息。
(3).解决方案
    尽管官方文档说支持3种重启策略,但是rc,rs,deploy等控制器貌似仅支持ALways。对于OnFailure和Never并不支持。
[root@master231 replicationcontrollers]# kubectl explain rc.spec.template.spec.restartPolicy
KIND:     ReplicationController
VERSION:  v1

FIELD:    restartPolicy <string>

DESCRIPTION:
     Restart policy for all containers within the pod. One of Always, OnFailure,
     Never. Default to Always. More info:
     https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
36.没有空闲的端口无法完成调度
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
deploy-scheduler-hostnetwork-78f5cfb654-6wwdj   1/1     Running   0          30s   10.0.0.233   worker233   <none>           <none>
deploy-scheduler-hostnetwork-78f5cfb654-j7qnr   1/1     Running   0          30s   10.0.0.232   worker232   <none>           <none>
deploy-scheduler-hostnetwork-78f5cfb654-l2624   0/1     Pending   0          30s   <none>       <none>      <none>           <none>

[root@master231 scheduler
-pods]# kubectl describe pod deploy-scheduler-hostnetwork-78f5cfb654-l2624 Name: deploy-scheduler-hostnetwork-78f5cfb654-l2624 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 39s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports. Warning FailedScheduling 19s (x1 over 37s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.
(2).问题原因
    目前集群环境没有空闲的端口可以占用。
(3).解决方案
    - 减少副本数
    - 增加节点
37.CPU资源不足
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
deploy-scheduler-resources-68586785c4-dp5vn   0/1     Pending   0          4s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-68586785c4-rkcdp   0/1     Pending   0          4s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-68586785c4-zbkq9   0/1     Pending   0          4s    <none>   <none>   <none>           <none>

[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-68586785c4-dp5vn 
Name:           deploy-scheduler-resources-68586785c4-dp5vn
Namespace:      default
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  12s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.
(2).问题原因
    当前集群节点CPU资源不充足。
(3).解决方案
    - 降低用户的期望资源。
    - 提高集群的CPU配置
38.内存资源不足
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
deploy-scheduler-resources-79d77c6758-4xzlf   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-79d77c6758-9pghn   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
deploy-scheduler-resources-79d77c6758-r6sdg   0/1     Pending   0          3s    <none>   <none>   <none>           <none>
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-79d77c6758-4xzlf 
Name:           deploy-scheduler-resources-79d77c6758-4xzlf
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  10s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient memory.
(2).问题原因
    当前集群节点内存资源不充足。
(3).解决方案
    - 降低用户的期望资源。
    - 提高集群的内存资源。
39.k8s默认不支持gpu的资源限制
(1).报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid: 
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource type or fully qualified
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource for containers
(2).问题原因
    K8S集群默认不支持gpu的配置。
(3).解决方案
    - 需要单独安装第三方插件。
40.k8s默认不支持gpu的资源限制
(1).报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "2Gi": must be less than or equal to memory limit
(2).问题原因
    requests的期望资源不得超过limits资源的限制。
(3).解决方案
    修改requests小于或等于limits资源即可。
41:已经存在污点,修改时必须覆盖
(1).报错信息
[root@master231 ~]# kubectl taint node worker233 school=laonanhai:NoSchedule
error: node worker233 already has school taint(s) with same effect(s) and --overwrite is false
(2).问题原因
    添加污点时,如果key和effect相同,则会冲突,因此需要使用"--overwrite"才能进行覆盖。
(3).解决方案
    使用"--overwrite"进行污点的覆盖。
42.ds控制器DaemonSet的Pod无法驱逐
(1).报错信息
[root@master231 scheduler-pods]# kubectl drain worker232 
node/worker232 cordoned
error: unable to drain node "worker232" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22, continuing command...
There are pending nodes to be drained:
 worker232
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22
(2).问题原因
    ds资源创建的pod无法进行驱逐,因此在驱逐时应该忽略ds资源,创建你的pods即可,使用--ignore-daemonset选项。
(3).解决方案
    使用“--ignore-daemonsets”进行驱逐。
43:Pod反亲和性不匹配规则
(1).报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME                                                READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
deploy-scheduler-podantiaffinity-77b58fc685-qz2fz   1/1     Running   0          29s   10.100.2.13   worker233   <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-r6brl   0/1     Pending   0          8s    <none>        <none>      <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-smlbj   1/1     Running   0          29s   10.100.1.21   worker232   <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-trbhb   0/1     Pending   0          8s    <none>        <none>      <none>           <none>
deploy-scheduler-podantiaffinity-77b58fc685-w7sd4   1/1     Running   0          29s   10.100.0.24   master231   <none>           <none>
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Name:           deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Namespace:      default
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  17s   default-scheduler  0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules.
(2).问题原因
    当前集群环境没有任意节点符合Pod反亲和性规则。
(3).解决方案
    检查现有环境是否有符合规则,或者说修改Pod调度策略。
44: 未安装Jenkins的依赖fontconfig包
(1).报错信息
 java.lang.NullPointerException: Cannot load from short array because "sun.awt.FontConfiguration.head" is null
(2).问题原因
    说明没有安装fontconfig软件包,主要是Jenkins的一些字体工具包。
(3).解决方案
    apt-get install fontconfig
45.gitee认证失败
(1).报错信息
无法连接仓库:Command "git ls-remote -h -- https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git HEAD" returned status code 128:
stdout:
stderr: remote: [session-3a49c78b] Username for 'https: Incorrect username or password (access token)
fatal: Authentication failed for 'https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git/'
(2).问题原因
    Jenkins无法从gitee拉取代码。
(3).解决方案
    - 1.配置认证信息,或者免密登录
    - 2.将项目设置为公开。
    - 3.检查密码是否正确,忘记密码修改密码即可;
46.Jenkins沒有docker运行环境
(1).报错信息
[oldboyedu-linux94-yiliao] $ /bin/sh -xe /tmp/jenkins16273452721317027695.sh
+ docker build -t harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v1 .
/tmp/jenkins16273452721317027695.sh: 2: docker: not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE
(2).问题原因
    Jenkins没有docker命令。
(3).解决方案
    Jenkins节点安装docker环境即可。
47.Jenkins沒有kubectl运行环境
(1).报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
/tmp/jenkins9730372368520249844.sh: 5: kubectl: not found
Build step 'Execute shell' marked build as failure
(2).问题原因
    kubectl是管理K8S集群的客户端命令行工具。Jenkins节点未安装。
(3).解决方案
    Jenkins节点安装kubectl工具即可。
48.Jenkins没有K8S认证文件
(1).报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
error: the server doesn't have a resource type "deploy"
Build step 'Execute shell' marked build as failure
(2).问题原因
    Jenkins节点缺少K8S集群的认证文件
(3).解决方案
    将K8S集群的认证文件拷贝到Jenkins节点即可。
49.资源类型未知
(1).报错信息
[root@master231 deployments]# kubectl apply -f 11-deploy-readinessProbe-tcpSocket.yaml
deployment.apps/deploy-livenessprobe-readinessprobe-tcpsocket created
service/svc-xiuxain created
error: unable to recognize "11-deploy-readinessProbe-tcpSocket.yaml": no matches for kind "configMap" in version "v1"
(2).问题原因
    K8S集群的资源类型写错了。
(3).解决方案
    检查资源的kind字段,是否符合K8S集群的类型,可以通过kubectl api-resource查看K8S资源的类型。
    或者使用“kubectl explain cm”也可以检查cm资源所属的类型。
50.Metrics server组件不工作
(1).报错信息
[root@master231 deployments]# kubectl top node 
error: Metrics API not available
(2).问题原因
    根据报错提示,Metrics server组件不正常工作导致的。
(3).解决方案
    - 1.检查是否安装Metrics server组件;
    - 2.检查K8S集群网络插件,可能导致metrics server不正常工作;

———————————————————————————————————————————————————————————————————————————

                                                                                                                         无敌小马爱学习

posted on 2025-03-31 09:49  马俊南  阅读(113)  评论(0)    收藏  举报