7.K8s集群常见报错1
1.资源创建冲突
(1).报错信息
[root@master231 pods]# kubectl create -f 01-pods-xiuxian.yaml Error from server (AlreadyExists): error when creating "01-pods-xiuxian.yaml": pods "oldboyedu-xiuxian-v1" already exists
(2).问题原因
同一个名称空间下,同一种类型资源不能同名。
(3).解决方案
- 1.修改资源的名称; - 2.删除原有的资源再创建;
2.资源名称命名不规范
(1).报错信息
[root@master231 pods]# kubectl create -f 02-pods-xiuxian-hostNetwork.yaml The Pod "oldboyedu-xiuxian-hostNetwork" is invalid: metadata.name: Invalid value: "oldboyedu-xiuxian-hostNetwork": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
(2).问题原因
对于资源名称,不能出现汉字,大写字母,等特殊符号,有效的值为小写字母,数字,横杠(-),点(.)。
(3).解决方案
- 1.修改名称为合法的字符即可。 [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*
3.资源清单字段写错
(1).报错信息
[root@master231 pods]# kubectl create -f 02-pods-xiuxian-hostNetwork.yaml error: error validating "02-pods-xiuxian-hostNetwork.yaml": error validating data: ValidationError(Pod.spec): unknown field "hostnetwork" in io.k8s.api.core.v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false
(2).问题原因
资源清单的字段名称写错导致的报错,在'unknown field "hostnetwork"',表示"hostnetwork"写的有问题,请检查资源清单。
(3).解决方案
根据报错信息提示找到报错字段解决即可。
4.镜像名称无效
(1).报错信息
[root@master231 pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-env 0/1 InvalidImageName 0 11s 10.0.0.233 worker233 <none> <none> [root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-env Name: oldboyedu-xiuxian-env ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 31s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233 Warning InspectFailed 3s (x4 over 31s) kubelet Failed to apply default image tag "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/APPS:v1": couldn't parse image reference "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/APPS:v1": invalid reference format: repository name must be lowercase Warning Failed 3s (x4 over 31s) kubelet Error: InvalidImageName
(2).问题原因
镜像名称写错导致。
(3).解决方案
检查镜像名称是否正确,修改正确的即可,最近的解决方案就是使用docker pull手动测下。
5.镜像拉取出错
(1).报错信息
[root@master231 pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-env 0/1 ErrImagePull 0 9s 10.0.0.233 worker233 <none> <none> [root@master231 pods]# kubectl describe -f 03-pods-xiuxian-env.yaml Name: oldboyedu-xiuxian-env ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 37s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233 Normal Pulling 22s (x2 over 37s) kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11" Warning Failed 21s (x2 over 37s) kubelet Failed to pull image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11": rpc error: code = Unknown desc = Error response from daemon: pull access denied for registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied Warning Failed 21s (x2 over 37s) kubelet Error: ErrImagePull Normal BackOff 9s (x2 over 36s) kubelet Back-off pulling image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11" Warning Failed 9s (x2 over 36s) kubelet Error: ImagePullBackOff
(2).问题原因
镜像名称格式正确但是无法拉取到镜像。
(3).解决方案
- 1.有可能是镜像的名称写错,远程仓库没有该镜像; - 2.可能项目是一个私有的,你无法访问,需要登录,而登录凭证出错也会报该错误;
6.容器无法启动导致重启
(1).报错信息
[root@master231 pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-env 0/1 CrashLoopBackOff 2 (28s ago) 51s 10.0.0.233 worker233 <none> <none> oldboyedu-xiuxian-hostnetwork 1/1 Running 0 102s 10.0.0.233 worker233 <none> <none> [root@master231 pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-env 0/1 Error 3 (34s ago) 57s 10.0.0.233 worker233 <none> <none> oldboyedu-xiuxian-hostnetwork 1/1 Running 0 108s 10.0.0.233 worker233 <none> <none> [root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-env Name: oldboyedu-xiuxian-env ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 70s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233 Normal Pulled 19s (x4 over 70s) kubelet Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine Normal Created 19s (x4 over 70s) kubelet Created container c1 Normal Started 19s (x4 over 70s) kubelet Started container c1 Warning BackOff 15s (x4 over 64s) kubelet Back-off restarting failed container
(2).问题原因
可能是启动出错,也可能是端口冲突等问题。
(3).解决方案
- 1.手动使用docker命令启动镜像,观察是否能够正常启动,若不能启动,则可能启动存在问题; - 2.如果使用docker可以正常启动该容器,则需要检查是否和已有的Pod存在端口冲突的问题? - 3.如果非要强行启动容器,可以尝试使用command来修改容器的启动命令来解决端口冲突问题;
7.容器无法启动导致重启
(1).报错信息
[root@master231 pods]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-nodename 0/1 Pending 0 5s <none> 10.0.0.232 <none> <none> [root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-nodename Name: oldboyedu-xiuxian-nodename Namespace: default Priority: 0 Node: 10.0.0.232/ Labels: <none> Annotations: <none> Status: Pending ... Events: <none> [root@master231 pods]# kubectl get nodes NAME STATUS ROLES AGE VERSION master231 Ready control-plane,master 21h v1.23.17 worker232 Ready <none> 21h v1.23.17 worker233 Ready <none> 21h v1.23.17
(2).问题原因
K8S集群已经接收资源创建,但是无法调度到指定的节点,因为服务器不存在"10.0.0.232"这个节点。
(3).解决方案
查看当前节点列表,在nodeName字段中指定正确的worker node。
8.容器无法启动导致重启
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 01-pods-mysql.yaml Error from server (BadRequest): error when creating "01-pods-mysql.yaml": Pod in version "v1" cannot be handled as a Pod: json: cannot unmarshal bool into Go struct field EnvVar.spec.containers.env.value of type string
(2).问题原因
环境变量的值类型是字符串,对于特殊字符,比如数字,yes,true,false,no等关键字有特殊含义,因此对于这些数据应该使用双引号("")括起来。
(3).解决方案
找到"spec.containers.env.value"自动中可能是布尔类型(bool)的数据,将其使用双引号括起来。
9.容器处于创建状态ContainerCreating
(1).报错信息
[root@master231 case-demo]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-mysql 1/1 Running 0 6s 10.0.0.233 worker233 <none> <none> oldboyedu-wp 0/1 ContainerCreating 0 6s 10.0.0.232 worker232 <none> <none> [root@master231 case-demo]# kubectl describe pod oldboyedu-wp Name: oldboyedu-wp ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 21s kubelet Pulling image "wordpress:latest"
(2).问题原因
网络原因正在拉取镜像。
(3).解决方案
等待镜像拉取完成,或则将镜像上传到对应的节点。若节点有镜像则需要指定镜像的拉取策略。
10.容器找不到对应标签处于创建状态ContainerCreating
(1).报错信息
[root@master231 pods]# kubectl get pods --show-labels NAME READY STATUS RESTARTS AGE LABELS oldboyedu-xiuxian-v1 1/1 Running 0 2m59s address=shahe,class=linux94,run=oldboyedu-xiuxian-v1,school=oldboyedu [root@master231 pods]# kubectl label pods oldboyedu-xiuxian-v1 school=laonanhai error: 'school' already has a value (oldboyedu), and --overwrite is false
(2).问题原因
打标签时,已经存在了标签的值,无法修改。节点找寻标签找不到,所以一直创建中ContainerCreating,如果想要修改则需要添加覆盖参数。
(3).解决方案
在后面加一个"--overwrite"即可。
11.harbor客户端无法识别harbor服务器端证书文件
(1).报错信息
[root@worker232 ~]# docker login -u admin -p 1 harbor.oldboyedu.com WARNING! Using --password via the CLI is insecure. Use --password-stdin. Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": x509: certificate signed by unknown authority
(2).问题原因
harbor客户端无法识别harbor服务器端证书文件,需要将服务端自建证书文件拷贝到docker客户端相应的路径。
(3).解决方案
1 docker客户端创建自建证书的目录结构(注意域名的名称和目录要一致哟~) [root@worker232 ~]# mkdir -pv /etc/docker/certs.d/harbor.oldboyedu.com/ mkdir: created directory '/etc/docker/certs.d' mkdir: created directory '/etc/docker/certs.d/harbor.oldboyedu.com/'2 将客户端证书文件进行拷贝 [root@worker232 ~]# scp 10.0.0.250:/oldboyedu/softwares/harbor/certs/docker-client/* /etc/docker/certs.d/harbor.oldboyedu.com/ The authenticity of host '10.0.0.250 (10.0.0.250)' can't be established. ED25519 key fingerprint is SHA256:PSNwZBh2LYir/ha4+/eaLbzGbvh+XoiNr2zRcbL49AA. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '10.0.0.250' (ED25519) to the list of known hosts. root@10.0.0.250's password: ca.crt 100% 2049 4.3MB/s 00:00 harbor.oldboyedu.com.cert 100% 2147 3.1MB/s 00:00 harbor.oldboyedu.com.key 100% 3272 7.2MB/s 00:00 3 docker客户端验证 [root@worker232 ~]# echo 10.0.0.250 harbor.oldboyedu.com >> /etc/hosts [root@worker232 ~]# docker login -u admin -p 1 harbor.oldboyedu.com WARNING! Using --password via the CLI is insecure. Use --password-stdin. WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded
12.拉取镜像策略导致拉取不到镜像
(1).报错信息
[root@master231 pods]# kubectl get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-xiuxian-imagepullpolicy 0/1 ErrImageNeverPull 0 5s 10.100.2.9 worker233 <none> <none> oldboyedu-xiuxian-labeles 1/1 Running 0 34m 10.100.2.8 worker233 <none> <none> [root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-imagepullpolicy Name: oldboyedu-xiuxian-imagepullpolicy ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrImageNeverPull 2s (x3 over 15s) kubelet Container image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest" is not present with pull policy of Never Warning Failed 2s (x3 over 15s) kubelet Error: ErrImageNeverPull
(2).问题原因
镜像拉取策略为Never表示从不拉取镜像,如果本地有则尝试启动,本地没有也不去镜像仓库拉取镜像。
(3).解决方案
- 1.在对应节点上传镜像; - 2.修改镜像拉取策略 Always(实时同步模式)、IfNotPresent(智能缓存模式)、Never(离线模式)、OnFailure(故障回退模式)
13.修改镜像拉取策略失败
(1).报错信息
[root@master231 pods]# kubectl apply -f 09-pods-xiuxian-imagePullPolicy.yaml The Pod "oldboyedu-xiuxian-imagepullpolicy" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative) core.PodSpec{ Volumes: {{Name: "kube-api-access-2hpgk", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}}, InitContainers: nil, Containers: []core.Container{ { ... // 15 identical fields TerminationMessagePath: "/dev/termination-log", TerminationMessagePolicy: "File", - ImagePullPolicy: "IfNotPresent", + ImagePullPolicy: "Never", SecurityContext: nil, Stdin: false, ... // 2 identical fields }, }, EphemeralContainers: nil, RestartPolicy: "Always", ... // 26 identical fields }
(2).问题原因
镜像拉取策略字段不支持修改。 [root@master231 pods]# kubectl explain po.spec.containers.imagePullPolicy KIND: Pod VERSION: v1 FIELD: imagePullPolicy <string> DESCRIPTION: Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
(3).解决方案
删除原有的Pod重新创建新的Pod。
14.Yaml文件描述不全
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 01-rc-xiuxian.yaml The ReplicationController "oldboyedu-rc-xiuxian" is invalid: spec.template.spec.containers: Required value
(2).问题原因
定义的资源清单缺少"spec.template.spec.containers"字段定义。
(3).解决方案
检查配置文件,观察是否缺少对应的资源清单字段未定义。
15.定义选择器和标签不能匹配
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 01-rc-xiuxian.yaml The ReplicationController "oldboyedu-rc-xiuxian" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"apps":"v1", "class":"linux94"}: `selector` does not match template `labels`
(2).问题原因
标签选择器(selector)和模板(template)的标签不匹配。
(3).解决方案
检查标签选择器的标签在模板的metadata.labels中是否被包含。
16.数据类型不匹配
(1).报错信息
[root@master231 services]# kubectl apply -f 01-svc-xiuxian.yaml error: error validating "01-svc-xiuxian.yaml": error validating data: ValidationError(Service.spec.ports): invalid type for io.k8s.api.core.v1.ServiceSpec.ports: got "map", expected "array"; if you choose to ignore these errors, turn validation off with --validate=false
(2).问题原因
数据类型不匹配,期望的是一个数组(array),但是得到的是一个映射(map)。
(3).解决方案
修改资源清单,检查配置是否正确。
17.定义端口值未在端口固定开放范围内
(1).报错信息
[root@master231 services]# kubectl apply -f 03-svc-xiuxian-NodePort.yaml The Service "svc-xiuxain-nodeport" is invalid: spec.ports[0].nodePort: Invalid value: 8080: provided port is not in the valid range. The range of valid ports is 30000-32767
(2).问题原因
nodePort端口范围不匹配,有效的端口范围是: 30000-32767
(3).解决方案
- 1.修改端口范围在默认有效的端口范围内30000-32767; - 2.修改默认的端口范围即可; https://kubernetes.io/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/
18.端口已被占用
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 03-rc-svc.yaml replicationcontroller/oldboyedu-rc-xiuxian-v2 created replicationcontroller/oldboyedu-rc-xiuxian-v3 created Error from server (Invalid): error when creating "03-rc-svc.yaml": Service "svc-xiuxain-v2" is invalid: spec.ports[0].nodePort: Invalid value: 30082: provided port is already allocated Error from server (Invalid): error when creating "03-rc-svc.yaml": Service "svc-xiuxain-v3" is invalid: spec.ports[0].nodePort: Invalid value: 30083: provided port is already allocated
(2).问题原因
nodePort端口无效,因为该端口已经分配。
(3).解决方案
- 1.删除已经存在的svc; - 2.修改现有的端口范围;
19.指定命名空间不存在
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 05-rc-svc-wordpress.yaml Error from server (NotFound): error when creating "05-rc-svc-wordpress.yaml": namespaces "oldboyedu" not found Error from server (NotFound): error when creating "05-rc-svc-wordpress.yaml": namespaces "oldboyedu" not found
(2).问题原因
指定的名称空间不存在。
(3).解决方案
- 1.手动创建名称空间; - 2.指定一个存在的名称空间即可;
20.字段下不支持的字段功能
(1).报错信息
[root@master231 case-demo]# kubectl apply -f 05-rc-svc-wordpress.yaml namespace/oldboyedu created replicationcontroller/oldboyedu-db created The Service "svc-db" is invalid: spec.ports[0].nodePort: Forbidden: may not be used when `type` is 'ClusterIP'
(2).问题原因
ClusterIP不能指定NodePort端口,因此权限被拒绝Forbidden。
(3).解决方案
- 1.删除nodePort字段; - 2.如果真的有必要,则可以考虑使用NodePort类型。
21.解析yaml文件某行出错
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 04-rc-volumes-emptyDir-multiple.yaml error: error parsing 04-rc-volumes-emptyDir-multiple.yaml: error converting YAML to JSON: yaml: line 19: did not find expected '-' indicator
(2).问题原因
资源清单会从yaml格式解析为json格式,发现解析失败。也有可能是缩进问题。
(3).解决方案
根据报错信息,检查第19行,是否缺少期望的"-"字符串,如果发现配置文件没有写错,大概率是缩进问题导致的报错!
22.挂载的数据卷名找不到
(1).报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 06-rc-volumes-hostPath-localtime.yaml The ReplicationController "oldboyedu-rc-hostpath-localtime" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: "data"
(2).问题原因
容器在使用存储卷挂载时找不到对应的存储卷名称。
(3).解决方案
- 检查挂载点的存储卷名称和定义的存储卷名称是否一致。
23.mount挂载nfs报错
(1).报错信息
[root@harbor250 ~]# mount -t nfs 10.0.0.231:/oldboyedu/data/nfs-server /mnt/ mount: /mnt: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
(2).问题原因
宿主机不支持nfs文件系统挂载。
(3).解决方案
安装nfs依赖的组件即可,比如Ubuntu可以这样操作: apt -y install nfs-kernel-server
24.容器启动内脚本无权限
(1).报错信息
[root@master231 case-demo]# kubectl get pods -n devops NAME READY STATUS RESTARTS AGE oldboyedu-sonarqube-b4xqv 0/1 RunContainerError 0 (7s ago) 7s [root@master231 case-demo]# kubectl -n devops describe pod oldboyedu-sonarqube-b4xqv Name: oldboyedu-sonarqube-b4xqv Namespace: devops ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 14s (x2 over 15s) kubelet Container image "harbor.oldboyedu.com/oldboyedu-devops/sonarqube:9.9.7-community" already present on machine Normal Created 14s (x2 over 15s) kubelet Created container c1 Warning Failed 14s (x2 over 15s) kubelet Error: failed to start container "c1": Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/opt/sonarqube/docker/entrypoint.sh": permission denied: unknown
(2).问题原因
容器无法运行脚本。
(3).解决方案
请确保容器的指定路径“/opt/sonarqube/docker/entrypoint.sh”是否有执行权限。
25.宿主机没有目录导致Pod挂载失败
(1).报错信息
[root@master231:zuoye]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oldboyedu-wp-ztt26 0/1 ContainerCreating 0 9s <none> worker232 <none> <none> [root@master231:zuoye]# kubectl describe pod oldboyedu-wp-ztt26 Name: oldboyedu-wp-ztt26 Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 14s default-scheduler Successfully assigned default/oldboyedu-wp-ztt26 to worker232 Warning FailedMount 6s (x5 over 14s) kubelet MountVolume.SetUp failed for volume "wp-data" : mount failed: exit status 32 Mounting command: mount Mounting arguments: -t nfs 10.0.0.231:/oldboyedu/data/wordpress/wp /var/lib/kubelet/pods/a26ab131-5cf1-4378-89c1-c47059224177/volumes/kubernetes.io~nfs/wp-data Output: mount.nfs: access denied by server while mounting 10.0.0.231:/oldboyedu/data/wordpress/wp
[root@master231:zuoye]# ll /oldboyedu/data/wordpress/wp ls: cannot access '/oldboyedu/data/wordpress/wp': No such file or directory [root@master231:zuoye]# exportfs /oldboyedu/data/nfs-server <world> [root@master231:zuoye]# ll /oldboyedu/data/nfs-server/ total 4 -rw-r--r-- 1 root root 23 Nov 19 16:10 index.html
(2).问题原因
nfs server服务端没有对应的数据挂载目录。
(3).解决方案
检查nfs对外暴露的路径,手动创建对应的挂载目录。
———————————————————————————————————————————————————————————————————————————
无敌小马爱学习