K8S集群常见的100+报错指南
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
目录
- Q1.资源创建冲突
- Q2.资源名称命名不规范
- Q3.资源清单字段写错
- Q4.镜像名称无效
- Q5.镜像拉取出错
- Q6.容器无法启动导致重启
- Q7.容器无法启动导致重启
- Q8.容器无法启动导致重启
- Q9.容器处于创建状态ContainerCreating
- Q10.容器处于创建状态ContainerCreating
- Q11.harbor客户端无法识别harbor服务器端证书文件
- Q12.harbor客户端无法识别harbor服务器端证书文件
- Q13.harbor客户端无法识别harbor服务器端证书文件
- Q14.harbor客户端无法识别harbor服务器端证书文件
- Q15.harbor客户端无法识别harbor服务器端证书文件
- Q16.harbor客户端无法识别harbor服务器端证书文件
- Q17.harbor客户端无法识别harbor服务器端证书文件
- Q18.harbor客户端无法识别harbor服务器端证书文件
- Q19.harbor客户端无法识别harbor服务器端证书文件
- Q20.harbor客户端无法识别harbor服务器端证书文件
- Q21.harbor客户端无法识别harbor服务器端证书文件
- Q22.harbor客户端无法识别harbor服务器端证书文件
- Q23.harbor客户端无法识别harbor服务器端证书文件
- Q24.harbor客户端无法识别harbor服务器端证书文件
- Q25.harbor客户端无法识别harbor服务器端证书文件
- Q26.harbor客户端无法识别harbor服务器端证书文件
- Q27.harbor客户端无法识别harbor服务器端证书文件
- Q28.harbor客户端无法识别harbor服务器端证书文件
- Q29.harbor客户端无法识别harbor服务器端证书文件
- Q30.CNI基础组件不正常
- Q31.多端口映射需要指定名称
- Q32.未识别主机信息
- Q33.ds资源不支持svc暴露
- Q34.ClusterIP字段不支持热更新
- Q35.重启策略的值不符合
- Q36.没有空闲的端口无法完成调度
- Q37.CPU资源不足
- Q38.内存资源不足
- Q39.k8s默认不支持gpu的资源限制
- Q40.k8s默认不支持gpu的资源限制
- Q41:已经存在污点,修改时必须覆盖
- Q42.ds控制器的Pod无法驱逐
- Q43:Pod反亲和性不匹配规则
- Q44: 未安装Jenkins的依赖fontconfig包
- Q45.gitee认证失败
- Q46.沒有docker运行环境
- Q47.沒有kubectl运行环境
- Q48.Jenkins没有K8S认证文件
- Q49.资源类型未知
- Q50.Metrics server组件不工作
- Q51.Metrics server组件正常工作
- Q52.Metrics server组件正常工作
- Q53.pvc无法关联pv或者sc
- Q54.pvc被引用时无法被删除
- Q55.pod无法关联pvc
- Q56.sc的回收策略不支持热更新
- Q57.sc不支持archiveOnDelete参数
- Q58.sc的回收策略不支持
- Q59.yaml格式解析出错
- Q60.本地没有helm仓库
- Q61.资源清单api版本错误
- Q62.未定义Chart的version信息
- Q63.未定义Chart的name信息
- Q64.helm对harbor证书不识别
- Q65.helm的Release版本不对应
- Q66.默认启用admissionWebhooks导致的报错
- Q67.ing找到svc
- Q68.yaml格式制表符问题
- Q69.ingress在相同的名称空间找不到svc
- Q70.Could not get lock /var/lib/dpkg/lock-frontend.
- Q71.没有Kubeconfig认证文件
- Q72.CoreDNS本地回环问题
- Q73.fail to check rbd image status with: (executable file not found in $PATH)
- Q73: MountVolume.WaitForAttach failed for volume "data" : fail to check rbd image status with: (exit status 95), rbd output: (did not load config file, using default settings.
- Q74.pod has unbound immediate PersistentVolumeClaims.
- Q75: wrong fs type, bad option, bad superblock on /dev/rbd0, missing codepage ...
- Q76.missing configuration for cluster ID "0f06b0e2#b128#11ef#9a37#4971ded8a98b"
Q1.资源创建冲突
报错信息
[root@master231 pods]# kubectl create -f 01-pods-xiuxian.yaml
Error from server (AlreadyExists): error when creating "01-pods-xiuxian.yaml": pods "oldboyedu-xiuxian-v1" already exists
[root@master231 pods]#
问题原因
同一个名称空间下,同一种类型资源不能同名。
解决方案
- 1.修改资源的名称;
- 2.删除原有的资源再创建;
Q2.资源名称命名不规范
报错信息
[root@master231 pods]# kubectl create -f 02-pods-xiuxian-hostNetwork.yaml
The Pod "oldboyedu-xiuxian-hostNetwork" is invalid: metadata.name: Invalid value: "oldboyedu-xiuxian-hostNetwork": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
[root@master231 pods]#
问题原因
对于资源名称,不能出现汉字,大写字母,等特殊符号,有效的值为小写字母,数字,横杠(-),点(.)。
解决方案
- 1.修改名称为合法的字符即可。
[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*
Q3.资源清单字段写错
报错信息
[root@master231 pods]# kubectl create -f 02-pods-xiuxian-hostNetwork.yaml
error: error validating "02-pods-xiuxian-hostNetwork.yaml": error validating data: ValidationError(Pod.spec): unknown field "hostnetwork" in io.k8s.api.core.v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false
[root@master231 pods]#
问题原因
资源清单的字段名称写错导致的报错,在'unknown field "hostnetwork"',表示"hostnetwork"写的有问题,请检查资源清单。
解决方案
根据报错信息提示找到报错字段解决即可。
Q4.镜像名称无效
报错信息
[root@master231 pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-env 0/1 InvalidImageName 0 11s 10.0.0.233 worker233 <none> <none>
[root@master231 pods]#
[root@master231 pods]#
[root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-env
Name: oldboyedu-xiuxian-env
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233
Warning InspectFailed 3s (x4 over 31s) kubelet Failed to apply default image tag "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/APPS:v1": couldn't parse image reference "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/APPS:v1": invalid reference format: repository name must be lowercase
Warning Failed 3s (x4 over 31s) kubelet Error: InvalidImageName
[root@master231 pods]#
问题原因
镜像名称写错导致。
解决方案
检查镜像名称是否正确,修改正确的即可,最近的解决方案就是使用docker pull手动测下。
Q5.镜像拉取出错
报错信息
[root@master231 pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-env 0/1 ErrImagePull 0 9s 10.0.0.233 worker233 <none> <none>
[root@master231 pods]#
[root@master231 pods]#
[root@master231 pods]# kubectl describe -f 03-pods-xiuxian-env.yaml
Name: oldboyedu-xiuxian-env
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233
Normal Pulling 22s (x2 over 37s) kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11"
Warning Failed 21s (x2 over 37s) kubelet Failed to pull image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11": rpc error: code = Unknown desc = Error response from daemon: pull access denied for registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning Failed 21s (x2 over 37s) kubelet Error: ErrImagePull
Normal BackOff 9s (x2 over 36s) kubelet Back-off pulling image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/app:v11"
Warning Failed 9s (x2 over 36s) kubelet Error: ImagePullBackOff
[root@master231 pods]#
问题原因
镜像名称格式正确但是无法拉取到镜像。
解决方案
- 1.有可能是镜像的名称写错,远程仓库没有该镜像;
- 2.可能项目是一个私有的,你无法访问,需要登录,而登录凭证出错也会报该错误;
Q6.容器无法启动导致重启
报错信息
[root@master231 pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-env 0/1 CrashLoopBackOff 2 (28s ago) 51s 10.0.0.233 worker233 <none> <none>
oldboyedu-xiuxian-hostnetwork 1/1 Running 0 102s 10.0.0.233 worker233 <none> <none>
[root@master231 pods]#
[root@master231 pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-env 0/1 Error 3 (34s ago) 57s 10.0.0.233 worker233 <none> <none>
oldboyedu-xiuxian-hostnetwork 1/1 Running 0 108s 10.0.0.233 worker233 <none> <none>
[root@master231 pods]#
[root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-env
Name: oldboyedu-xiuxian-env
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 70s default-scheduler Successfully assigned default/oldboyedu-xiuxian-env to worker233
Normal Pulled 19s (x4 over 70s) kubelet Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine
Normal Created 19s (x4 over 70s) kubelet Created container c1
Normal Started 19s (x4 over 70s) kubelet Started container c1
Warning BackOff 15s (x4 over 64s) kubelet Back-off restarting failed container
[root@master231 pods]#
问题原因
可能是启动出错,也可能是端口冲突等问题。
解决方案
- 1.手动使用docker命令启动镜像,观察是否能够正常启动,若不能启动,则可能启动存在问题;
- 2.如果使用docker可以正常启动该容器,则需要检查是否和已有的Pod存在端口冲突的问题?
- 3.如果非要强行启动容器,可以尝试使用command来修改容器的启动命令来解决端口冲突问题;
Q7.容器无法启动导致重启
报错信息
[root@master231 pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-nodename 0/1 Pending 0 5s <none> 10.0.0.232 <none> <none>
[root@master231 pods]#
[root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-nodename
Name: oldboyedu-xiuxian-nodename
Namespace: default
Priority: 0
Node: 10.0.0.232/
Labels: <none>
Annotations: <none>
Status: Pending
...
Events: <none>
[root@master231 pods]#
[root@master231 pods]#
[root@master231 pods]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master231 Ready control-plane,master 21h v1.23.17
worker232 Ready <none> 21h v1.23.17
worker233 Ready <none> 21h v1.23.17
[root@master231 pods]#
问题原因
K8S集群已经接收资源创建,但是无法调度到指定的节点,因为服务器不存在"10.0.0.232"这个节点。
解决方案
查看当前节点列表,在nodeName字段中指定正确的worker node。
Q8.容器无法启动导致重启
报错信息
[root@master231 case-demo]# kubectl apply -f 01-pods-mysql.yaml
Error from server (BadRequest): error when creating "01-pods-mysql.yaml": Pod in version "v1" cannot be handled as a Pod: json: cannot unmarshal bool into Go struct field EnvVar.spec.containers.env.value of type string
[root@master231 case-demo]#
问题原因
环境变量的值类型是字符串,对于特殊字符,比如数字,yes,true,false,no等关键字由特殊含义,因此对于这些数据应该使用双引号("")括起来。
解决方案
找到"spec.containers.env.value"自动中可能是布尔类型(bool)的数据,将其使用双引号括起来。
Q9.容器处于创建状态ContainerCreating
报错信息
[root@master231 case-demo]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-mysql 1/1 Running 0 6s 10.0.0.233 worker233 <none> <none>
oldboyedu-wp 0/1 ContainerCreating 0 6s 10.0.0.232 worker232 <none> <none>
[root@master231 case-demo]#
[root@master231 case-demo]#
[root@master231 case-demo]# kubectl describe pod oldboyedu-wp
Name: oldboyedu-wp
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 21s kubelet Pulling image "wordpress:latest"
[root@master231 case-demo]#
问题原因
网络原因正在拉取镜像。
解决方案
等待镜像拉取完成,或则将镜像上传到对应的节点。若节点有镜像则需要指定镜像的拉取策略。
Q10.容器处于创建状态ContainerCreating
报错信息
[root@master231 pods]# kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
oldboyedu-xiuxian-v1 1/1 Running 0 2m59s address=shahe,class=linux94,run=oldboyedu-xiuxian-v1,school=oldboyedu
[root@master231 pods]#
[root@master231 pods]#
[root@master231 pods]# kubectl label pods oldboyedu-xiuxian-v1 school=laonanhai
error: 'school' already has a value (oldboyedu), and --overwrite is false
[root@master231 pods]#
问题原因
打标签时,已经存在了标签的值,无法修改。如果想要修改则需要添加覆盖参数。
解决方案
在后面加一个"--overwrite"即可。
Q11.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@worker232 ~]# docker login -u admin -p 1 harbor.oldboyedu.com
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": x509: certificate signed by unknown authority
[root@worker232 ~]#
问题原因
harbor客户端无法识别harbor服务器端证书文件,需要将服务端自建证书文件拷贝到docker客户端相应的路径。
解决方案
1 docker客户端创建自建证书的目录结构(注意域名的名称和目录要一致哟~)
[root@worker232 ~]# mkdir -pv /etc/docker/certs.d/harbor.oldboyedu.com/
mkdir: created directory '/etc/docker/certs.d'
mkdir: created directory '/etc/docker/certs.d/harbor.oldboyedu.com/'
[root@worker232 ~]#
2 将客户端证书文件进行拷贝
[root@worker232 ~]# scp 10.0.0.250:/oldboyedu/softwares/harbor/certs/docker-client/* /etc/docker/certs.d/harbor.oldboyedu.com/
The authenticity of host '10.0.0.250 (10.0.0.250)' can't be established.
ED25519 key fingerprint is SHA256:PSNwZBh2LYir/ha4+/eaLbzGbvh+XoiNr2zRcbL49AA.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.0.0.250' (ED25519) to the list of known hosts.
root@10.0.0.250's password:
ca.crt 100% 2049 4.3MB/s 00:00
harbor.oldboyedu.com.cert 100% 2147 3.1MB/s 00:00
harbor.oldboyedu.com.key 100% 3272 7.2MB/s 00:00
[root@worker232 ~]#
3 docker客户端验证
[root@worker232 ~]# echo 10.0.0.250 harbor.oldboyedu.com >> /etc/hosts
[root@worker232 ~]#
[root@worker232 ~]# docker login -u admin -p 1 harbor.oldboyedu.com
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
[root@worker232 ~]#
Q12.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 pods]# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-xiuxian-imagepullpolicy 0/1 ErrImageNeverPull 0 5s 10.100.2.9 worker233 <none> <none>
oldboyedu-xiuxian-labeles 1/1 Running 0 34m 10.100.2.8 worker233 <none> <none>
[root@master231 pods]#
[root@master231 pods]# kubectl describe pod oldboyedu-xiuxian-imagepullpolicy
Name: oldboyedu-xiuxian-imagepullpolicy
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrImageNeverPull 2s (x3 over 15s) kubelet Container image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest" is not present with pull policy of Never
Warning Failed 2s (x3 over 15s) kubelet Error: ErrImageNeverPull
[root@master231 pods]#
问题原因
镜像拉取策略为Never表示从不拉取镜像,如果本地有则尝试启动。
解决方案
- 1.要么在对应节点上传镜像;
- 2.修改镜像拉取策略
Q13.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 pods]# kubectl apply -f 09-pods-xiuxian-imagePullPolicy.yaml
The Pod "oldboyedu-xiuxian-imagepullpolicy" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
core.PodSpec{
Volumes: {{Name: "kube-api-access-2hpgk", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
InitContainers: nil,
Containers: []core.Container{
{
... // 15 identical fields
TerminationMessagePath: "/dev/termination-log",
TerminationMessagePolicy: "File",
- ImagePullPolicy: "IfNotPresent",
+ ImagePullPolicy: "Never",
SecurityContext: nil,
Stdin: false,
... // 2 identical fields
},
},
EphemeralContainers: nil,
RestartPolicy: "Always",
... // 26 identical fields
}
[root@master231 pods]#
问题原因
镜像拉取策略字段不支持修改。
[root@master231 pods]# kubectl explain po.spec.containers.imagePullPolicy
KIND: Pod
VERSION: v1
FIELD: imagePullPolicy <string>
DESCRIPTION:
Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always
if :latest tag is specified, or IfNotPresent otherwise. Cannot be updated.
More info:
https://kubernetes.io/docs/concepts/containers/images#updating-images
[root@master231 pods]#
解决方案
删除原有的Pod重新创建新的Pod。
Q14.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 01-rc-xiuxian.yaml
The ReplicationController "oldboyedu-rc-xiuxian" is invalid: spec.template.spec.containers: Required value
[root@master231 replicationcontrollers]#
问题原因
定义的资源清单缺少"spec.template.spec.containers"字段定义。
解决方案
检查配置文件,观察是否缺少对应的资源清单字段未定义。
Q15.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 01-rc-xiuxian.yaml
The ReplicationController "oldboyedu-rc-xiuxian" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"apps":"v1", "class":"linux94"}: `selector` does not match template `labels`
[root@master231 replicationcontrollers]#
问题原因
标签选择器(selector)和模板(template)的标签不匹配。
解决方案
检查标签选择器的标签在模板的metadata.labels中是否被包含。
Q16.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 services]# kubectl apply -f 01-svc-xiuxian.yaml
error: error validating "01-svc-xiuxian.yaml": error validating data: ValidationError(Service.spec.ports): invalid type for io.k8s.api.core.v1.ServiceSpec.ports: got "map", expected "array"; if you choose to ignore these errors, turn validation off with --validate=false
[root@master231 services]#
问题原因
数据类型不匹配,期望的是一个数组(array),但是得到的是一个映射(map)。
解决方案
修改资源清单,检查配置是否正确。
Q17.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 services]# kubectl apply -f 03-svc-xiuxian-NodePort.yaml
The Service "svc-xiuxain-nodeport" is invalid: spec.ports[0].nodePort: Invalid value: 8080: provided port is not in the valid range. The range of valid ports is 30000-32767
[root@master231 services]#
问题原因
nodePort端口范围不匹配,有效的端口范围是: 30000-32767
解决方案
- 1.修改端口范围在默认有效的端口范围内30000-32767;
- 2.修改默认的端口范围即可;
https://kubernetes.io/zh-cn/docs/reference/command-line-tools-reference/kube-apiserver/
Q18.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl apply -f 03-rc-svc.yaml
replicationcontroller/oldboyedu-rc-xiuxian-v2 created
replicationcontroller/oldboyedu-rc-xiuxian-v3 created
Error from server (Invalid): error when creating "03-rc-svc.yaml": Service "svc-xiuxain-v2" is invalid: spec.ports[0].nodePort: Invalid value: 30082: provided port is already allocated
Error from server (Invalid): error when creating "03-rc-svc.yaml": Service "svc-xiuxain-v3" is invalid: spec.ports[0].nodePort: Invalid value: 30083: provided port is already allocated
[root@master231 case-demo]#
问题原因
nodePort端口无效,因为该端口已经分配。
解决方案
- 1.删除已经存在的svc;
- 2.修改现有的端口范围;
Q19.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl apply -f 05-rc-svc-wordpress.yaml
Error from server (NotFound): error when creating "05-rc-svc-wordpress.yaml": namespaces "oldboyedu" not found
Error from server (NotFound): error when creating "05-rc-svc-wordpress.yaml": namespaces "oldboyedu" not found
[root@master231 case-demo]#
问题原因
指定的名称空间不存在。
解决方案
- 1.手动创建名称空间;
- 2.指定一个存在的名称空间即可;
Q20.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl apply -f 05-rc-svc-wordpress.yaml
namespace/oldboyedu created
replicationcontroller/oldboyedu-db created
The Service "svc-db" is invalid: spec.ports[0].nodePort: Forbidden: may not be used when `type` is 'ClusterIP'
[root@master231 case-demo]#
问题原因
ClusterIP不能指定NodePort端口,因此权限被拒绝Forbidden。
解决方案
- 1.删除nodePort字段;
- 2.如果真的有必要,则可以考虑使用NodePort类型。
Q21.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 04-rc-volumes-emptyDir-multiple.yaml
error: error parsing 04-rc-volumes-emptyDir-multiple.yaml: error converting YAML to JSON: yaml: line 19: did not find expected '-' indicator
[root@master231 replicationcontrollers]#
问题原因
资源清单会从yaml格式解析为json格式,发现解析失败。也有可能是缩进问题。
解决方案
根据报错信息,检查第19行,是否缺少期望的"-"字符串,如果发现配置文件没有写错,大概率是缩进问题导致的报错!
Q22.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 06-rc-volumes-hostPath-localtime.yaml
The ReplicationController "oldboyedu-rc-hostpath-localtime" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: "data"
[root@master231 replicationcontrollers]#
问题原因
容器在使用存储卷挂载时找不到对应的存储卷名称。
解决方案
- 检查挂载点的存储卷名称和定义的存储卷名称是否一致。
Q23.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@harbor250 ~]# mount -t nfs 10.0.0.231:/oldboyedu/data/nfs-server /mnt/
mount: /mnt: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
[root@harbor250 ~]#
问题原因
宿主机不支持nfs文件系统挂载。
解决方案
安装nfs依赖的组件即可,比如Ubuntu可以这样操作: apt -y install nfs-kernel-server
Q24.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl get pods -n devops
NAME READY STATUS RESTARTS AGE
oldboyedu-sonarqube-b4xqv 0/1 RunContainerError 0 (7s ago) 7s
[root@master231 case-demo]#
[root@master231 case-demo]# kubectl -n devops describe pod oldboyedu-sonarqube-b4xqv
Name: oldboyedu-sonarqube-b4xqv
Namespace: devops
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 14s (x2 over 15s) kubelet Container image "harbor.oldboyedu.com/oldboyedu-devops/sonarqube:9.9.7-community" already present on machine
Normal Created 14s (x2 over 15s) kubelet Created container c1
Warning Failed 14s (x2 over 15s) kubelet Error: failed to start container "c1": Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/opt/sonarqube/docker/entrypoint.sh": permission denied: unknown
[root@master231 case-demo]#
问题原因
容器无法运行脚本。
解决方案
请确保容器的指定路径“/opt/sonarqube/docker/entrypoint.sh”是否有执行权限。
Q25.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231:zuoye]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-wp-ztt26 0/1 ContainerCreating 0 9s <none> worker232 <none> <none>
[root@master231:zuoye]#
[root@master231:zuoye]# kubectl describe pod oldboyedu-wp-ztt26
Name: oldboyedu-wp-ztt26
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14s default-scheduler Successfully assigned default/oldboyedu-wp-ztt26 to worker232
Warning FailedMount 6s (x5 over 14s) kubelet MountVolume.SetUp failed for volume "wp-data" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs 10.0.0.231:/oldboyedu/data/wordpress/wp /var/lib/kubelet/pods/a26ab131-5cf1-4378-89c1-c47059224177/volumes/kubernetes.io~nfs/wp-data
Output: mount.nfs: access denied by server while mounting 10.0.0.231:/oldboyedu/data/wordpress/wp
[root@master231:zuoye]#
[root@master231:zuoye]# ll /oldboyedu/data/wordpress/wp
ls: cannot access '/oldboyedu/data/wordpress/wp': No such file or directory
[root@master231:zuoye]#
[root@master231:zuoye]# exportfs
/oldboyedu/data/nfs-server
<world>
[root@master231:zuoye]#
[root@master231:zuoye]# ll /oldboyedu/data/nfs-server/
total 4
-rw-r--r-- 1 root root 23 Nov 19 16:10 index.html
[root@master231:zuoye]#
问题原因
nfs server服务端没有对应的数据挂载目录。
解决方案
检查nfs对外暴露的路径,手动创建对应的挂载目录。
Q26.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 08-rc-configmaps-env.yaml
The ReplicationController "oldboyedu-rc-cm-env" is invalid:
* spec.template.spec.containers[0].env[0].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
* spec.template.spec.containers[0].env[1].valueFrom.configMapKeyRef.name: Invalid value: "": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
[root@master231 replicationcontrollers]#
问题原因
表示环境变量引用时,没有指定configMap的名称。
解决方案
没有指定cm的名称。
Q27.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 replicationcontrollers]# kubectl describe pod oldboyedu-rc-cm-env-ml8pk
Name: oldboyedu-rc-cm-env-ml8pk
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned default/oldboyedu-rc-cm-env-ml8pk to worker233
Normal Pulled 3s (x4 over 30s) kubelet Container image "registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1" already present on machine
Warning Failed 3s (x4 over 30s) kubelet Error: couldn't find key SchooL in ConfigMap default/oldboyedu-linux94
[root@master231 replicationcontrollers]#
问题原因
表示环境变量引用时,在对应cm中未找到对应KEY。
解决方案
查看cm的KEY和pod引用的KEY是否相同。
Q28.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
oldboyedu-rc-harbor-2kt2t 0/1 ErrImagePull 0 2s 10.100.2.116 worker233 <none> <none>
oldboyedu-rc-harbor-crhgh 0/1 ErrImagePull 0 2s 10.100.1.61 worker232 <none> <none>
oldboyedu-rc-harbor-q57ff 0/1 ErrImagePull 0 2s 10.100.2.117 worker233 <none> <none>
[root@master231 case-demo]#
[root@master231 case-demo]# kubectl describe pod oldboyedu-rc-harbor-q57ff
Name: oldboyedu-rc-harbor-q57ff
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14s default-scheduler Successfully assigned default/oldboyedu-rc-harbor-q57ff to worker233
Normal Pulling 13s kubelet Pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
Warning Failed 13s kubelet Failed to pull image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest": rpc error: code = Unknown desc = Error response from daemon: unauthorized: unauthorized to access repository: oldboyedu-linux/alpine, action: pull: unauthorized to access repository: oldboyedu-linux/alpine, action: pull
Warning Failed 13s kubelet Error: ErrImagePull
Normal BackOff 12s (x2 over 13s) kubelet Back-off pulling image "harbor.oldboyedu.com/oldboyedu-linux/alpine:latest"
Warning Failed 12s (x2 over 13s) kubelet Error: ImagePullBackOff
[root@master231 case-demo]#
问题原因
拉取镜像出错,可能是未认证登录。
解决方案
- 使用secrets创建认证信息,从而可以进行认证。
Q29.harbor客户端无法识别harbor服务器端证书文件
报错信息
[root@master231 case-demo]# kubectl apply -f 18-ns-rc-svc-jenkins.yaml
Warning: Detected changes to resource devops which is currently being deleted.
namespace/devops unchanged
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": replicationcontrollers "oldboyedu-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
Error from server (Forbidden): error when creating "18-ns-rc-svc-jenkins.yaml": services "svc-jenkins" is forbidden: unable to create new content in namespace devops because it is being terminated
[root@master231 case-demo]#
[root@master231 case-demo]#
[root@master231 case-demo]# kubectl get ns
NAME STATUS AGE
default Active 5d
devops Terminating 99s
kube-flannel Active 4d8h
kube-node-lease Active 5d
kube-public Active 5d
kube-system Active 5d
[root@master231 case-demo]#
问题原因
由于资源还未删除,处于Terminating,就开始重新创建该资源导致的错误。
解决方案
等待其删除完成即可,若长时间一直处于该状态,则考虑去etcd中删除对应的数据。
Q30.CNI基础组件不正常
报错信息
[root@master231:huidu]# kubectl apply -f metallb-ip-pool.yaml
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.200.75.222:443: i/o timeout
Error from server (InternalError): error when creating "metallb-ip-pool.yaml": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded
[root@master231:huidu]#
问题原因
服务内部错误,根据信息提示应该是连接超时。从报错信息来看,应该是coreDNS组件应该是正常工作的。
能够将"metallb-webhook-service.metallb-system.svc"解析为"10.200.75.222",可以初步排除是coreDNS组件的问题。
检查基础组件CNI是否正常。
解决方案
巡检CNI组件是否正常,比如: "https://www.cnblogs.com/yinzhengjie/p/18353027#八k8s主机巡检流程"
Q31.多端口映射需要指定名称
报错信息
[root@master231 endpoints]# kubectl apply -f 01-ep-harbor.yaml
endpoints/oldboyedu-harbor created
The Service "oldboyedu-harbor" is invalid:
* spec.ports[0].name: Required value
* spec.ports[1].name: Required value
[root@master231 endpoints]#
问题原因
svc在配置多端口映射时,应该定义名称以区分不同端口的作用。名称自定义能够唯一标识即可。
解决方案
定义svc端口映射时添加名称即可。
Q32.未识别主机信息
报错信息
[root@harbor250 ~]# docker pull harbor.oldboyedu.com/oldboyedu-db/mysql:8.0.36-oracle
Error response from daemon: Get "https://harbor.oldboyedu.com/v2/": dial tcp: lookup harbor.oldboyedu.com on 127.0.0.53:53: no such host
[root@harbor250 ~]#
问题原因
对于域名无法实现解析,可以添加hosts解析即可。
解决方案
添加hosts文件解析。
Q33.ds资源不支持svc暴露
报错信息
[root@master231 kubernetes]# kubectl expose ds ds-xiuxian --port=80 --target-port=80 --type=ClusterIP
error: cannot expose a DaemonSet.apps
[root@master231 kubernetes]#
问题原因
ds资源不支持svc暴露。
解决方案
换个资源暴露吧。
Q34.ClusterIP字段不支持热更新
报错信息
[root@master231 services]# kubectl apply -f 06-sessionAffinity.yaml
The Service "oldboyedu-xiuxian" is invalid: spec.clusterIPs[0]: Invalid value: []string{"10.200.0.88"}: may not change once set
[root@master231 services]#
问题原因
svc资源的ClusterIP字段设置后不可修改。
解决方案
如果真的是想要修改svc的地址,则需要删除原有的svc重新创建即可。
Q35.重启策略的值不符合
报错信息
[root@master231 replicationcontrollers]# kubectl apply -f 12-rc-xiuxian-restartPolicy.yaml
The ReplicationController "oldboyedu-rc-restartpolicy" is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"
[root@master231 replicationcontrollers]#
问题原因
根据报错"Unsupported value"说明重启策略不支持Never,这是rc资源报错信息。
解决方案
尽管官方文档说支持3种重启策略,但是rc,rs,deploy等控制器貌似仅支持ALways。对于OnFailure和Never并不不支持。
[root@master231 replicationcontrollers]# kubectl explain rc.spec.template.spec.restartPolicy
KIND: ReplicationController
VERSION: v1
FIELD: restartPolicy <string>
DESCRIPTION:
Restart policy for all containers within the pod. One of Always, OnFailure,
Never. Default to Always. More info:
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
[root@master231 replicationcontrollers]#
Q36.没有空闲的端口无法完成调度
报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-scheduler-hostnetwork-78f5cfb654-6wwdj 1/1 Running 0 30s 10.0.0.233 worker233 <none> <none>
deploy-scheduler-hostnetwork-78f5cfb654-j7qnr 1/1 Running 0 30s 10.0.0.232 worker232 <none> <none>
deploy-scheduler-hostnetwork-78f5cfb654-l2624 0/1 Pending 0 30s <none> <none> <none> <none>
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-hostnetwork-78f5cfb654-l2624
Name: deploy-scheduler-hostnetwork-78f5cfb654-l2624
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 39s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.
Warning FailedScheduling 19s (x1 over 37s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports.
[root@master231 scheduler-pods]#
问题原因]
目前集群环境没有空闲的端口可以占用。
解决方案
- 减少副本数
- 增加节点
Q37.CPU资源不足
报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-scheduler-resources-68586785c4-dp5vn 0/1 Pending 0 4s <none> <none> <none> <none>
deploy-scheduler-resources-68586785c4-rkcdp 0/1 Pending 0 4s <none> <none> <none> <none>
deploy-scheduler-resources-68586785c4-zbkq9 0/1 Pending 0 4s <none> <none> <none> <none>
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-68586785c4-dp5vn
Name: deploy-scheduler-resources-68586785c4-dp5vn
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient cpu.
[root@master231 scheduler-pods]#
问题原因
当前集群节点CPU资源不充足。
解决方案
- 降低用户的期望资源。
- 提高集群的CPU配置
Q38.内存资源不足
报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-scheduler-resources-79d77c6758-4xzlf 0/1 Pending 0 3s <none> <none> <none> <none>
deploy-scheduler-resources-79d77c6758-9pghn 0/1 Pending 0 3s <none> <none> <none> <none>
deploy-scheduler-resources-79d77c6758-r6sdg 0/1 Pending 0 3s <none> <none> <none> <none>
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-resources-79d77c6758-4xzlf
Name: deploy-scheduler-resources-79d77c6758-4xzlf
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 Insufficient memory.
[root@master231 scheduler-pods]#
问题原因
当前集群节点内存资源不充足。
解决方案
- 降低用户的期望资源。
- 提高集群的内存资源。
Q39.k8s默认不支持gpu的资源限制
报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid:
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource type or fully qualified
* spec.template.spec.containers[0].resources.requests[gpu]: Invalid value: "gpu": must be a standard resource for containers
[root@master231 scheduler-pods]#
问题原因
K8S集群默认不支持gpu的配置。
解决方案
- 需要单独安装第三方插件。
Q40.k8s默认不支持gpu的资源限制
报错信息
[root@master231 scheduler-pods]# kubectl apply -f 04-scheduler-resources.yaml
The Deployment "deploy-scheduler-resources" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "2Gi": must be less than or equal to memory limit
[root@master231 scheduler-pods]#
问题原因
requests的期望资源不得超过limits资源的限制。
解决方案
修改requests小于或等于limits资源即可。
Q41:已经存在污点,修改时必须覆盖
报错信息
[root@master231 ~]# kubectl taint node worker233 school=laonanhai:NoSchedule
error: node worker233 already has school taint(s) with same effect(s) and --overwrite is false
[root@master231 ~]#
问题原因
添加污点时,如果key和effect相同,则会冲突,因此需要使用"--overwrite"才能进行覆盖。
解决方案
使用"--overwrite"进行污点的覆盖。
Q42.ds控制器的Pod无法驱逐
报错信息
[root@master231 scheduler-pods]# kubectl drain worker232
node/worker232 cordoned
error: unable to drain node "worker232" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22, continuing command...
There are pending nodes to be drained:
worker232
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-flannel/kube-flannel-ds-8x2nc, kube-system/kube-proxy-766pl, metallb-system/speaker-87q22
[root@master231 scheduler-pods]#
问题原因
ds资源创建的pod无法进行驱逐,因此在驱逐时应该忽略ds资源床架你的pods即可,使用--ignore-daemonset选项。
解决方案
使用“--ignore-daemonsets”进行驱逐。
Q43:Pod反亲和性不匹配规则
报错信息
[root@master231 scheduler-pods]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-scheduler-podantiaffinity-77b58fc685-qz2fz 1/1 Running 0 29s 10.100.2.13 worker233 <none> <none>
deploy-scheduler-podantiaffinity-77b58fc685-r6brl 0/1 Pending 0 8s <none> <none> <none> <none>
deploy-scheduler-podantiaffinity-77b58fc685-smlbj 1/1 Running 0 29s 10.100.1.21 worker232 <none> <none>
deploy-scheduler-podantiaffinity-77b58fc685-trbhb 0/1 Pending 0 8s <none> <none> <none> <none>
deploy-scheduler-podantiaffinity-77b58fc685-w7sd4 1/1 Running 0 29s 10.100.0.24 master231 <none> <none>
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]#
[root@master231 scheduler-pods]# kubectl describe pod deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Name: deploy-scheduler-podantiaffinity-77b58fc685-r6brl
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17s default-scheduler 0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules.
[root@master231 scheduler-pods]#
问题原因
当前集群环境没有任意节点符合Pod反亲和性规则。
解决方案
检查现有环境是否有符合规则,或者说修改Pod调度策略。
Q44: 未安装Jenkins的依赖fontconfig包
报错信息
java.lang.NullPointerException: Cannot load from short array because "sun.awt.FontConfiguration.head" is null
错误原因
说明你没有安装fontconfig软件包,主要是Jenkins的一些字体工具包。
解决方案
apt-get install fontconfig
Q45.gitee认证失败
报错信息
无法连接仓库:Command "git ls-remote -h -- https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git HEAD" returned status code 128:
stdout:
stderr: remote: [31m[session-3a49c78b] Username for 'https: Incorrect username or password (access token)[0m
fatal: Authentication failed for 'https://gitee.com/jasonyin2020/oldboyedu-linux94-yiliao.git/'
错误原因
Jenkins无法从gitee拉取代码。
解决方案
- 1.配置认证信息,或者免密登录
- 2.将项目设置为公开。
- 3.检查密码是否正确,忘记密码修改密码即可;
Q46.沒有docker运行环境
报错信息
[oldboyedu-linux94-yiliao] $ /bin/sh -xe /tmp/jenkins16273452721317027695.sh
+ docker build -t harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v1 .
/tmp/jenkins16273452721317027695.sh: 2: docker: not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE
错误原因
Jenkins没有docker命令。
解决方案
Jenkins节点安装docker环境即可。
Q47.沒有kubectl运行环境
报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
/tmp/jenkins9730372368520249844.sh: 5: kubectl: not found
Build step 'Execute shell' marked build as failure
错误原因
kubectl是管理K8S集群的客户端命令行工具。Jenkins节点未安装。
解决方案
Jenkins节点安装kubectl工具即可。
Q48.Jenkins没有K8S认证文件
报错信息
+ kubectl set image deploy deploy-yiliao c1=harbor.oldboyedu.com/oldboyedu-jenkins/yiliao:v2
error: the server doesn't have a resource type "deploy"
Build step 'Execute shell' marked build as failure
错误原因
Jenkins节点缺少K8S集群的认证文件
解决方案
将K8S集群的认证文件拷贝到Jenkins节点即可。
Q49.资源类型未知
报错信息
[root@master231 deployments]# kubectl apply -f 11-deploy-readinessProbe-tcpSocket.yaml
deployment.apps/deploy-livenessprobe-readinessprobe-tcpsocket created
service/svc-xiuxain created
error: unable to recognize "11-deploy-readinessProbe-tcpSocket.yaml": no matches for kind "configMap" in version "v1"
[root@master231 deployments]#
错误原因
K8S集群的资源类型写错了。
解决方案
检查资源的kind字段,是否符合K8S集群的类型,可以通过kubectl api-resource查看K8S资源的类型。
或者使用“kubectl explain cm”也可以检查cm资源所属的类型。
Q50.Metrics server组件不工作
报错信息
[root@master231 deployments]# kubectl top node
error: Metrics API not available
[root@master231 deployments]#
错误原因
根据报错提示,Metrics server组件不正常工作导致的。
解决方案
- 1.检查是否安装Metrics server组件;
- 2.检查K8S集群网络插件,可能导致metrics server不正常工作;
Q51.Metrics server组件正常工作
报错信息
[root@master231 02-metrics-server]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
deploy-stress Deployment/deploy-stress <unknown>/95% 2 5 0 3s
[root@master231 02-metrics-server]#
错误原因
如果目标一直处于"<unknown>"状态,说明没有监控到目标,也可能metrics-server不正常工作。
解决方案
- 1.检查metrics-server的版本和K8S是否对应,验证组件是否正常工作;(kubectl top)
- 2.被监控目标(Deployment/deploy-stress)是否被移除;
Q52.Metrics server组件正常工作
报错信息
replicasets.apps is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "replicasets" in API group "apps" in the namespace "default"
deployments.apps is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "deployments" in API group "apps" in the namespace "default"
statefulsets.apps is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "statefulsets" in API group "apps" in the namespace "default"
错误原因
访问K8S集群资源没有权限。
解决方案
为K8S配置授权。
Q53.pvc无法关联pv或者sc
报错信息
[root@master231 persistentvolumeclaims]# kubectl get pvc oldboyedu-linux-pvc-2
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
oldboyedu-linux-pvc-2 Pending 30s
[root@master231 persistentvolumeclaims]#
[root@master231 persistentvolumeclaims]# kubectl describe pvc oldboyedu-linux-pvc-2
Name: oldboyedu-linux-pvc-2
Namespace: default
StorageClass:
Status: Pending
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 13s (x3 over 32s) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
[root@master231 persistentvolumeclaims]#
错误原因
pvc无法关联pv或者sc,目前集群资源不符合pvc的期望。
解决方案
- 降低pvc的期望存储(不推荐)
- 创建符合对应的pv或者sc即可。
Q54.pvc被引用时无法被删除
报错信息
[root@master231 ~]# kubectl get pvc oldboyedu-linux-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
oldboyedu-linux-pvc Terminating oldboyedu-linux-pv02 5Gi RWX 55m
[root@master231 ~]#
[root@master231 ~]# kubectl describe pvc oldboyedu-linux-pvc
Name: oldboyedu-linux-pvc
Namespace: default
StorageClass:
Status: Terminating (lasts 5m2s)
Volume: oldboyedu-linux-pv02
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 5Gi
Access Modes: RWX
VolumeMode: Filesystem
Used By: deploy-xiuxian-pvc-6564dd4856-t52rq
Events: <none>
[root@master231 ~]#
错误原因
通过观察"Used By"说明该pvc正在被名为"deploy-xiuxian-pvc-6564dd4856-t52rq"的pods使用。
解决方案
删除pod对于pvc的引用即可。
Q55.pod无法关联pvc
报错信息
[root@master231 deployments]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-xiuxian-pvc-6564dd4856-86kds 0/1 Pending 0 41s <none> <none> <none> <none>
[root@master231 deployments]#
[root@master231 deployments]# kubectl describe pod deploy-xiuxian-pvc-6564dd4856-86kds
Name: deploy-xiuxian-pvc-6564dd4856-86kds
Namespace: default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 42s default-scheduler 0/3 nodes are available: 3 persistentvolumeclaim "oldboyedu-linux-pvc" not found.
[root@master231 deployments]#
错误原因
pod无找到pvc。
解决方案
- 1.手动创建pvc
- 2.pod引用其他的pvc
Q56.sc的回收策略不支持热更新
报错信息
[root@master231 storageclasses]# kubectl apply -f sc.yaml
storageclass.storage.k8s.io/oldboyedu-sc-xixi unchanged
The StorageClass "oldboyedu-sc-haha" is invalid: reclaimPolicy: Forbidden: updates to reclaimPolicy are forbidden.
[root@master231 storageclasses]#
错误原因
sc的回收策略不支持热更新。
解决方案
删除sc重建即可。
Q57.sc不支持archiveOnDelete参数
报错信息
[root@master231 persistentvolumeclaims]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
oldboyedu-linux-pvc Bound pvc-f47d1e5b-a2f1-463f-b06c-7940add76104 3Gi RWX nfs-csi 3h12m
pvc-linux94 Pending oldboyedu-sc-haha 7s
[root@master231 persistentvolumeclaims]#
[root@master231 persistentvolumeclaims]#
[root@master231 persistentvolumeclaims]# kubectl describe pvc pvc-linux94
Name: pvc-linux94
Namespace: default
StorageClass: oldboyedu-sc-haha
Status: Pending
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 8s (x2 over 12s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator
Normal Provisioning 5s (x4 over 12s) nfs.csi.k8s.io_worker233_d8fe1cb1-595b-4208-95c1-8599785034ad External provisioner is provisioning volume for claim "default/pvc-linux94"
Warning ProvisioningFailed 5s (x4 over 12s) nfs.csi.k8s.io_worker233_d8fe1cb1-595b-4208-95c1-8599785034ad failed to provision volume with StorageClass "oldboyedu-sc-haha": rpc error: code = InvalidArgument desc = invalid parameter "archiveOnDelete" in storage class
[root@master231 persistentvolumeclaims]#
错误原因
新版本的nfs 4.9.0不支持"archiveOnDelete"参数,参考官网即可。
解决方案
经过查看官网链接,不难发现已经将"archiveOnDelete"参数更名为"OnDelete"参数啦。
参考链接:
https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/docs/driver-parameters.md#storage-class-usage-dynamic-provisioning
Q58.sc的回收策略不支持
报错信息
[root@master231 storageclasses]# kubectl apply -f sc.yaml
storageclass.storage.k8s.io/oldboyedu-sc-xixi created
The StorageClass "oldboyedu-sc-haha" is invalid: reclaimPolicy: Unsupported value: "archive": supported values: "Delete", "Retain"
[root@master231 storageclasses]#
错误原因
新版本的nfs 4.9.0不支持指定的回收策略,默认有效的回收策略为: "Delete", "Retain"
解决方案
根据提示修改为正确的回收策略即可。
Q59.yaml格式解析出错
报错信息
[root@master231 04-helm]# helm -n oldboyedu-helm install xiuxian oldboyedu-xiuxian-quote
Error: INSTALLATION FAILED: 1 error occurred:
* Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field LabelSelector.spec.selector.matchLabels of type string
[root@master231 04-helm]#
错误原因
对于deploy资源"spec.selector.matchLabels"要求是string类型,可能得到了不符合字符串类型的数据。
解决方案
如果是helm可以考虑使用quote或者squote实现。
Q60.本地没有helm仓库
报错信息
[root@master231 04-helm]# helm repo list
Error: no repositories to show
[root@master231 04-helm]#
错误原因
本地没有helm仓库,可以自行添加第三方仓库。
解决方案
在网上找一些知名度比较高,可信任的站点添加仓库。
Q61.资源清单api版本错误
报错信息
[root@master231 04-helm]# helm install es-exporter elasticsearch-exporter
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: resource mapping not found for name: "es-exporter-elasticsearch-exporter" namespace: "" from "": no matches for kind "Deployment" in version "apps/v1beta2"
ensure CRDs are installed first
[root@master231 04-helm]#
错误原因
类型的api版本出错。
解决方案
修改资源清单即可。
Q62.未定义Chart的version信息
报错信息
[root@master231 13-oldboyedu-xiuxian-package]# helm package .
Error: validation: chart.metadata.version is required
[root@master231 13-oldboyedu-xiuxian-package]#
错误原因
未定义Chart的version信息。
解决方案
查看Chart的Chart.yaml文件,观察是否定义version字段。
Q63.未定义Chart的name信息
报错信息
[root@master231 13-oldboyedu-xiuxian-package]# helm package .
Error: validation: chart.metadata.name is required
[root@master231 13-oldboyedu-xiuxian-package]#
错误原因
未定义Chart的name信息。
解决方案
查看Chart的Chart.yaml文件,观察是否定义name字段。
Q64.helm对harbor证书不识别
报错信息
[root@master231 13-oldboyedu-xiuxian-package]# helm push oldboyedu-apps-v1.tgz oci://harbor.oldboyedu.com/oldboyedu-helm
Error: failed to do request: Head "https://harbor.oldboyedu.com/v2/oldboyedu-helm/oldboyedu-apps/blobs/sha256:846795cdbf1cb14ec33d7c87b50795e59b17347e218c077d3fad66e8321f46c1": tls: failed to verify certificate: x509: certificate signed by unknown authority
[root@master231 13-oldboyedu-xiuxian-package]#
错误原因
helm对harbor证书不识别。
解决方案
使用helm是查看帮助信息,以识别证书的相关参数。举例:
[root@master231 13-oldboyedu-xiuxian-package]# helm push oldboyedu-apps-v1.tgz oci://harbor.oldboyedu.com/oldboyedu-helm --ca-file /etc/docker/certs.d/harbor.oldboyedu.com/ca.crt --cert-file /etc/docker/certs.d/harbor.oldboyedu.com/harbor.oldboyedu.com.cert
Pushed: harbor.oldboyedu.com/oldboyedu-helm/oldboyedu-apps:v1
Digest: sha256:34a8cd21c9bc7a3c6361aa13768e3a1d5780ef7d1e64617c8b7fda4fb3d040dc
[root@master231 13-oldboyedu-xiuxian-package]#
Q65.helm的Release版本不对应
报错信息
[root@master231 kubeapps-12.2.10]# helm -n oldboyedu-helm install oldboyedu-kubeapps kubeapps
Error: INSTALLATION FAILED: Unable to continue with install: AppRepository "bitnami" in namespace "oldboyedu-helm" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "oldboyedu-kubeapps": current value is "myapps"
[root@master231 kubeapps-12.2.10]#
错误原因
初步怀疑是kubeapps部署时有数据在etcd中未删除干净。
解决方案
- 1.等待一段时间再执行;
- 2.或者修改和报错同名称的Release测试;
- 3.换一个新的名称空间测试;
Q66.默认启用admissionWebhooks导致的报错
报错信息
[root@master231 ingresses]# kubectl apply -f 02-ingress-xiuxian.yaml
Error from server (InternalError): error when creating "02-ingress-xiuxian.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "https://myingress-ingress-nginx-controller-admission.yinzhengjie-ingress.svc:443/networking/v1/ingresses?timeout=10s": x509: certificate is not valid for any names, but wanted to match myingress-ingress-nginx-controller-admission.yinzhengjie-ingress.svc
[root@master231 ingresses]#
错误原因
默认启用admissionWebhooks导致的报错
解决方案
禁用admissionWebhooks即可。
Q67.ing找到svc
报错信息
[root@master231 ingresses]# kubectl describe -f 03-ingress-redirect.yaml
Name: apps-redirect
Labels: <none>
Namespace: default
Address: 10.0.0.150
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
blog.oldboyedu.com
/ svc-apps:80 (<error: endpoints "svc-apps" not found>)
Annotations: nginx.ingress.kubernetes.io/permanent-redirect: https://www.cnblogs.com/yinzhengjie
nginx.ingress.kubernetes.io/permanent-redirect-code: 308
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 3s (x2 over 7s) nginx-ingress-controller Scheduled for sync
Normal Sync 3s (x2 over 7s) nginx-ingress-controller Scheduled for sync
[root@master231 ingresses]#
错误原因
根据错误提示: "error: endpoints "svc-apps" not found"不难发现,ing后端的svc不存在。
解决方案
检查环境配置,是否存在svc,更正即可。
Q68.yaml格式制表符问题
报错信息
[root@master231 04-kuboard]# docker-compose up -d
parsing /root/cloud-computing-stack/linux94/kubernetes/projects/04-kuboard/docker-compose.yaml: yaml: line 7: found a tab character that violates indentation
[root@master231 04-kuboard]#
错误原因
yaml格式制表符问题,可以使用"cat -A"选项验证是否有问题。
解决方案
使用"cat -A"选项验证是否有问题,根据提示改正即可。
Q69.ingress在相同的名称空间找不到svc
报错信息
[root@master231 05-prometheus]# kubectl describe -f 01-ingress-prometheus.yaml
Name: ing-prometheus-grafana
Labels: <none>
Namespace: default
Address:
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
grafana.oldboyedu.com
/ grafana:3000 (<error: endpoints "grafana" not found>)
prom.oldboyedu.com
/ prometheus-k8s:9090 (<error: endpoints "prometheus-k8s" not found>)
Annotations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 4s nginx-ingress-controller Scheduled for sync
Normal Sync 4s nginx-ingress-controller Scheduled for sync
[root@master231 05-prometheus]#
错误原因
ingress在相同的名称空间找不到svc
解决方案:
将ing和svc放在同一个名称空间中。
Q70.Could not get lock /var/lib/dpkg/lock-frontend.
报错信息
[root@node-exporter41 ~]# apt -y install ipvsadm ipset sysstat conntrack
Waiting for cache lock: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 2829 (unattended-Waiting for cache lock: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 2829 (unattended-upgr)
错误原因
dpkg工具被另一个安装占用导致的错误。
解决方案
使用"kill -9 2829"即可。
Q71.没有Kubeconfig认证文件
报错信息
[root@node-exporter42 ~]# kubectl get nodes
E1202 16:02:28.074900 13882 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1202 16:02:28.076566 13882 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1202 16:02:28.077953 13882 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1202 16:02:28.079502 13882 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1202 16:02:28.080963 13882 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[root@node-exporter42 ~]#
错误原因
没有Kubeconfig认证文件导致的报错。
解决方案
参考K8S配置Kubeconfig认证的课程视频笔记即可。
Q72.CoreDNS本地回环问题
报错信息
[FATAL] plugin/loop: Loop (127.0.0.1:36030 #> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 8244365230594049349.2552766472385065880."
错误原因
CoreDNS组件本地的DNS## 解析和Pod## 解析回环## 问题导致的错误。
参考链接:
https://coredns.io/plugins/loop#troubleshooting
解决方案
如果修改本地的"/etc/resolv.conf"你会发现,修改后会被覆盖!因此我们需要自行定义一个文件## 解析记录。
1.所有节点添加## 解析记录
echo "nameserver 223.5.5.5" > /etc/kubernetes/resolv.conf
2.所有节点修改kubelet的配置文件
# vim /etc/kubernetes/kubelet#conf.yml
...
resolvConf: /etc/kubernetes/resolv.conf
3.所有节点重启kubelet组件
systemctl daemon#reload
systemctl restart kubelet
4.验证DNS组件是否正常工作
[root@node#exporter41 ~]# kubectl get svc,pods #n kube#system
NAME TYPE CLUSTER#IP EXTERNAL#IP PORT(S) AGE
service/coredns ClusterIP 10.200.0.254 <none> 53/UDP,53/TCP,9153/TCP 14h
NAME READY STATUS RESTARTS AGE
pod/coredns#859664f9d8#2fl7l 1/1 Running 0 89s
pod/coredns#859664f9d8#stdbs 1/1 Running 0 89s
[root@node#exporter41 ~]#
[root@node#exporter41 ~]# kubectl get svc #A
NAMESPACE NAME TYPE CLUSTER#IP EXTERNAL#IP PORT(S) AGE
calico#apiserver calico#api ClusterIP 10.200.93.100 <none> 443/TCP 16h
calico#system calico#kube#controllers#metrics ClusterIP None <none> 9094/TCP 15h
calico#system calico#typha ClusterIP 10.200.250.163 <none> 5473/TCP 16h
default kubernetes ClusterIP 10.200.0.1 <none> 443/TCP 17h
kube#system coredns ClusterIP 10.200.0.254 <none> 53/UDP,53/TCP,9153/TCP 14h
[root@node#exporter41 ~]#
[root@node#exporter41 ~]# dig @10.200.0.254 calico#api.calico#apiserver.svc.oldboyedu.com +short
10.200.93.100
[root@node#exporter41 ~]#
[root@node#exporter41 ~]# dig @10.200.0.254 calico#typha.calico#system.svc.oldboyedu.com +short
10.200.250.163
[root@node#exporter41 ~]#
Q73.fail to check rbd image status with: (executable file not found in $PATH)
报错信息
[root@master231 06#ceph]# kubectl get pods #o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy#xiuxian#rbd#698bbc5555#44dz6 0/1 ContainerCreating 0 34s <none> worker233 <none> <none>
[root@master231 06#ceph]#
[root@master231 06#ceph]#
[root@master231 06#ceph]# kubectl describe pod deploy#xiuxian#rbd#698bbc5555#44dz6
Name: deploy#xiuxian#rbd#698bbc5555#44dz6
Namespace: default
...
Type Reason Age From Message
#### ###### #### #### #######
Normal Scheduled 36s default#scheduler Successfully assigned default/deploy#xiuxian#rbd#698bbc5555#44dz6 to worker233
Normal SuccessfulAttachVolume 36s attachdetach#controller AttachVolume.Attach succeeded for volume "data"
Warning FailedMount 4s (x7 over 35s) kubelet MountVolume.WaitForAttach failed for volume "data" : fail to check rbd image status with: (executable file not found in $PATH), rbd output: ()
[root@master231 06#ceph]#
错误原因
无法使用以下命令检查rbd映像状态,说白了,就是k8s的worker节点未安装rbd相关的工具。
解决方案
安装ceph#common工具包即可。
Q73: MountVolume.WaitForAttach failed for volume "data" : fail to check rbd image status with: (exit status 95), rbd output: (did not load config file, using default settings.
报错信息
[root@master231 06#ceph]# kubectl get pods #o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy#xiuxian#rbd#698bbc5555#9wtf6 0/1 ContainerCreating 0 22s <none> worker232 <none> <none>
[root@master231 06#ceph]#
[root@master231 06#ceph]#
[root@master231 06#ceph]# kubectl describe po deploy#xiuxian#rbd#698bbc5555#9wtf6
Name: deploy#xiuxian#rbd#698bbc5555#9wtf6
Namespace: default
Priority: 0
...
Events:
Type Reason Age From Message
#### ###### #### #### #######
Normal Scheduled 24s default#scheduler Successfully assigned default/deploy#xiuxian#rbd#698bbc5555#9wtf6 to worker232
Normal SuccessfulAttachVolume 24s attachdetach#controller AttachVolume.Attach succeeded for volume "data"
...
Warning FailedMount 5s kubelet MountVolume.WaitForAttach failed for volume "data" : fail to check rbd image status with: (exit status 95), rbd output: (did not load config file, using default settings.
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 Errors while parsing config file!
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 can't open ceph.conf: (2) No such file or directory
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 Errors while parsing config file!
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 can't open ceph.conf: (2) No such file or directory
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 AuthRegistry(0x558e897092e8) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2024#12#09T11:00:25.991+0800 7fd4c748f4c0 #1 AuthRegistry(0x7fff3c530800) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2024#12#09T11:00:25.995+0800 7fd4c748f4c0 #1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
rbd: couldn't connect to the cluster!
)
[root@master231 06#ceph]#
错误原因
无法加载ceph的秘钥认证文件。
解决方案
- 1.将认证文件拷贝到K8S工作节点集群环境即可。
- 2.请确保名称是否和默认的路径"/etc/ceph/keyring"是否对应,若不对应,则需要在资源清单中指定。
Q74.pod has unbound immediate PersistentVolumeClaims.
报错信息
[root@master231 06#ceph]# kubectl get pods #o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy#xiuxian#secretref#5d97b68548#rf9f9 0/1 Pending 0 5s <none> <none> <none> <none>
[root@master231 06#ceph]#
[root@master231 06#ceph]#
[root@master231 06#ceph]# kubectl describe pod deploy#xiuxian#secretref#5d97b68548#rf9f9
Name: deploy#xiuxian#secretref#5d97b68548#rf9f9
Namespace: default
...
Events:
Type Reason Age From Message
#### ###### #### #### #######
Warning FailedScheduling 9s default#scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
[root@master231 06#ceph]#
错误原因
pod具有未绑定的pvc导致Pod无法挂载持久卷。
解决方案
检查是否pvc为绑定导致的错误。
Q75: wrong fs type, bad option, bad superblock on /dev/rbd0, missing codepage ...
报错信息
[root@master231 06#ceph]# kubectl get pods #o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy#xiuxian#secretref#5d97b68548#pxtkq 0/1 ContainerCreating 0 13s <none> worker233 <none> <none>
[root@master231 06#ceph]#
[root@master231 06#ceph]# kubectl describe pods deploy#xiuxian#secretref#5d97b68548#pxtkq
Name: deploy#xiuxian#secretref#5d97b68548#pxtkq
...
Events:
Type Reason Age From Message
#### ###### #### #### #######
Normal Scheduled 18s default#scheduler Successfully assigned default/deploy#xiuxian#secretref#5d97b68548#pxtkq to worker233
Normal SuccessfulAttachVolume 18s attachdetach#controller AttachVolume.Attach succeeded for volume "pv#rbd"
Warning FailedMount 16s kubelet MountVolume.MountDevice failed for volume "pv#rbd" : rbd: failed to mount device /dev/rbd0 at /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/yinzhengjie#k8s#image#xiuxian (fstype: ), error mount failed: exit status 32
Mounting command: systemd#run
Mounting arguments: ##description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/yinzhengjie#k8s#image#xiuxian ##scope ## mount #t ext4 #o defaults /dev/rbd0 /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/yinzhengjie#k8s#image#xiuxian
Output: Running scope as unit: run#refe6402e5fb941fb8114176a595fdaed.scope
mount: /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/yinzhengjie#k8s#image#xiuxian: wrong fs type, bad option, bad superblock on /dev/rbd0, missing codepage or helper program, or other error.
...
[root@master231 06#ceph]#
错误原因
当使用ceph做后端存储时,若不指定fsType字段,则默认为ext4,因此检查块设备的文件系统类型是否对应。
解决方案
已经存在的文件系统,需要使用fsType字段指定正确的文件系统名称。
Q76.missing configuration for cluster ID "0f06b0e2#b128#11ef#9a37#4971ded8a98b"
报错信息
[root@master231 rbd]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
...
rbd#pvc01 Pending csi#rbd#sc 11s
rbd#pvc02 Pending csi#rbd#sc 11s
[root@master231 rbd]#
[root@master231 rbd]#
[root@master231 rbd]# kubectl describe pvc rbd#pvc01
Name: rbd#pvc01
Namespace: default
StorageClass: csi#rbd#sc
Status: Pending
...
Events:
Type Reason Age From Message
#### ###### #### #### #######
Warning ProvisioningFailed 17s persistentvolume#controller storageclass.storage.k8s.io "csi#rbd#sc" not found
Normal ExternalProvisioning 3s (x2 over 3s) persistentvolume#controller waiting for a volume to be created, either by external provisioner "rbd.csi.ceph.com" or manually created by system administrator
Normal Provisioning 1s (x3 over 3s) rbd.csi.ceph.com_csi#rbdplugin#provisioner#5dfcf67885#jmplh_d864ea20#046e#430e#9867#7a34250c7d9d External provisioner is provisioning volume for claim "default/rbd#pvc01"
Warning ProvisioningFailed 1s (x3 over 3s) rbd.csi.ceph.com_csi#rbdplugin#provisioner#5dfcf67885#jmplh_d864ea20#046e#430e#9867#7a34250c7d9d failed to provision volume with StorageClass "csi#rbd#sc": rpc error: code = InvalidArgument desc = failed to fetch monitor list using clusterID (0f06b0e2#b128#11ef#9a37#4971ded8a98b): missing configuration for cluster ID "0f06b0e2#b128#11ef#9a37#4971ded8a98b"
[root@master231 rbd]#
错误原因
检查动态存储类的配置,有关于"cluster ID"字段的配置是否加了双引号。
解决方案
在“cluster ID”字段必须添加双引号,否则会## 报错!
本文来自博客园,作者:尹正杰,转载请注明原文链接:https://www.cnblogs.com/yinzhengjie/p/18645161,个人微信: "JasonYin2020"(添加时请备注来源及意图备注,有偿付费)
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。