1、高可用脚本安装完etcd后启动失败
解决:所有节点重启即可解决。这样的情况遇到了三次,就是因为电脑太卡了,当时cpu利用率很高,达到了94%。脚本是正确的,跟脚本没有关系
所以最好分开安装,先安装etcd集群,然后重启所有节点,再安装k8s部分,
2、kube-apiserver报错: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
现象如下:
[root@test1 ssl]# systemctl status kube-apiserver -l
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2019-02-06 18:14:58 EST; 1h 3min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 1684 (kube-apiserver)
Tasks: 16
Memory: 11.4M
CGroup: /system.slice/kube-apiserver.service
└─1684 /opt/k8s/bin/kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --anonymous-auth=false # --experimental-encryption-provider-config=/etc/kubernetes/encryption-config.yaml --advertise-address=192.168.0.91 --bind-address=192.168.0.91 --insecure-port=8080 --authorization-mode=Node,RBAC # --runtime-config=api/all --enable-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-cluster-ip-range=10.254.0.0/16 --service-node-port-range=8000-30000 --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem --client-ca-file=/etc/kubernetes/cert/ca.pem --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem --etcd-cafile=/etc/kubernetes/cert/ca.pem --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem --service-account-key-file=/etc/kubernetes/cert/sa.pub --etcd-servers=https://192.168.0.91:2379,https://192.168.0.92:2379,https://192.168.0.93:2379 --enable-swagger-ui=true --secure-port=6443 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --allow-privileged=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/log/kube-apiserver-audit.log --event-ttl=1h --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2
Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.055401 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.650493 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.074728 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.666053 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.103077 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.689155 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.123484 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.707282 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.246831 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.729613 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input
解决:用脚本重新装了一遍好了。
3、kube-apiserver无法启动:external host was not specified, using 192.168.0.91
解决:kube-apiserver启动文件里面的注释都删掉即可解决
4、kubelet日志有错误:No valid private key and/or certificate found, reusing existing private key or creating a new one
下面报错是正常的,但是还是排查了一遍发现两个致命错误
[root@test4 kubernetes]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled)
Active: active (running) since Thu 2019-02-07 07:24:53 EST; 5s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 73646 (kubelet)
Tasks: 12
Memory: 15.2M
CGroup: /system.slice/kubelet.service
└─73646 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.021451 73646 server.go:407] Version: v1.13.0
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024450 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024837 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025195 73646 plugins.go:103] No cloud provider specified.
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025304 73646 server.go:523] No cloud provider specified: "" from the config file: ""
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025410 73646 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.043219 73646 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one
Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.176716 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials
Feb 07 07:24:56 test4 kubelet[73646]: I0207 07:24:56.347469 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials
Feb 07 07:24:58 test4 kubelet[73646]: I0207 07:24:58.451741 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials
错误一:
查看生成boootstrap配置文件发现错误,
发现BOOTSTRAP_TOKEN=(kubeadm )竟然没有加$,必须要加上$符号。这是最主要的错误,还有个错误看下面
BOOTSTRAP_TOKEN=(kubeadm token create --description kubelet-bootstrap-token --groups system:bootstrappers:test1 --kubeconfig ~/.kube/config)
[root@test1 profile]# cat bootstrap-kubeconfig.sh
#!/bin/bash
#定义变量
export MASTER_VIP="192.168.0.235"
export KUBE_APISERVER="https://192.168.0.235:8443"
export NODE_NAMES=(test1 test2 test3 test4)
cd $HOME/ssl/
for node_name in ${NODE_NAMES[*]}
do
#创建 token
export BOOTSTRAP_TOKEN=(kubeadm token create \
--description kubelet-bootstrap-token \
--groups system:bootstrappers:${node_name} \
--kubeconfig ~/.kube/config)
#设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=/etc/kubernetes/cert/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
#设置客户端认证参数
kubectl config set-credentials kubelet-bootstrap \
--token=${BOOTSTRAP_TOKEN} \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
#设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=kubelet-bootstrap \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
#设置默认上下文
kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
done
错误二:查看参数配置文件发现一个错误
[root@test4 ~]# cat /etc/kubernetes/kubelet.config.json
{
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"authentication": {
"x509": {
"clientCAFile": "/etc/kubernetes/cert/ca.pem"
},
"webhook": {
"enabled": true,
"cacheTTL": "2m0s"
},
"anonymous": {
"enabled": false
}
},
"authorization": {
"mode": "Webhook",
"webhook": {
"cacheAuthorizedTTL": "5m0s",
"cacheUnauthorizedTTL": "30s"
}
},
"address": "0.0.0.0",
"port": 10250,
"readOnlyPort": 0,
"cgroupDriver": "cgroupfs",
"hairpinMode": "promiscuous-bridge",
"serializeImagePulls": false,
"featureGates": {
"RotateKubeletClientCertificate": true,
"RotateKubeletServerCertificate": true
},
"clusterDomain": "cluster.local.",
"clusterDNS": ["10.254.0.2"]
}
发现address: 0.0.0.0并不是真实的ip地址。在test4节点用hostname -i 看到的竟然是0.0.0.0,把address改成真实的worker节点ip即可
5、通过csr请求后发现没有node
解决:发现是kubelet停了;原因是往配置文件里面加上cavidor参数后重启了下,并没有看状态,之后才发现挂了,重启即可
6、kubectl无法查询pod资源:Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused
error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
请仔细阅读完下面
现象如下:
[root@test4 profile]# kubectl run -it --rm --image=infoblox/dnstools dns-client
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused
deployment.apps "dns-client" deleted
Error from server: Get https://test4:10250/containerLogs/default/dns-client-86c6d59f7-tzh5c/dns-client: dial tcp 0.0.0.0:10250: connect: connection refused
查看coredns.yaml 文件
[root@test4 profile]# cat coredns.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: cluster_dns_svc_ip
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
发现没有ip "address": "0.0.0.0",
[root@test4 profile]# cat /etc/kubernetes/kubelet.config.json
{
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"authentication": {
"x509": {
"clientCAFile": "/etc/kubernetes/cert/ca.pem"
},
"webhook": {
"enabled": true,
"cacheTTL": "2m0s"
},
"anonymous": {
"enabled": false
}
},
"authorization": {
"mode": "Webhook",
"webhook": {
"cacheAuthorizedTTL": "5m0s",
"cacheUnauthorizedTTL": "30s"
}
},
"address": "0.0.0.0",
"port": 10250,
"readOnlyPort": 0,
"cgroupDriver": "cgroupfs",
"hairpinMode": "promiscuous-bridge",
"serializeImagePulls": false,
"featureGates": {
"RotateKubeletClientCertificate": true,
"RotateKubeletServerCertificate": true
},
"clusterDomain": "cluster.local.",
"clusterDNS": ["10.254.0.2"]
}
解决上面的问题后,扔然不管用。就怀疑是apiserver的问题,最后参照这篇文档中的apiserver启动配置文件 https://www.cnblogs.com/effortsing/p/10312081.html
需要在所有master节点kube-apiserver 启动参数中添加这句话:--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
然后重启所有master节点 kube-apiserver,就不再报dial tcp 192.168.0.93:10250: connect: no route to host,这个错误,但是出现新的报错,报错如下:
执行查看资源报错: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
[root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh
error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
解决:创建apiserver到kubelet的权限,就是没有给kubernetes用户rbac授权,授权即可,进行如下操作:
注意:user=kubernetes ,这个user要替换掉下面yaml文件里面的用户名
cat > apiserver-to-kubelet.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kubernetes-to-kubelet
rules:
- apiGroups:
- ""
resources:
- nodes/proxy
- nodes/stats
- nodes/log
- nodes/spec
- nodes/metrics
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kubernetes
namespace: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kubernetes-to-kubelet
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubernetes
EOF
创建授权:
kubectl create -f apiserver-to-kubelet.yaml
[root@test4 ~]# kubectl create -f apiserver-to-kubelet.yaml
clusterrole.rbac.authorization.k8s.io/system:kubernetes-to-kubelet created
clusterrolebinding.rbac.authorization.k8s.io/system:kubernetes created
重新进到容器查看资源
[root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh
/ # exit
现在可以进到容器里面查看资源了
参照文档:https://www.jianshu.com/p/b3d8e8b8fd7e
7、无法创建flannel、coredns 问题: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system
现象如下:pod都挂掉状态
[root@test4 profile]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-69d58bd968-mdskk 0/1 ContainerCreating 0 4s
coredns-69d58bd968-xjqpj 0/1 ContainerCreating 0 3m6s
kube-flannel-ds-4bgqb 0/1 Init:0/1 0 94s
查看pod日志发现错误:
Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system
[root@test4 profile]# kubectl describe pod coredns-69d58bd968-f9tn4 --namespace kube-system
Name: coredns-69d58bd968-f9tn4
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: test4/192.168.0.94
Start Time: Fri, 08 Feb 2019 23:50:28 -0500
Labels: k8s-app=kube-dns
pod-template-hash=69d58bd968
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-69d58bd968
Containers:
coredns:
Container ID:
Image: coredns/coredns:1.2.0
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-29dbl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-29dbl:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-29dbl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned kube-system/coredns-69d58bd968-f9tn4 to test4
Warning FailedMount 68s (x7 over 14m) kubelet, test4 Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system(38cb8d7e-2c26-11e9-8db2-000c2935f634)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"coredns-69d58bd968-f9tn4". list of unmounted volumes=[coredns-token-29dbl]. list of unattached volumes=[config-volume coredns-token-29dbl]
Warning FailedMount 7s (x16 over 16m) kubelet, test4 MountVolume.SetUp failed for volume "coredns-token-29dbl" : couldn't propagate object cache: timed out waiting for the condition
查看docker日志报错是一样的:
Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist"
[root@test4 profile]# systemctl status docker -l
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-02-08 23:23:56 EST; 50min ago
Docs: https://docs.docker.com
Main PID: 956 (dockerd)
CGroup: /system.slice/docker.service
├─ 956 /usr/bin/dockerd
└─1152 docker-containerd --config /var/run/docker/containerd/containerd.toml
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.245990170-05:00" level=error msg="Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.248503580-05:00" level=error msg="Failed to load container mount f4e32003f4c0fc39d292b2dd76dd0a0016a0b1e72028c7d4910749fc7836efde: mount does not exist"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.250961209-05:00" level=error msg="Failed to load container mount fb5ca71237d38e0bb413ac95a858ee3e41c209a936a1f41081bf2b6a57f10a45: mount does not exist"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.253042348-05:00" level=error msg="Failed to load container mount fb8dfb7d9813b638ac24dc9b0cde97ed095c222b22f8d44f082f5130e2f233e4: mount does not exist"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.666363859-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.760913207-05:00" level=info msg="Loading containers: done."
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.864408002-05:00" level=info msg="Docker daemon" commit=0520e24 graphdriver(s)=overlay2 version=18.03.0-ce
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.867069598-05:00" level=info msg="Daemon has completed initialization"
Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.883546083-05:00" level=info msg="API listen on /var/run/docker.sock"
Feb 08 23:23:56 test4 systemd[1]: Started Docker Application Container Engine.
解决:重启docker即可
systemctl restart docker
再次查看pod马上就正常
[root@test4 profile]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-69d58bd968-mdskk 1/1 Running 0 3m26s
coredns-69d58bd968-xjqpj 1/1 Running 0 6m28s
kube-flannel-ds-4bgqb 1/1 Running 0 4m56s
再次查看docker
这才是docker和k8s结合的正常状态
[root@test4 profile]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago
Docs: https://docs.docker.com
Main PID: 18711 (dockerd)
Tasks: 246
Memory: 98.6M
CGroup: /system.slice/docker.service
├─18711 /usr/bin/dockerd
├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml
├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
└─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054
Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete"
Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270
Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328
Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks"
Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete"
Hint: Some lines were ellipsized, use -l to show in full.
8、测试coredns功能时候,执行kubectl run -it --rm --image=infoblox/dnstools dns-client卡住
现象如下:
[root@test4 ~]# kubectl run -it --rm --image=infoblox/dnstools dns-client
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
查看pod
[root@test4 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
busybox1-54dc95466f-kmjcp 1/1 ContainerCreating 1 40m
dig-5c7554b84f-sdl8k 1/1 ContainerCreating 1 40m
dns-client-2-56bdd8dfd5-pn5zn 1/1 ContainerCreating 1 40m
dns-client-3-6f98f9f7df-g29d6 1/1 ContainerCreating 1 40m
dns-client-86c6d59f7-znnbb 1/1 ContainerCreating 1 40m
dnstools-6d4979fbbf-294ns 1/1 ContainerCreating 1 40m
原因:可能是因为flannal和coredns有问题,后来查看docker日志发现有错误日志;也可能是cpu标的太高,当时cpu86%。大概就是这两种情况。
解决:
关掉一个master节点来降低cpu
重启docker,查看docker状态,docker和k8s结合的正常状态应该是下面这样的:
[root@test4 profile]# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago
Docs: https://docs.docker.com
Main PID: 18711 (dockerd)
Tasks: 246
Memory: 98.6M
CGroup: /system.slice/docker.service
├─18711 /usr/bin/dockerd
├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml
├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
└─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu...
Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054
Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks"
Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete"
Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270
Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328
Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks"
Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete"
Hint: Some lines were ellipsized, use -l to show in full.
再次查看pod
[root@test4 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
busybox1-54dc95466f-kmjcp 1/1 Running 1 40m
dig-5c7554b84f-sdl8k 1/1 Running 1 40m
dns-client-2-56bdd8dfd5-pn5zn 1/1 Running 1 40m
dns-client-3-6f98f9f7df-g29d6 1/1 Running 1 40m
dns-client-86c6d59f7-znnbb 1/1 Running 1 40m
dnstools-6d4979fbbf-294ns 1/1 Running 1 40m
9、执行删除pod操作不管用
第一种情况很可能是cpu标高了,关掉一个master节点来降低cpu
第二种情况是其他组件出现了问题,比如flannal、coredns、docker,看看是否正常,尤其看docker是否有报错,这很关键
10、不管什么报错,时刻查看flannal、coredns、docker的状态,很有可能和这几个组件有关系
11、flannel处于Init:0/1状态、coredns无法创建
[root@test4 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s
coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s
kube-flannel-ds-w2r7l 0/1 Init:0/1 0 3m32s
首先查看有没有docker容器
[root@test4 profile]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
发现一个容器都没有,之前的pod就算创建失败,从这个也可以看个id,这次连id都没有,奇怪的很,只能通过kubectl去查看日志了,如下:
再查看pod日志:
[root@test4 profile]# cat /var/log/kubernetes/kubelet.test4.root.log.ERROR.20190210-071055.86336
Log file created at: 2019/02/10 07:10:55
Running on machine: test4
Binary: Built with gc go1.11.2 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0210 07:10:55.151126 86336 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
E0210 07:14:56.172087 86336 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
发现这句话:rpc error: code = DeadlineExceeded desc = context deadline exceeded
上面这句话从网上搜说是网络问题。之前一直没事,就没动过网络,能有啥问题。这时候突然想起来了防火墙,就去看看防火墙状态吧,发现防火墙开着的,如下:
刚开始安装就已经禁用了防火墙怎么会开着的,奇怪了。猜测可能是配置ip_vs内核参数时候,自动开启了防火墙,这可能是默认规则,然后需要关闭即可
[root@test4 profile]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2019-02-10 04:56:42 EST; 2h 28min ago
Docs: man:firewalld(1)
Main PID: 28767 (firewalld)
Tasks: 2
Memory: 372.0K
CGroup: /system.slice/firewalld.service
└─28767 /usr/bin/python2.7 /usr/sbin/firewalld --nofork --nopid
Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?).
Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C PREROUTING -m addrtype -...t name.
Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C OUTPUT -m addrtype --dst...t name.
Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.17.0....t name.
Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C DOCKER -i docker0 -j RET...hain?).
Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?).
Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 -o...hain?).
Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 ! ...hain?).
Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j...t name.
Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -m...hain?).
Hint: Some lines were ellipsized, use -l to show in full.
一直没动过防火墙,怎么会开着的,奇怪了都,关掉防火墙,重新创建flannel,等几分钟flannel就会处于running状态,查看如下:
[root@test4 profile]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-w2r7l 1/1 Running 0 6m33s
再次查看docker中的容器,发现就有个flannel容器,查看如下:
[root@test4 profile]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 12 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0
[root@test4 profile]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 13 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0
9d886b599345 b949a39093d6 "cp -f /etc/kube-fla…" 13 minutes ago Exited (0) 13 minutes ago k8s_install-cni_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0
5c85c66161fb registry.access.redhat.com/rhel7/pod-infrastructure:latest "/usr/bin/pod" 13 minutes ago Up 13 minutes k8s_POD_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0
12、pod一直处于pending状态
[root@test4 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s
coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s
查看 kubelet的日志发现总是报Unable to update cni config,干脆直接不用cni插件,直接从kubelet启动参数中剔除掉cni先关的几个参数即可解决
[root@test4 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled)
Active: active (running) since Sun 2019-02-10 05:21:42 EST; 12min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 33598 (kubelet)
CGroup: /system.slice/kubelet.service
└─33598 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2
Feb 10 05:34:16 test4 kubelet[33598]: W0210 05:34:16.651679 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 10 05:34:16 test4 kubelet[33598]: E0210 05:34:16.652336 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 10 05:34:21 test4 kubelet[33598]: W0210 05:34:21.656085 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 10 05:34:21 test4 kubelet[33598]: E0210 05:34:21.656587 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 10 05:34:26 test4 kubelet[33598]: W0210 05:34:26.665157 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 10 05:34:26 test4 kubelet[33598]: E0210 05:34:26.666018 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 10 05:34:31 test4 kubelet[33598]: W0210 05:34:31.669777 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 10 05:34:31 test4 kubelet[33598]: E0210 05:34:31.671423 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Feb 10 05:34:36 test4 kubelet[33598]: W0210 05:34:36.677673 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 10 05:34:36 test4 kubelet[33598]: E0210 05:34:36.679154 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
13、用kubeadm token 命令查看token总是报错说 base64 格式不正确,这是因为token过期了,重启搭建k8s集群即可解决,并不是不能用kubeadm命令了。试过,
kubeadm token list --kubeconfig ~/.kube/config
14、第二次安装k8s再次出现flannel处于Init:0/1状态
[root@test4 ~]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-69d58bd968-gqsqz 0/1 ContainerCreating 0 11m
coredns-69d58bd968-wg2jf 0/1 ContainerCreating 0 11m
kube-flannel-ds-kljwc 0/1 Init:0/1 0 14m
这时候发现cpu已经标到 93%,可能是cpu导致的,也可能是防火墙导致的。做了如下两步就解决了
启动防火墙然后关掉
关掉其中一个master节点,因为cpu使用率太高了
15、安装完etcd集群显示有一个etcd不健康,
解决:
不影响使用,继续安装即可,里面问题太多了。是因为之前执行脚本的时候出现报错信息,中间给停过脚本,之后安装完就出现有一个etcd节点不健康
或者重新安装一遍即可解决,亲测有效