使用kubeadm安装k8s集群v1.20.6 --- 1
本文记录使用kubeadm安装k8s集群v1.20.6,仅仅为实验环境,主控节点为单节点。
按照官方要求,服务器至少为 2台 2核4G,CentOS 7.8 以上
一:环境介绍
| IP地址 | 操作系统 | 主机名 | 角色 | 安装软件 |
| 192.168.6.81 | Centos 7.9 | K8S-6-81 | Master |
docker kube-apiserver,kube-schduler kube-controller-manager kubelet,etcd,flannel |
| 192.168.6.82 | Centos 7.9 | K8S-6-82 | Node1 | docker,kubelet,kube-proxy,flannel |
| 192.168.6.83 | Centos 7.9 | K8S-6-83 | Node2 | docker,kubelet,kube-proxy,flannel |
二:环境初始化(所有服务器都操作)
1:关闭selinux # 临时关闭 setenforce 0 # 永久关闭 sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config 2:关闭防火墙 systemctl stop firewalld systemctl disable firewalld 3:同步时间 ntpdate time1.aliyun.com 4:修改主机名 方法一: hostnamectl set-hostname k8s-6-81 方法二: vi /etc/hostname
# 主机名进行解析
echo "k8s-6-81 192.168.6.81" >> /etc/hosts 5:安装依赖包 yum install -y ntpdate telnet net-tools wget epel-release 6:关闭swap分区 swapoff -a #临时 sed -i '/swap/s/^/#/' /etc/fstab #永久 7:内核优化 echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.d/k8s.conf echo 'net.bridge.bridge-nf-call-ip6tables = 1' >> /etc/sysctl.d/k8s.conf echo 'net.bridge.bridge-nf-call-iptables = 1' >> /etc/sysctl.d/k8s.conf ~]# modprobe br_netfilter ~]# sysctl -p /etc/sysctl.d/k8s.conf 8:配置IPVS ~]# vi /etc/sysconfig/modules/ipvs.modules #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 ~]# chmod 755 /etc/sysconfig/modules/ipvs.modules ~]# sh /etc/sysconfig/modules/ipvs.modules 验证IPVS是否启用 ~]# lsmod | grep ip_vs ip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145458 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 139264 2 ip_vs,nf_conntrack_ipv4 libcrc32c 12644 3 xfs,ip_vs,nf_conntrack
三:安装Docker,并配置Docker
在192.168.6.81,192.168.6.82,192.168.6.83上操作:
1:安装Docker ~]# curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun 2:配置Docker ~]# mkdir /etc/docker ~]# vi /etc/docker/daemon.json { "data-root": "/data/docker", "storage-driver": "overlay2", "insecure-registries": ["registry.access.redhat.com","quay.io","harbor.auth.com"], "registry-mirrors": ["https://q2gr04ke.mirror.aliyuncs.com"], "bip": "172.6.81.1/24", "exec-opts": ["native.cgroupdriver=systemd"], "live-restore": true } ########## bip要根据宿主机ip变化 注意:k8s-6-81 bip 172.6.81.1/24 k8s-6-82 bip 172.6.82.1/24 k8s-6-83 bip 172.6.83.1/24 3:启动Docker mkdir -p /data/docker systemctl start docker systemctl enable docker docker --version
四:添加K8S yum仓库源(在192.168.6.81,192.168.6.82,192.168.6.83上操作:)
cat /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes Repo baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg enabled=1
五:安装kubeadm,并启动kubelet(在192.168.6.81,192.168.6.82,192.168.6.83上操作:)
默认安装最新版,也可以手动指定版本,如 kubelet-1.20.6
yum install kubelet-1.20.6 kubeadm-1.20.6 kubectl-1.20.6 -y
或者
yum install kubelet-1.23.6 kubeadm-1.23.6 kubectl-1.23.6 -y
systemctl start kubelet && systemctl enable kubelet
注意,此时kubelet是无法正常启动的,可以查看/var/log/messages有报错信息,等待master节点初始化之后即可正常运行
六:需要提前下载所需的镜像,在master主机上操作即可
vim k8s-image-download.sh
#!/bin/bash if [ $# -ne 1 ];then echo "USAGE: bash `basename $0` KUBERNETES-VERSION" exit 1 fi version=$1 images=`kubeadm config images list --kubernetes-version=${version} |awk -F'/' '{print $2}'` for imageName in ${images[@]};do docker pull registry.aliyuncs.com/google_containers/$imageName done # 执行shell脚本 sh k8s-image-download 1.20.6
或者
sh k8s-image-download 1.23.6
七:集群搭建
7.1:master节点执行
kubeadm init --kubernetes-version=v1.20.6 \ --pod-network-cidr=172.6.0.0/16 \ --service-cidr=10.100.0.0/16 \ --apiserver-advertise-address=192.168.6.81 \ --ignore-preflight-errors=Swap \ --ignore-preflight-errors=NumCPU \ --image-repository registry.aliyuncs.com/google_containers
参数说明
- --kubernetes-version=v1.20.1:指定要安装的版本号。
- --apiserver-advertise-address:指定用 Master 的哪个IP地址与 Cluster的其他节点通信。
- --service-cidr:指定Service网络的范围,即负载均衡VIP使用的IP地址段。
- --pod-network-cidr:指定Pod网络的范围,即Pod的IP地址段。
- --ignore-preflight-errors=:忽略运行时的错误,例如执行时存在[ERROR NumCPU]和[ERROR Swap],忽略这两个报错就是增加--ignore-preflight-errors=NumCPU 和--ignore-preflight-errors=Swap的配置即可。
- --image-repository:Kubenetes默认Registries地址是 k8s.gcr.io,一般在国内并不能访问 gcr.io,可以将其指定为阿里云镜像地址:registry.aliyuncs.com/google_containers
执行过程显示如下:
[init] Using Kubernetes version: v1.20.6 [preflight] Running pre-flight checks [WARNING NumCPU]: the number of available CPUs 1 is less than the required 2 [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service' [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Activating the kubelet service [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [node-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 1.1.1.101] [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [node-1 localhost] and IPs [1.1.1.101 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [node-1 localhost] and IPs [1.1.1.101 127.0.0.1 ::1] [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 22.503724 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node master as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: z1609x.bg2tkrsrfwlrl3rb [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.6.81:6443 --token 13ywnk.xkwedmlhj2z0lzz6 \ --discovery-token-ca-cert-hash sha256:ccbcabf8f22c39ec40c5de522108c007fd93e66eb59052deb474fc36e0875001
初始化操作主要经历了下面15个步骤,每个阶段均输出均使用[步骤名称]作为开头:
- [init]:指定版本进行初始化操作
- [preflight] :初始化前的检查和下载所需要的Docker镜像文件。
- [kubelet-start] :生成kubelet的配置文件”/var/lib/kubelet/config.yaml”,没有这个文件kubelet无法启动,所以初始化之前的kubelet实际上启动失败。
- [certificates]:生成Kubernetes使用的证书,存放在/etc/kubernetes/pki目录中。
- [kubeconfig] :生成 KubeConfig 文件,存放在/etc/kubernetes目录中,组件之间通信需要使用对应文件。
- [control-plane]:使用/etc/kubernetes/manifest目录下的YAML文件,安装 Master 组件。
- [etcd]:使用/etc/kubernetes/manifest/etcd.yaml安装Etcd服务。
- [wait-control-plane]:等待control-plan部署的Master组件启动。
- [apiclient]:检查Master组件服务状态。
- [upload-config]:更新配置
- [kubelet]:使用configMap配置kubelet。
- [patchnode]:更新CNI信息到Node上,通过注释的方式记录。
- [mark-control-plane]:为当前节点打标签,打了角色Master,和不可调度标签,这样默认就不会使用Master节点来运行Pod。
- [bootstrap-token]:生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到
- [addons]:安装附加组件CoreDNS和kube-proxy
7.2:安装网络插件-flannel(在192.168.6.81,192.168.6.82,192.168.6.83上操作:)
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds-amd64 created daemonset.apps/kube-flannel-ds-arm64 created daemonset.apps/kube-flannel-ds-arm created daemonset.apps/kube-flannel-ds-ppc64le created daemonset.apps/kube-flannel-ds-s390x created
7.3:将各Node节点加入集群中
kubeadm join 192.168.6.81:6443 --token 13ywnk.xkwedmlhj2z0lzz6 \
--discovery-token-ca-cert-hash sha256:ccbcabf8f22c39ec40c5de522108c007fd93e66eb59052deb474fc36e0875001
注释:如果忘记了执行后的 join 命令,可以使用命令 kubeadm token create --print-join-command重新获取
说明:默认token的有效期为24小时,过期之后,该token就不可用了,如果后续有nodes节点加入,可以重新生成新的token,解决方法如下:
#生成token kubeadm token create 0w3a92.ijgba9ia0e3scicg #查看token kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 0w3a92.ijgba9ia0e3scicg 23h 2019-09-08T22:02:40+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token t0ehj8.k4ef3gq0icr3etl0 22h 2019-09-08T20:58:34+08:00 authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token #获取ca证书sha256编码hash值 openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' #节点加入集群 kubeadm join --token aa78f6.8b4cafc8ed26c34f --discovery-token-ca-cert-hash sha256:0fd95a9bc67a7bf0ef42da968a0d55d92e52898ec37c971bd77ee501d845b538 192.168.8.81:6443 --skip-preflight-chec
7.4:查看集群状态
[root@k8s-6-81 ~]# kubectl get pods -n kube-system The connection to the server localhost:8080 was refused - did you specify the right host or port?
原因:kubernetes master没有与本机绑定,集群初始化的时候没有绑定,按照如下操作即可。
解决办法:
1:查看admin.conf [root@k8s-6-81 ~]# ll /etc/kubernetes/admin.conf -rw------- 1 root root 5450 Oct 23 14:10 /etc/kubernetes/admin.conf 2:配置环境变量 [root@k8s-6-81 ~]# mkdir /root/.kube [root@k8s-6-81 ~]# cp -i /etc/kubernetes/admin.conf /root/.kube/config 注释:如果Node节点没有admin.conf文件,请从master节点上拷贝到/etc/kubernetes/目录下,请配置环境变量
再次执行命令,进行检查集群状态:
[root@k8s-6-81 ~]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-6-81 Ready control-plane,master 34m v1.20.6 k8s-6-82 Ready <none> 23m v1.20.6 k8s-6-83 Ready <none> 22m v1.20.6 [root@k8s-6-81 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-7f89b7bc75-bcqgt 1/1 Running 0 33m coredns-7f89b7bc75-gghjr 1/1 Running 0 33m etcd-k8s-6-81 1/1 Running 0 33m kube-apiserver-k8s-6-81 1/1 Running 0 33m kube-controller-manager-k8s-6-81 1/1 Running 0 33m kube-flannel-ds-99c9j 1/1 Running 0 24m kube-flannel-ds-bnz4q 1/1 Running 0 22m kube-flannel-ds-z2h46 1/1 Running 0 23m kube-proxy-8bnhm 1/1 Running 0 22m kube-proxy-hchlz 1/1 Running 0 23m kube-proxy-rj26z 1/1 Running 0 33m kube-scheduler-k8s-6-81 1/1 Running 0 33m
各节点都是 Ready 状态,各Pod都是 Running 状态,表示集群正常运行。
注意:使用kubectl get nodes查看已加入的节点时,出现了Status为NotReady的情况。
[root@k8s-6-81 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-6-81 NotReady control-plane,master 31m v1.20.6 k8s-6-82 NotReady <none> 21m v1.20.6 k8s-6-83 NotReady <none> 20m v1.20.6
这种情况是因为有某些关键的 pod 没有运行起来,首先使用如下命令来看一下kube-system的 pod 状态:
[root@localhost ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-7f89b7bc75-bcqgt 1/1 Running 0 31m coredns-7f89b7bc75-gghjr 1/1 Running 0 31m etcd-k8s-6-81 1/1 Running 0 31m kube-apiserver-k8s-6-81 1/1 Running 0 31m kube-controller-manager-k8s-6-81 1/1 Running 0 31m kube-flannel-ds-99c9j 0/1 Init:ImagePullBackOff 0 22m kube-flannel-ds-bnz4q 0/1 Init:ImagePullBackOff 0 20m kube-flannel-ds-z2h46 0/1 Init:ImagePullBackOff 0 21m kube-proxy-8bnhm 1/1 Running 0 20m kube-proxy-hchlz 1/1 Running 0 21m kube-proxy-rj26z 1/1 Running 0 31m kube-scheduler-k8s-6-81 1/1 Running 0 31m
如上,可以看到 pod kube-flannel 的状态是ImagePullBackoff,意思是镜像拉取失败了,所以我们需要手动去拉取这个镜像。这里可以看到某些 pod 运行了三个副本是因为我有三个节点存在了。
你也可以通过kubectl describe pod -n kube-system <服务名>来查看某个服务的详细情况,如果 pod 存在问题的话,你在使用该命令后在输出内容的最下面看到一个[Event]条目,如下:
[root@k8s-6-81 ~]# kubectl describe pod kube-flannel-ds-99c9j -n kube-system ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned kube-system/kube-flannel-ds-99c9j to k8s-6-81 Warning Failed 8m (x4 over 10m) kubelet Failed to pull image "quay.io/coreos/flannel:v0.14.0": rpc error: code = Unknown desc = Error response from daemon: Get http://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Warning Failed 8m (x4 over 10m) kubelet Error: ErrImagePull Normal BackOff 7m20s (x7 over 10m) kubelet Back-off pulling image "quay.io/coreos/flannel:v0.14.0" Normal Pulling 6m29s (x5 over 11m) kubelet Pulling image "quay.io/coreos/flannel:v0.14.0" Warning Failed 84s (x28 over 10m) kubelet Error: ImagePullBackOff
手动拉取镜像
[root@k8s-6-81 ~]# docker pull quay.io/coreos/flannel:v0.14.0 Error response from daemon: Get http://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 依旧拉取失败,此时我通过阿里云的云主机进行pull镜像,进行打包: 2-223 ~]# docker pull quay.io/coreos/flannel:v0.14.0 2-223 ~]# docker images | grep flannel 2-223 ~]# docker save -o flannel.tar quay.io/coreos/flannel:v0.14.0 上传到这三台服务器上,进行导入: [root@k8s-6-81 ~]# docker load -i flannel.tar [root@k8s-6-81 ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/coreos/flannel v0.14.0 8522d622299c 2 months ago 67.9MB 此时已经有flannel镜像了,之后过几分钟 k8s 会自动重试,等一下就可以发现不仅flannel正常了,其他的 pod 状态也都变成了Running,这时再看 node 状态就可以发现问题解决了: [root@k8s-6-81 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-7f89b7bc75-bcqgt 1/1 Running 0 33m coredns-7f89b7bc75-gghjr 1/1 Running 0 33m etcd-k8s-6-81 1/1 Running 0 33m kube-apiserver-k8s-6-81 1/1 Running 0 33m kube-controller-manager-k8s-6-81 1/1 Running 0 33m kube-flannel-ds-99c9j 1/1 Running 0 24m kube-flannel-ds-bnz4q 1/1 Running 0 22m kube-flannel-ds-z2h46 1/1 Running 0 23m kube-proxy-8bnhm 1/1 Running 0 22m kube-proxy-hchlz 1/1 Running 0 23m kube-proxy-rj26z 1/1 Running 0 33m kube-scheduler-k8s-6-81 1/1 Running 0 33m [root@k8s-6-81 ~]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-6-81 Ready control-plane,master 34m v1.20.6 k8s-6-82 Ready <none> 23m v1.20.6 k8s-6-83 Ready <none> 22m v1.20.6
7.5:检查集群健康状态,确认个组件都处于healthy状态
安装完k8s集群之后很可能会出现一下情况:
[root@k8s-6-81~]# kubectl get cs Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused etcd-0 Healthy {"health":"true"}
出现这种情况是kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0,在文件中注释掉就可以了。(每台master节点都要执行操作)
解决方法:修改master节点配置文件
[root@k8s-6-81 ~]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml [root@k8s-6-81 ~]# vi /etc/kubernetes/manifests/kube-scheduler.yaml # 注释掉 port=0 这一行 重启kubelet服务 systemctl restart kubelet.service 再次执行kubectl get cs,查看状态 [root@k8s-6-81 ~]# kubectl get cs Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"}

浙公网安备 33010602011771号