Kubernetes 高可用(三主二从)

安装要求

部署 Kubernetes 集群机器环境

  • 五台机器,操作系统 Centos 7.6+ (mini)
  • 硬件配置:2GBRAM,2vCPU+,HD30GB+
  • 集群中所有机器之间网络互通,且可访问外网

安装步骤

主机名称 ip地址 角色
- 188.188.4.110 虚拟ip(vip)
master1 188.188.4.111 master
master2 188.188.4.112 master
master3 188.188.4.113 master
node1 188.188.4.114 node
node2 188.188.4.115 node

安装前环境操作

1)设备 hostname 和 配置免密

# 所有节点修改主机名和 hosts 文件
$ hostnamectl set-hostname master1
$ hostnamectl set-hostname master2
$ hostnamectl set-hostname master3
$ hostnamectl set-hostname node1
$ hostnamectl set-hostname node2

# 所有节点都要添加 hosts 解析记录
$ cat >> /etc/hosts << EOF
188.188.4.110 vip
188.188.4.111 master1
188.188.4.112 master2
188.188.4.113 master3
188.188.4.114 node1
188.188.4.115 node2
EOF
# 在 master1 生成密钥对,并分发给其他主机
$ ssh-keygen -t rsa -b 1200
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master3
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2

2)升级内核

通过下载 kernel image 的 rpm 包进行安装

$ rpm -import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
$ rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm

# 查看可用的内核包,长期维护版本为lt,最新的稳定版为ml
$ yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
$ yum -y --enablerepo=elrepo-kernel install kernel-ml

# 查看现有内核版本,并设置新的grub2内核版本
$ sudo awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
$ grub2-set-default 0

注意:一定要重启机器

3)关闭防火墙、swap分区

$ systemctl disable --now firewalld
$ setenforce 0
$ sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
$ sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/sysconfig/selinux
$ sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
$ sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/sysconfig/selinux
$ sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config

$ swapoff -a
$ sed -i.bak 's/.*swap.*/#&/' /etc/fstab

4)优化内核

$ cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp.keepaliv.probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp.max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp.max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.top_timestamps = 0
net.core.somaxconn = 16384
EOF
$ sysctl --system

echo "* soft nofile 655360" >> /etc/security/limits.conf
echo "* hard nofile 655360" >> /etc/security/limits.conf
echo "* soft nproc 655360"  >> /etc/security/limits.conf
echo "* hard nproc 655360"  >> /etc/security/limits.conf
echo "* soft memlock unlimited" >> /etc/security/limits.conf
echo "* hard memlock unlimited" >> /etc/security/limits.conf
echo "DefaultLimitNOFILE = 1024000" >> /etc/systemd/system.conf
echo "DefaultLimitNPROC = 1024000"  >> /etc/systemd/system.conf

# 查看最大文件打开数是否是655360
$ ulimit -Hn

5)配置 yum 源

配置阿里云 base 和 epel 源

$ mv /etc/yum.repos.d/* /tmp
$ curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
$ curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
$ yum -y clean all && yum makecache

# 安装依赖
$ yum install -y conntrack ipvsadm ipset jq sysstat curl iptables libseccomp bash-completion yum-utils device-mapper-persistent-data lvm2 net-tools conntrack-tools vim libtool-ltdl dnf

# 时间同步
$ yum -y install chrony
$ systemctl enable chronyd.service && systemctl start chronyd.service && systemctl status chronyd.service
chronyc sources

安装 Docker

# 清除旧版本
$ yum remove -y docker docker-ce docker-common docker-selinux docker-engine

# 选择版本安装
$ curl -o /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
$ yum list docker-ce --showduplicates | sort -r
$ yum install -y docker-ce docker-ce-cli
$ systemctl enable --now docker
$ systemctl start docker
# 配置加速器
$ cat > /etc/docker/daemon.json << EOF
{
  "registry-mirrors": ["https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://v343s1uf.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
$ systemctl daemon-reload && systemctl restart docker && systemctl enable docker && systemctl status docker

安装 Kubernetes

$ cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
$ dnf clean all
$ dnf makecache

1)安装 Kubeadm、Kubelet、Kubectl

所有节点都得安装(包括 node 节点)

  • kubeadm: 部署集群用的命令
  • kubelet: 在集群中每台机器上都要运行的组件,负责管理pod、容器的生命周期
  • kubectl: 集群管理工具
$ dnf list kubeadm --showduplicates
# 方法一 自动获取最新版本
$ yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
# 方法二 列出有哪些版本,选择一个合适的。
$ yum install -y kubelet-1.19.4 kubeadm-1.19.4 kubectl-1.19.4
$ systemctl enable kubelet && systemctl start kubel

注意:Kubelet 服务会暂时启动不了,先不用管它

2)Haproxy+Keepalived 配置高可用 VIP

HAproxy 和 Keepalived 以守护进程的方式在所有 Master 节点部署

$ dnf install -y keepalived haproxy 
# 配置 Haproxy 服务,所有master节点的haproxy配置相同,haproxy的配置文件是/etc/haproxy/haproxy.cfg,配置完成之后再分发给master2、master3两个节点
global
  maxconn  2000
  ulimit-n  16384
  log  127.0.0.1 local0 err
  stats timeout 30s

defaults
  log global
  mode  http
  option  httplog
  timeout connect 5000
  timeout client  50000
  timeout server  50000
  timeout http-request 15s
  timeout http-keep-alive 15s

frontend monitor-in
  bind *:33305
  mode http
  option httplog
  monitor-uri /monitor

listen stats
  bind    *:8006
  mode    http
  stats   enable
  stats   hide-version
  stats   uri       /stats
  stats   refresh   30s
  stats   realm     Haproxy\ Statistics
  stats   auth      admin:admin

frontend k8s-master
  bind 0.0.0.0:8443
  bind 127.0.0.1:8443
  mode tcp
  option tcplog
  tcp-request inspect-delay 5s
  default_backend k8s-master

backend k8s-master
  mode tcp
  option tcplog
  option tcp-check
  balance roundrobin
  default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
  server master1    188.188.4.111:6443  check inter 2000 fall 2 rise 2 weight 100
  server master2    188.188.4.112:6443  check inter 2000 fall 2 rise 2 weight 100
  server master3    188.188.4.113:6443  check inter 2000 fall 2 rise 2 weight 100

注意:此三个 master 节点的 ip 地址要根据自身环境来配置

# 配置 Keepalived 服务,使用track_script机制来配置脚本进行探测kubernetes的master节点是否宕机,并以此切换节点实现高可用,配置文件所在的位置/etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script chk_kubernetes {
    script "/etc/keepalived/check_kubernetes.sh"
    interval 2
    weight -5
    fall 3  
    rise 2
}
vrrp_instance VI_1 {
    state MASTER                  #BACKUP
    interface ens192
    mcast_src_ip 188.188.4.111    #188.188.4.112 188.188.4.113
    virtual_router_id 51
    priority 100                  #99 98
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        188.188.4.110     
    }
    track_script {
       chk_kubernetes
    }
}

需要注意几点(前两点记得修改):

  • mcast_src_ip:配置多播源地址,此地址是当前主机的ip地址。
  • priority:keepalived根据此项参数的大小仲裁master节点。我们这里让master节点为kubernetes提供服务,其他两个节点暂时为备用节点。因此master1节点设置为100,master2节点设置为99,master3节点设置为98。
  • state:我们将master1节点的state字段设置为MASTER,其他两个节点字段修改为BACKUP。
# 配置健康检测脚本,脚本放置在/etc/keepalived目录下
#!/bin/bash
#****************************************************************#
# ScriptName: check_kubernetes.sh
# Author: YuiKuen.Yuen
# Create Date: 2020-12-21 16:17
#****************************************************************#

function chech_kubernetes() {
 for ((i=0;i<5;i++));do
  apiserver_pid_id=$(pgrep kube-apiserver)
  if [[ ! -z $apiserver_pid_id ]];then
   return
  else
   sleep 2
  fi
  apiserver_pid_id=0
 done
}

# 1:running  0:stopped
check_kubernetes
if [[ $apiserver_pid_id -eq 0 ]];then
 /usr/bin/systemctl stop keepalived
 exit 1
else
 exit 0
fi
$ systemctl enable --now keepalived haproxy
$ systemctl status keepalived haproxy
$ ping 188.188.4.110                        #检测一下是否通

部署 Master 节点

1)镜像准备

使用kubeadm来搭建Kubernetes,那么就需要下载得到Kubernetes运行的对应基础镜像,比如:kubeproxy、kube-apiserver、kube-controller-manager等等 。那么有什么方法可以得知要下载哪些镜像 呢?从kubeadm v1.11+版本开始,增加了一个kubeadm config print-default 命令,可以让我们方便的将kubeadm的默认配置输出到文件中,这个文件里就包含了搭建K8S对应版本需要的基础配置环境。另外,我们也可以执行 kubeadm config images list 命令查看依赖需要安装的镜像列表

$ kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.19.4
k8s.gcr.io/kube-controller-manager:v1.19.4
k8s.gcr.io/kube-scheduler:v1.19.4
k8s.gcr.io/kube-proxy:v1.19.4
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0

配置文件默认会从google的镜像仓库地址k8s.gcr.io下载镜像,如果你没有KX上网,那么就会下载不来。因此,我们通过下面的方法把地址改成国内的,比如用阿里云的

$ kubeadm config print init-defaults > kubeadm-init.yaml
$ cat kubeadm-init.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 188.188.4.110               # VIP的地址
  bindPort:  6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: master1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:                                      # 添加如下两行信息
  certSANs:
  - "188.188.4.110"                             # VIP地址
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers   # 阿里云的镜像站点
controlPlaneEndpoint: "188.188.4.110:8443"     # VIP的地址和端口
kind: ClusterConfiguration
kubernetesVersion: v1.19.4                     # kubernetes版本号
networking:
  dnsDomain: cluster.local  
  serviceSubnet: 10.96.0.0/12                  # 选择默认即可,当然也可以自定义CIDR
  podSubnet: 10.244.0.0/16                     # 添加pod网段
scheduler: {}

注意:
advertiseAddress字段的值,这个值并非当前主机的网卡地址,而是高可用集群的VIP的地址。
controlPlaneEndpoint这里填写的是VIP的地址,而端口则是haproxy服务的8443端口,也就是我们在haproxy里面配置的这段信息。

frontend k8s-master
  bind 0.0.0.0:8443
  bind 127.0.0.1:8443
  mode tcp
$ kubeadm config images pull --config kubeadm-init.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.2
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.4.3-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:1.7.0

# 前面将 kubeadm-init.yaml 的镜像文件修改成阿里云下载,如后面Kubeadm安装出现问题,可能是因为kubeadm里面只认 google自身的模式,需要将tag变成k8s.gcr.io,按需进行批量修改
$ cat tag.sh 
#!/bin/bash

newtag=k8s.gcr.io
for i in $(docker images | grep -v TAG |awk '{print $1 ":" $2}')
do
   image=$(echo $i | awk -F '/' '{print $3}')
   docker tag $i $newtag/$image
   docker rmi $i
done
$ bash tag

2)初始化 master 节点

$ kubeadm init --config kubeadm-init.yaml --upload-certs
...
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
     --discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70 \
     --control-plane --certificate-key 346bac1ff7b1a52cf8e4bfe4d448f9216e585e61cf8492880728d94b969e4443
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:4c738bc8e2684c5d52d80687d48925613b66ab660403649145eb668d71d85648

过程大概30s的时间就做完了,之所以初始化的这么快就是因为我们提前拉取了镜像。像我上面这样的没有报错信息,并且显示上面的最后10行类似的信息这些,说明我们的master1节点是初始化成功的。

3)命令解释

上述有两条 kubeadm join 188.188.4.110:8443的信息,这分别是其他masternode节点加入kubernetes集群的认证命令。密钥是系统根据sha256算法计算出来的,必须有这样的密钥方可加入当前的kubernetes集群,其中--control-plane --certificate-key xxxx,这是控制节点加入集群的命令,没有则是node节点

4)其他 master 节点加入

[root@master2 ~]# kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
>     --discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70 \
>     --control-plane --certificate-key 346bac1ff7b1a52cf8e4bfe4d448f9216e585e61cf8492880728d94b969e4443
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
	mkdir -p $HOME/.kube
	sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
	sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.

最后在各 master 节点按提示执行一些配置

$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 加环境变量
$ cat >> ~/.bashrc <<EOF
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF
$ source ~/.bashrc

# 查看集群 master 节点(可在任意master节点执行)
$ kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
master1   Ready    master   34m   v1.19.4
master2   Ready    master   33m   v1.19.4
master3   Ready    master   31m   v1.19.4

部署 Node 节点

$ kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
     --discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

当看到倒数第四行内容This node has joined the cluster,这一行信息表示node1节点加入集群成功。不需要像master一样做最后的加入环境变量等收尾工作

# 再次执行命令查看集群节点信息
$ kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
master1   NotReady   master   34m   v1.19.4
master2   NotReady   master   32m   v1.19.4
master3   NotReady   master   31m   v1.19.4
node1     NotReady   <none>   30m   v1.19.4

可以看到集群的五个节点都已经存在,但现在还不能用,也就是说现在集群节点是不可用的,原因在于上面的第2个字段,我们看到五个节点都是NotReady状态,这是因为我们还没有安装网络插件。网络插件有calicoflannel等插件,这里我们选择使用flannel插件。

安装网络插件

# 默认初始化文件命令
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 因为国内网络受限,所以需要更新镜像源。master1 节点上修改本地的 hosts 文件添加如下内容以便解析,然后下载 flannel 文件
$ cat >> /etc/hosts << EOF
199.232.28.133  raw.githubusercontent.com
EOF
$ curl -o kube-flannel.yml   https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 编辑默认的镜像地址,把 yaml 文件中所有的 quay.io 修改为 quay-mirror.qiniu.com
$ sed -i 's/quay.io/quay-mirror.qiniu.com/g' kube-flannel.yml
$ kubectl apply -f kube-flannel.yml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
这样就可以成功拉取 flannel 镜像了
# 查看 flannel 的 pod 运行是否正常
$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7s7k6             1/1     Running   1          26m
kube-flannel-ds-8855s             1/1     Running   2          26m
kube-flannel-ds-8sqnn             1/1     Running   2          26m
kube-flannel-ds-cttlq             1/1     Running   1          26m

如上述操作出现以下问题,解决方案如下:

$ kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged configured
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg unchanged
daemonset.apps/kube-flannel-ds unchanged

$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7gbk8             0/1     Init:ImagePullBackOff   0          13s
kube-flannel-ds-b8lgg             0/1     Init:ErrImagePull       0          13s
kube-flannel-ds-b9xpd             0/1     Init:ImagePullBackOff   0          13s
kube-flannel-ds-ccklp             0/1     Init:ImagePullBackOff   0          13s

$ kubectl get nodes
NAME      STATUS     ROLES    AGE   VERSION
master1   NotReady   master   18m   v1.19.4
master2   NotReady   master   17m   v1.19.4
master3   NotReady   master   15m   v1.19.4
node1     NotReady   <none>   14m   v1.19.4

$ kubectl get pods -n kube-system
NAME                              READY   STATUS                  RESTARTS   AGE
coredns-6d56c8448f-dpjzw          0/1     Pending                 0          19m
coredns-6d56c8448f-rbsh5          0/1     Pending                 0          19m
etcd-master1                      1/1     Running                 0          19m
etcd-master2                      1/1     Running                 0          17m
etcd-master3                      1/1     Running                 0          16m
kube-apiserver-master1            1/1     Running                 0          19m
kube-apiserver-master2            1/1     Running                 0          18m
kube-apiserver-master3            1/1     Running                 0          16m
kube-controller-manager-master1   1/1     Running                 1          19m
kube-controller-manager-master2   1/1     Running                 0          18m
kube-controller-manager-master3   1/1     Running                 0          16m
kube-flannel-ds-7pwv2             0/1     Init:ImagePullBackOff   0          17s
kube-flannel-ds-k6zt2             0/1     Init:ImagePullBackOff   0          20s
kube-flannel-ds-rp94k             0/1     Init:ImagePullBackOff   0          18s
kube-flannel-ds-vbkns             0/1     Init:ImagePullBackOff   0          18s
kube-proxy-2zstb                  1/1     Running                 0          15m
kube-proxy-fs4z2                  1/1     Running                 0          16m
kube-proxy-h8r2h                  1/1     Running                 0          19m
kube-proxy-tmcnn                  1/1     Running                 0          18m
kube-scheduler-master1            1/1     Running                 1          19m
kube-scheduler-master2            1/1     Running                 0          18m
kube-scheduler-master3            1/1     Running                 0          16m

# 先卸载 flannel 插件及配置文件(kubectl delete -f 文件名)
$ kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 在node节点清理 flannel 网络留下的文件
$ ifconfig cni0 down
$ ip link delete cni0
$ ifconfig flannel.1 down
$ ip link delete flannel.1
$ rm -rf /var/lib/cni/
$ rm -f /etc/cni/net.d/\*
注:执行完上面的操作,重启 kubele

到 Github 下载 docker 镜像包 https://github.com/coreos/flannel/releases (建议使用下载工具,下载后再进行上传)

$ docker load < flanneld-v0.13.1-rc1.docker 
ace0eda3e3be: Loading layer [==================================================>]  5.843MB/5.843MB
0a790f51c8dd: Loading layer [==================================================>]  11.42MB/11.42MB
db93500c64e6: Loading layer [==================================================>]  2.595MB/2.595MB
70351a035194: Loading layer [==================================================>]  45.68MB/45.68MB
cd38981c5610: Loading layer [==================================================>]   5.12kB/5.12kB
dce2fcdf3a87: Loading layer [==================================================>]  9.216kB/9.216kB
be155d1c86b7: Loading layer [==================================================>]   7.68kB/7.68kB
Loaded image: quay.io/coreos/flannel:v0.13.1-rc1-amd64

$ docker images
REPOSITORY                                                        TAG                 IMAGE ID            CREATED             SIZE
quay.io/coreos/flannel                                            v0.13.1-rc1-amd64   f03a23d55e57        3 days ago          64.6MB

# 修改 flannel 文件
$ sed -i 's/quay.io/quay-mirror.qiniu.com/g' kube-flannel.yml
$ cat kube-flannel.yml 
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
  - configMap
  - secret
  - emptyDir
  - hostPath
  allowedHostPaths:
  - pathPrefix: "/etc/cni/net.d"
  - pathPrefix: "/etc/kube-flannel"
  - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unused in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs: ['use']
  resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: quay.io/coreos/flannel:v0.13.1-rc1-amd64         # 修改成上传的镜像包
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.13.1-rc1-amd64         # 修改成上传的镜像包
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
          
# 重新执行命令
$ kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7s7k6             1/1     Running   0          51s
kube-flannel-ds-8855s             1/1     Running   0          51s
kube-flannel-ds-8sqnn             1/1     Running   0          51s
kube-flannel-ds-cttlq             1/1     Running   0          51s

$ kubectl get nodes
NAME      STATUS   ROLES    AGE   VERSION
master1   Ready    master   34m   v1.19.4
master2   Ready    master   33m   v1.19.4
master3   Ready    master   31m   v1.19.4
node1     Ready    <none>   30m   v1.19.4

$ kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-6d56c8448f-dpjzw          1/1     Running   0          35m
coredns-6d56c8448f-rbsh5          1/1     Running   0          35m
etcd-master1                      1/1     Running   0          35m
etcd-master2                      1/1     Running   0          34m
etcd-master3                      1/1     Running   0          33m
kube-apiserver-master1            1/1     Running   0          35m
kube-apiserver-master2            1/1     Running   0          34m
kube-apiserver-master3            1/1     Running   0          33m
kube-controller-manager-master1   1/1     Running   1          35m
kube-controller-manager-master2   1/1     Running   1          34m
kube-controller-manager-master3   1/1     Running   0          33m
kube-flannel-ds-7s7k6             1/1     Running   0          2m39s
kube-flannel-ds-8855s             1/1     Running   0          2m39s
kube-flannel-ds-8sqnn             1/1     Running   0          2m39s
kube-flannel-ds-cttlq             1/1     Running   0          2m39s
kube-proxy-2zstb                  1/1     Running   0          32m
kube-proxy-fs4z2                  1/1     Running   0          33m
kube-proxy-h8r2h                  1/1     Running   0          35m
kube-proxy-tmcnn                  1/1     Running   0          34m
kube-scheduler-master1            1/1     Running   2          35m
kube-scheduler-master2            1/1     Running   1          34m
kube-scheduler-master3            1/1     Running   0          33m

测试 Kubernetes 集群

在 master 节点上创建一个 nginx 的 pod,验证是否能正常运行

$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

$ kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed

# 查看 pod 和 service
$ kubectl get pod,svc -o wide
NAME                         READY   STATUS              RESTARTS   AGE   IP       NODE    NOMINATED NODE   READINESS GATES
pod/nginx-6799fc88d8-5kvdh   0/1     ContainerCreating   0          20s   <none>   node2   <none>           <none>

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE   SELECTOR
service/kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP        21h   <none>
service/nginx        NodePort    10.96.228.76   <none>        80:31494/TCP   8s    app=nginx

打印的结果中,前半部分是pod相关信息,后半部分是service相关信息。我们看service/nginx这一行可以看出service暴漏给集群的端口是31494.从pod的详细信息可以看出此时podnode2节点之上。node2节点的IP地址是188.188.4.115,那现在我们访问一下。打开浏览器(建议火狐浏览器),访问地址就是:http://188.188.4.115:31494并且访问VIP地址也是能正常访问;

安装 Dashboard

# 下载 dashboard 配置文件
$ cd /etc/kubernetes/
$ wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.4/aio/deploy/recommended.yaml

# 默认`Dashboard`只能集群内部访问,修改`Service`为`NodePort`类型,暴露到外部
---
kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  type: NodePort                       # 加上此行
  ports:
    - port: 443
      targetPort: 8443
      nodePort: 30001                  # 加上此行,端口可自定义
  selector:
    k8s-app: kubernetes-dashboard
---

# 运行 yaml 文件
$ kubectl apply -f recommended.yaml 
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created

# 查看dashboard运行是否正常并查看此 dashboard 的 pod 运行所在的节点
$ kubectl get pods -n kubernetes-dashboard
NAME                                         READY   STATUS    RESTARTS   AGE
dashboard-metrics-scraper-7b59f7d4df-cjsrg   1/1     Running   0          37s
kubernetes-dashboard-665f4c5ff-5pswj         1/1     Running   0          37s

$ kubectl get pod,svc -n kubernetes-dashboard -o wide
NAME                                             READY   STATUS    RESTARTS   AGE   IP           NODE    NOMINATED NODE   READINESS GATES
pod/dashboard-metrics-scraper-7b59f7d4df-cjsrg   1/1     Running   0          77s   10.244.3.5   node1   <none>           <none>
pod/kubernetes-dashboard-665f4c5ff-5pswj         1/1     Running   0          77s   10.244.3.4   node1   <none>           <none>

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE   SELECTOR
service/dashboard-metrics-scraper   ClusterIP   10.106.88.97     <none>        8000/TCP        78s   k8s-app=dashboard-metrics-scraper
service/kubernetes-dashboard        NodePort    10.100.198.214   <none>        443:30001/TCP   79s   k8s-app=kubernetes-dashboard

主要是看status这一列的值,如果是Running,并且RESTARTS字段的值为0(只要这个值不是一直在渐渐变大),就是正常的,目前来看是没有问题的。我们可以继续下一步。
可以看出,kubernetes-dashboard-665f4c5ff-5pswj运行所在的节点是node1上面,并且暴漏出来的端口是30001,所以访问地址是:https://188.188.4.114:30001

# 不过现在我们虽然可以登陆上去,但我们权限不够还查看不了集群信息,因为我们还没有绑定集群角色,需要再来做下面的步骤`cluster-admin管理员角色绑定`(建议使用火狐浏览器访问)
$ kubectl create serviceaccount dashboard-admin -n kube-system
$ kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
$ kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
....
Type:  kubernetes.io/service-account-token
Data
====
ca.crt:     1066 bytes
namespace:  11 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IjFCNk1fb1R0dHcxeEZFOXZjczVtVzJER21TSW00STIyR1l4NWZRcU90azAifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tbG5yejYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMTU3ZTVmYzktOGQ3Yi00NDcyLTllOTItMWE0Y2EwYTRmMmE1Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.PBCRBqo85qEkR0eI0V1_3zE6MVS2ts5GM4dnX0RyH_oe8yCiE8UeFkEzOs1sStCKmlAPA_0ti_g2mVYVy09QqU50uLG1obuNghe_lYgkNmKxG0-_4iUQKAQGzNOPxgBwocsJTjIo9ghN19IMhzhy8RDZMVCGulyZRXvMza38qYRepeT-zhwodzwcqq3WGY8oiZlSDS8v2ynWU5ey1rWVRYDogX7y8QzkVRcrMws2Q8Z6GOReCCjGbY_V6_EunyTpgVOmJlTemyUSndoy2hmuy2225wkI6hR04YJj4NLC741I3Q6Y9nr6eZ3zEaVLoOF1dPAkZ47UC08D6Bl-gq2f7A
posted @ 2020-12-23 14:47  YuiKuen_Yuen  阅读(611)  评论(0)    收藏  举报