Kubernetes 高可用(三主二从)
安装要求
部署 Kubernetes 集群机器环境
- 五台机器,操作系统 Centos 7.6+ (mini)
- 硬件配置:2GBRAM,2vCPU+,HD30GB+
- 集群中所有机器之间网络互通,且可访问外网
安装步骤
| 主机名称 | ip地址 | 角色 |
|---|---|---|
| - | 188.188.4.110 | 虚拟ip(vip) |
| master1 | 188.188.4.111 | master |
| master2 | 188.188.4.112 | master |
| master3 | 188.188.4.113 | master |
| node1 | 188.188.4.114 | node |
| node2 | 188.188.4.115 | node |
安装前环境操作
1)设备 hostname 和 配置免密
# 所有节点修改主机名和 hosts 文件
$ hostnamectl set-hostname master1
$ hostnamectl set-hostname master2
$ hostnamectl set-hostname master3
$ hostnamectl set-hostname node1
$ hostnamectl set-hostname node2
# 所有节点都要添加 hosts 解析记录
$ cat >> /etc/hosts << EOF
188.188.4.110 vip
188.188.4.111 master1
188.188.4.112 master2
188.188.4.113 master3
188.188.4.114 node1
188.188.4.115 node2
EOF
# 在 master1 生成密钥对,并分发给其他主机
$ ssh-keygen -t rsa -b 1200
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@master3
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
2)升级内核
通过下载 kernel image 的 rpm 包进行安装
$ rpm -import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
$ rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
# 查看可用的内核包,长期维护版本为lt,最新的稳定版为ml
$ yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
$ yum -y --enablerepo=elrepo-kernel install kernel-ml
# 查看现有内核版本,并设置新的grub2内核版本
$ sudo awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
$ grub2-set-default 0
注意:一定要重启机器
3)关闭防火墙、swap分区
$ systemctl disable --now firewalld
$ setenforce 0
$ sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
$ sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/sysconfig/selinux
$ sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
$ sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/sysconfig/selinux
$ sed -i "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config
$ swapoff -a
$ sed -i.bak 's/.*swap.*/#&/' /etc/fstab
4)优化内核
$ cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp.keepaliv.probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp.max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp.max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.top_timestamps = 0
net.core.somaxconn = 16384
EOF
$ sysctl --system
echo "* soft nofile 655360" >> /etc/security/limits.conf
echo "* hard nofile 655360" >> /etc/security/limits.conf
echo "* soft nproc 655360" >> /etc/security/limits.conf
echo "* hard nproc 655360" >> /etc/security/limits.conf
echo "* soft memlock unlimited" >> /etc/security/limits.conf
echo "* hard memlock unlimited" >> /etc/security/limits.conf
echo "DefaultLimitNOFILE = 1024000" >> /etc/systemd/system.conf
echo "DefaultLimitNPROC = 1024000" >> /etc/systemd/system.conf
# 查看最大文件打开数是否是655360
$ ulimit -Hn
5)配置 yum 源
配置阿里云 base 和 epel 源
$ mv /etc/yum.repos.d/* /tmp
$ curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
$ curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
$ yum -y clean all && yum makecache
# 安装依赖
$ yum install -y conntrack ipvsadm ipset jq sysstat curl iptables libseccomp bash-completion yum-utils device-mapper-persistent-data lvm2 net-tools conntrack-tools vim libtool-ltdl dnf
# 时间同步
$ yum -y install chrony
$ systemctl enable chronyd.service && systemctl start chronyd.service && systemctl status chronyd.service
chronyc sources
安装 Docker
# 清除旧版本
$ yum remove -y docker docker-ce docker-common docker-selinux docker-engine
# 选择版本安装
$ curl -o /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
$ yum list docker-ce --showduplicates | sort -r
$ yum install -y docker-ce docker-ce-cli
$ systemctl enable --now docker
$ systemctl start docker
# 配置加速器
$ cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://v343s1uf.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
$ systemctl daemon-reload && systemctl restart docker && systemctl enable docker && systemctl status docker
安装 Kubernetes
$ cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
$ dnf clean all
$ dnf makecache
1)安装 Kubeadm、Kubelet、Kubectl
所有节点都得安装(包括 node 节点)
- kubeadm: 部署集群用的命令
- kubelet: 在集群中每台机器上都要运行的组件,负责管理pod、容器的生命周期
- kubectl: 集群管理工具
$ dnf list kubeadm --showduplicates
# 方法一 自动获取最新版本
$ yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
# 方法二 列出有哪些版本,选择一个合适的。
$ yum install -y kubelet-1.19.4 kubeadm-1.19.4 kubectl-1.19.4
$ systemctl enable kubelet && systemctl start kubel
注意:Kubelet 服务会暂时启动不了,先不用管它
2)Haproxy+Keepalived 配置高可用 VIP
HAproxy 和 Keepalived 以守护进程的方式在所有 Master 节点部署
$ dnf install -y keepalived haproxy
# 配置 Haproxy 服务,所有master节点的haproxy配置相同,haproxy的配置文件是/etc/haproxy/haproxy.cfg,配置完成之后再分发给master2、master3两个节点
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
listen stats
bind *:8006
mode http
stats enable
stats hide-version
stats uri /stats
stats refresh 30s
stats realm Haproxy\ Statistics
stats auth admin:admin
frontend k8s-master
bind 0.0.0.0:8443
bind 127.0.0.1:8443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-master
backend k8s-master
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server master1 188.188.4.111:6443 check inter 2000 fall 2 rise 2 weight 100
server master2 188.188.4.112:6443 check inter 2000 fall 2 rise 2 weight 100
server master3 188.188.4.113:6443 check inter 2000 fall 2 rise 2 weight 100
注意:此三个 master 节点的 ip 地址要根据自身环境来配置
# 配置 Keepalived 服务,使用track_script机制来配置脚本进行探测kubernetes的master节点是否宕机,并以此切换节点实现高可用,配置文件所在的位置/etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script chk_kubernetes {
script "/etc/keepalived/check_kubernetes.sh"
interval 2
weight -5
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER #BACKUP
interface ens192
mcast_src_ip 188.188.4.111 #188.188.4.112 188.188.4.113
virtual_router_id 51
priority 100 #99 98
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
188.188.4.110
}
track_script {
chk_kubernetes
}
}
需要注意几点(前两点记得修改):
- mcast_src_ip:配置多播源地址,此地址是当前主机的ip地址。
- priority:keepalived根据此项参数的大小仲裁master节点。我们这里让master节点为kubernetes提供服务,其他两个节点暂时为备用节点。因此master1节点设置为100,master2节点设置为99,master3节点设置为98。
- state:我们将master1节点的state字段设置为MASTER,其他两个节点字段修改为BACKUP。
# 配置健康检测脚本,脚本放置在/etc/keepalived目录下
#!/bin/bash
#****************************************************************#
# ScriptName: check_kubernetes.sh
# Author: YuiKuen.Yuen
# Create Date: 2020-12-21 16:17
#****************************************************************#
function chech_kubernetes() {
for ((i=0;i<5;i++));do
apiserver_pid_id=$(pgrep kube-apiserver)
if [[ ! -z $apiserver_pid_id ]];then
return
else
sleep 2
fi
apiserver_pid_id=0
done
}
# 1:running 0:stopped
check_kubernetes
if [[ $apiserver_pid_id -eq 0 ]];then
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
$ systemctl enable --now keepalived haproxy
$ systemctl status keepalived haproxy
$ ping 188.188.4.110 #检测一下是否通
部署 Master 节点
1)镜像准备
使用kubeadm来搭建Kubernetes,那么就需要下载得到Kubernetes运行的对应基础镜像,比如:kubeproxy、kube-apiserver、kube-controller-manager等等 。那么有什么方法可以得知要下载哪些镜像 呢?从kubeadm v1.11+版本开始,增加了一个kubeadm config print-default 命令,可以让我们方便的将kubeadm的默认配置输出到文件中,这个文件里就包含了搭建K8S对应版本需要的基础配置环境。另外,我们也可以执行 kubeadm config images list 命令查看依赖需要安装的镜像列表
$ kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.19.4
k8s.gcr.io/kube-controller-manager:v1.19.4
k8s.gcr.io/kube-scheduler:v1.19.4
k8s.gcr.io/kube-proxy:v1.19.4
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0
配置文件默认会从google的镜像仓库地址k8s.gcr.io下载镜像,如果你没有KX上网,那么就会下载不来。因此,我们通过下面的方法把地址改成国内的,比如用阿里云的
$ kubeadm config print init-defaults > kubeadm-init.yaml
$ cat kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 188.188.4.110 # VIP的地址
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: master1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer: # 添加如下两行信息
certSANs:
- "188.188.4.110" # VIP地址
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers # 阿里云的镜像站点
controlPlaneEndpoint: "188.188.4.110:8443" # VIP的地址和端口
kind: ClusterConfiguration
kubernetesVersion: v1.19.4 # kubernetes版本号
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12 # 选择默认即可,当然也可以自定义CIDR
podSubnet: 10.244.0.0/16 # 添加pod网段
scheduler: {}
注意:
advertiseAddress字段的值,这个值并非当前主机的网卡地址,而是高可用集群的VIP的地址。
controlPlaneEndpoint这里填写的是VIP的地址,而端口则是haproxy服务的8443端口,也就是我们在haproxy里面配置的这段信息。
frontend k8s-master
bind 0.0.0.0:8443
bind 127.0.0.1:8443
mode tcp
$ kubeadm config images pull --config kubeadm-init.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.19.4
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.2
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.4.3-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:1.7.0
# 前面将 kubeadm-init.yaml 的镜像文件修改成阿里云下载,如后面Kubeadm安装出现问题,可能是因为kubeadm里面只认 google自身的模式,需要将tag变成k8s.gcr.io,按需进行批量修改
$ cat tag.sh
#!/bin/bash
newtag=k8s.gcr.io
for i in $(docker images | grep -v TAG |awk '{print $1 ":" $2}')
do
image=$(echo $i | awk -F '/' '{print $3}')
docker tag $i $newtag/$image
docker rmi $i
done
$ bash tag
2)初始化 master 节点
$ kubeadm init --config kubeadm-init.yaml --upload-certs
...
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70 \
--control-plane --certificate-key 346bac1ff7b1a52cf8e4bfe4d448f9216e585e61cf8492880728d94b969e4443
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:4c738bc8e2684c5d52d80687d48925613b66ab660403649145eb668d71d85648
过程大概30s的时间就做完了,之所以初始化的这么快就是因为我们提前拉取了镜像。像我上面这样的没有报错信息,并且显示上面的最后10行类似的信息这些,说明我们的master1节点是初始化成功的。
3)命令解释
上述有两条 kubeadm join 188.188.4.110:8443的信息,这分别是其他master和node节点加入kubernetes集群的认证命令。密钥是系统根据sha256算法计算出来的,必须有这样的密钥方可加入当前的kubernetes集群,其中--control-plane --certificate-key xxxx,这是控制节点加入集群的命令,没有则是node节点
4)其他 master 节点加入
[root@master2 ~]# kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
> --discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70 \
> --control-plane --certificate-key 346bac1ff7b1a52cf8e4bfe4d448f9216e585e61cf8492880728d94b969e4443
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
最后在各 master 节点按提示执行一些配置
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 加环境变量
$ cat >> ~/.bashrc <<EOF
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF
$ source ~/.bashrc
# 查看集群 master 节点(可在任意master节点执行)
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 34m v1.19.4
master2 Ready master 33m v1.19.4
master3 Ready master 31m v1.19.4
部署 Node 节点
$ kubeadm join 188.188.4.110:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:c116e1a1db5561189b9f12411ea69999fff95a79f8cd1d9ccbcdf866d7311a70
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
当看到倒数第四行内容This node has joined the cluster,这一行信息表示node1节点加入集群成功。不需要像master一样做最后的加入环境变量等收尾工作
# 再次执行命令查看集群节点信息
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 NotReady master 34m v1.19.4
master2 NotReady master 32m v1.19.4
master3 NotReady master 31m v1.19.4
node1 NotReady <none> 30m v1.19.4
可以看到集群的五个节点都已经存在,但现在还不能用,也就是说现在集群节点是不可用的,原因在于上面的第2个字段,我们看到五个节点都是NotReady状态,这是因为我们还没有安装网络插件。网络插件有calico,flannel等插件,这里我们选择使用flannel插件。
安装网络插件
# 默认初始化文件命令
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 因为国内网络受限,所以需要更新镜像源。master1 节点上修改本地的 hosts 文件添加如下内容以便解析,然后下载 flannel 文件
$ cat >> /etc/hosts << EOF
199.232.28.133 raw.githubusercontent.com
EOF
$ curl -o kube-flannel.yml https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 编辑默认的镜像地址,把 yaml 文件中所有的 quay.io 修改为 quay-mirror.qiniu.com
$ sed -i 's/quay.io/quay-mirror.qiniu.com/g' kube-flannel.yml
$ kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
这样就可以成功拉取 flannel 镜像了
# 查看 flannel 的 pod 运行是否正常
$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7s7k6 1/1 Running 1 26m
kube-flannel-ds-8855s 1/1 Running 2 26m
kube-flannel-ds-8sqnn 1/1 Running 2 26m
kube-flannel-ds-cttlq 1/1 Running 1 26m
如上述操作出现以下问题,解决方案如下:
$ kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged configured
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
serviceaccount/flannel unchanged
configmap/kube-flannel-cfg unchanged
daemonset.apps/kube-flannel-ds unchanged
$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7gbk8 0/1 Init:ImagePullBackOff 0 13s
kube-flannel-ds-b8lgg 0/1 Init:ErrImagePull 0 13s
kube-flannel-ds-b9xpd 0/1 Init:ImagePullBackOff 0 13s
kube-flannel-ds-ccklp 0/1 Init:ImagePullBackOff 0 13s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 NotReady master 18m v1.19.4
master2 NotReady master 17m v1.19.4
master3 NotReady master 15m v1.19.4
node1 NotReady <none> 14m v1.19.4
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6d56c8448f-dpjzw 0/1 Pending 0 19m
coredns-6d56c8448f-rbsh5 0/1 Pending 0 19m
etcd-master1 1/1 Running 0 19m
etcd-master2 1/1 Running 0 17m
etcd-master3 1/1 Running 0 16m
kube-apiserver-master1 1/1 Running 0 19m
kube-apiserver-master2 1/1 Running 0 18m
kube-apiserver-master3 1/1 Running 0 16m
kube-controller-manager-master1 1/1 Running 1 19m
kube-controller-manager-master2 1/1 Running 0 18m
kube-controller-manager-master3 1/1 Running 0 16m
kube-flannel-ds-7pwv2 0/1 Init:ImagePullBackOff 0 17s
kube-flannel-ds-k6zt2 0/1 Init:ImagePullBackOff 0 20s
kube-flannel-ds-rp94k 0/1 Init:ImagePullBackOff 0 18s
kube-flannel-ds-vbkns 0/1 Init:ImagePullBackOff 0 18s
kube-proxy-2zstb 1/1 Running 0 15m
kube-proxy-fs4z2 1/1 Running 0 16m
kube-proxy-h8r2h 1/1 Running 0 19m
kube-proxy-tmcnn 1/1 Running 0 18m
kube-scheduler-master1 1/1 Running 1 19m
kube-scheduler-master2 1/1 Running 0 18m
kube-scheduler-master3 1/1 Running 0 16m
# 先卸载 flannel 插件及配置文件(kubectl delete -f 文件名)
$ kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 在node节点清理 flannel 网络留下的文件
$ ifconfig cni0 down
$ ip link delete cni0
$ ifconfig flannel.1 down
$ ip link delete flannel.1
$ rm -rf /var/lib/cni/
$ rm -f /etc/cni/net.d/\*
注:执行完上面的操作,重启 kubele
到 Github 下载 docker 镜像包 https://github.com/coreos/flannel/releases (建议使用下载工具,下载后再进行上传)
$ docker load < flanneld-v0.13.1-rc1.docker
ace0eda3e3be: Loading layer [==================================================>] 5.843MB/5.843MB
0a790f51c8dd: Loading layer [==================================================>] 11.42MB/11.42MB
db93500c64e6: Loading layer [==================================================>] 2.595MB/2.595MB
70351a035194: Loading layer [==================================================>] 45.68MB/45.68MB
cd38981c5610: Loading layer [==================================================>] 5.12kB/5.12kB
dce2fcdf3a87: Loading layer [==================================================>] 9.216kB/9.216kB
be155d1c86b7: Loading layer [==================================================>] 7.68kB/7.68kB
Loaded image: quay.io/coreos/flannel:v0.13.1-rc1-amd64
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/coreos/flannel v0.13.1-rc1-amd64 f03a23d55e57 3 days ago 64.6MB
# 修改 flannel 文件
$ sed -i 's/quay.io/quay-mirror.qiniu.com/g' kube-flannel.yml
$ cat kube-flannel.yml
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
defaultAddCapabilities: []
requiredDropCapabilities: []
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.13.1-rc1-amd64 # 修改成上传的镜像包
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.13.1-rc1-amd64 # 修改成上传的镜像包
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
# 重新执行命令
$ kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
$ kubectl get pods -n kube-system | grep flannel
kube-flannel-ds-7s7k6 1/1 Running 0 51s
kube-flannel-ds-8855s 1/1 Running 0 51s
kube-flannel-ds-8sqnn 1/1 Running 0 51s
kube-flannel-ds-cttlq 1/1 Running 0 51s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 34m v1.19.4
master2 Ready master 33m v1.19.4
master3 Ready master 31m v1.19.4
node1 Ready <none> 30m v1.19.4
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6d56c8448f-dpjzw 1/1 Running 0 35m
coredns-6d56c8448f-rbsh5 1/1 Running 0 35m
etcd-master1 1/1 Running 0 35m
etcd-master2 1/1 Running 0 34m
etcd-master3 1/1 Running 0 33m
kube-apiserver-master1 1/1 Running 0 35m
kube-apiserver-master2 1/1 Running 0 34m
kube-apiserver-master3 1/1 Running 0 33m
kube-controller-manager-master1 1/1 Running 1 35m
kube-controller-manager-master2 1/1 Running 1 34m
kube-controller-manager-master3 1/1 Running 0 33m
kube-flannel-ds-7s7k6 1/1 Running 0 2m39s
kube-flannel-ds-8855s 1/1 Running 0 2m39s
kube-flannel-ds-8sqnn 1/1 Running 0 2m39s
kube-flannel-ds-cttlq 1/1 Running 0 2m39s
kube-proxy-2zstb 1/1 Running 0 32m
kube-proxy-fs4z2 1/1 Running 0 33m
kube-proxy-h8r2h 1/1 Running 0 35m
kube-proxy-tmcnn 1/1 Running 0 34m
kube-scheduler-master1 1/1 Running 2 35m
kube-scheduler-master2 1/1 Running 1 34m
kube-scheduler-master3 1/1 Running 0 33m
测试 Kubernetes 集群
在 master 节点上创建一个 nginx 的 pod,验证是否能正常运行
$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
$ kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed
# 查看 pod 和 service
$ kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-6799fc88d8-5kvdh 0/1 ContainerCreating 0 20s <none> node2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21h <none>
service/nginx NodePort 10.96.228.76 <none> 80:31494/TCP 8s app=nginx
打印的结果中,前半部分是pod相关信息,后半部分是service相关信息。我们看service/nginx这一行可以看出service暴漏给集群的端口是31494.从pod的详细信息可以看出此时pod在node2节点之上。node2节点的IP地址是188.188.4.115,那现在我们访问一下。打开浏览器(建议火狐浏览器),访问地址就是:http://188.188.4.115:31494并且访问VIP地址也是能正常访问;
安装 Dashboard
# 下载 dashboard 配置文件
$ cd /etc/kubernetes/
$ wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.4/aio/deploy/recommended.yaml
# 默认`Dashboard`只能集群内部访问,修改`Service`为`NodePort`类型,暴露到外部
---
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort # 加上此行
ports:
- port: 443
targetPort: 8443
nodePort: 30001 # 加上此行,端口可自定义
selector:
k8s-app: kubernetes-dashboard
---
# 运行 yaml 文件
$ kubectl apply -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
# 查看dashboard运行是否正常并查看此 dashboard 的 pod 运行所在的节点
$ kubectl get pods -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-7b59f7d4df-cjsrg 1/1 Running 0 37s
kubernetes-dashboard-665f4c5ff-5pswj 1/1 Running 0 37s
$ kubectl get pod,svc -n kubernetes-dashboard -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/dashboard-metrics-scraper-7b59f7d4df-cjsrg 1/1 Running 0 77s 10.244.3.5 node1 <none> <none>
pod/kubernetes-dashboard-665f4c5ff-5pswj 1/1 Running 0 77s 10.244.3.4 node1 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/dashboard-metrics-scraper ClusterIP 10.106.88.97 <none> 8000/TCP 78s k8s-app=dashboard-metrics-scraper
service/kubernetes-dashboard NodePort 10.100.198.214 <none> 443:30001/TCP 79s k8s-app=kubernetes-dashboard
主要是看status这一列的值,如果是Running,并且RESTARTS字段的值为0(只要这个值不是一直在渐渐变大),就是正常的,目前来看是没有问题的。我们可以继续下一步。
可以看出,kubernetes-dashboard-665f4c5ff-5pswj运行所在的节点是node1上面,并且暴漏出来的端口是30001,所以访问地址是:https://188.188.4.114:30001
# 不过现在我们虽然可以登陆上去,但我们权限不够还查看不了集群信息,因为我们还没有绑定集群角色,需要再来做下面的步骤`cluster-admin管理员角色绑定`(建议使用火狐浏览器访问)
$ kubectl create serviceaccount dashboard-admin -n kube-system
$ kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
$ kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
....
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1066 bytes
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IjFCNk1fb1R0dHcxeEZFOXZjczVtVzJER21TSW00STIyR1l4NWZRcU90azAifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tbG5yejYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMTU3ZTVmYzktOGQ3Yi00NDcyLTllOTItMWE0Y2EwYTRmMmE1Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.PBCRBqo85qEkR0eI0V1_3zE6MVS2ts5GM4dnX0RyH_oe8yCiE8UeFkEzOs1sStCKmlAPA_0ti_g2mVYVy09QqU50uLG1obuNghe_lYgkNmKxG0-_4iUQKAQGzNOPxgBwocsJTjIo9ghN19IMhzhy8RDZMVCGulyZRXvMza38qYRepeT-zhwodzwcqq3WGY8oiZlSDS8v2ynWU5ey1rWVRYDogX7y8QzkVRcrMws2Q8Z6GOReCCjGbY_V6_EunyTpgVOmJlTemyUSndoy2hmuy2225wkI6hR04YJj4NLC741I3Q6Y9nr6eZ3zEaVLoOF1dPAkZ47UC08D6Bl-gq2f7A

浙公网安备 33010602011771号