k8s集群新增master节点(这种方式有问题,不推荐)
1. 准备工作
- 确认现有集群状态:
现有k8s集群——>master1(192.168.4.85:6443)、node1、node2、node3、node4,需新增一个master2节点,做HA。
使用 kubectl get nodes 检查当前节点状态。
确保所有现有Master节点健康。
- 版本一致性:
新节点的 kubeadm、kubelet、kubectl 版本需与集群一致。
2. 生成加入命令所需的令牌和证书
- 在现有Master节点执行:
# 生成新的令牌(若现有令牌已过期)
kubeadm token create --print-join-command
# 获取证书密钥,通过以下命令生成:
kubeadm init phase upload-certs --upload-certs
记录输出的令牌(<token>.<hash>)和证书密钥(<certificate-key>)。
3.新master节点初始化
- 需保证新的master节点初始化中的系统内核、依赖库文件、kubectl组件等版本一致
#参考此文档
https://www.cnblogs.com/Leonardo-li/p/18648449
4.执行kubeadm join命令加入k8s集群
- 在新节点master2执行(替换<...>为实际值),就是第2步骤中在master1上获取到的信息:
sudo kubeadm join <负载均衡器IP:端口> \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <certificate-key>
- 替换后的样子
kubeadm join 192.168.4.85:6443 \
--token vzcnpt.nrev6qteo7q3ucws \
--discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
--control-plane \
--certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
- 关键参数:
--control-plane:标记为Master节点。
--certificate-key:从步骤2获取的证书密钥。
<负载均衡器IP:端口>:指向集群的负载均衡器地址(如 192.168.4.85:6443) - 执行报错如下
[root@master2 data]# kubeadm join 192.168.4.85:6443 \
> --token vzcnpt.nrev6qteo7q3ucws \
> --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
> --control-plane \
> --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
To see the stack trace of this error execute with --v=5 or higher
- 解决方法(在master1节点上执行,切记不是新增的master节点)
#查看kubeadm-config.yaml
kubectl -n kube-system get cm kubeadm-config -oyaml
#发现没有controlPlaneEndpoint,添加controlPlaneEndpoint
kubectl -n kube-system edit cm kubeadm-config
#大概在这个位置添加,controlPlaneEndpoint: 192.168.4.85:6443(master1的apiserver地址和端口)
kind: ClusterConfiguration
kubernetesVersion: v1.23.17
controlPlaneEndpoint: 192.168.4.85:6443
- 再次在新增master2的节点上执行kubeadm join加入k8s集群的命令(新增节点master2上执行)
[root@master2 data]# kubeadm join 192.168.4.85:6443 --token vzcnpt.nrev6qteo7q3ucws --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 --control-plane --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master2] and IPs [10.96.0.1 192.168.4.92 192.168.4.85]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node master2 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
- 执行上一步成功后给出的命令(新增节点master2上执行)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
这三条命令说明:
一、mkdir -p $HOME/.kube
作用:在用户的家目录下创建 .kube 目录。kubectl 默认会从 ~/.kube/config 文件中读取集群的配置信息(如 API Server 地址、证书、令牌等)。如果目录不存在,直接复制配置文件会失败,因此需要预先创建该目录。
二、sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
作用:将 Kubernetes 管理员配置文件(admin.conf)复制到用户目录。/etc/kubernetes/admin.conf 是 kubeadm 初始化集群时生成的管理员配置文件,包含以下关键信息:
API Server 的地址(如 https://192.168.4.85:6443)。
集群的 CA 证书(用于验证 API Server 身份)。
管理员用户的客户端证书和私钥(用于身份认证)。
默认情况下,admin.conf 属于 root 用户,普通用户无法直接读取,因此需要通过 sudo 复制。
三、sudo chown $(id -u):$(id -g) $HOME/.kube/config
作用:将 config 文件的所有权更改为当前用户。复制后的 config 文件默认属于 root 用户(因为使用 sudo 复制),普通用户无法直接访问。$(id -u) 获取当前用户的 UID,$(id -g) 获取当前用户的 GID,通过 chown 将文件所有权转移给当前用户。修改权限后,用户无需 sudo 即可使用 kubectl 命令管理集群。
- 测试新增master2节点状态,查看角色和集群信息
#新master1节点执行
[root@master1 data]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready control-plane,master 57d v1.23.17 172.16.4.85 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
master2 Ready control-plane,master 48m v1.23.17 172.16.4.92 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node1 Ready <none> 57d v1.23.17 172.16.4.86 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node2 Ready <none> 57d v1.23.17 172.16.4.87 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node3 Ready <none> 30d v1.23.17 172.16.4.89 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node4 Ready <none> 14d v1.23.17 172.16.4.90 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
#新增节点master2上执行
[root@master2 data]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready control-plane,master 57d v1.23.17 172.16.4.85 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
master2 Ready control-plane,master 4m34s v1.23.17 172.16.4.92 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node1 Ready <none> 57d v1.23.17 172.16.4.86 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node2 Ready <none> 57d v1.23.17 172.16.4.87 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node3 Ready <none> 30d v1.23.17 172.16.4.89 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
node4 Ready <none> 14d v1.23.17 172.16.4.90 <none> CentOS Linux 7 (Core) 5.4.278-1.el7.elrepo.x86_64 docker://20.10.9
- 查看etcd的状态(在任意的master节点上执行)
#检查 etcd Pod 的运行状态
[root@master2 data]# kubectl get pods -n kube-system -l component=etcd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
etcd-master1 1/1 Running 3 (29d ago) 57d 192.168.4.85 master1 <none> <none>
etcd-master2 1/1 Running 0 60m 192.168.4.92 master2 <none> <none>
#查看 etcd 集群成员列表
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
> ETCDCTL_API=3 etcdctl \
> --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/kubernetes/pki/etcd/ca.crt \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> member list
> '
b86cbc1bb95305e1, started, master1, https://192.168.4.85:2380, https://192.168.4.85:2379, false
c14c740c6e1238c2, started, master2, https://192.168.4.92:2380, https://192.168.4.92:2379, false
#检查 etcd 集群健康状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
> ETCDCTL_API=3 etcdctl \
> --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/kubernetes/pki/etcd/ca.crt \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> endpoint health
> '
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 13.416709ms
[root@master2 data]# kubectl exec -it -n kube-system etcd-master2 -- sh -c '
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health
'
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 12.093902ms
#查看 etcd 数据存储状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
> ETCDCTL_API=3 etcdctl \
> --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/kubernetes/pki/etcd/ca.crt \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> endpoint status -w table
> '
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | b86cbc1bb95305e1 | 3.5.6 | 10 MB | true | false | 6 | 8531666 | 8531666 | |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
5.参考文档:
https://www.cnblogs.com/hewei-blogs/articles/17164545.html
https://www.cnblogs.com/qianyuliang/p/17044626.html

浙公网安备 33010602011771号