k8s集群新增master节点(这种方式有问题,不推荐)

1. 准备工作

  • 确认现有集群状态:

         现有k8s集群——>master1(192.168.4.85:6443)、node1、node2、node3、node4,需新增一个master2节点,做HA。

         使用 kubectl get nodes 检查当前节点状态。

         确保所有现有Master节点健康。

  • 版本一致性:

         新节点的 kubeadm、kubelet、kubectl 版本需与集群一致。

2. 生成加入命令所需的令牌和证书

  • 在现有Master节点执行:
# 生成新的令牌(若现有令牌已过期)
kubeadm token create --print-join-command
# 获取证书密钥,通过以下命令生成:
kubeadm init phase upload-certs --upload-certs

 记录输出的令牌(<token>.<hash>)和证书密钥(<certificate-key>)。

3.新master节点初始化

  • 需保证新的master节点初始化中的系统内核、依赖库文件、kubectl组件等版本一致
#参考此文档
https://www.cnblogs.com/Leonardo-li/p/18648449

4.执行kubeadm join命令加入k8s集群

  • 在新节点master2执行(替换<...>为实际值),就是第2步骤中在master1上获取到的信息:
sudo kubeadm join <负载均衡器IP:端口> \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <certificate-key>
  • 替换后的样子
kubeadm join 192.168.4.85:6443 \
  --token vzcnpt.nrev6qteo7q3ucws \
  --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
  --control-plane \
  --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
  • 关键参数:
    --control-plane:标记为Master节点。
    --certificate-key:从步骤2获取的证书密钥。
    <负载均衡器IP:端口>:指向集群的负载均衡器地址(如 192.168.4.85:6443) 
  • 执行报错如下
[root@master2 data]# kubeadm join 192.168.4.85:6443 \
>   --token vzcnpt.nrev6qteo7q3ucws \
>   --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
>   --control-plane \
>   --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher
  •  解决方法(在master1节点上执行,切记不是新增的master节点)
#查看kubeadm-config.yaml
kubectl -n kube-system get cm kubeadm-config -oyaml

#发现没有controlPlaneEndpoint,添加controlPlaneEndpoint
kubectl -n kube-system edit cm kubeadm-config

#大概在这个位置添加,controlPlaneEndpoint: 192.168.4.85:6443(master1的apiserver地址和端口)
kind: ClusterConfiguration
kubernetesVersion: v1.23.17
controlPlaneEndpoint: 192.168.4.85:6443
  • 再次在新增master2的节点上执行kubeadm join加入k8s集群的命令(新增节点master2上执行)
[root@master2 data]# kubeadm join 192.168.4.85:6443 --token vzcnpt.nrev6qteo7q3ucws --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 --control-plane --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master2] and IPs [10.96.0.1 192.168.4.92 192.168.4.85]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node master2 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

	mkdir -p $HOME/.kube
	sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
	sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.
  • 执行上一步成功后给出的命令(新增节点master2上执行)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
这三条命令说明:
一、mkdir -p $HOME/.kube
作用:在用户的家目录下创建 .kube 目录。kubectl 默认会从 ~/.kube/config 文件中读取集群的配置信息(如 API Server 地址、证书、令牌等)。如果目录不存在,直接复制配置文件会失败,因此需要预先创建该目录。

二、sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
作用:将 Kubernetes 管理员配置文件(admin.conf)复制到用户目录。/etc/kubernetes/admin.conf 是 kubeadm 初始化集群时生成的管理员配置文件,包含以下关键信息:

API Server 的地址(如 https://192.168.4.85:6443)。
集群的 CA 证书(用于验证 API Server 身份)。
管理员用户的客户端证书和私钥(用于身份认证)。

默认情况下,admin.conf 属于 root 用户,普通用户无法直接读取,因此需要通过 sudo 复制。

三、sudo chown $(id -u):$(id -g) $HOME/.kube/config
作用:将 config 文件的所有权更改为当前用户。复制后的 config 文件默认属于 root 用户(因为使用 sudo 复制),普通用户无法直接访问。$(id -u) 获取当前用户的 UID,$(id -g) 获取当前用户的 GID,通过 chown 将文件所有权转移给当前用户。修改权限后,用户无需 sudo 即可使用 kubectl 命令管理集群。
  • 测试新增master2节点状态,查看角色和集群信息

#新master1节点执行

[root@master1 data]# kubectl get node -o wide 
NAME      STATUS   ROLES                  AGE   VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
master1   Ready    control-plane,master   57d   v1.23.17   172.16.4.85   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
master2   Ready    control-plane,master   48m   v1.23.17   172.16.4.92   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node1     Ready    <none>                 57d   v1.23.17   172.16.4.86   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node2     Ready    <none>                 57d   v1.23.17   172.16.4.87   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node3     Ready    <none>                 30d   v1.23.17   172.16.4.89   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node4     Ready    <none>                 14d   v1.23.17   172.16.4.90   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9

#新增节点master2上执行

[root@master2 data]# kubectl get node -o wide 
NAME      STATUS   ROLES                  AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
master1   Ready    control-plane,master   57d     v1.23.17   172.16.4.85   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
master2   Ready    control-plane,master   4m34s   v1.23.17   172.16.4.92   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node1     Ready    <none>                 57d     v1.23.17   172.16.4.86   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node2     Ready    <none>                 57d     v1.23.17   172.16.4.87   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node3     Ready    <none>                 30d     v1.23.17   172.16.4.89   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node4     Ready    <none>                 14d     v1.23.17   172.16.4.90   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
  • 查看etcd的状态(在任意的master节点上执行)
#检查 etcd Pod 的运行状态
[root@master2 data]# kubectl get pods -n kube-system -l component=etcd -o wide
NAME           READY   STATUS    RESTARTS      AGE   IP            NODE      NOMINATED NODE   READINESS GATES
etcd-master1   1/1     Running   3 (29d ago)   57d   192.168.4.85   master1   <none>           <none>
etcd-master2   1/1     Running   0             60m   192.168.4.92   master2   <none>           <none>

#查看 etcd 集群成员列表
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     member list
> '
b86cbc1bb95305e1, started, master1, https://192.168.4.85:2380, https://192.168.4.85:2379, false
c14c740c6e1238c2, started, master2, https://192.168.4.92:2380, https://192.168.4.92:2379, false

#检查 etcd 集群健康状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     endpoint health
> '
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 13.416709ms

[root@master2 data]# kubectl exec -it -n kube-system etcd-master2 -- sh -c '
  ETCDCTL_API=3 etcdctl \
    --endpoints=https://127.0.0.1:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    endpoint health
'
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 12.093902ms

#查看 etcd 数据存储状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     endpoint status -w table
> '
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | b86cbc1bb95305e1 |   3.5.6 |   10 MB |      true |      false |         6 |    8531666 |            8531666 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

5.参考文档:

https://www.cnblogs.com/hewei-blogs/articles/17164545.html
https://www.cnblogs.com/qianyuliang/p/17044626.html

 

posted @ 2025-03-14 13:41  Leonardo-li  阅读(297)  评论(0)    收藏  举报