k8s集群新增master节点（这种方式有问题，不推荐）

1. 准备工作

确认现有集群状态：

现有k8s集群——>master1（192.168.4.85:6443）、node1、node2、node3、node4，需新增一个master2节点，做HA。

使用 kubectl get nodes 检查当前节点状态。

确保所有现有Master节点健康。

版本一致性：

新节点的 kubeadm、kubelet、kubectl 版本需与集群一致。

2. 生成加入命令所需的令牌和证书

在现有Master节点执行：

# 生成新的令牌（若现有令牌已过期）
kubeadm token create --print-join-command

# 获取证书密钥，通过以下命令生成：
kubeadm init phase upload-certs --upload-certs

记录输出的令牌（<token>.<hash>）和证书密钥（<certificate-key>）。

3.新master节点初始化

需保证新的master节点初始化中的系统内核、依赖库文件、kubectl组件等版本一致

#参考此文档
https://www.cnblogs.com/Leonardo-li/p/18648449

4.执行kubeadm join命令加入k8s集群

在新节点master2执行（替换<...>为实际值），就是第2步骤中在master1上获取到的信息：

sudo kubeadm join <负载均衡器IP:端口> \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <certificate-key>

替换后的样子

kubeadm join 192.168.4.85:6443 \
  --token vzcnpt.nrev6qteo7q3ucws \
  --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
  --control-plane \
  --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7

关键参数：
--control-plane：标记为Master节点。
--certificate-key：从步骤2获取的证书密钥。
<负载均衡器IP:端口>：指向集群的负载均衡器地址（如 192.168.4.85:6443）
执行报错如下

[root@master2 data]# kubeadm join 192.168.4.85:6443 \
>   --token vzcnpt.nrev6qteo7q3ucws \
>   --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 \
>   --control-plane \
>   --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher

解决方法（在master1节点上执行，切记不是新增的master节点）

#查看kubeadm-config.yaml
kubectl -n kube-system get cm kubeadm-config -oyaml

#发现没有controlPlaneEndpoint，添加controlPlaneEndpoint
kubectl -n kube-system edit cm kubeadm-config

#大概在这个位置添加，controlPlaneEndpoint: 192.168.4.85:6443（master1的apiserver地址和端口）
kind: ClusterConfiguration
kubernetesVersion: v1.23.17
controlPlaneEndpoint: 192.168.4.85:6443

再次在新增master2的节点上执行kubeadm join加入k8s集群的命令（新增节点master2上执行）

[root@master2 data]# kubeadm join 192.168.4.85:6443 --token vzcnpt.nrev6qteo7q3ucws --discovery-token-ca-cert-hash sha256:4fd5a431609cbe13041d9b80a845dcb40150c8427266fdd17602d16ed11cdd61 --control-plane --certificate-key eaa14c06bbea27fd813bf2659b889a1e1306a82b7fc3f28ac27c68b749116ae7
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master2] and IPs [10.96.0.1 192.168.4.92 192.168.4.85]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master2] and IPs [192.168.4.92 127.0.0.1 ::1]
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node master2 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

	mkdir -p $HOME/.kube
	sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
	sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

执行上一步成功后给出的命令（新增节点master2上执行）

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

这三条命令说明：
一、mkdir -p $HOME/.kube
作用：在用户的家目录下创建 .kube 目录。kubectl 默认会从 ~/.kube/config 文件中读取集群的配置信息（如 API Server 地址、证书、令牌等）。如果目录不存在，直接复制配置文件会失败，因此需要预先创建该目录。

二、sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
作用：将 Kubernetes 管理员配置文件（admin.conf）复制到用户目录。/etc/kubernetes/admin.conf 是 kubeadm 初始化集群时生成的管理员配置文件，包含以下关键信息：

API Server 的地址（如 https://192.168.4.85:6443）。
集群的 CA 证书（用于验证 API Server 身份）。
管理员用户的客户端证书和私钥（用于身份认证）。

默认情况下，admin.conf 属于 root 用户，普通用户无法直接读取，因此需要通过 sudo 复制。

三、sudo chown $(id -u):$(id -g) $HOME/.kube/config
作用：将 config 文件的所有权更改为当前用户。复制后的 config 文件默认属于 root 用户（因为使用 sudo 复制），普通用户无法直接访问。$(id -u) 获取当前用户的 UID，$(id -g) 获取当前用户的 GID，通过 chown 将文件所有权转移给当前用户。修改权限后，用户无需 sudo 即可使用 kubectl 命令管理集群。

测试新增master2节点状态，查看角色和集群信息

#新master1节点执行

[root@master1 data]# kubectl get node -o wide 
NAME      STATUS   ROLES                  AGE   VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
master1   Ready    control-plane,master   57d   v1.23.17   172.16.4.85   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
master2   Ready    control-plane,master   48m   v1.23.17   172.16.4.92   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node1     Ready    <none>                 57d   v1.23.17   172.16.4.86   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node2     Ready    <none>                 57d   v1.23.17   172.16.4.87   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node3     Ready    <none>                 30d   v1.23.17   172.16.4.89   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node4     Ready    <none>                 14d   v1.23.17   172.16.4.90   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9

#新增节点master2上执行

[root@master2 data]# kubectl get node -o wide 
NAME      STATUS   ROLES                  AGE     VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
master1   Ready    control-plane,master   57d     v1.23.17   172.16.4.85   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
master2   Ready    control-plane,master   4m34s   v1.23.17   172.16.4.92   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node1     Ready    <none>                 57d     v1.23.17   172.16.4.86   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node2     Ready    <none>                 57d     v1.23.17   172.16.4.87   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node3     Ready    <none>                 30d     v1.23.17   172.16.4.89   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9
node4     Ready    <none>                 14d     v1.23.17   172.16.4.90   <none>        CentOS Linux 7 (Core)   5.4.278-1.el7.elrepo.x86_64   docker://20.10.9

查看etcd的状态（在任意的master节点上执行）

#检查 etcd Pod 的运行状态
[root@master2 data]# kubectl get pods -n kube-system -l component=etcd -o wide
NAME           READY   STATUS    RESTARTS      AGE   IP            NODE      NOMINATED NODE   READINESS GATES
etcd-master1   1/1     Running   3 (29d ago)   57d   192.168.4.85   master1   <none>           <none>
etcd-master2   1/1     Running   0             60m   192.168.4.92   master2   <none>           <none>

#查看 etcd 集群成员列表
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     member list
> '
b86cbc1bb95305e1, started, master1, https://192.168.4.85:2380, https://192.168.4.85:2379, false
c14c740c6e1238c2, started, master2, https://192.168.4.92:2380, https://192.168.4.92:2379, false

#检查 etcd 集群健康状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     endpoint health
> '
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 13.416709ms

[root@master2 data]# kubectl exec -it -n kube-system etcd-master2 -- sh -c '
  ETCDCTL_API=3 etcdctl \
    --endpoints=https://127.0.0.1:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    endpoint health
'
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 12.093902ms

#查看 etcd 数据存储状态
[root@master2 data]# kubectl exec -it -n kube-system etcd-master1 -- sh -c '
>   ETCDCTL_API=3 etcdctl \
>     --endpoints=https://127.0.0.1:2379 \
>     --cacert=/etc/kubernetes/pki/etcd/ca.crt \
>     --cert=/etc/kubernetes/pki/etcd/server.crt \
>     --key=/etc/kubernetes/pki/etcd/server.key \
>     endpoint status -w table
> '
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | b86cbc1bb95305e1 |   3.5.6 |   10 MB |      true |      false |         6 |    8531666 |            8531666 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

5.参考文档：

https://www.cnblogs.com/hewei-blogs/articles/17164545.html
https://www.cnblogs.com/qianyuliang/p/17044626.html

posted @ 2025-03-14 13:41 Leonardo-li 阅读(297) 评论(0) 收藏举报

刷新页面返回顶部

Leonardo-li