ZhangZhihui's Blog

公告

日历

导航

Kubernetes - k8s cluster

I have an HA k8s cluster which has 3 control-plane nodes. When 2 of the 3 control-plane nodes are shut down, the cluster stops working.

When shut down zubt3:
scnzzh@zubt2:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 61m v1.20.2
zubt2 Ready control-plane,master 31m v1.20.2
zubt3 NotReady control-plane,master 13m v1.20.2

When also shut down zubt2:
scnzzh@zubt1:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt1:~$ kubectl get node
Error from server: rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field

scnzzh@zubt1:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

scnzzh@zubt1:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

When restart up zubt3:
scnzzh@zubt1:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 68m v1.20.2
zubt2 NotReady control-plane,master 38m v1.20.2
zubt3 Ready control-plane,master 20m v1.20.2

When shut down zubt1:
scnzzh@zubt3:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt3:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt3:~$ kubectl get node
Unable to connect to the server: net/http: TLS handshake timeout

scnzzh@zubt3:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

When restart up zubt2:
scnzzh@zubt3:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 NotReady control-plane,master 75m v1.20.2
zubt2 Ready control-plane,master 45m v1.20.2
zubt3 Ready control-plane,master 26m v1.20.2

When restart up zubt1:
scnzzh@zubt1:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 88m v1.20.2
zubt2 Ready control-plane,master 58m v1.20.2
zubt3 Ready control-plane,master 40m v1.20.2

A Kubernetes cluster has 3 nodes, will it still work when 2 of them are shut down?

The short answer is: It depends on whether those nodes were "Control Plane" nodes or "Worker" nodes.

In Kubernetes, nodes have different roles. If you lose 2 out of 3 nodes, the survival of your cluster depends on how those roles were distributed.

Scenario A: A Single-Master Cluster (1 Control Plane + 2 Workers)

In many small or development clusters, only one node acts as the "brain" (Control Plane), and the other two run your apps (Workers).

If the 2 Workers shut down: The cluster stays alive, but your apps (Pods) will have nowhere to run. The Control Plane will show the nodes as NotReady and wait for them to come back.
If the Control Plane + 1 Worker shut down: The cluster dies. You cannot issue commands (like kubectl get pods), and the remaining worker won't know what to do if its existing pods crash.

Scenario B: A High-Availability (HA) Cluster (3 Control Planes)

In production, you often use 3 nodes that all act as the Control Plane to ensure "High Availability." This setup relies on a database called etcd, which requires a quorum (a majority) to function.

The Math: To make decisions, the cluster needs more than half of the nodes to be online.
- For a 3-node cluster, the majority is 2.
Result: If 2 out of 3 nodes shut down, the cluster loses quorum. The cluster will stop working. Even though 1 node is still powered on, it will refuse to make any changes because it can't "vote" with a majority.

Summary Table

Setup	Can it survive 2 nodes down?	Why?
Standard 3-Node HA	No	Needs at least 2 nodes to maintain `etcd` quorum.
1 Master + 2 Workers	Partial	Only if the 1 Master is the one that stayed online.
5-Node HA	Yes	5 nodes can lose 2 and still have a majority of 3.

posted on 2021-02-01 09:40 ZhangZhihuiAAA 阅读(1506) 评论(0) 收藏举报

刷新页面返回顶部