ZhangZhihui's Blog  

I have an HA k8s cluster which has 3 control-plane nodes. When 2 of the 3 control-plane nodes are shut down, the cluster stops working.

When shut down zubt3:
scnzzh@zubt2:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 61m v1.20.2
zubt2 Ready control-plane,master 31m v1.20.2
zubt3 NotReady control-plane,master 13m v1.20.2

 

When also shut down zubt2:
scnzzh@zubt1:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt1:~$ kubectl get node
Error from server: rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field

scnzzh@zubt1:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

scnzzh@zubt1:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

 

When restart up zubt3:
scnzzh@zubt1:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 68m v1.20.2
zubt2 NotReady control-plane,master 38m v1.20.2
zubt3 Ready control-plane,master 20m v1.20.2

 

When shut down zubt1:
scnzzh@zubt3:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt3:~$ kubectl get node
Error from server: etcdserver: request timed out

scnzzh@zubt3:~$ kubectl get node
Unable to connect to the server: net/http: TLS handshake timeout

scnzzh@zubt3:~$ kubectl get node
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

 

When restart up zubt2:
scnzzh@zubt3:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 NotReady control-plane,master 75m v1.20.2
zubt2 Ready control-plane,master 45m v1.20.2
zubt3 Ready control-plane,master 26m v1.20.2

 

When restart up zubt1:
scnzzh@zubt1:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
zubt1 Ready control-plane,master 88m v1.20.2
zubt2 Ready control-plane,master 58m v1.20.2
zubt3 Ready control-plane,master 40m v1.20.2

 

A Kubernetes cluster has 3 nodes, will it still work when 2 of them are shut down?

 

The short answer is: It depends on whether those nodes were "Control Plane" nodes or "Worker" nodes.

In Kubernetes, nodes have different roles. If you lose 2 out of 3 nodes, the survival of your cluster depends on how those roles were distributed.


Scenario A: A Single-Master Cluster (1 Control Plane + 2 Workers)

In many small or development clusters, only one node acts as the "brain" (Control Plane), and the other two run your apps (Workers).

  • If the 2 Workers shut down: The cluster stays alive, but your apps (Pods) will have nowhere to run. The Control Plane will show the nodes as NotReady and wait for them to come back.

  • If the Control Plane + 1 Worker shut down: The cluster dies. You cannot issue commands (like kubectl get pods), and the remaining worker won't know what to do if its existing pods crash.


Scenario B: A High-Availability (HA) Cluster (3 Control Planes)

In production, you often use 3 nodes that all act as the Control Plane to ensure "High Availability." This setup relies on a database called etcd, which requires a quorum (a majority) to function.

  • The Math: To make decisions, the cluster needs more than half of the nodes to be online.

    • For a 3-node cluster, the majority is 2.

  • Result: If 2 out of 3 nodes shut down, the cluster loses quorum. The cluster will stop working. Even though 1 node is still powered on, it will refuse to make any changes because it can't "vote" with a majority.


Summary Table

Setup Can it survive 2 nodes down? Why?
Standard 3-Node HA No Needs at least 2 nodes to maintain etcd quorum.
1 Master + 2 Workers Partial Only if the 1 Master is the one that stayed online.
5-Node HA Yes 5 nodes can lose 2 and still have a majority of 3.
posted on 2021-02-01 09:40  ZhangZhihuiAAA  阅读(1506)  评论(0)    收藏  举报