First master crashed, all other nodes impacted. Why?

azer · March 31, 2020, 6:53pm

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.15.3
Cloud being used: bare-metal
Installation method: kubespray
Host OS: CentOS 7
CNI and version: docker.io/calico/cni:v3.7.3

Hi. Sorry for my bad english.

Today I had a big probel on my cluster. The kubelet client certificate has expired. I have changed it, and I upgraded on the same time my cluster to version 1.15.3. After that, my first master had a system crash.

The server has been removed (kubectl delete node xxxx) and reset after a file restore (kubeadm reset).
Since the crash, all other nodes have problems to create or restart containers:

Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0c24b200a9cc5b9294a38f92c117ea8b4742586120bfb230d2b688d940196bbc" network for pod "trs-mop-6c5d87b585-7mkjq": NetworkPlugin cni failed to set up pod "trs-mop-6c5d87b585-7mkjq_trs" network: dial tcp 172.20.24.56:2379: connect: connection refused, failed to clean up sandbox container "0c24b200a9cc5b9294a38f92c117ea8b4742586120bfb230d2b688d940196bbc" network for pod "trs-mop-6c5d87b585-7mkjq": NetworkPlugin cni failed to teardown pod "trs-mop-6c5d87b585-7mkjq_trs" network: dial tcp 172.20.24.56:2379: connect: connection refused]

All nodes still trying to send all requests to the first marster … All containers already running when the first server crashes are still up as long as they don’t restart

I’d like to have more time to restore correctly the first master. Why all nodes are stil using the first master ? How can I change this to force them to work with other masters (I have 2 other masters that are running).

Thanks in advance for your help.

Topic		Replies	Views
Pod creation stuck at ContainerCreating state in 3 node k8s cluster General Discussions	1	2278	May 19, 2021
Error while starting POD in a newly created kubernetes cluster (ContainerCreating) General Discussions	0	1626	March 25, 2020
Ubuntu 18 kubernates don't go up after reboot General Discussions	0	532	October 25, 2020
My PODs are stuck creating General Discussions development , network	4	4796	October 27, 2021
Pods fail to create, netplugin failed with no error message General Discussions network	1	1606	May 4, 2024

First master crashed, all other nodes impacted. Why?

Cluster information:

Related topics