First master crashed, all other nodes impacted. Why?

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.15.3
Cloud being used: bare-metal
Installation method: kubespray
Host OS: CentOS 7
CNI and version: docker.io/calico/cni:v3.7.3

Hi. Sorry for my bad english.

Today I had a big probel on my cluster. The kubelet client certificate has expired. I have changed it, and I upgraded on the same time my cluster to version 1.15.3. After that, my first master had a system crash.

The server has been removed (kubectl delete node xxxx) and reset after a file restore (kubeadm reset).
Since the crash, all other nodes have problems to create or restart containers:

Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0c24b200a9cc5b9294a38f92c117ea8b4742586120bfb230d2b688d940196bbc" network for pod "trs-mop-6c5d87b585-7mkjq": NetworkPlugin cni failed to set up pod "trs-mop-6c5d87b585-7mkjq_trs" network: dial tcp 172.20.24.56:2379: connect: connection refused, failed to clean up sandbox container "0c24b200a9cc5b9294a38f92c117ea8b4742586120bfb230d2b688d940196bbc" network for pod "trs-mop-6c5d87b585-7mkjq": NetworkPlugin cni failed to teardown pod "trs-mop-6c5d87b585-7mkjq_trs" network: dial tcp 172.20.24.56:2379: connect: connection refused]

All nodes still trying to send all requests to the first marster … All containers already running when the first server crashes are still up as long as they don’t restart

I’d like to have more time to restore correctly the first master. Why all nodes are stil using the first master ? How can I change this to force them to work with other masters (I have 2 other masters that are running).

Thanks in advance for your help.