Cluster information:
Kubernetes version: 1.17.3
Cloud being used: baremetal
Installation method: kubeadm (Stacked control plane and etcd nodes)
Host OS: Centos 7.8
CNI and version: flannel 0.11.0
CRI and version: docker-ce-19.03.4
I have three nodes multi-master kubernetes(1.17.3) cluster(Stacked control plane and etcd nodes),
11.11.11.1 - master1
11.11.11.2 - master2
11.11.11.3 - master3
So before going to productions, I am trying possible failures and did below steps
Graceful Removal of Master Nodes
Run kubectl drain 11.11.11.3
on master3kubeadm reset
on master 3kubectl delete node 11.11.11.3
on master3
So by applying above steps all pods are running on master 1 and 2, it removes entries from kubeadm-conf
configmap and also from etcd
, infact i run above steps on master2 and still one master is up and running and i can run kubectl
.
Non-Graceful Removal of Master Nodes
- I shutdown master3 but dont face any issue still two master accessible and i can run
kubectl
and do administrations. - As soon as i shut master2 i have no access to
kubectl
and its saying apiserver is not accessible. How i can restore master1 in this situation?
This can happen that two nodes may have hardware issue in production. I want that till the time i fix hardware issue on master2 and master3 i could use cluster (As we can do with single master). So its like scaling down cluster from 3 nodes to 1 node.
If i can access etcd and remove master2 and master3 it may work, i thought to do docker ps
and docker exec <etcd container>
but docker ps is showing etcd container in exiting state and repeating following in logs
raft2020/06/09 17:50:04 INFO: a2311c10a9fc5790 is starting a new election at term 297
raft2020/06/09 17:50:04 INFO: a2311c10a9fc5790 became candidate at term 298
raft2020/06/09 17:50:04 INFO: a2311c10a9fc5790 received MsgVoteResp from a2311c10a9fc5790 at term 298
raft2020/06/09 17:50:04 INFO: a2311c10a9fc5790 [logterm: 22, index: 66984] sent MsgVote request to 65d952fd463d8693 at term 298
raft2020/06/09 17:50:04 INFO: a2311c10a9fc5790 [logterm: 22, index: 66984] sent MsgVote request to 87b89f8e10cada7e at term 298
2020-06-09 17:50:05.752320 W | rafthttp: health check for peer 65d952fd463d8693 could not connect: dial tcp 11.11.11.2:2380: connect: no route to host
2020-06-09 17:50:05.752372 W | rafthttp: health check for peer 65d952fd463d8693 could not connect: dial tcp 11.11.11.2:2380: i/o timeout
2020-06-09 17:50:05.753299 W | rafthttp: health check for peer 87b89f8e10cada7e could not connect: dial tcp 11.11.11.3:2380: i/o timeout
2020-06-09 17:50:05.753349 W | rafthttp: health check for peer 87b89f8e10cada7e could not connect: dial tcp 11.11.11.3:2380: connect: no route to host