Kubeadm join re-created control plane

Cluster information:

Kubernetes version: 1.20
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: ubuntu 18
CNI and version: weave-net
CRI and version: docker 20


Just trying to break-and-fix to play around with disaster recovery:

  • Using kubeadm, I created a cluster with 1 master and 2 workers (total 3 virtualbox headless vm’s) and cluster worked fine: kubectl get nodes and get pods -n kube-system showed everything ready
  • I create a secret (as a sentinel for later)
  • I backed up the etcd db and saved the etcd static pod yaml to the host and completely deleted the master.
  • Then I recreated the master node, ran kubeadm init and installed weave-net into the cluster. The kubectl get pod -n kube-system showed all pods running and kubectl get nodes showed the master ready.
  • Then I restored the etcd db snapshot and edited the etcd static pod yaml (data dir and initial cluster token). A kubectl get secrets showed the secret I had created. HOWEVER, I noticed that the weave-net (CNI plugin) was getting authorization problems; so I deleted it an re-applied it. It ran fine after that.


Two problems remain:

  1. coredns pods are non-ready because they seem (based on their logs) to be unauthorized to use the cluster
  2. the worker nodes are not authorized to rejoin the cluster

For #1, I’m guessing it is because the etcd I restored does not have a service account token that matches what is mounted in the coredns deployment. How do I fix that?

I’m hoping that once I fix #1, for #2 I will just need to execute the kubeadm join on each node but I suspect they will all have the same issue as coredns, because their service account token will be obsolete too.