Kubernetes version: 1.17.11
Cloud being used: Baremetal, AWS
Installation method: Kubernetes Yum Repo
Host OS: Centos 7
CNI and version: Calico 3.16.0
CRI and version: Docker 19.03.12
One of our Kubernetes etcd pods is stuck in a crashloopbackoff. Logs show the error “state.commit is out of range”. Some investigation shows that it’s likely corrupted data, and SOP would be to remove the etcd member delete the data and rejoin.
However that’s made difficult by the fact that these pods are created by kubeadm. We’re unable to find details about the pods such as where the persistent volume is, how to re-add a member within the pod, etc.
Has anyone had to do this before? I can’t find any documentation on this. The closest I’ve found was kubeadm reset with the delete etcd pod flag.
EDIT: It’s worth noting that because the etcd pod won’t come up, the kubernetes API pod for this control node won’t come up either. We’re currently trying to point the API pod to a different etcd pod.