We’ve had an odd incident occur in a 1.15.7 cluster two days in a row. We’re not exactly sure about causality/order, but two things occur:
- The kube-apiserver stops accepting tokens it has issued
- Many
Kind: Secret
resources in the cluster disappear
This creates a very big mess in the cluster. Among other things, kube-controller-manager’s processes can no longer authenticate to do their things, so replicasets can’t provision pods, etc. Calico also falls apart due to authentication failures, so it’s a fairly big process to reset the cluster to working state.
This also causes the kube-apiserver to get DOS’ed given the amount of unauthorized tokens.
Has anyone experienced anything like this?
Cluster information:
Kubernetes version: 1.15.7
Cloud being used: AWS
Installation method: Kops
Host OS: Debian GNU/Linux 9 (stretch)
CNI and version: Calico 3.9.6
CRI and version: Docker 18.09.3