Cluster information:
Kubernetes version: v1.28.2
Cloud being used: Virtualbox
Installation method: Kubernetes Cluster VirtualBox
Host OS: Ubuntu 22.04.3 LTS
CNI and version: calico
CRI and version: containerd://1.7.2
Cluster contains with 1 Master node and 2 Worker nodes.
Once cluster is started for a moment (matter of 1-2 minutes since startup) looks good:
lab@master:~$ kubectl -nkube-system get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-7ddc4f45bc-4qx7l 1/1 Running 12 (2m11s ago) 13d 10.10.219.98 master <none> <none>
calico-node-bqlnm 1/1 Running 3 (2m11s ago) 4d2h 192.168.1.164 master <none> <none>
calico-node-mrd86 1/1 Running 105 (2d20h ago) 4d2h 192.168.1.165 worker01 <none> <none>
calico-node-r6w9s 1/1 Running 110 (2d20h ago) 4d2h 192.168.1.166 worker02 <none> <none>
coredns-5dd5756b68-njtpf 1/1 Running 11 (2m11s ago) 13d 10.10.219.100 master <none> <none>
coredns-5dd5756b68-pxn8l 1/1 Running 10 (2m11s ago) 13d 10.10.219.99 master <none> <none>
etcd-master 1/1 Running 67 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-apiserver-master 1/1 Running 43 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-controller-manager-master 1/1 Running 47 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-proxy-ffnzb 1/1 Running 122 (95s ago) 12d 192.168.1.165 worker01 <none> <none>
kube-proxy-hf4mx 1/1 Running 108 (78s ago) 12d 192.168.1.166 worker02 <none> <none>
kube-proxy-ql576 1/1 Running 15 (2m11s ago) 13d 192.168.1.164 master <none> <none>
kube-scheduler-master 1/1 Running 46 (2m11s ago) 13d 192.168.1.164 master <none> <none>
metrics-server-54cb77cffd-q292x 0/1 CrashLoopBackOff 68 (18s ago) 3d21h 10.10.30.94 worker02 <none> <none>
However, after some minutes later, pods in kube-system namespace start flapping/crashing.
lab@master:~$ kubectl -nkube-system get po
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7ddc4f45bc-4qx7l 1/1 Running 12 (19m ago) 13d
calico-node-bqlnm 0/1 Running 3 (19m ago) 4d2h
calico-node-mrd86 0/1 CrashLoopBackOff 111 (2m28s ago) 4d2h
calico-node-r6w9s 0/1 CrashLoopBackOff 116 (2m15s ago) 4d2h
coredns-5dd5756b68-njtpf 1/1 Running 11 (19m ago) 13d
coredns-5dd5756b68-pxn8l 1/1 Running 10 (19m ago) 13d
etcd-master 1/1 Running 67 (19m ago) 13d
kube-apiserver-master 1/1 Running 43 (19m ago) 13d
kube-controller-manager-master 1/1 Running 47 (19m ago) 13d
kube-proxy-ffnzb 0/1 CrashLoopBackOff 127 (42s ago) 12d
kube-proxy-hf4mx 0/1 CrashLoopBackOff 113 (2m17s ago) 12d
kube-proxy-ql576 1/1 Running 15 (19m ago) 13d
kube-scheduler-master 1/1 Running 46 (19m ago) 13d
metrics-server-54cb77cffd-q292x 0/1 CrashLoopBackOff 73 (64s ago) 3d22h
It is completely unclear to me what is wrong and by checking pods description I see repeating events:
lab@master:~$ kubectl -nkube-system logs kube-proxy-ffnzb
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 2d20h (x50 over 3d1h) kubelet Stopping container kube-proxy
Warning BackOff 2d20h (x1146 over 3d1h) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)
Normal Pulled 8m11s (x3 over 10m) kubelet Container image "registry.k8s.io/kube-proxy:v1.28.6" already present on machine
Normal Created 8m10s (x3 over 10m) kubelet Created container kube-proxy
Normal Started 8m10s (x3 over 10m) kubelet Started container kube-proxy
Normal SandboxChanged 6m56s (x4 over 10m) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Killing 4m41s (x4 over 10m) kubelet Stopping container kube-proxy
Warning BackOff 12s (x28 over 10m) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ffnzb_kube-system(79f808ba-f450-4103-80a9-0e75af2e77cf)
Note!
This situation does not prevent me to deploy some example deployments (nginx) - it seems to be running stable.
Yet, I tried to add metrics-server and this one is crashing (possibly it is related to CrashLoopBackOff pods in kube-system namespace)
Any ideas what might be wrong/where else to look to troubleshoot?