Asking for help? Comment out what you need so we can get more information to help you!
Cluster information:
Kubernetes version: 1.28.2
Cloud being used: AWS govcloud
Installation method: cloudformations / chef / manual
Host OS: redhat 8.8
CNI and version: Calico v3.24.5
CRI and version: containerd.io 1.6.31-3.1.el8
I have a handful of nodes in AWS that are not connected to the internet, and are behind a web proxy for the limited web services available on the network.
I have pods that remain stuck in “ContainerCreating” with very little feedback. The one common error with all of them is:
“Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up network for sandbox “(imagine a 32 bit uuid here)”: plugin type “calico” failed (add): netplugin failed but error parsing its diagnostic message “No valid options provided. Usage:\n”: invalid character ‘N’ looking for beginning of value”
Kubelet and containerd both throw that message repeatedly for any pods stuck in creating.
The nature of the message makes me think that something is trying to reach something else, and is getting a proxy instead OR that there’s some calico command that isn’t completing. The error is un-Google-able; no one has ever recorded that error in google search space.
I’ve been staring at it for three weeks, and in the course of that time have tried pretty much everything you can think of. BUT I do know I have’t tried everything, ever. I do know that this exact network was working with kubernetes 1.14 on redhat7.
Is there any way for me to get better debug logs from, say, metrics-server while it tries to spawn? Any (essentially) manual methods for picking through a pod’s initialization? Does kubernetes have any troubleshooting or debugging, or is it just: start over again, again, again?