Pods fail to create, netplugin failed with no error message

R_H · March 12, 2023, 2:02pm

Hi,

We have a bare metal k8s cluster spread on about 12 nodes at the moment. This has been working fine for almost 2 years at this point, however in the last few days strange things started to happen.
Seemingly pods hang in CreateContainer state

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container network for pod : networkPlugin cni failed to set up pod network: netplugin failed with no error message: signal: killed

We noticed this first with our cron jobs in the cluster (we don’t currently dynamicly scale anything so really the only things that are created periodically are the container that run the cron tasks atm - we deploy our own apps on to it, but nothing has been deployed for over 2 weeks, and nothing was changed in terms of the base k8s install in a very long time). These started to pile up in this state with this error message. Unfortunately it doesn’t actually say what the problem is, and we couldn’t find anything meaningful. Tried looking for answers but everything I find always has a concrete error message so it is easy to resolve, this one doesn’t.
What seems to help it is if we kill all the hanging jobs, and/or if we restart the weave pods on nodes.

Within a day the issue would be back again, does anyone have any idea how to diagnose this further?

Thank you

EDIT:
Actually, slight edit, clearing the stuck stuff and cycling weave on all nodes doesn’t work anymore, the really annoying bit about this is if I hard reboot nodes, the services start up on them no problem, however once they all up literally a minute after I am just not able to start anything on the nodes anymore. Tried just random hello world deployments, they just not start with the same issue, tried restarting services that started up just fine after rebooting the node and they do not start either… (nodes are not tainted in any way)

Cluster information:

Kubernetes version: 1.21.1
Cloud being used: bare-metal
Host OS: ubuntu 20.04
CNI and version: weave 2.8.1

pavanreddymaley · May 4, 2024, 2:31pm

We recently started seeing this issue in our environment. @R_H did you find the solution to fix this issue ??

Topic		Replies	Views
CoreDNS pods stuck in ContainerCreating General Discussions	3	13446	May 17, 2022
Unable to connect to pod General Discussions k8s-blog	25	23853	March 27, 2019
Pod creation stuck at ContainerCreating state in 3 node k8s cluster General Discussions	1	2296	May 19, 2021
Kubernetes NetworkPlugin cni failed to set up pod General Discussions	0	4706	August 25, 2019
Kube-system pods stuck on ContainerCreating General Discussions	0	3250	August 28, 2020

Pods fail to create, netplugin failed with no error message

Cluster information:

Related topics