Powercycle of one of the worker nodes - results in the worker node being useless after restart- pods stay in "ContainerCreating"

nsprash · August 12, 2019, 5:48am

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Root cause for containers stuck in worker-node when it was restarted or due to power cycle.

When one of the worker-node goes down due to power cycle while master is scheduling the pods between worker nodes, Once the worker-node comes up and running master is able to schedule the remaining pods to worker-node which came up However all the pods which are scheduled to the worker-node are stuck in the container creating state for a long time and the worker-node which came up after the power cycle is in no use.

The networking pod -(weave) skipped the network check of all these containers on the restarted node. With no ip assigned to these pods, they stay in ContainerCreating state.

Jul 26 15:45:06 k8sworker3 kubelet[1832]: W0726 15:45:06.622277 1832 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod “job-5d242b93c6ba2500011bfe3b-1564172924508-h9vw5_”: CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container “bf4a489a1d46705163fdc228486398d8d33d2c6e41dc354f32de5f5d6986abcc”

Recovery: Delete all pods to release the max- resources(110 pods) on worker3. Once the resources are replenished, newly created pods should receive IP address and proceed to completion

Kubernetes version: 1.15
Cloud being used: (put bare-metal if not on a public cloud)
Installation method:Ansible Script
Host OS: Centos 7
CNI and version:
CRI and version:

Topic		Replies	Views
Pods got deleted, but docker container is still running on the worker node General Discussions	0	402	June 8, 2021
Contiv-vswitch pod restarts when new node joins the cluster General Discussions development , network , architecture , kubernetes-custom-resources	0	463	May 5, 2022
Pods fail to create, netplugin failed with no error message General Discussions network	1	1716	May 4, 2024
Kube-system pods stuck on ContainerCreating General Discussions	0	3318	August 28, 2020
Kubernetes Master Worker Node issue General Discussions kubernetes-custom-resources	2	1496	August 30, 2022

Powercycle of one of the worker nodes - results in the worker node being useless after restart- pods stay in "ContainerCreating"

Cluster information:

Related topics