Pods terminating on downed node

CrankyCoder · March 26, 2019, 7:41pm

So last night something on my esxi lab environment happened and one of my nodes vm completely shut off. Not gracefully.

When i did a kubectl get pods i saw that the pods that had been running on that node showed terminating, and new pods had spun up on the other node. However, since the previous nodes were still “terminating” it appears they were still be sucked in to the svc. So the svc was not fully functional because it was trying to route traffic to pods that were terminating.

The system restored stability when the node booted back up. the pods completed their termination and the svc’s came back online.

Is there something I can configure that would force delete of terminating pods after X amount of time? or if the node is gone?

Thanks!

macintoshprime · March 26, 2019, 7:50pm

Try this, it’s worked for me in the past
kubectl delete pod NAME --grace-period=0 --force

CrankyCoder · March 26, 2019, 8:22pm

I was wondering if there is a way to do it automatically instead of force killing it. Seems counter productive for terminating pods that aren’t accessible to impact the rest of the service. I figured I must have missed something.

macintoshprime · March 27, 2019, 3:38pm

ya sadly I haven’t come across anything that automatically does that. There is an open ticket for it here if you want to track it https://github.com/kubernetes/kubernetes/issues/51835

rata · March 28, 2019, 4:17am

CrankyCoder
    March 26
So last night something on my esxi lab environment happened and one of my nodes vm completely shut off. Not gracefully.

When i did a kubectl get pods i saw that the pods that had been running on that node showed terminating, and new pods had spun up on the other node. However, since the previous nodes were still “terminating” it appears they were still be sucked in to the svc. So the svc was not fully functional because it was trying to route traffic to pods that were terminating.

That seems more surprising. The pods in the terminating state were still shown in the endpoints? We’re the pods marked as ready too?

If that is the case, it smells like something worth reporting (if not already report, please check before opening a new issue).

The system restored stability when the node booted back up. the pods completed their termination and the svc’s came back online.

Is there something I can configure that would force delete of terminating pods after X amount of time? or if the node is gone?

I’m really not sure if this is a bug to report, that might be fixable, or if there is something you can tune.

I’d consider looking for open issues, reading documentation heavily and try to reproduce with different kubernetes versions (if easy for you to change them).

Topic		Replies	Views
[Question] Does Kubernetes support hundreds PODs in “terminating” status for a week? General Discussions development	0	581	June 2, 2020
Self-terminate multi-container pod from within General Discussions	0	7205	December 4, 2018
Kubernetes deployment strategy General Discussions	5	4714	September 20, 2019
Why "kubectl get pods" show pod still running while its node was poweroff? General Discussions	3	3812	March 20, 2021
Node down - pods shown still as Running for hours, others stuck in Terminating General Discussions	5	9104	August 4, 2022

Pods terminating on downed node

Related topics