Pods terminating on downed node

So last night something on my esxi lab environment happened and one of my nodes vm completely shut off. Not gracefully.

When i did a kubectl get pods i saw that the pods that had been running on that node showed terminating, and new pods had spun up on the other node. However, since the previous nodes were still “terminating” it appears they were still be sucked in to the svc. So the svc was not fully functional because it was trying to route traffic to pods that were terminating.

The system restored stability when the node booted back up. the pods completed their termination and the svc’s came back online.

Is there something I can configure that would force delete of terminating pods after X amount of time? or if the node is gone?

Thanks!

Try this, it’s worked for me in the past
kubectl delete pod NAME --grace-period=0 --force

I was wondering if there is a way to do it automatically instead of force killing it. Seems counter productive for terminating pods that aren’t accessible to impact the rest of the service. I figured I must have missed something.

ya sadly I haven’t come across anything that automatically does that. There is an open ticket for it here if you want to track it https://github.com/kubernetes/kubernetes/issues/51835

CrankyCoder

    March 26

So last night something on my esxi lab environment happened and one of my nodes vm completely shut off. Not gracefully.

When i did a kubectl get pods i saw that the pods that had been running on that node showed terminating, and new pods had spun up on the other node. However, since the previous nodes were still “terminating” it appears they were still be sucked in to the svc. So the svc was not fully functional because it was trying to route traffic to pods that were terminating.

That seems more surprising. The pods in the terminating state were still shown in the endpoints? We’re the pods marked as ready too?

If that is the case, it smells like something worth reporting (if not already report, please check before opening a new issue).

The system restored stability when the node booted back up. the pods completed their termination and the svc’s came back online.

Is there something I can configure that would force delete of terminating pods after X amount of time? or if the node is gone?

I’m really not sure if this is a bug to report, that might be fixable, or if there is something you can tune.

I’d consider looking for open issues, reading documentation heavily and try to reproduce with different kubernetes versions (if easy for you to change them).