How do I block a pod termination in a controller?

Hi, I’m developing a controller to update my loadbalancer’s upstream endpoint but I found no way to ensure the pod ip in my loadbalancer’s upstream has been deleted before the pod got terminated when in the process of a normal rolling update. This may lead to some timeouts.

I wonder how endpoint controller handles this, so I took a glimpse of the controller manager’s source code, if I’m reading it right, the pod is not removed from the endpoint slice until it has been terminated(since the controller uses podInformer’s delete hook), isn’t it too late? But from experience, it seems endpoint controller dealt this rolling update scenario well. Am I missing something?

The general answer is to set terminationGracePeriodSeconds on your pod so that upstream LBs have time to “catch up”. It’s not always practical to confirm that a every pod deletion has been removed from LBs, but you can choose to do so with a Pod finalizer. That will race with kubelet sending SIGTERM, so you still need to manage that.

We keep it in the slice while it is terminating to handle some edge-cases where the whole deployment is terminating or where the only pod on a node is terminating but can still serve, but it’s clearly marked as not-ready.

LBs should route away from unready endpoints unless as a last-resort with no other choice.

Also https://twitter.com/thockin/status/1560398974929973248

Got it, thanks.