Pod End of Life: Still serving after sigterm?

AlexLo_hopper · December 13, 2022, 5:21pm

You use the kubectl tool to manually delete a specific Pod, with the default grace period (30 seconds).

The Pod in the API server is updated with the time beyond which the Pod is considered “dead” along with the grace period […]
2.2. The kubelet triggers the container runtime to send a TERM signal to process 1 inside each container.
[…]

At the same time as the kubelet is starting graceful shutdown, the control plane removes that shutting-down Pod from EndpointSlice (and Endpoints) objects where these represent a Service with a configured selector. ReplicaSets and other workload resources no longer treat the shutting-down Pod as a valid, in-service replica. Pods that shut down slowly cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from the list of endpoints as soon as the termination grace period begins.

We see traffic hitting our workloads for small period of time after the SIGTERM happens - my question is why is the order of 2 and 3 not reversed? In other words I would like the pod to no longer be serving strictly before termination is started.

Thanks, Alex

thockin · December 13, 2022, 5:42pm

Please read through this and see if it answers your question

AlexLo_hopper · December 13, 2022, 5:49pm

Thank you @thockin !

I know it's not super satisfying. We want a deterministic answer, but I hope you can now reason through why this hard. You can explore more by emulating the "better" process.

It does “seem” like the k8s controller could dictate the order a bit harder, but I will take your word for it.

Thanks

AlexLo_hopper · December 13, 2022, 5:55pm

Thinking on it further (for anyone else looking at this): I believe the endpointslice update has to propagate to all the other k8s nodes before the other nodes will effectively stop sending traffic to the pod.

thockin · December 13, 2022, 6:49pm

The problem is there is an arbitrary number of things which can take an arbitrary amount of time to program and that is almost entirely OUTSIDE of the core of Kubernetes.

The deterministic answer is something like:

every interested agent (subsystem, controller, etc) registers their interest in individual endpoints
stopping a pod first changes something about the endpoint
wait for every interested agent to ACK, probably with a timeout
- note: this means EVERY NODE and LB controller has to ack that its own dataplane was updated
once complete, then start pod shutdown

It’s not impossible, it’s just fairly low RoI effort.

Topic		Replies	Views
When scale replicas==0, do K8s send SIGTERM? General Discussions	9	6952	July 15, 2020
Client is disconnected when the pod is terminating or fails the readiness probe General Discussions network	5	6596	April 30, 2024
Pods terminating on downed node General Discussions	4	921	March 28, 2019
How do I block a pod termination in a controller? General Discussions development , network	3	731	February 8, 2023
Does the kubelet SIGKILL PID 1 or PID -1? General Discussions	1	1933	March 10, 2021

Pod End of Life: Still serving after sigterm?

Related topics