Client is disconnected when the pod is terminating or fails the readiness probe

Hi,

I’m managing one server on k8s which serves HTTP API consuming quite a long time to respond to it.
Pods are deployed as StatefulSet and use RollingUpdate as an update strategy.
Also, the type of service is LoadBalancer.
For the maintenance, when I update my server, the pod should wait for all the requests to be responded before exiting. (I mean graceful shutdown.)

I read following articles:

After I read them, my understanding about the process of pod termination is here:

  1. Change the status of pod to Terminating and remove it from service endpoints.
    : When pod comes to this status, LoadBalancer doesn’t send new requests to this pod.
  2. Execute preStop phase if available.
  3. Send SIGTERM to pod.
  4. Wait for pod terminates before terminationGracePeriod.
  5. If terminationGracePeriod is expired, send SIGKILL to pod.

At step 1, I thought that LoadBalancer will not send new requests to this pod, but also it will NOT disconnect the connections which are established before this step.
However, in my environment, it closes all the client connections and clients get connection reset by peer error.
On the server side, the server isn’t aware of it and it tries to write a response to the closed connection and is blocked.
Regardless of the process of termination, I’m experiencing the same thing when I just make the pod fail the readiness probe while processing the requests from the clients.

I’m using the internal k8s platform in my company and I asked the same issue to its managers.
They said closing client connections when the pod is removed from service endpoints is the official spec of k8s.
However, I think keeping connections and letting the pod handle them gracefully are more reasonable.

Could you guys please confirm whether it is truly a spec of k8s or not?
There are several docs which say pods will not receive new connections in Terminating or Not-Ready status, but it is hard to find an official doc that says already established connections will be closed or not.
Also could you guys suggest some points or ways that I or our platform managers can try on settings of k8s to slove this issue?

Thanks!

Cluster information:

Kubernetes version: v1.15.10
I’m sorry but, as I’m using the internal k8s platform in my company as I said above, the detailed cluster information is invisible to me.

Hi Junghoon:

From the Pod Lifecycle you’ve provided:

Pods that shut down slowly cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from the list of endpoints as soon as the termination grace period begins.

As the pod is removed as a valid endpoint, your client gets a connection reset by peer.

I am no developer, but regarding the 12 factor app

Processes shut down gracefully when they receive a SIGTERM signal from the process manager. For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost.

Your description seems to fit in the “long polling” scenario described here, so maybe the application can be updated to retry the un-processed request (on a different pod).

Best regards,

Xavi

The unfortunate answer is that it was under-defined. Both behaviors exist.

With the rise of EndpointSlice, we have more metadata to work with, and sig-net is discussing what the ideal behavior should be. That said, we can’t just expect everyone to change their implementations over night. There’s got to be some amount of “implementation defined” freedom.

In MY opinion, connections MUST survive while an endpoint is marked as terminating but MAY be killed when an endpoint is removed. To do that cleanly, we have open KEPs to track that intermediate state.

Hi, Xavi and Thockin.

Thank you for answering my question.

It looks controversial to say removing a pod from endpoints implies closing the client connections.
I tested the same server in the other k8s environment, but it didn’t close the client connections when the pod was removed from endpoints.
Therefore, I think saying the answer is undefined or under-defined is correct as thockin said.

I’m not sure which part of the environment decides the behavior, but I hope someday there is an option in k8s to choose behavior explicitly or can add hooks like preStop before removing pods from endpoints.
For now, I think I have to find other ways to avoid this issue.

Thanks a lot!

I tested the same server in the other k8s environment, but it didn’t close the client connections when the pod was removed from endpoints.

Today we just don’t spec that, and so implementations do what they want. But also, today we do not disitinguish “this endpoint exists but is terminating” from “this endpoint doesn’t exist”. Once we have that, I think implementations can be smarter.

I’m not sure which part of the environment decides the behavior, but I hope someday there is an option in k8s to choose behavior explicitly

It’s a combination of the service proxy (kube-proxy, usually) and the LB implementation. I don’t want to add parameters here, but as I said - I think more metadata will allow better impl choices. Coming soon.