RollingUpdate order in deployments and statefulsets

Hi everyone!

I have some application installed as Deployment in k8s. This application is not just serving incoming requests, but also establishing outbound connections to some outer service in the Internet at the moment of startup. Outer service limits connection number to N, and attempt to set N+1 connection would leads to errors on client side (rebalancing and reestablishing connections, something like this).

It usually happens during update, so I added RollingUpdate stratege to the deployment to prevent excess replicas scheduling:

  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

However, it didn’t solve my problem because of specific deployment rollout sequence:

  • set the old pod to terminating status (graceful shutdown takes some time in my case)
  • at the same moment schedule the new pod
  • keep the new pod actually running, but not ready while old is still terminating
  • set the new pod to ready after deletion of the old pod

Since readiness of the pod means only ability to serve inbound connections for kubernetes, it does not prevent pod from actually running and establishing outbound connections.

At the same time, StatefulSet would wait for old pod complete deletion before spawning new one.

It’s not like my application is actually stateful, but it seems the only way to prevent overspawning is to use StatefulSet.

So I have some questions:

  • why k8s assumes that application is not doing anything if it’s not available from outside? Web servers are not the only type of applications
  • is there a way to force deploymet rollout to wait for old pod actual deletion before scheduling the new one, or is it the fundamental difference between Deployment and StatefulSet?
  • should I consider limited outbound connection as a state?

Cluster information:

Kubernetes version: v1.30.9-gke.1127000
Cloud being used: Google

Deployment is really managing pods in terms of readiness to serve. As soon as a pod is terminating, it is considered “done” from the Deployment’s POV and a new one is brought up.

AFAIK there is no way to ask for complete removal before replacement. It does not sound completely unreasonable but it doesn’t exist.

If you have such a finite pool, you need to handle retries internally, which you probably do anyway. Realistically, your outbound connections being limited effectively ARE state – just like a volume which cannot be double-mounted, you have to “hand off” these connection “slots”. You can either do that through some external mechanism (e.g. N leases) or simply try and retry.

If you don’t want to proceed in your rolling update until the connection is established, you might use a startupProbe which only becomes true once the connection is established, but be careful. Having a readiness probe which depends on an external service can lead to flapping of your service and overall worse experience.

1 Like