Hi everyone,
I’v got a process to update cluster and node : im doing a cordon and then a drain. Since some deployments i have are using1 replica id like to achieve 0 downtime upgrade on these deployments.
I know i can use 2 replicas with pdb, i tested and its working but id like to try a method to have 0 downtime with 1 replica. I saw it was possible to use maxsurge to do that but can you explain me how to do that ? Thanks !
Hi @sashalarsoo
You’re absolutely right — achieving zero downtime with only 1 replica is inherently challenging, because kubectl drain
will evict the only running pod, leading to temporary unavailability.
That said, here are a few insights and approaches:
- Understanding the limitation with 1 replica
With a single replica, any upgrade or node drain will cause a moment where no pods are running — hence downtime is almost unavoidable unless we somehow pre-create the new pod before terminating the old one.
- Why
maxSurge
can help (only in Deployments)
When using Deployment
with rollingUpdate
strategy, you can configure:
yaml
CopyEdit
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
This allows creating a new pod before deleting the old one, even when you only have replicas: 1
.
– However, this works only if you’re not draining the node before the new pod is up — otherwise, the pod gets evicted anyway and you lose availability. So this works best for application updates, not for node draining.
- How to leverage
maxSurge
effectively
- Ensure you use a Deployment, not a StatefulSet or DaemonSet.
- Set
maxUnavailable: 0
and maxSurge: 1
.
- Do a rolling update of the Deployment, not a node drain, to trigger the zero-downtime rollout.
- Alternative approach for node maintenance
If you really need to drain nodes, but still avoid downtime with 1 replica, consider:
- Temporarily scale to 2 replicas, do the node drain, and scale back to 1 after.
- Or, do a rolling node upgrade without cordon/drain but by upgrading the nodes in-place (only works in some environments).
Let me know what controller type you’re using (Deployment, StatefulSet, etc.) and whether this is during app update or node upgrade, and I can give more tailored advice.
Hope this helps!