I’m a bit new to k8s in real world environments. For the most part my deployments have been smooth thus far. But recently I’m working with a service where the current builds have stability issues. Right now all pods are crashing and deployments of new and hopefully fixed code is failing because minimum replicaset requirements cannot be met.
We use templated deployment manifests maintained in a code repo and deployed via CD - so I can’t simply just modify the deployment manifest (well technically I could and deploy manually but it’s not how we do things). The issue here (I think) is the deployment has a rolling update strategy of 25% max unavailable.
So let’s assume a scenario where the 5 current pods are all constantly crashing - they may restart but never reach healthy status - is there any possibility this rolling updates strategy could ever work? What is the “best practice” way of resolving the issue?
What is the “best practice” way of resolving the issue?
That’s just my opinion, but as I see it, the problem is that the application has not been tested properly; the priority should be to make sure the application works (and then worry about how to deploy/update it).
You are requesting Kubernetes to update the Deployment keeping 25% of the pods running, but that cannot happen because of the stability issues of the application, so the update cannot be performed. Or it may succeed randomly, if 25% of the pods run long enough so the update process is completed…
I would remove the
maxUnavailable requirement for testing and update the “templated deployment” once the application is stable.
If that’s not an option, I would suggest to use Canary Deployments.
Scaling a Deployment describes how to pause and resume an update Deployments or how to tweak the update process (e.g., changing the
progressDeadlineSeconds and make the update fail faster)…