Cluster information:
Kubernetes version: 1.22.13
Cloud being used: AWS
CNI and version: v1.11.4-eksbuild.1
I have 4 deployments that are not restarting properly out of ~25 that are developed internally. They all live in the same namespace in EKS and are deployed by Helm via ArgoCD. These apps are also deployed to 15+ AWS accounts and the behavior is consistent in each cluster.
When attempting a kubectl rollout restart
all but the four deployments restart as you’d expect. We have default settings in play so 25% of each deployments’ pods go down at a time and new ones spin up. The other four however behave differently. Their behavior is like so:
DeploymentApp1 is running with ReplicaSetApp1.rev5 with 3 pods running
You perform a rollout restart on DeploymentApp1.
DeploymentApp1 scales up ReplicaSetApp1.rev6
DeploymentApp1 scales down ReplicaSetApp1.rev6
ReplicaSetApp1.rev5 becomes ReplicaSetApp1.rev7 with 3 pods running with no change to pod status/age.
Events/logs show you a successful restart.
- I’ve investigated leads where too many replicasets exist which in some past meant things got confused, but we use the default of 10 and none are over the limit. Killing all the replicasets to start back at one did not influence the behavior.
- Deleting the deployment and allowing ArgoCD to put it back did not change the behavior on subsequent rollout restarts. It did of course result in everything being destroyed and spinning back up.
- Deleting a single pop results in a new pod spinning up as expected.
- Having ArgoCD send the restart duplicates the unexpected behavior. My testing is all native otherwise.
- Labels and annotations seem to match up at every level.
- We don’t set maxSurge or maxAvailable
I’m out of ideas as to what may be influencing the problem apps.