Kubernetes version: 1.20.4
Cloud being used: bare-metal
Host OS: Ubuntu 20.04.2 LTS
CNI and version: flannel v0.13.0
CRI and version: containerd v1.4.4
We have a recently installed cluster with 20 nodes (3 masters and 17 workers) that we have been using for a couple months without any major incidents.
However, this past weekend some of our deployments (13 in total) became unavailable unexpectedly. We jumped into one of the master nodes and just found out that they have been scaled down to 0 for no apparent reason.
Upon further investigating the logs on the controller manager, we saw multiple entries like these ones below, basically one “scaled down event” for each service that went down.
I0411 10:36:36.761404 1 event.go:291] “Event occurred” object=“prd/service1” kind=“Deployment” apiVersion=“apps/v1” type=“Normal” reason=“ScalingReplicaSet” message=“Scaled down replica set service1-57db45b756 to 0”
I0411 10:36:36.785589 1 event.go:291] “Event occurred” object=“prd/service1-57db45b756” kind=“ReplicaSet” apiVersion=“apps/v1” type=“Normal” reason=“SuccessfulDelete” message=“Deleted pod: service1-57db45b756-pshps”
My question is: is there anything on Kubernetes 1.20 that would scale down a bunch of deployments automatically without user intervention (eg: running kubectl and explicitly scaling them down)?
Is there anything else I can try to check in the cluster to determine what caused those deployments to scale down without user intervention?