We have cluster Autoscalar ,overprovisioning,descheduler setup in our cluster.
Overprovisiong is set as replicacount 44. (11 pods gives a buffer of 1 EC2 instance)
We are seeing some issues like nodes are scaled down in 10 mins and in 20 mins new nodes are added.This is sometimes repeating.When cluster autoscalar find that the node is not used for 10 mins,then it tries to scale down…But in next 20 mins,there is a need to create a new node.
Not sure how to tune this…Is this because of rebalancing happening due to descheduler cronjob which is running every 30 mins? Or is this because of Overprovisioning replicacount 44? Or any other things we should consider?
k logs -n kube-system cluster-autoscaler-aws-cluster-autoscaler-774bbb4cf-9mq4z aws-cluster-autoscaler | ag “(scale-up plan)|(removing empty node)”
I0505 22:42:14.659857 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-36-56.ec2.internal
I0505 22:42:14.660428 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46611687", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-36-56.ec2.internal
I0505 23:00:18.881836 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0505 23:13:01.741943 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-42-5.ec2.internal
I0505 23:13:01.742378 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46624852", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-42-5.ec2.internal
I0505 23:30:05.422452 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0505 23:42:08.867123 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-36-45.ec2.internal
I0505 23:42:08.868311 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46637400", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-36-45.ec2.internal
I0506 00:00:13.286783 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]
I0506 00:12:26.796684 1 scale_down.go:938] Scale-down: removing empty node ip-10-120-59-39.ec2.internal
I0506 00:12:26.796974 1 event.go:258] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"28c7ffa0-de2e-4a5f-856f-69684065056b", APIVersion:"v1", ResourceVersion:"46650479", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node ip-10-120-59-39.ec2.internal
I0506 00:30:22.431082 1 scale_up.go:533] Final scale-up plan: [{nodes.us-east-1.apicentral.axwaydev.net 26->27 (max: 30)}]