Is HPA thrashing a real problem?

framigni · December 6, 2021, 5:31am

We have our website running on a AKS cluster and HPA enabled on a couple of services (frontend and backend pods), min 2 max 4, on CPU 50% average-utilization
There is plenty of CPU available, but constantly fluctuating (probably because of the crawling) forcing the HPA to scale up and down all the time. It’s an issue called thrashing.
We have been tinkering with different parameters for mitigating, but there was always a side effect that compensated the benefit (e.g. website crashes or too many nodes or…).
I wonder if anyone knows more about thrashing and if it can be really a problem or at what rate can be considered a problem ?
We don’t see any performance issues at the moment, but we are worried that we may be in a hazardous state and things might get pear shape at any moment.
I’m uploading here a snapshot of 6 hours diagram of the pods number, where

you can have an idea of the autoscaling rate

Cluster information:

Kubernetes version: 1.21.2
Cloud being used: Azure
Installation method: Terraform
Host OS: Ubuntu

Topic		Replies	Views
HPA autoscaler not getting correct CPU metrics General Discussions	0	640	March 9, 2019
Optimize HPA in my cluster General Discussions	0	443	April 28, 2022
AutoScaling ReplicaSet General Discussions	0	467	September 21, 2020
HPA based on readiness probe? General Discussions	2	2155	December 6, 2019
Cluster Autoscaler and Horizontal Pod Autoscaler working together General Discussions	0	706	March 2, 2021

Is HPA thrashing a real problem?

Cluster information:

Related topics