We have our website running on a AKS cluster and HPA enabled on a couple of services (frontend and backend pods), min 2 max 4, on CPU 50% average-utilization
There is plenty of CPU available, but constantly fluctuating (probably because of the crawling) forcing the HPA to scale up and down all the time. It’s an issue called thrashing.
We have been tinkering with different parameters for mitigating, but there was always a side effect that compensated the benefit (e.g. website crashes or too many nodes or…).
I wonder if anyone knows more about thrashing and if it can be really a problem or at what rate can be considered a problem ?
We don’t see any performance issues at the moment, but we are worried that we may be in a hazardous state and things might get pear shape at any moment.
I’m uploading here a snapshot of 6 hours diagram of the pods number, where
you can have an idea of the autoscaling rate
Kubernetes version: 1.21.2
Cloud being used: Azure
Installation method: Terraform
Host OS: Ubuntu