Handling Long running request during HPA Scale-down

apoorva_kamath · July 7, 2022, 9:16am

I am exploring HPA using custom pod metrics.
HPA is able to scale-up and scale-down based on metrics exposed by the application.

During the scale-down triggered by HPA, the pods are terminated randomly is metrics falls below the average targetvalue.

How are long-running requests handled in the field during the scale-down?

I know there exists prestop and terminationGracePeriodSeconds, but these are values that are pre-defined.
If the long-running request exceeds the terminationGracePeriodSeconds the request gets terminated, this is what I am trying to avoid.

Is there a way for HPA to scale-down based on a different counter, something like active connections. Only when active connections reach 0, the pod is deleted.

I did find custom pod autoscaler operator custom-pod-autoscaler/example at master · jthomperoo/custom-pod-autoscaler · GitHub, not really sure if I can achieve my use case with this.

Any help/direction is highly appreciated.

eperaza · October 31, 2024, 4:02am

I have the same problem… did you find a solution?

fox-md · October 31, 2024, 10:26am

Hi,
Have a look at https://keda.sh/. It might be what you are looking for.
HTH

Topic		Replies	Views
Cluster Autoscaler and Horizontal Pod Autoscaler working together General Discussions	0	710	March 2, 2021
Automatically killing pods with a abnormal(high) CPU Usage General Discussions development	0	784	October 1, 2023
How we shall prevent HPA scale-in when the external metrics is missing General Discussions	0	545	August 24, 2021
Horizontal pod autoscaler with sidecar container General Discussions	1	6055	September 17, 2019
Kubernetes pod auto scale down General Discussions	1	449	October 15, 2024

Handling Long running request during HPA Scale-down

Related topics