I am exploring HPA using custom pod metrics.
HPA is able to scale-up and scale-down based on metrics exposed by the application.
During the scale-down triggered by HPA, the pods are terminated randomly is metrics falls below the average targetvalue.
How are long-running requests handled in the field during the scale-down?
I know there exists prestop and terminationGracePeriodSeconds, but these are values that are pre-defined.
If the long-running request exceeds the terminationGracePeriodSeconds the request gets terminated, this is what I am trying to avoid.
Is there a way for HPA to scale-down based on a different counter, something like active connections. Only when active connections reach 0, the pod is deleted.
I did find custom pod autoscaler operator custom-pod-autoscaler/example at master · jthomperoo/custom-pod-autoscaler · GitHub, not really sure if I can achieve my use case with this.
Any help/direction is highly appreciated.