Handling Long running request during HPA Scale-down

I am exploring HPA using custom pod metrics.
HPA is able to scale-up and scale-down based on metrics exposed by the application.

During the scale-down triggered by HPA, the pods are terminated randomly is metrics falls below the average targetvalue.

How are long-running requests handled in the field during the scale-down?

I know there exists prestop and terminationGracePeriodSeconds, but these are values that are pre-defined.
If the long-running request exceeds the terminationGracePeriodSeconds the request gets terminated, this is what I am trying to avoid.

Is there a way for HPA to scale-down based on a different counter, something like active connections. Only when active connections reach 0, the pod is deleted.

I did find custom pod autoscaler operator custom-pod-autoscaler/example at master · jthomperoo/custom-pod-autoscaler · GitHub, not really sure if I can achieve my use case with this.

Any help/direction is highly appreciated.

I have the same problem… did you find a solution?

Hi,
Have a look at https://keda.sh/. It might be what you are looking for.
HTH