Mark pods as busy

I have an application that I have deployed on kubernetes with a replica set of 2 pods.
The application has a function that is very long and heavy.
When a user calls it, it will consume most of the pod’s resources for about an hour before it terminates.
During that time however, the function may be called again by a different user.

I would therefore like to ensure that if a user tries to call the function, the request will not be sent to a pod where the function is already running. In other words I want to be able to mark the pod where it is running as busy (but I do not what the pod deleted).

If possible I would also like to set up an autoscaler that makes sure there are always 2 ‘non-busy’ pods.

Is there any way to do this?

The simplest answer might be to write this as a custom router and scaler. I don’t know any off-the-shelf components that do exactly this.

For example, what do you want to happen if there are more calls than pods? Does it queue? Return an error? Spin up a new replica on-demand?

If you send all requests to your custom router, it can know how many and which “slots” are free and steer traffic to those. When it hits low-water-mark it can scale the backend set.

If you are blasting a lot of traffic, this is pretty bad - you now have to proxy that data, unless your app supports some sort of redirect.

You could maybe accept a request at the leaf and then fail your readiness probe, but that is async and you still need to handle losing the race and taking a second request

I have thought some more about the routing problem.
I may be able to solve it by setting up a kafka queue for the function calls and have the pods works through them one by one. That way a pod that is already running should not start working on a new call until it has completed the previous one.

But that still leaves me with the question of how to autoscale the number of pods.
What I would need is a custom metric that give a value of 0 for a pod where the function is running and a value of 1 for a pod where it is not? I would then want to set up an autoscaler with a target of keeping the sum of this metric equal to 2.

Are there any off-the-shelf components to achieve this?

Here are some links you might want to look at! :slight_smile:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

https://itnext.io/kubernetes-workers-autoscaling-based-on-rabbitmq-queue-size-cb0803193cdf

https://medium.com/faun/event-driven-autoscaling-for-kubernetes-with-kafka-keda-d68490200812

https://stackoverflow.com/questions/37377119/in-kubernetes-how-do-i-autoscale-based-on-the-size-of-a-queue