I have an application that I have deployed on kubernetes with a replica set of 2 pods.
The application has a function that is very long and heavy.
When a user calls it, it will consume most of the pod’s resources for about an hour before it terminates.
During that time however, the function may be called again by a different user.
I would therefore like to ensure that if a user tries to call the function, the request will not be sent to a pod where the function is already running. In other words I want to be able to mark the pod where it is running as busy (but I do not what the pod deleted).
If possible I would also like to set up an autoscaler that makes sure there are always 2 ‘non-busy’ pods.
The simplest answer might be to write this as a custom router and scaler. I don’t know any off-the-shelf components that do exactly this.
For example, what do you want to happen if there are more calls than pods? Does it queue? Return an error? Spin up a new replica on-demand?
If you send all requests to your custom router, it can know how many and which “slots” are free and steer traffic to those. When it hits low-water-mark it can scale the backend set.
If you are blasting a lot of traffic, this is pretty bad - you now have to proxy that data, unless your app supports some sort of redirect.
You could maybe accept a request at the leaf and then fail your readiness probe, but that is async and you still need to handle losing the race and taking a second request
I have thought some more about the routing problem.
I may be able to solve it by setting up a kafka queue for the function calls and have the pods works through them one by one. That way a pod that is already running should not start working on a new call until it has completed the previous one.
But that still leaves me with the question of how to autoscale the number of pods.
What I would need is a custom metric that give a value of 0 for a pod where the function is running and a value of 1 for a pod where it is not? I would then want to set up an autoscaler with a target of keeping the sum of this metric equal to 2.
Are there any off-the-shelf components to achieve this?