Custom load balancing in Kubernetes

We are developing a simulation software which is deployed and scaled between multiple pods using kubernetes. When a user makes a simulation request, a pod is selected which starts doing the job and is considered as busy. When another user makes a simulation request, it should be routed to the next free pod. Currently, a busy pod is often selected (even though there are free ones) as kubernetes does not know which pods are busy/free.

Is it possible to balance requests in such way that a free pod is always selected? (Assuming that each app instance inside a pod exposes an HTTP endpoint which tells it’s current busy/free status)

Without knowing how the application works or communicates, I’m just treating this like a coding problem.

One option is to create an operator that can take a request and create a new pod for each request. I essentially did that for some build software I’m working on. That software just spins up a pod per a build pipeline. The endpoint.py script adds builds to a CRD and operator.py reads builds added and runs a pipeline that includes building a pod.

Another option is to use a managed field on your objects. There’s some precursory knowledge that builds up to this though.

While testing the API, I did some testing against custom objects here. What that illustrates is that if you have multiple copies of a program watching an resource, first update will win the resource. This makes it possible to claim work.

It turns out the testing I did against custom objects applies to all objects. This is important to note because that means if you can use a controller to just manage 1 field on an existing resource, like a pod, then you can just use .metadata to store state information.

Kubernetes knows what is managing a field. So you can just add stuff to .metadata (or personally I like .metadata.annotations) to track the state of your pod with a controller.