Single threaded service load balancing/scaling

Hi ,

This is a general question relating to a particular service we need to scale and access in a specific way.
The service itself is single threaded in that only one connection can be estaiblished to it at a time. Specifically it is a document generation service based around OpenOffice which can only process a single request at a time.
We need to manage a “pool” of these and hand out connections to other applications as required and hopefully scale it appropriately to cater for the no of concurrent connections required.
I am new to k8S and have done some reading of the available load balancing and scaling options but I cannot seem to find any reference to load balancing/scaling single threaded applications like these ?
Do we have to build our own front end load balancer app to support this or is there a way K8S can do this for us ?
Hopefully this makes sense and someone can assist. :slight_smile:


1 Like

What is your expected behaviour in case you do not have any pods available to serve the request?
Example: 5 single threaded pods are service 1 request each. 6th request comes in, what should happen? request should get queued? How log the request should be queued for ? request should fail? will the consumer retry the request, in case of failure?

You almost certainly want your own front-end “proxy” that receives N incoming requests and allocates them to backends. You’ll have to think about connection-lifecycle management (how do you know a backend is “available”) and how you handle overload (more incoming requests than backends) and scale-out (what if your front-end crashes or needs an update) and things like that.

There is no built-in support for this - it’s too domain-specific :slight_smile:

Thanks for the responses. Very helpful.
Looks like we will be building a front end proxy to deliver our solution. :slight_smile:

You can try using Knative Serving with a concurrency of 1, you can configure it to only handle one concurrent request at a time, it comes with a queue-proxy sidecar container that will make sure the request is done before sending the next request to the user container, and scale more pods as need it based on overall traffic to the service