I have been playing around with kubernetes hpa lately and have a use case I could not find a solution for.
One of my services can process only one task at a time, once it start processing some data it will not take any other requests. I can start as many of this service as needed. Currently we start a static amount of replicas for it but I am looking for a way to have kubernetes scale it automatically as it runs out of available instances.
Problem is that I cannot use default cpu metric to determine if a pod is busy or not as it might not have sufficient load to trigger the autoscale, depending on the data it is processing.
One idea I had was to use the readiness probe. I have a command I can run in each pod to check if it is busy or not. However, I could not find a way to tell the hpa to use only this information to start additional replicas? For example: start 40 to 120 replicas, start to scale when available pods (readiness probe returning true) gets below 10% ?