How to do Load Balance with GRPC connection when HPA autoscaling is enabled?

Hi,

Im not sure whether this is the right place to ask this. I’m new to kubernetes, i need some guidance in achieving below requirement.

In my azure cluster, im having 2 GRPC services running, 1st will be exposed to outside world using LB. When i call first one, it will then send multiple requests to 2nd GRPC service. Here i have enabled HPA with 50% CPU utilization for scaling. The process running in 2nd GRPC server is long running and memory consuming one.

After lots of request, 2nd pod is autoscaled like i expect, but all my requests are sending to initial pod only and my load is not balanced., i tried sending the request with some delay(2s) still same behavior.

When i researched about that, i found that GRPC will keep the connection alive so the requests are sending to same pod. How to overcome this?

Also, i have used linkerd still same issue. Also , is it possible to scale pod based on the requests it receives? i need to do the scaling for each request, so only one request runs at a time in a pod?

TIA

You probably want to control the keep alive values.

Is there any other way to load balance request? should I write a custom resolver?

Hi,

Were you able to find a solution?

On my side, gRPC load balancing works well with a service mesh (linkerd).

But in an HPA context I encounter the same problems as those you mentioned: pods started by autoscaling do not receive any HTTP/2 traffic.

Thanks

2 Likes

do you use the automatic sidecar trick, adding the annotation to the deployment, that gets the linkerd sidecar added to any pod that comes up?
Does linkerd see or not see the new pods?