How to do Load Balance with GRPC connection when HPA autoscaling is enabled?

MuthukumarA · July 6, 2021, 9:09am

Hi,

Im not sure whether this is the right place to ask this. I’m new to kubernetes, i need some guidance in achieving below requirement.

In my azure cluster, im having 2 GRPC services running, 1st will be exposed to outside world using LB. When i call first one, it will then send multiple requests to 2nd GRPC service. Here i have enabled HPA with 50% CPU utilization for scaling. The process running in 2nd GRPC server is long running and memory consuming one.

After lots of request, 2nd pod is autoscaled like i expect, but all my requests are sending to initial pod only and my load is not balanced., i tried sending the request with some delay(2s) still same behavior.

When i researched about that, i found that GRPC will keep the connection alive so the requests are sending to same pod. How to overcome this?

Also, i have used linkerd still same issue. Also , is it possible to scale pod based on the requests it receives? i need to do the scaling for each request, so only one request runs at a time in a pod?

TIA

protosam · July 6, 2021, 3:55pm

You probably want to control the keep alive values.

github.com

grpc/grpc/blob/master/doc/keepalive.md

# Keepalive User Guide for gRPC Core (and dependents)

The keepalive ping is a way to check if a channel is currently working by sending HTTP2 pings over the transport. It is sent periodically, and if the ping is not acknowledged by the peer within a certain timeout period, the transport is disconnected.

This guide documents the knobs within gRPC core to control the current behavior of the keepalive ping.

The keepalive ping in core is controlled by the following channel arguments -

* **GRPC_ARG_KEEPALIVE_TIME_MS**
  * This channel argument controls the period (in milliseconds) after which a keepalive ping is sent on the transport.
* **GRPC_ARG_KEEPALIVE_TIMEOUT_MS**
  * This channel argument controls the amount of time (in milliseconds) the sender of the keepalive ping waits for an acknowledgement. If it does not receive an acknowledgment within this time, it will close the connection.
* **GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA**
  * This channel argument controls the maximum number of pings that can be sent when there is no data/header frame to be sent. gRPC Core will not continue sending pings if we run over the limit. Setting it to 0 allows sending pings without such a restriction. (Note that this is an unfortunate setting that does not agree with [A8-client-side-keepalive.md](https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md). There should ideally be no such restriction on the keepalive ping and we plan to deprecate it in the future.)
* **GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS**
  * This channel argument if set to 1 (0 : false; 1 : true), allows keepalive pings to be sent even if there are no calls in flight.

On the server-side, the following additional channel arguments need to be configured -

* **GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS**

This file has been truncated. show original

Kevin_Su · October 6, 2023, 6:30am

Is there any other way to load balance request? should I write a custom resolver?

eoriou · October 28, 2024, 12:38am

Hi,

Were you able to find a solution?

On my side, gRPC load balancing works well with a service mesh (linkerd).

But in an HPA context I encounter the same problems as those you mentioned: pods started by autoscaling do not receive any HTTP/2 traffic.

Thanks

David_Thornton · October 28, 2024, 11:54pm

do you use the automatic sidecar trick, adding the annotation to the deployment, that gets the linkerd sidecar added to any pod that comes up?
Does linkerd see or not see the new pods?

Topic		Replies	Views
Kubernetes.io Blog: gRPC Load Balancing on Kubernetes without Tears General Discussions k8s-blog	0	1224	November 10, 2018
Load Balancing not working properly General Discussions development , loadbalancer , service	11	570	June 15, 2024
Loadbalancing across pods in a service using minikube General Discussions	7	2013	August 6, 2019
All the traffic is going to only single pod General Discussions loadbalancer , service , network	1	1499	April 22, 2024
Cluster Autoscaler and Horizontal Pod Autoscaler working together General Discussions	0	682	March 2, 2021

How to do Load Balance with GRPC connection when HPA autoscaling is enabled?

Related topics