Kubernetes service load balancing

Ahmad_Al-Masry · September 2, 2021, 8:44am

Hi;

We are trying to use HPA to scale our system based on load and we noticed that HPA works correctly in creating new pods. but our main issue is that there is no fair load balancing between the pods, and the old pods handles most of the requests as in the attached image.

As specified by the cluster information below, we use kube-proxy v1.21.2-eksbuild.2 in iptables mode as it is the only supported mode by bottlerocket. we also use AWS appmesh controller v 1.4.1 with envoy images 1.19.1.0 as service mesh.

We use ClusterIP service to point to the deployment, and the VirtualService is configured to use the DNS name of the service.

Is this behavior related happening because of kube-proxy working in iptables mode? or is it because of Appmesh and envoy?

Cluster information:

Kubernetes version: v1.21.2-eks-0389ca3
Cloud being used: AWS EKS
Installation method: Managed
Host OS: Bottlerocket 1.2.0
CNI and version: AWS vpc-cni v1.9.0-eksbuild.1
CRI and version: containerd://1.4.8+bottlerocket

protosam · September 6, 2021, 1:11am

I would be aiming to force the loadbalancer to use an algorithm that targets “lease connections”. Though I’m not quite sure how to do that yet.

My best guess is to start in this documentation though:

Ahmad_Al-Masry · September 6, 2021, 2:08pm

Hi @protosam,
In our situation, IPVS mode (which is the one supports specifying least connections) for kube-proxy is not supported for Bottlerocket OS on AWS EKS. so I am stick with iptables mode that does not have that option.
I just wanted to make sure that this behavior is indeed from kupe-proxy working in iptables mode, and it has nothing related AWS AppMesh and envoy proxy sidecars. if so, we want to try eBPF with calico.

protosam · September 6, 2021, 4:56pm

It’s hard to say really, because iptables mode chooses a backend at random.

thockin · September 6, 2021, 5:11pm

Keep in mind that any kube-proxy mode is effectively random, once you have more than one client node involved - they do not coordinate their selections nor do they get load info from backends.

Also keep in mind that many clients choose to reuse connections behind the scenes, so you might THINK you are doing many transactions, but at the TCP (and thus IPVs and iptables) level, it’s one connection to one backend. Common error in load tests.

Ahmad_Al-Masry · September 13, 2021, 9:44am

Hi;
Just to provide an update. After investigation with AWS EKS support, the issue was caused because of envoy proxies of app mesh.
The issue is that envoy was caching the endpoint ip instead of caching the cluster ip of the service.
So the solution was to change the cluster to headless service and change envoy to Strict DNS for service discovery to prevent the previous behavior.
Thanks all for collaboration.

Topic		Replies	Views
Kubernetes for IoT Gateway General Discussions minikube , loadbalancer , network	0	395	October 25, 2023
Kubernetes Podcast: Envoy, with Matt Klein General Discussions podcast	0	712	December 11, 2018
Kube proxy Connection Keep Alive load balance issue remains in IPVS mode? General Discussions	1	1869	January 6, 2022
Exposing kubernetes app using AWS Elastic LoadBalancer General Discussions	4	3369	February 1, 2019
K8s on premise: expose api to public General Discussions	1	1359	March 28, 2019

Kubernetes service load balancing

Cluster information:

Related topics