Time taken by EndPoint controller to update iptables

Himanshu_Kumar · November 1, 2021, 4:09pm

Hello All,

I am hosting some REST APIs in AKS and in my use case I expect burst of load like a REST endpoint getting thousands of hits (5K to 10K) in a very short span of time. I am using iptable based approach to route traffic to PODs from ClusterIP service. But in iptable based approach PODs are chosen at random unlike the user space based approach where PODs are chosen in round robin manner.

My PODs are 1Gi RAM and 1 Core large.

What I have been noticing is that not all PODs are getting even load which is fine because of random choice of PODs by ClusterIP but what is worst is that some PODs are hitting 100% resource utilization (CPU and/or RAM) and then later crashing.

I thought of using readinessProbe but my concern is that will it be an efficient solution considering my scenario is a load burst scenario. For example let’s say readinessProbe tells EndPoint controller that a particular POD is reaching resource limits, will EndPoint controller be able to update iptable fast enough that there are no undesired effects in the cluster. Specifically let’s say if there are 100 entries in the iptable at a node, in how many milliseconds or microseconds EndPoint controller will be able to update the iptable?

Is there any documentation on how much time EndPoint controller takes to update a fairly big iptable?
My cluster does not have a huge number of PODs, few large enough PODs on 4 to 5 large VMs.

I was also thinking to reduce the scan time for metric server which scans PODs every 15 seconds by default. Is there any documentation or guidance on customizing the scan interval for metrics explorer.

Thanks,
Himanshu.

Cluster information:

Kubernetes version: 1.20.9
Cloud being used: (put bare-metal if not on a public cloud) Azure
Installation method: Azure Kubernetes Cluster using Azure CLI
Host OS: Ubuntu 18.04
CNI and version: Azure CNI v1.4.14
CRI and version: containerd v1.4.9+azure

You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

thockin · November 1, 2021, 4:37pm

Bear in mind that “round robin” means nothing when you have N nodes each making
independent decisions. It devolves into random.

Is this internal traffic (from the cluster) or external (from some LB)? Is it
HTTP or something else? When you layer LBs you also need to be aware of how
those are configured. Some will re-use connections (leading to “hot” nodes),
for example.

The roundtrip will be impacted by things like probe period, Kubelet telling the
apiserver, and ultimately kube-proxy has a “max frequency” control. Worst case
should be O(seconds). The actual iptables write is proportional to how big
your table is (how many total endpoints). For 4-5 nodes this should not be a
problem.

Himanshu_Kumar · November 1, 2021, 4:57pm

This is from external LB and it is HTTP. Thanks for the pointer around hot nodes chosen by LBs.

thockin · November 1, 2021, 5:24pm

If the LB chooses a node (or pod) and then keeps the connection alive, it doesn’t matter how good the k8s balancing algorithm is - the upstream is choosing to pound on the same node until it decides to close that connection. This is a common enough problem that it keeps coming up.

Topic		Replies	Views
EndpointSliceCache Only has 1 IP Address Even Though Multiple Endpoints Are Running General Discussions	0	316	October 15, 2022
Api-server times out when inserting pods spec into etcd General Discussions	2	3770	April 7, 2022
Configuring working liveness and readiness probes for high load pods General Discussions development	2	1656	October 10, 2023
GKE ignores readiness probe from pod during high load General Discussions	4	1120	March 24, 2021
Connecting a service in one namespace to a pod in another? General Discussions	3	3194	March 27, 2020

Time taken by EndPoint controller to update iptables

Cluster information:

Related topics