Surge during startup of pods crashing nodes in EKS 1.12

aaronmell · September 19, 2019, 8:01pm

Trying to gain some additional perspective here.

We originally did not set limits on our workloads. We’ve had instances were during startup, the node would hang at 100% CPU indefinitely.

Our first mistake was not setting kube-reserved and system-reserved, which we have corrected. However even with those limits set, to reasonable limits of 100m for both kube and system reserved, I am still able to reproduce the issue, by scaling a bunch of pods immediately.

I’ve now taken the approach of setting requests = limits, but I’m having a hard time deciding if this is the right approach, or if there is something else I’m missing.

Topic		Replies	Views
GKE - More than enough host resource capacity available and pods becoming unschedulable General Discussions	5	2939	April 8, 2019
Why a limit of 110 pods by worker? General Discussions	1	900	May 16, 2019
Start regular pods after all daemonset pods are Running General Discussions	6	5567	November 16, 2018
Node "not ready" state when sum of all running pods exceed node capacity General Discussions	13	8101	September 29, 2019
Question about settings CPU and Mem requests and limits General Discussions	1	601	September 20, 2020

Surge during startup of pods crashing nodes in EKS 1.12

Related topics