Surge during startup of pods crashing nodes in EKS 1.12

Trying to gain some additional perspective here.

We originally did not set limits on our workloads. We’ve had instances were during startup, the node would hang at 100% CPU indefinitely.

Our first mistake was not setting kube-reserved and system-reserved, which we have corrected. However even with those limits set, to reasonable limits of 100m for both kube and system reserved, I am still able to reproduce the issue, by scaling a bunch of pods immediately.

I’ve now taken the approach of setting requests = limits, but I’m having a hard time deciding if this is the right approach, or if there is something else I’m missing.