Kubernetes version: v1.18.3
Cloud being used: bare metal
Installation method: Rancher (RKE)
Host OS: Redhat 7
CRI and version: docker 19.3.12
I’m trying to set up eviction thresholds and resource reservations in such a way that there is always at least 1GiB of memory available.
However, in practice, this does not work at all, as the computation the kubelet does seems to be different from the computation the kernel does when it needs to determine whether or not the OOM killer needs to be invoked. E.g. when I load up my system with a bunch of pods running an artificial memory hog, I get the following report from free -m:
Total: 15866 Used: 14628 free: 161 shared: 53 buff/cache: 1077 available: 859
According to the kernel, there’s 859 MiB memory available. Yet, the kubelet does not invoke its eviction policy. In fact, I’ve been able to invoke the system OOM killer before the kubelet eviction policy was invoked, even when ramping up memory usage incredibly slowly (to allow the kubelet housekeeing control loop to sleep 10 seconds, as per its default configuration).
I’ve found this script which used to be in Kubernetes documentation and is supposed to calculate the available memory in the same way the Kubelet does. I ran it in parallel to free -m above and got the following result:
That’s almost 1000M difference!
Now, I understand the calculation was by design, but that leaves me with the obvious question: how can I reliably manage system resource usage so that the system OOM killer does not get invoked? What eviction policy can I set so the kubelet will start evicting pods when there’s less than a gigabyte of memory available?
I’ve been able to trigger the eviction mechanism by simply increasing the
eviction-hard limit to
2Gi, here too, though, the actual available memory at the time the eviction started was way, way lower than the limit set in the configuration.
Additionally, I tried the example from the out-or-resource handling doc page, which states that setting the
eviction-hard to 500 Mi and the
system-reserved at 1.5Gi would result in pods being evicted when there’s half a gig of memory left. That has the same issues as what I described above.