Is request memory and cpu reserved by k8s pod upfront and not released even if it uses lesser than requested?

Hello All,

I have always had this confusion and it seems there are different opinions on this topic.
When we specify a request memory and cpu for a pod, kubernetes uses this to schedule the pod on a node with that much memory and cpu available on that node at that point.

Now, the question is, does this pod reserve this much requested memory and cpu upfront even if it doesn’t use that much amount at runtime?
Secondly, if used memory is lesser than requested memory, still the pod continue to reserve the requested memory for itself thus the margin between request and used memory is not made available to other pods on that node?

Example scenario: for pod A, request memory is 500MB and limit is 600MB, pod is scheduled on node with 1GB available memory. Another pod B with request memory 500MB is scheduled on same node later.
Now, pod A is using 500MB however pod B is using 300MB.
Later pod A needs to use 600MB. Now, pod B is still using 300MB out of requested 500MB. So, will pod A be allowed to use 600MB since pod B is using lesser than it requested or since pod B requested and reserved 500MB upfront, it won’t release that even if it is using only 300MB?

Cluster information:

Kubernetes version: 1.21.9
Cloud being used: (put bare-metal if not on a public cloud) : Azure AKS
Installation method: AKS managed service on Azure cloud
Host OS: RHEL
CNI and version:
CRI and version:

No, they are not reserved I’m the sense you describe. How they work is slightly different from each other.

In the case you laid out, Pod A can use the 600Mi. The problem comes when pod B eventually needs to use it and it isn’t available. The OS will try to free up memory from anywhere (pod A is strictly under its own limit, so it is not specifically victimized) and if it can’t find pages quickly enough it will OOM. So pod A caused OOM for pod B. Hence the oft repeated guidance to always set memory limit == request.

CPU is more easily reclaimed and pod A would just get less CPU time in the future, if pod B wanted to use it’s request.

Tim

Thanks @thockin for the reply… so as I understand this, when the OOM for pod B will occur, essentially the used memory on node should reflect as 100% in the kubernetes node metrics, is that correct?

More or less - that is a system-OOM (as opposed to a local cgroup OOM).