I have 5 nodes running in k8s cluster and with around 30 pods.
Some of the pods usually take high memory. At one stage we found a node went to “not ready” state when the sum of memory of all running pods exceeded node memory. Anyhow, I increased the resource request memory to high value for high memory pods but shouldn’t node controller kill all the pods and restarts all instead of making a node to “not ready” state?
Suppose 4 pods were already running in a node and scheduler allowed another pod to get added in that node as resource request memory is within the node left memory capacity. Now over a period of time due to some reason all pods memory started increasing and although each pod memory is still under the individual resource memory limit value but sum of all pods memory exceeds the node memory and this causes the node to “not ready” state.
Is there any way to overcome this situation?
Due to this all pods get shifted to other node or some pods to pending as it has higher resource request value.
Please help me how to handle this.
Kubernetes version: 1.10.6
Cloud being used: (put bare-metal if not on a public cloud) AWS
CNI and version:
CRI and version: