Rescheduling pod after scale up

I have a question, I hope you can help me
We have multiple deployments of the same JAVA based application with two pods distributed in our two nodes AKS, both at 90% of RAM (Request limit), to improve performance, I add a third node.
No pod of the deployments is rescheduled to the new node, they stays in the old two nodes and when we have load peaks, the service is affected, since during a very short period, there are peaks of 95% of RAM in the nodes and we suspect that is affecting us
What should I do, so that without redeploying the deployment, kubernetes reschedule some of the pods to the new node that has no load?
We have this RAM limit in the pods
resources:
limits:
memory: 1024Mi
requests:
memory: 1024Mi

Thank you very much for the answers

Cluster information:

Kubernetes version: 1.21.2
Cloud being used: Azure
Host OS: Linux

You could use Node selector/affinity to schedule the pods on the new node. In addition, you can drain the nodes with high RAM.

Hi ga, thanks for the answer

I know about the affinity and selector, but i want something automatic.

I mean, is not posible to kube scheduler, reschedule pod if the nodes are stressed?

I was looking for information but i didnt find anything

the kubelet (on the node) will evict pods if:

  1. the node is scarce in memory and needs to free up memory
  2. a kubectl drain was issues.

as @ga2202 mentioned - you can use pod antiAffinity to make sure two pods will not deploy on the same node - but the only way to move a pod from one node to another is to kill the old pod and redeploy a new pod instead (on a new node if configured).

so when you mention rescheduling, you can do that (any change to the deployment yaml will cause a redeploy of the pods) but it will kill the pods and redeploy them.

I do not know the exact architecture you are implementing but memory consumption does not indicate stress (as opposed to cpu which can throttle) the worst that can happen memory wise is that your pod will be oom killed if it exceeds its memory limit.

2 Likes

Hi theog, first of all, thanks you for the answer

In short, it is not possible for kubernetes to move the load of pods already deployed, as long as the node is working properly
Even if one node has 90% ram and another 30%
I supose at the end, i have to redeploy the app so the resources will be balanced

redeploy will not redistribute the pods necessarily, if the node has enough memory un-requested - the pod might deploy on the same node again, you can drain the node and that will evict the pod (move it to a different node).

I am trying to understand what does stress as you mentioned above means

The Kubernetes scheduler only checks the node’s load (among other things) before it schedules the pod on the more suitable node.

Once the pod is scheduled in one node, for the duration of its lifetime, it is bounded to that node (see Pod Lifecycle):

Pods are only scheduled once in their lifetime. Once a Pod is scheduled (assigned) to a Node, the Pod runs on that Node until it stops or is terminated.

So the answer to your question is “no”, as others had already mentioned: the pod will not be re-scheduled to any other node.

Maybe you would like to consider using an horizontal pod autoscaler (see Horizontal Pod Autoscaling) to increase the number of pods in a deployment if the CPU or memory usage goes over a threshold. So, if your application is heavyly used, new pods of your application will be created to share the load (assuming you have available resources on the existing nodes or you manually add new nodes to your cluster for new pods to be created).

So, in your scenario, when your application is stressed, the HPA will try to create a new pod; the scheduler will check if any of the nodes have at least your pod’s requested memory available (1024Mi). If a third node is available, assuming that the current nodes has a RAM utilization of 95% and they will not have 1024Mib avaliable to host the new pod, the scheduler will deploy a new pod on the third (empty) node. This will increase the number of pods from 2 to 3, resulting in one pod of your application on each node (it will not move one pod from from, say node 2 to node 3). This will distribute your application’s load between 3 pods, instead of 2 and will decrease the load on your application’s pods.

If the load then goes down, the HPA can also reduce the number of pods in your deployment.

Using the cluster autoscaler (see Automatically scale a cluster to meet application demands on Azure Kubernetes Service (AKS)), you can increase the number of nodes in your cluster automatically when they are stressed.

So, using a combination of an HPA for your application and the cluster autoscaler, you can have a fully “elastic” solution for both your app and your cluster.

2 Likes

Thank you all for the answers, we will work on the horizontal scaling to unload the saturated nodes and we will redeploy so that the scheduler reevaluates the nodes.

1 Like