Does a second pod schedule based on requested CPU or actual available CPU?

Hi,

Simple scenario I want to confirm.

Cluster information:

  • Kubernetes version: 1.35 (EKS managed)

  • Cloud: AWS EKS

  • Installation method: AWS Managed EKS

  • Host OS: Amazon Linux

  • CNI: Amazon VPC CNI (latest managed version)

  • CRI: containerd (EKS default)

Setup:

  • Node: t3.large (2 vCPU, ~1.9 vCPU allocatable after OS/kubelet reservation)

  • Pod resource config:

yaml

resources:
  requests:
    cpu: "1000m"
    memory: "2Gi"
  limits:
    cpu: "1500m"
    memory: "3Gi"

Scenario:

Pod 1 starts on the node. It has 1000m requested, but actual usage is very low — say 50m at idle. So realistically the node still has ~1.85 vCPU free in actual usage.

Now Pod 2 (Same server just replicas) needs to be scheduled. It also requests 1000m.

From the scheduler’s perspective — does it look at:

  • (A) Actual current CPU usage (~0.05 vCPU used → ~1.85 free) → Pod 2 fits on same node, or

  • (B) Requested/reserved CPU (~1.0 vCPU reserved by Pod 1 → only ~0.9 remaining) → Pod 2 goes Pending, Cluster Autoscaler provisions a new node and new pod fit on second node

My assumption is (B) — the scheduler always goes by requests, never actual usage. So even though the node is nearly idle, Pod 2 forces a new node.

Is this correct? And if so, is the standard recommendation to set requests closer to real baseline usage to allow better bin-packing?

Thanks

Hello Parth,

Scheduler behaviour:

The scheduler has a two-step process to select the right node for your workloads.

  • Filtering: The filtering step finds the set of Nodes where it’s feasible to schedule the Pod.
  • Scoring: The scheduler ranks the remaining nodes to choose the most suitable Pod placement

If the resource request by a specific workload is such that it does not leave room for other resources to be scheduled on that node, the scheduler will not consider that node in the filtering step, let alone scoring that node.

Regardless of the compute/node resources available, the scheduler will not schedule the workload, as there is a specific mem/cpu requested and the node will need to allocate that to the workload as and when requested.

Check your node resource allocation

kubectl describe node | grep -i allocated -A7

If you have cluster autoscaler It is designed to spin up new nodes or consolidate existing workloads.

Unless you set config.kubernetes.io/local-config: "false" The cluster autoscaler will evict workloads with lower resource usage to scale-down the node as part of consolidation. Regardless of the resource request that the workload is requesting, if it is not using the asked CPU/MEM, it will be killed by the cluster autoscaler in effect to be scheduled by schedule once more.

In your case:

Now, you can either decrease the requested resources by your workload to allow other pods to be scheduled on the same node, allowing you to use the available resources and save costs.

Or, you can increase the size of your node to facilitate other workload items.

Consider: kubeReserved, systemReserved & Eviction Threshold

Example:

  • Node has 32Gi of memory, 16 CPUs and 100Gi of Storage

  • kubeReserved is set to {cpu: 1000m, memory: 2Gi, ephemeral-storage: 1Gi}

  • systemReserved is set to {cpu: 500m, memory: 1Gi, ephemeral-storage: 1Gi}

  • evictionHard is set to {memory.available: "<500Mi", nodefs.available: "<10%"}

Under this scenario, ‘Allocatable’ will be 14.5 CPUs, 28.5Gi of memory, and 88Gi of local storage. Scheduler ensures that the total memory requests across all pods on this node does not exceed 28.5Gi and storage doesn’t exceed 88Gi.

I hope this gives you some insight and clarity.

1 Like

Thank you @ImMnan

This information is very useful for efficiently managing and optimizing node utilization.