Hi,
Simple scenario I want to confirm.
Cluster information:
-
Kubernetes version:
1.35(EKS managed) -
Cloud: AWS EKS
-
Installation method: AWS Managed EKS
-
Host OS: Amazon Linux
-
CNI: Amazon VPC CNI (latest managed version)
-
CRI: containerd (EKS default)
Setup:
-
Node:
t3.large(2 vCPU, ~1.9 vCPU allocatable after OS/kubelet reservation) -
Pod resource config:
yaml
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "1500m"
memory: "3Gi"
Scenario:
Pod 1 starts on the node. It has 1000m requested, but actual usage is very low — say 50m at idle. So realistically the node still has ~1.85 vCPU free in actual usage.
Now Pod 2 (Same server just replicas) needs to be scheduled. It also requests 1000m.
From the scheduler’s perspective — does it look at:
-
(A) Actual current CPU usage (~0.05 vCPU used → ~1.85 free) → Pod 2 fits on same node, or
-
(B) Requested/reserved CPU (~1.0 vCPU reserved by Pod 1 → only ~0.9 remaining) → Pod 2 goes
Pending, Cluster Autoscaler provisions a new node and new pod fit on second node
My assumption is (B) — the scheduler always goes by requests, never actual usage. So even though the node is nearly idle, Pod 2 forces a new node.
Is this correct? And if so, is the standard recommendation to set requests closer to real baseline usage to allow better bin-packing?
Thanks