Consider GPU vRAM during scheduling

Paul2708 · September 25, 2025, 3:21pm

Hey there,

We are running an OpenShift cluster with NVIDIA GPUs. Since the GPUs don’t support MIG and we don’t have any vGPU licenses, we cannot limit the maximum vRAM usage per pod (btw, if you have any ideas to do so, please let me know!).
To counter this limitation, I would like to at least consider the vRAM during pod scheduling. I have already searched for it, but I didn’t find anything out of the box.

Therefore, it’s most suitable to me to extend the Kube scheduler by considering the vRAM utilization. The information can be retrieved by sending a request to Prometheus.
How exactly can I extend the Kube scheduler? While researching the topic, I stumbled upon several different (and possibly outdated) blog entries and solutions: Writing a Kube scheduler plugin, using a webhook, cloning the existing scheduler, and making additional changes, …

What’s the preferred way to consider the vRAM as well? I want all the default Kube scheduler mechanics, but only if there are several nodes left to schedule the pod on, I want to choose the node with the lowest vRAM utilization.

Best regards
Paul

thockin · September 25, 2025, 3:39pm

I think DRA is what you want? You can file a ResourceClaim which asks for a GPU and you can match against any properties that device exposes, including memory capacity.

Topic		Replies	Views
Custom resource limit for GPU Memory microk8s	0	764	November 24, 2023
Is available memory taken into consideration when scheduling pods? General Discussions	5	1133	May 17, 2024
Scheduling-according-to-the-available-memory-of-the-node General Discussions	5	2372	November 29, 2023
GPU resource limit General Discussions	1	976	October 9, 2019
Does a second pod schedule based on requested CPU or actual available CPU? General Discussions	2	39	April 6, 2026

Consider GPU vRAM during scheduling

Related topics