Hey there,
We are running an OpenShift cluster with NVIDIA GPUs. Since the GPUs don’t support MIG and we don’t have any vGPU licenses, we cannot limit the maximum vRAM usage per pod (btw, if you have any ideas to do so, please let me know!).
To counter this limitation, I would like to at least consider the vRAM during pod scheduling. I have already searched for it, but I didn’t find anything out of the box.
Therefore, it’s most suitable to me to extend the Kube scheduler by considering the vRAM utilization. The information can be retrieved by sending a request to Prometheus.
How exactly can I extend the Kube scheduler? While researching the topic, I stumbled upon several different (and possibly outdated) blog entries and solutions: Writing a Kube scheduler plugin, using a webhook, cloning the existing scheduler, and making additional changes, …
What’s the preferred way to consider the vRAM as well? I want all the default Kube scheduler mechanics, but only if there are several nodes left to schedule the pod on, I want to choose the node with the lowest vRAM utilization.
Best regards
Paul