Gauging interest here!
Most existing Kubernetes schedulers (default, Volcano, YuniKorn, Kueue, etc.) are still largely hardware-agnostic. This creates inefficiencies when running AI/ML workloads on specialized accelerators like GPUs, TPUs, Trainium, or Inferentia. The result: resource contention, GPU fragmentation, and unnecessary infrastructure costs.
I’m working on a new scheduler that will:
-
Match jobs to hardware based on actual requirements (GPU memory, compute power, etc.).
-
Support multi-job sharing on the same accelerator to improve throughput.
-
Enable adaptive prioritization and preemption policies.
-
Incorporate cloud pricing models for cost-aware scheduling (spot vs on-demand).
The plan is to release this as an open-source library and contribute it back to the K8s community, with active engagement at KubeCon and beyond. The goal is to maximize accelerator efficiency while reducing costs, creating real impact for AI/ML workloads at scale.
Would love to hear thoughts from the community—what pain points do you see today with GPU/accelerator scheduling?