Custom resource limit for GPU Memory

jjg · November 24, 2023, 12:00am

I have 2 nodes, each with 2 GPU (cuda_0 and cuda_1). I would like to schedule pods on a node only if it has sufficient GPU memory available.

So I set a customer resource limit for each GPU on each node:
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_1=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_1=47000

I specify how much of each of these resources is needed per pod:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: StatefulSet
metadata:
  name: transcribe-worker-statefulset-name
spec:
  podManagementPolicy: Parallel
  replicas: 20
  selector:
    matchLabels:
      app: transcribe-worker-pod # has to match .spec.template.metadata.labels below
  serviceName: transcribe-worker-service # needed for service to assign dns entries for each pod
  template:
    metadata:
      labels:
        app: transcribe-worker-pod # has to match .spec.selector.matchLabels above
    spec:
      containers:
        - image: localhost:32000/transcribe_worker_health_monitor:2022-12-03-m
          name: transcribe-worker-health-monitor
          ports:
            - containerPort: 8080
          livenessProbe:
                httpGet:
                  path: '/health-of-health-monitor'
                  port: 8080
                initialDelaySeconds: 300
                periodSeconds: 15
                failureThreshold: 3
                timeoutSeconds: 10

        - image: localhost:32000/transcribe_worker:2023-07-18-b
          name: transcribe-worker-container # container name inside of the pod
          ports:
            - containerPort: 55001
              name:  name-b
          livenessProbe:
                httpGet:
                  path: '/health-of-transcriber'
                  port: 8080
                initialDelaySeconds: 300
                periodSeconds: 15
                failureThreshold: 3
                timeoutSeconds: 10

          env:
            - name: DEVICE
              value: "cuda:0" #"cuda:1" 

          resources: 
            requests:
               cuda_0: 2100
            limits:
               cuda_0: 2100

apply the YAML configuration, and nothing gets scheduled.

delete the resource specification and these pods get launched across both nodes but without regards if it fits or not.

What am I missing?
any help is appreciated.

Topic		Replies	Views
GPU resource limit General Discussions	1	952	October 9, 2019
Limit GPUs on Node available to Pods microk8s	0	746	July 28, 2019
GKE - More than enough host resource capacity available and pods becoming unschedulable General Discussions	5	2945	April 8, 2019
Setting namespace maximum CPU usage General Discussions	2	1107	December 23, 2020
Node CPU limit is smaller than CPU allocatable General Discussions	1	1203	October 27, 2020

Custom resource limit for GPU Memory

Related topics