Custom resource limit for GPU Memory

I have 2 nodes, each with 2 GPU (cuda_0 and cuda_1). I would like to schedule pods on a node only if it has sufficient GPU memory available.

So I set a customer resource limit for each GPU on each node:
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_1=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_1=47000

I specify how much of each of these resources is needed per pod:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: StatefulSet
metadata:
  name: transcribe-worker-statefulset-name
spec:
  podManagementPolicy: Parallel
  replicas: 20
  selector:
    matchLabels:
      app: transcribe-worker-pod # has to match .spec.template.metadata.labels below
  serviceName: transcribe-worker-service # needed for service to assign dns entries for each pod
  template:
    metadata:
      labels:
        app: transcribe-worker-pod # has to match .spec.selector.matchLabels above
    spec:
      containers:
        - image: localhost:32000/transcribe_worker_health_monitor:2022-12-03-m
          name: transcribe-worker-health-monitor
          ports:
            - containerPort: 8080
          livenessProbe:
                httpGet:
                  path: '/health-of-health-monitor'
                  port: 8080
                initialDelaySeconds: 300
                periodSeconds: 15
                failureThreshold: 3
                timeoutSeconds: 10

        - image: localhost:32000/transcribe_worker:2023-07-18-b
          name: transcribe-worker-container # container name inside of the pod
          ports:
            - containerPort: 55001
              name:  name-b
          livenessProbe:
                httpGet:
                  path: '/health-of-transcriber'
                  port: 8080
                initialDelaySeconds: 300
                periodSeconds: 15
                failureThreshold: 3
                timeoutSeconds: 10

          env:
            - name: DEVICE
              value: "cuda:0" #"cuda:1" 

          resources: 
            requests:
               cuda_0: 2100
            limits:
               cuda_0: 2100
 

apply the YAML configuration, and nothing gets scheduled.

delete the resource specification and these pods get launched across both nodes but without regards if it fits or not.

What am I missing?
any help is appreciated.