I have 2 nodes, each with 2 GPU (cuda_0 and cuda_1). I would like to schedule pods on a node only if it has sufficient GPU memory available.
So I set a customer resource limit for each GPU on each node:
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node webserver1 cluster-autoscaler.kubernetes.io/resource.cuda_1=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_0=47000
k annotate node john-development cluster-autoscaler.kubernetes.io/resource.cuda_1=47000
I specify how much of each of these resources is needed per pod:
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: StatefulSet
metadata:
name: transcribe-worker-statefulset-name
spec:
podManagementPolicy: Parallel
replicas: 20
selector:
matchLabels:
app: transcribe-worker-pod # has to match .spec.template.metadata.labels below
serviceName: transcribe-worker-service # needed for service to assign dns entries for each pod
template:
metadata:
labels:
app: transcribe-worker-pod # has to match .spec.selector.matchLabels above
spec:
containers:
- image: localhost:32000/transcribe_worker_health_monitor:2022-12-03-m
name: transcribe-worker-health-monitor
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: '/health-of-health-monitor'
port: 8080
initialDelaySeconds: 300
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 10
- image: localhost:32000/transcribe_worker:2023-07-18-b
name: transcribe-worker-container # container name inside of the pod
ports:
- containerPort: 55001
name: name-b
livenessProbe:
httpGet:
path: '/health-of-transcriber'
port: 8080
initialDelaySeconds: 300
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 10
env:
- name: DEVICE
value: "cuda:0" #"cuda:1"
resources:
requests:
cuda_0: 2100
limits:
cuda_0: 2100
apply the YAML configuration, and nothing gets scheduled.
delete the resource specification and these pods get launched across both nodes but without regards if it fits or not.
What am I missing?
any help is appreciated.