Setup
I am using microk8s 1.32.3. I have 2 nodes in my cluster. The first is my local system which is the master node and the other is a server with 2 GPUs (NVIDIA RTX 3080) that is a worker. I am deploying DeepStream instances in a Pod using the manifest shown below. For multiple deployments I am changing the names and necessary labels as app-deepstream-1
, app-deepstream-2
and so on.
apiVersion: v1
kind: Pod
metadata:
name: app-deepstream-1 # modify
labels:
name: app-deepstream-1 # modify
family: app-deepstream
spec:
restartPolicy: Always
runtimeClassName: nvidia
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: app-ai
image: 192.168.65.106:32000/nvcr.io/nvidia/deepstream
securityContext:
privileged: true
imagePullPolicy: IfNotPresent
tty: true
resources:
limits:
nvidia.com/gpu: 1
workingDir: /opt/app/ai-app-prod/
command: ["bash", "run.sh"]
volumeMounts:
- name: app-volume
mountPath: /opt/app/
volumes:
- name: app-volume
persistentVolumeClaim:
claimName: app-pvc
---
apiVersion: v1
kind: Service
metadata:
name: app-deepstream-svc-1 # modify
labels:
name: app-deepstream-svc-1 # modify
family: app-deepstream
spec:
type: NodePort
selector:
name: app-deepstream-1 # modify
ports:
- port: 9000 # ClusterIP port
targetPort: 9000 # Container port
protocol: TCP
I have enabled gpu and registry add-ons in microk8s. The node with GPU is correctly labelled and I have checked that mig capability is marked as false (this will become important later).
I needed a custom auto-scaler that scales the number of Pods up or down based on the number of streams running in an instance. I have used Python 3.10
with the kubernetes
package for this. The upscale script and the downscale script just modify the manifest template and deploy the Pod and the service in the cluster. I face no issues when I run these scripts at all. This is just a wrapper of the v1.create_namespaced_pod()
and v1.delete_namespaced_pod()
provided by the Kubernetes library in a try-except block.
Note: The DeepStream app config (present in the mount) specifies the GPU index which is set to 0 with the expectation that a single GPU will be assigned and hence visible to the container.
Problem
The Pods and respective services are deployed without any problems even when I deploy multiple Pods. Once multiple pods are deployed and in the Running
phase I checked nvidia-smi
in my GPU node. I found out that both DeepStream apps in the 2 pods that are deployed are running on the same GPU (0).
The interesting thing is when I check microk8s kubectl describe <node>
I can see that 2 GPUs have been allocated as shown below.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 365m (2%) 0 (0%)
memory 320Mi (0%) 5632Mi (4%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
nvidia.com/gpu 2 2
Now when I go inside a container and check with nvidia-smi
I see that 2 GPUs are visible as shown below.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3080 Off | 00000000:01:00.0 Off | N/A |
| 0% 34C P8 16W / 340W | 564MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3080 Off | 00000000:03:00.0 Off | N/A |
| 0% 30C P8 18W / 340W | 12MiB / 10240MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
Expectation
Each GPU should be exclusively allocated to each container that requests a GPU and it should run on that GPU only. I thought GPUs are not shared by default unless we enable MIGs or time-slicing explicitly.
- Why is this happening?
- Are there any changes to microk8s or the manifest of the Pod that might resolve this issue?
- Is this an issue with the Python Kubernetes client?
This is the first time I am using Kubernetes and microk8s so I am unsure what the root problem might be.