NVIDIA_VISIBLE_DEVICES not being respected by the nvidia-container-runtime

Summary

Multiple GPUs are visible in a container despite setting limits in Kubernetes manifest

Each GPU should be exclusively allocated to each container that requests a GPU and it should run on that GPU only.

Setup

Here’s the setup in which I am getting this problem.
I am using microk8s 1.32.3. I have 2 nodes in my cluster. The first is my local system which is the master node and the other is a server with 2 GPUs (NVIDIA RTX 3080) that is a worker. I am deploying DeepStream instances in a Pod using the manifest shown below. For multiple deployments I am changing the names and necessary labels as app-deepstream-1, app-deepstream-2 and so on.

apiVersion: v1
kind: Pod
metadata:
  name: app-deepstream-1                                # modify
  labels:
    name: app-deepstream-1                              # modify
    family: app-deepstream
spec:
  restartPolicy: Always
  runtimeClassName: nvidia
  nodeSelector:
    nvidia.com/gpu.present: "true"
  containers:
    - name: app-ai
      image: 192.168.65.106:32000/nvcr.io/nvidia/deepstream
      securityContext:
        privileged: true
      imagePullPolicy: IfNotPresent
      tty: true
      resources:
        limits:
          nvidia.com/gpu: 1
      workingDir: /opt/app/ai-app-prod/
      command: ["bash", "run.sh"]
      volumeMounts:
        - name: app-volume
          mountPath: /opt/app/
  volumes:
    - name: app-volume
      persistentVolumeClaim:
        claimName: app-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: app-deepstream-svc-1                           # modify
  labels:
    name: app-deepstream-svc-1                         # modify
    family: app-deepstream
spec:
  type: NodePort
  selector:
    name: app-deepstream-1                             # modify
  ports:
    - port: 9000           # ClusterIP port
      targetPort: 9000     # Container port
      protocol: TCP

I have enabled gpu and registry add-ons in microk8s. The node with GPU is correctly labelled and I have checked that mig capability is marked as false (this will become important later).

I needed a custom auto-scaler that scales the number of Pods up or down based on the number of streams running in an instance. I have used Python 3.10 with the kubernetes package for this. The upscale script and the downscale script just modify the manifest template and deploy the Pod and the service in the cluster. I face no issues when I run these scripts at all. This is just a wrapper of the v1.create_namespaced_pod() and v1.delete_namespaced_pod() provided by the Kubernetes library in a try-except block.

Note: The DeepStream app config (present in the mount) specifies the GPU index which is set to 0 with the expectation that a single GPU will be assigned and hence visible to the container.

The Pods and respective services are deployed without any problems even when I deploy multiple Pods. Once multiple pods are deployed and in the Running phase I checked nvidia-smi in my GPU node. I found out that both DeepStream apps in the 2 pods that are deployed are running on the same GPU (0).

The interesting thing is when I check microk8s kubectl describe <node> I can see that 2 GPUs have been allocated as shown below.

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                365m (2%)   0 (0%)
  memory             320Mi (0%)  5632Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
  nvidia.com/gpu     2           2

Now when I go inside a container and check with nvidia-smi I see that 2 GPUs are visible as shown below.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   34C    P8              16W / 340W |    564MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        Off | 00000000:03:00.0 Off |                  N/A |
|  0%   30C    P8              18W / 340W |     12MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Each GPU should be exclusively allocated to each container that requests a GPU and it should run on that GPU only. I thought GPUs are not shared by default unless we enable MIGs or time-slicing explicitly.

There seems to be a communication gap between the nvidia-device-plugin and the nvidia-container runtime.

Some further details

  1. I checked whether nvidia-device-plugin was working correctly or not. To do this I checked the value of the environment variable NVIDIA_VISIBLE_DEVICES inside each container. The values are unique in each container as it should be.
nvidia-smi -L          # executed in GPU host
GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-e2a7d5f1-f4e4-9585-61ee-b64cce744228)
GPU 1: NVIDIA GeForce RTX 3080 (UUID: GPU-95f9f092-00da-bbb0-7d4e-180cd0f6d7ff)

# NVIDIA_VISIBLE_DEVICES in each container
app-deepstream-1: GPU-e2a7d5f1-f4e4-9585-61ee-b64cce744228
app-deepstream-2: GPU-95f9f092-00da-bbb0-7d4e-180cd0f6d7ff

Despite this, when I do nvidia-smi I can see both the GPUs in both containers which I don’t believe should be happening.

  1. I was using the nvidia-runtime-container set up by the add-on. Since, I have nvidia-container-runtime in my host I thought maybe there is some interference. Hence, I switched to the host runtime using the steps mentioned here. This did not solve the issue. I have attached the containerd.toml and containerd-template.toml for reference (using .txt extension as .toml extension is not allowed)

containerd.txt
containerd-template.txt

  1. Finally, I came across these settings recommended by the official NVIDIA documentation of the gpu operator. I injected the options during install. The validator got stuck in init:3/4 state and hence, I could not verify whether this solved the issue.

  2. I also added a RuntimeClass to the manifest to select the nvidia runtime. This too did not resolve the issue.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia  # <-- This name MUST match what you put in runtimeClassName
handler: nvidia  # <-- This name MUST match a runtime defined in your containerd-template.toml
...
...
...

Inspect tarball (post deployment of 2 Pods):

inspection-report-20250625_105200.tar.gz