0/1 nodes are available: 1 Insufficient nvidia.com/gpu

Todoroki02 · August 4, 2023, 5:51am

The pod that i created is in a pending state is showing this error:

root@ttogpu:~# kubectl describe pod triton-inference-server-5b6c7f889c-f54c6 
Name:             triton-inference-server-5b6c7f889c-f54c6
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=triton-inference-server
                  pod-template-hash=5b6c7f889c
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/triton-inference-server-5b6c7f889c
Containers:
  triton-server:
    Image:       triton_server:latest
    Ports:       8000/TCP, 8001/TCP, 8002/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Limits:
      nvidia.com/gpu:  1
    Requests:
      nvidia.com/gpu:  1
    Environment:
      DP_DISABLE_HEALTHCHECKS:  xids
    Mounts:
      /models from model-repository (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sczwq (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  model-repository:
    Type:          HostPath (bare host directory volume)
    Path:          /path/to/host/model/directory
    HostPathType:
  kube-api-access-sczwq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                             nvidia.com/gpu:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  2m58s  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

Any suggestions on how to solve this error?

Yang_Dongshan · September 2, 2024, 1:09pm

I seem the same issue.

Topic		Replies	Views
How can I use nvidia gpu in kubernetes pod? General Discussions	2	3226	August 19, 2022
GPU resource limit General Discussions	1	935	October 9, 2019
0/1 nodes are available: 1 node(s) had untolerated taint General Discussions	3	65291	May 7, 2024
Getting Error while creating GPU node in GKE General Discussions	0	759	May 23, 2021
Allocatable and Capacitiy resources like nvidia.com/gpu are always same General Discussions	1	58	February 17, 2025

0/1 nodes are available: 1 Insufficient nvidia.com/gpu

Related topics