Pods requiring resources advertised by device plugin stucks at pending state

Cluster information:

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5+k3s2", GitCommit:"724ef700bab896ff252a75e2be996d5f4ff1b842", GitTreeState:"clean", BuildDate:"2021-10-05T19:59:14Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5+k3s2", GitCommit:"724ef700bab896ff252a75e2be996d5f4ff1b842", GitTreeState:"clean", BuildDate:"2021-10-05T19:59:14Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud being used: bare-metal
Installation method: K3S
Host OS:
CNI and version:
CRI and version:

Client: Docker Engine - Community
 Version:           20.10.11
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        dea9396
 Built:             Thu Nov 18 00:37:08 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.11
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       847da18
  Built:            Thu Nov 18 00:35:16 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Hi,

I built a poc python device plugin based on the template in this repo and I was able to deploy it as a daemonset on my k3s cluster.

example-device-plugin-ds-r85dg                1/1     Running     15         19h

The plugin registration is successful and ListAndWatch is called, shown by device plugin pod logs

kubectl logs example-device-plugin-ds-nffh6 -n kube-system
DEBUG:asyncio:Using selector: EpollSelector
INFO:example-device-plugin.server:DevicePluginServicer::__init__
INFO:example-device-plugin.server:Listening to gRPC request from /var/lib/kubelet/device-plugins/example-plugin.sock
INFO:example-device-plugin.server:Registering Device Plugin to Kubelet
DEBUG:example-device-plugin.server:Trying to connect to gRPC Channel /var/lib/kubelet/device-plugins/kubelet.sock
DEBUG:example-device-plugin.server:Created gRPC Client
DEBUG:example-device-plugin.server:Trying to send request version: "v1beta1"
endpoint: "example-plugin.sock"
resource_name: "example.ai/foo"
DEBUG:example-device-plugin.server:Device plugin registered. Closing channel
INFO:example-device-plugin.server:gRPC::ListAndWatch
INFO:example-device-plugin.server:foo: 8
DEBUG:example-device-plugin.server:gRPC::ListAndWatch device:
DEBUG:example-device-plugin.server:ID: "668f39ae-4ef9-4fc6-826a-2e4af4594685"
health: "Healthy"

and from the node description I can also see that the resource is allocatable

Capacity:
  cpu:                128
  ephemeral-storage:  921378636Ki
  hugepages-1Gi:      264Gi
  memory:             1055395216Ki
  pods:               110
  example.ai/foo:   8
Allocatable:
  cpu:                128
  ephemeral-storage:  896317136398
  hugepages-1Gi:      264Gi
  memory:             778571152Ki
  pods:               110
   example.ai/foo:   8

but when I submit a job that requires 1 foo device, the pod would be scheduled successfully but the container would not start and stuck at pending state

test.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  generateName: hello-node-
spec:
  template:
    spec:
      containers:
        - name: hello-node
          image: k8s.gcr.io/echoserver:1.4
          command: ["pwd"]
          resources:
            limits:
              example.ai/foo: "1"
      restartPolicy: Never

pod would be pending

hello-node-hfvrx-2snfn   0/1     Pending     0          49s

description:

Name:           hello-node-hfvrx-2snfn
Namespace:      default
Priority:       0
Node:           host/
Labels:         controller-uid=540b61e1-894e-438a-af55-700a79dd6bfa
                job-name=hello-node-hfvrx
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  Job/hello-node-hfvrx
Containers:
  hello-node:
    Image:      k8s.gcr.io/echoserver:1.4
    Port:       <none>
    Host Port:  <none>
    Command:
      pwd
    Limits:
      example.ai/foo:  1
    Requests:
      example.ai/foo:  1
    Environment:         <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qszlx (ro)
Conditions:
  Type           Status
  PodScheduled   True 
Volumes:
  kube-api-access-qszlx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  98s   default-scheduler  Successfully assigned default/hello-node-hfvrx-2snfn to host

The same pod would run to completion fine without the example.ai/foo: 1 resource requirement. so it is not the problem with cluster setup.
kubectl describe node shows:

Taints:             <none>
Unschedulable:      false

And another interesting thing is that, after one pod that requires the resource goes in the pending state. If I submit additional jobs that DO NOT require the resources, they would also be stuck at pending state. Which led me to believe that the scheduler is somehow hung by the plugin.

And once the job is scheduled, I was expecting additional logs to show up in the device plugin pod, especially calls to the Allocate function. But the device plugin pod continues to be alive and running without showing any new logs.

Which leads me to suspect kubelet is somehow not calling the Allocate function or ran into issues. However, there is no additional info from the kubectl get events or pod description that shows any error has occurred.

when the hello pod is pending, node description shows that 1 foo device is allocated, but total allocatable remains 8

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (0%)  0 (0%)
  memory             70Mi (0%)  170Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  example.ai/foo   1          1

then I noticed, that the most current api.proto is slightly different from what’s listed in the repo with the addition of GetPreferredAllocation and the supporting request/response protobuf msg definitions.
So I tried to update the api.proto to match this and regenerated the grpc files. But then I still ran into the exact same situation.

I’m using k3s to manage my 1-node cluster and I can’t seem to find a way to enable debug kubelet logs to see where it got stuck.

The system is Ubuntu 18.04.3 LTS
and I’m using python3.7 base image in my dockerfile
for other dependencies I used pip install:

grpclib==0.4.2
grpcio==1.42.0
grpcio-tools==1.42.0
google-api-python-client==2.31.0

Any pointers or advice would be apprciated.

Thank you!