Kubernetes version: “v1.21.1+k3s-2d1556d1-dirty” and “v1.11.0+d4cacc0”
Cloud being used: bare-metal.
Installation method: ??
Host OS: Centos7 / atomic
CNI and version: ??
CRI and version: ??
I wrote a device-plugin a few years ago. It has worked well. Up to now our major responsibility was to run tenants under kubevirt. The tenant includes a resource-group id and the number of devices it requires. The plugin eventually gets an AllocateRequest from kubelet, and makes adjustments to the pcie devices that the resource-ids refer to, and produces a AllocateResponse. This works fine.
Recently we added support for starting pods that do not use a kubevirt/hypervisor. Everything works during normal run: pod deploys, we get an AllocateRequest, we do our adjustments to the pci devices, and return an AllocateResponse. Everyone is happy.
And then someone went and rebooted.
The kubevirt pods came up, and we got an AllocateRequest, and did our normal thing. All the kubevirt pods worked as expected. But the non-kubevirt pods deployed without asking for a new AllocateRequest. That was a surprise after leaning on kubevirt pods which generate AllocateRequests.
My first concern was: after reboot, will our kubevirt pods ever get assigned resource-ids by the scheduler that are presumably going to the non-kubevirt pods?
To test this, I create my non-kubevirt pod, deployed and saved the AllocateRequest resources list that our plugin reported. Then I rebooted.
After reboot, I deployed a new kubevirt pod with MAXIMUM allowed resources. The plugin got the AllocateRequest, and to my complete surprise, it skipped the resource-id’s assigned to the non-kubevirt pod.
So here are my questions.
- Where is k8s serializing the resource-ids assigned to pods?
- How can I get to the list? I’d like to see the pod name and the specific resource-ids that went to the pod.
- How come the “redeploy” of kubevirt pod results in a new AllocateRequest, but a “redeploy” of the other pod does not? Is there some magic going on in kubevirt that we are not leveraging correctly in our pod deployment that prevents our non-kubevirt pod from sending a AllocateRequest?