Pod startup time increases as pod count on node increases

Hi!

I’m new here so bear with me please.

I’m having issues with pods taking in excess of 10 minutes between node allocation and pvc mounting as the pod count on a single node increases. CPU\Memory on the node is not an issue, this is an 80 core 600 GB node and most pods are idle. My suspicion is it’s related to mount count. (mount | wc -l = 2700+) If someone has a direct answer that would be amazing, but my real question is where do I go next to try and diagnose this issue? I see a 2 minute gap between “SyncLoop Add” and the first mount error. A 6 minute gap before the mount appears to even be started? What in the world is happening in that 2 minutes? These are both “local” mounts to the cluster?

The following is an export from my kubelet logs:

Jan 20 18:35:51 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:35:51.322846   18395 kubelet.go:1932] "SyncLoop ADD" source="api" pods=[default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s]
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379648   18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379699   18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313649   18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313694   18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237083   18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237204   18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-root\" (UniqueName: \"kubernetes.io/host-path/557358f5-19f4-4f7d-94fe-eb7bc6155040-host-root\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.340402   18395 reconciler.go:269] "operationExecutor.MountVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "

Cluster information:

Kubernetes version: 1.21.2
Cloud being used: AKS
Installation method: Bicep template
Host OS:
CNI and version: Azure CNI? (I don’t know how to answer these 2 questions)
CRI and version: ContainerD?

Kernel wise there should be a problem supporting this number (I think it’s max 100k mounts).

With that said a few questions would help in understanding what causes the delays:

  1. Are you using a storage class or azure CSI driver?
  2. If using azure CSI are you using azure disk or azure file pvs?(Enable Container Storage Interface (CSI) drivers on Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Docs)
  3. Can you paste the output of
journalctl -f

While the pods are redeploying it should expose a bit more than the kubelet regarding the mount process

Thanks for taking the time to respond!

  1. The 80 pods are using NFS mounts with no storage class on most of the mounts along with an emptydir mount. This example pod in the log is just using a host mount and a configmap. No azure files or azure disks. Again the mount count is just a suspicion of mine. I’m an expert beginner to be sure.
  2. Neither
  3. Here is a link a gist with the journalctl log for a simple pod (node-debugger-aks-e80ids-62052009-vmss00000f-qj4cd) with a host mount and config map. There are a boatload of CSI registration probes running. I can filter them out if you would like: journalctl gist

from your logs:

Jan 21 04:15:07 aks-e80ids-62052009-vmss00000F containerd[21328]: time="2022-01-21T04:15:07.958743390Z" level=info msg="ExecSync for \"1f35f9d1be57b4d81665ca9b3940f260619e2973b1ef6b65f65e0b812562a2d5\" with command [/csi-node-driver-registrar --kubelet-registration-path=/var/lib/kubelet/plugins/disk.csi.azure.com/csi.sock --mode=kubelet-registration-probe] and timeout 30 (s)"

  1. you are using azure disk (/var/lib/kubelet/plugins/disk.csi.azure.com) :wink:
  2. I would go through this maybe your cap is the IOPS, from the logs it seems the CSI is timing out.

The csi driver is installed but I’m not actually using any volumes that use the drivers.

Ok. If you perform a manual mount during this “loaded”. State… How long does it take?