Hi!
I’m new here so bear with me please.
I’m having issues with pods taking in excess of 10 minutes between node allocation and pvc mounting as the pod count on a single node increases. CPU\Memory on the node is not an issue, this is an 80 core 600 GB node and most pods are idle. My suspicion is it’s related to mount count. (mount | wc -l = 2700+) If someone has a direct answer that would be amazing, but my real question is where do I go next to try and diagnose this issue? I see a 2 minute gap between “SyncLoop Add” and the first mount error. A 6 minute gap before the mount appears to even be started? What in the world is happening in that 2 minutes? These are both “local” mounts to the cluster?
The following is an export from my kubelet logs:
Jan 20 18:35:51 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:35:51.322846 18395 kubelet.go:1932] "SyncLoop ADD" source="api" pods=[default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s]
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379648 18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379699 18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313649 18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313694 18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237083 18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237204 18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-root\" (UniqueName: \"kubernetes.io/host-path/557358f5-19f4-4f7d-94fe-eb7bc6155040-host-root\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.340402 18395 reconciler.go:269] "operationExecutor.MountVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Cluster information:
Kubernetes version: 1.21.2
Cloud being used: AKS
Installation method: Bicep template
Host OS:
CNI and version: Azure CNI? (I don’t know how to answer these 2 questions)
CRI and version: ContainerD?