Asking for help? Comment out what you need so we can get more information to help you!
Cluster information:
Kubernetes version: 1.24.2
Cloud being used: Google GKE in Singapore (asia-southeast-1)
Installation method: Created with GKE web console
Host OS: GKE + cos_containerd arm64 nodes
CNI and version: GKE v1.24.2-gke.1900
CRI and version: GKE v1.24.2-gke.1900
Hey there.
I created a brand new cluster on GKE with the new T2A arm64 instances and noticed that the pods I created are stuck in a pending state.
I had a look for the nodes’ information and found out the new nodes got kubernetes.io/arch=arm64:NoSchedule
taint.
Name: gke-arm64-node-test-default-pool-e01e1958-l2mr
Roles: <none>
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/instance-type=t2a-standard-8
beta.kubernetes.io/os=linux
cloud.google.com/gke-boot-disk=pd-balanced
cloud.google.com/gke-container-runtime=containerd
cloud.google.com/gke-cpu-scaling-level=8
cloud.google.com/gke-logging-variant=DEFAULT
cloud.google.com/gke-max-pods-per-node=110
cloud.google.com/gke-nodepool=default-pool
cloud.google.com/gke-os-distribution=cos
cloud.google.com/gke-spot=true
cloud.google.com/machine-family=t2a
cloud.google.com/private-node=false
failure-domain.beta.kubernetes.io/region=asia-southeast1
failure-domain.beta.kubernetes.io/zone=asia-southeast1-c
kubernetes.io/arch=arm64
kubernetes.io/hostname=gke-arm64-node-test-default-pool-e01e1958-l2mr
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t2a-standard-8
topology.gke.io/zone=asia-southeast1-c
topology.kubernetes.io/region=asia-southeast1
topology.kubernetes.io/zone=asia-southeast1-c
Annotations: container.googleapis.com/instance_id: 3431168218254450398
csi.volume.kubernetes.io/nodeid:
{"pd.csi.storage.gke.io":"projects/<PROJECT_NAME>/zones/asia-southeast1-c/instances/gke-arm64-node-test-default-pool-e01e1958-l2mr"}
node.alpha.kubernetes.io/ttl: 0
node.gke.io/last-applied-node-labels:
cloud.google.com/gke-boot-disk=pd-balanced,cloud.google.com/gke-container-runtime=containerd,cloud.google.com/gke-cpu-scaling-level=8,clou...
node.gke.io/last-applied-node-taints: kubernetes.io/arch=arm64:NoSchedule
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 25 Aug 2022 16:12:13 +0900
Taints: kubernetes.io/arch=arm64:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: gke-arm64-node-test-default-pool-e01e1958-l2mr
AcquireTime: <unset>
RenewTime: Thu, 25 Aug 2022 16:21:35 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
FrequentDockerRestart False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 NoFrequentDockerRestart docker is functioning properly
FrequentContainerdRestart False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 NoFrequentContainerdRestart containerd is functioning properly
FrequentUnregisterNetDevice False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 NoFrequentUnregisterNetDevice node is functioning properly
KernelDeadlock False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 KernelHasNoDeadlock kernel has no deadlock
ReadonlyFilesystem False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 FilesystemIsNotReadOnly Filesystem is not read-only
CorruptDockerOverlay2 False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 NoCorruptDockerOverlay2 docker overlay2 is functioning properly
FrequentKubeletRestart False Thu, 25 Aug 2022 16:17:16 +0900 Thu, 25 Aug 2022 16:12:15 +0900 NoFrequentKubeletRestart kubelet is functioning properly
NetworkUnavailable False Thu, 25 Aug 2022 16:12:14 +0900 Thu, 25 Aug 2022 16:12:14 +0900 RouteCreated NodeController create implicit route
MemoryPressure False Thu, 25 Aug 2022 16:18:21 +0900 Thu, 25 Aug 2022 16:10:14 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 25 Aug 2022 16:18:21 +0900 Thu, 25 Aug 2022 16:10:14 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 25 Aug 2022 16:18:21 +0900 Thu, 25 Aug 2022 16:10:14 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 25 Aug 2022 16:18:21 +0900 Thu, 25 Aug 2022 16:12:14 +0900 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.148.0.6
ExternalIP: 34.142.163.28
InternalDNS: gke-arm64-node-test-default-pool-e01e1958-l2mr.c.<PROJECT_NAME>.internal
Hostname: gke-arm64-node-test-default-pool-e01e1958-l2mr.c.<PROJECT_NAME>.internal
Capacity:
attachable-volumes-gce-pd: 127
cpu: 8
ephemeral-storage: 21389116Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 32823604Ki
pods: 110
Allocatable:
attachable-volumes-gce-pd: 127
cpu: 7910m
ephemeral-storage: 6827307385
hugepages-1Gi: 0
hugepages-2Mi: 0
hugepages-32Mi: 0
hugepages-64Ki: 0
memory: 29028660Ki
pods: 110
System Info:
Machine ID: bb504e228285f65771e9d5f3cf53f85b
System UUID: bb504e22-8285-f657-71e9-d5f3cf53f85b
Boot ID: 31d735ed-5e7e-4cea-a2da-710b2a14117d
Kernel Version: 5.10.107+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: arm64
Container Runtime Version: containerd://1.6.2
Kubelet Version: v1.24.2-gke.1900
Kube-Proxy Version: v1.24.2-gke.1900
PodCIDR: 10.76.0.0/24
PodCIDRs: 10.76.0.0/24
ProviderID: gce://<PROJECT_NAME>/asia-southeast1-c/gke-arm64-node-test-default-pool-e01e1958-l2mr
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system event-exporter-gke-857959888b-n4pb9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 10m
kube-system fluentbit-gke-ff4n2 100m (1%) 0 (0%) 200Mi (0%) 500Mi (1%) 9m32s
kube-system gke-metrics-agent-k5v9b 8m (0%) 0 (0%) 110Mi (0%) 110Mi (0%) 9m32s
kube-system konnectivity-agent-7bb5998b8-wg9hw 10m (0%) 0 (0%) 30Mi (0%) 125Mi (0%) 10m
kube-system konnectivity-agent-autoscaler-7b4cb89b88-2n6rf 10m (0%) 0 (0%) 10M (0%) 0 (0%) 10m
kube-system kube-dns-74bbfc7776-hhpw6 260m (3%) 0 (0%) 110Mi (0%) 210Mi (0%) 9m8s
kube-system kube-dns-autoscaler-9f89698b6-j2749 20m (0%) 0 (0%) 10Mi (0%) 0 (0%) 10m
kube-system kube-proxy-gke-arm64-node-test-default-pool-e01e1958-l2mr 100m (1%) 0 (0%) 0 (0%) 0 (0%) 9m
kube-system l7-default-backend-58fd4695c8-cghsl 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 9m59s
kube-system pdcsi-node-42zkx 10m (0%) 0 (0%) 20Mi (0%) 100Mi (0%) 9m32s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 528m (6%) 0 (0%)
memory 534288000 (1%) 1045Mi (3%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
hugepages-32Mi 0 (0%) 0 (0%)
hugepages-64Ki 0 (0%) 0 (0%)
attachable-volumes-gce-pd 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 9m30s kube-proxy
Warning InvalidDiskCapacity 11m kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 11m kubelet Updated Node Allocatable limit across pods
Normal Starting 11m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 10m (x7 over 11m) kubelet Node gke-arm64-node-test-default-pool-e01e1958-l2mr status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 10m (x7 over 11m) kubelet Node gke-arm64-node-test-default-pool-e01e1958-l2mr status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 10m (x7 over 11m) kubelet Node gke-arm64-node-test-default-pool-e01e1958-l2mr status is now: NodeHasSufficientPID
Normal NodeReady 9m31s kubelet Node gke-arm64-node-test-default-pool-e01e1958-l2mr status is now: NodeReady
Warning NodeRegistrationCheckerStart 9m30s node-registration-checker-monitor Thu Aug 25 07:10:14 UTC 2022 - ** Starting Node Registration Checker **
Warning ContainerdStart 9m30s (x2 over 9m30s) systemd-monitor Starting containerd container runtime...
Warning DockerStart 9m30s (x3 over 9m30s) systemd-monitor Starting Docker Application Container Engine...
Warning KubeletStart 9m30s systemd-monitor Started Kubernetes kubelet.
Warning NodeRegistrationCheckerDidNotRunChecks 4m31s node-registration-checker-monitor Thu Aug 25 07:17:14 UTC 2022 - ** Node ready and registered. **
The taint was preventing the pods to get provisioned and stuck there forever.
I was able to run pods right after I dropped the taint myself.
Is there a reason why it got a NoSchedule
by default?
If it’s an unexpected bug, this better gets fixed to cause no confusion for users.
Thank you.