Nodes crashed. node.kubernetes.io/unreachable:NoSchedule taint

Dunge · March 20, 2023, 9:05pm

Ubuntu 20.04.5 LTS
microk8s 1.26.1 installed via snap
3 nodes HA clusters

I came back one morning and my cluster that was working fine the previous day was completely down. Unlike my previous similar experience, this time the disks still have space remaining.

microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.

microk8s start
nothing for a few minutes, then exit with no message status still say not running.

I tried to check the logs using sudo journalctl -u snap.microk8s.daemon-kubelite but there’s too much stuff, couldn’t find anything relevant.

I then rebooted the nodes (sudo reboot) and they actually came back online status Ready.

The condition look good:

Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message

NetworkUnavailable False Thu, 23 Feb 2023 04:36:12 +0000 Thu, 23 Feb 2023 04:36:12 +0000 CalicoIsUp Calico is running on this node
MemoryPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled

But there is a Taint preventing any workload to get scheduled on them:

Taints: node.kubernetes.io/unreachable:NoSchedule

Events on pods:

0/3 nodes are available: 3 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling…

Events seems to crash the kubelet service on loop:

Warning InvalidDiskCapacity 9m38s kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientPID 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 9m38s kubelet Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 9m38s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientMemory 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory
Normal Starting 9m38s kubelet Starting kubelet.
Normal Starting 7m7s kubelet Starting kubelet.
Warning InvalidDiskCapacity 7m7s kubelet invalid capacity 0 on image filesystem
Normal NodeHasNoDiskPressure 7m6s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure
Normal NodeAllocatableEnforced 7m6s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientPID 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory
Normal Starting 2m57s kubelet Starting kubelet.
Warning InvalidDiskCapacity 2m57s kubelet invalid capacity 0 on image filesystem
Normal NodeHasSufficientMemory 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m57s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m57s kubelet Updated Node Allocatable limit across pods

Any command I send to microk8s kubectl print some error message about memcache.go:

E0320 20:41:19.730002 50241 memcache.go:255] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0320 20:41:19.733484 50241 memcache.go:106] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0320 20:41:19.735017 50241 memcache.go:106] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0320 20:41:19.738322 50241 memcache.go:106] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0320 20:41:20.827654 50241 memcache.go:106] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0320 20:41:20.830214 50241 memcache.go:106] couldn’t get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Any idea what is causing this? From my point of view, nodes seems healthy, ram, cpu and disk are good, networking between them is also functional.

I uploaded the result tarball of microk8s inspect to my Google Drive:

Please help me recover my cluster!

balchua1 · March 20, 2023, 10:13pm

Can you log this in microk8s github issues?

Dunge · March 20, 2023, 10:34pm

Sure, here it is:

github.com/canonical/microk8s

Nodes crashed. node.kubernetes.io/unreachable:NoSchedule taint

opened 10:33PM - 20 Mar 23 UTC

Dunge

#### Summary **Ubuntu 20.04.5 LTS microk8s 1.26.1 installed via snap 3 nodes …HA clusters** I came back one morning and my cluster that was working fine the previous day was completely down. Unlike my previous similar experience, this time the disks still have space remaining. ``` microk8s status microk8s is not running. Use microk8s inspect for a deeper inspection. ``` ``` microk8s start nothing for a few minutes, then exit with no message. status still say not running. ``` I tried to check the logs using `sudo journalctl -u snap.microk8s.daemon-kubelite` but there’s too much stuff, couldn’t find anything relevant. I then rebooted the nodes `sudo reboot` and they actually came back online status Ready. The condition look good: ``` Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message NetworkUnavailable False Thu, 23 Feb 2023 04:36:12 +0000 Thu, 23 Feb 2023 04:36:12 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Mon, 20 Mar 2023 20:38:48 +0000 Mon, 20 Mar 2023 20:10:26 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled ``` But there is a Taint preventing any workload to get scheduled on them: ``` Taints: [node.kubernetes.io/unreachable:NoSchedule](http://node.kubernetes.io/unreachable:NoSchedule) ``` Events on pods: ``` 0/3 nodes are available: 3 node(s) had untolerated taint {[node.kubernetes.io/unreachable:](http://node.kubernetes.io/unreachable:) }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling… ``` Events seems to crash the kubelet service on loop: ``` Warning InvalidDiskCapacity 9m38s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientPID 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 9m38s kubelet Updated Node Allocatable limit across pods Normal NodeHasNoDiskPressure 9m38s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientMemory 9m38s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal Starting 9m38s kubelet Starting kubelet. Normal Starting 7m7s kubelet Starting kubelet. Warning InvalidDiskCapacity 7m7s kubelet invalid capacity 0 on image filesystem Normal NodeHasNoDiskPressure 7m6s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeAllocatableEnforced 7m6s kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientPID 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeHasSufficientMemory 7m6s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal Starting 2m57s kubelet Starting kubelet. Warning InvalidDiskCapacity 2m57s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 2m57s kubelet Node jldocker-1 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 2m57s kubelet Node jldocker-1 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 2m57s kubelet Updated Node Allocatable limit across pods ``` Any command I send to microk8s kubectl print some error message about memcache.go: ``` E0320 20:41:19.730002 50241 memcache.go:255] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request E0320 20:41:19.733484 50241 memcache.go:106] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request E0320 20:41:19.735017 50241 memcache.go:106] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request E0320 20:41:19.738322 50241 memcache.go:106] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request E0320 20:41:20.827654 50241 memcache.go:106] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request E0320 20:41:20.830214 50241 memcache.go:106] couldn’t get resource list for [metrics.k8s.io/v1beta1:](http://metrics.k8s.io/v1beta1:) the server is currently unable to handle the request ``` Any idea what is causing this? From my point of view, nodes seems healthy, ram, cpu and disk are good, networking between them is also functional. #### Introspection Report Here is my microk8s inspect tarball file hosted on my Google Drive: https://drive.google.com/file/d/1LEaCI3O2Y82OI4Q2quo-uIBD-zV8rZBA/view?usp=sharing

I wasn’t sure where to post since this is not a bug report in itself and more of a request for help I assumed it was preferable to post on the community forum, but as you please!

Dunge · March 22, 2023, 6:14pm

@balchua1 Did you try to take a look? Still stuck!

Topic		Replies	Views
Microk8s not starting in Debian 10 VM microk8s	0	1009	February 4, 2023
Pi4 1.20 nodes keep failing after a few days on a fairly stock install microk8s	4	1946	February 3, 2021
Microk8s.status not running; microk8s.inspect looks good General Discussions	4	3289	February 5, 2020
On inspecting microk8s shows running state but not even single pod is running microk8s	0	635	March 9, 2022
Microk8s: after reboot I have "FAIL: Service snap.microk8s.daemon-kubelet is not running" microk8s service	5	6207	June 7, 2019

Nodes crashed. node.kubernetes.io/unreachable:NoSchedule taint

Related topics