Remove nodes stuck in NotReady state

lachezar.kolev · March 21, 2023, 10:56am

Cluster information:

Kubernetes version: 1.25
Cloud being used: AWS
Installation method: Console
Host OS: Amazon Linux 2
CNI and version: Amazon VPC CNI; version: v1.12.5-eksbuild.2
CRI and version: Containerd; version: 1.6.6

Hey Folks, I’m running an EKS Cluster with a few worker nodes (scaling configured through Karpenter). Now for some reason, from time to time, a node gets stuck in a NotReady state, rendering all the pods Terminating state. To fix (remove the node, since it’s unreachable in any way) someone has to manually delete the worker node, which is operationally heavy & very inefficient.

Is there a tool or any kind of configuration option that would give the possibility to delete a node automatically, if the kubelet from the said node does not respond in N amount of minutes?
From what I’ve seen, there doesn’t seem to be such an option in aws-termination-handler & karpenter.

Would appreciate any feedback & help on this. Thanks!

Topic		Replies	Views
Statefulset pod stuck in Terminating state when node become NotReady General Discussions	1	843	January 23, 2023
What's the recommendation for handling dead nodes? General Discussions	0	685	December 6, 2018
Master Node NotReady General Discussions	1	2703	February 18, 2024
Kubernetes with remote worker nodes General Discussions	1	42	December 30, 2024
Stateful set pod remain stuck in terminating state in case node went to NotReady state General Discussions	0	561	January 20, 2023

Remove nodes stuck in NotReady state

Cluster information:

Related topics