What's the recommendation for handling dead nodes?


#1

Hello, in my eks (AWS k8s) cluster I’ve encountered a situation where the kubelet stopped responding, from the logs I can’t still quite understand what happened, but the stateful pods in that node became unknown, and eventually not accessible.

As I understand, stateful sets are not rescheduled and a manual action is needed (delete pods or restore the node).

What’s the recommendation to handle this? I’m not sure what exactly to do in those cases, in one case I even saw that 2/3 pods were scheduled on the dead node, and my application then was down.