I was playing around and wanted to see how quick kubernetes would see if a node failed and reprovision the pods.
However, it’s still showing the pods are running even though the node was powered off.
I am guessing since i have no “liveness” or “readiness” probes that is the cause of this. Wanted to confirm if there is or isn’t something that regularly checks the state of the pods even if it’s just asking the node if they are running.
The Node communication should be handled via the kubelet so liveness/readiness probes wouldn’t be part of it. Normally I would see the missing node within 2-3 mins if losing it, odd that you are not seeing similar behaviour.
A couple questions to try to help debug When you run kubectl get pods -o wide
do they show the missing node? Are the services still accessible? Is there anything in the host kube-api server or scheduler?
When i do the get pods wide. it shows the pods, shows them running on the various nodes, including the node that is down. The pods are not accessible though.
can you clarify “anything in the host kube-api or scheduler?”
Wondering if there is something odd with my cluster and maybe I should rebuild it with a quick rancher setup or something.
That’s very odd. So one of the other hosts that is running in the cluster, if you could check the logs on the scheduler to see if it is throwing any errors regarding the downed node. Given your running rancher you should be able to do that on the host node using docker logs kube-scheduler --tail <number-of-logs-back>
. There might be a clue in there.
How did the cluster get provisioned, through the Rancher UI or via RKE/Other?