K8s daemonset is in running state even when the respective node is down

We are observing an issue where the k8s daemonset is in a running state even when the respective node is down. I would have expected the daemonset on the down node to get evicted from that node. However, since we are exposing the daemonset via the headless service, DNS query to the headless service is returning the IP’s of all the daemon sets (including the one corresponding to the down node) due to which application pod is not able to connect to the daemon set corresponding to the down node and resulting into traffic interruption.

Cluster information:

Kubernetes version: 1.21.1
Cloud being used: bare-metal
Installation method: K3s
Host OS: Ubuntu 18.04
CNI and version:
CRI and version:

How long did you wait after the node was down? The heartbeat for nodes is like 5 minutes. Things take some time to correct themselves when a node abruptly dies.

Do you have health checks configured for the daemonset? While I’ve not tested this, I believe health checks happen from the control plane. My assumption is that a failing health check on the pod could cause it to be handled faster than just waiting for node recovery things to happen.

Yes, We waited long enough & do have health checks enabled. In fact, we didn’t restart the node to check the behavior and observed that all the daemonset pods remains in the running state.

The main concern is that the headless service returning the IP address of the POD that doesn’t
exists. We have the same daemon exposed via cluster IP & headless service. The cluster IP service works as expected & doesn’t forward the traffic to the pod that doesn’t exist but the DNS query to the headless service returns the IP address of the pod which doesn’t exist due to which application load balancing logic is going for a toss. Any thoughts/suggestions to kick out the daemonset if the node is not ready?

Perhaps with tollerations?

I came across this from a stack overflow post.

Read the documentation carefully though, it seems DaemonSets have some tolleration stuff set by default.

Thanks, Yes, daemonset pods are not evicted due to the toleration added by default.
We can try fixing it by removing the tolerations but it would require monitoring the daemonset pod deployments to automate it.

Do you think k8s headless service returning the endpoint which is not reachable is an issue and should be fixed in the k8s?

Looking at headless services it just kindof says that DNS happens, not really why DNS happens.

When I check out the ServiceSpec API Reference, the setting “publishNotReadyAddresses” strikes me as interesting.

What’s the value of “publishNotReadyAddresses” in your cluster and what’s the status of the pod on the dead node?

Thanks for the update!
It seems the default value of publishNotReadyAddresses is false but it is still returning the IP address of the pod on the dead node. I tried service.alpha.kubernetes.io/tolerate-unready-endpoints: “false” annotation as well but no luck.

Describe service does show that not ready pod but DNS query resolve the not ready pod IP address as well. Please see below.

kubectl describe endpoints opa-headless -n ztna
Name: opa-headless
Namespace: ztna
Labels: app.kubernetes.io/managed-by=Helm
service=opa
service.kubernetes.io/headless=
Annotations:
Subsets:
Addresses: 10.42.0.10,10.42.2.6
NotReadyAddresses: 10.42.1.4
Ports:
Name Port Protocol
---- ---- --------
9191 9191 TCP

Events:

DNS resolution logs
[2021-08-18 08:37:09.901][1][debug][upstream] [source/common/upstream/upstream_impl.cc:279] transport socket match, socket default selected for host with address 10.42.2.6:9191

[2021-08-18 08:37:09.901][1][debug][upstream] [source/common/upstream/upstream_impl.cc:279] transport socket match, socket default selected for host with address 10.42.1.4:9191

[2021-08-18 08:37:09.901][1][debug][upstream] [source/common/upstream/upstream_impl.cc:279] transport socket match, socket default selected for host with address 10.42.0.10:9191

[2021-08-18 08:37:09.901][1][debug][upstream] [source/common/upstream/strict_dns_cluster.cc:170] DNS refresh rate reset for opa-headless, refresh rate 5000 ms