Kube-proxy really wants to use my ISP's nameservers

Cluster information:

Kubernetes version: 1.23
Cloud being used: bare-metal
Installation method: kubeadm init
Host OS: Ubuntu 20.04
CNI and version: flannel 0.17.0
CRI and version: Docker CE 20.10.14

When I initially ran kubeadm init I seemed to have a problem with flannel timing out on dial 10.96.0.1. I couldn’t figure it out at the time and after a few tries it seemed to resolve itself. However today I restarted my cluster master, and flannel started acting up again with the same issue. A post suggested that kube-proxy might not be working, so I checked its logs, and it was failing to resolve the hostname that is defined in Tailscale using the ISP of the server. I checked, and sure enough, for some reason the kube-proxy container contains my ISP’s nameservers instead of containing 127.0.0.53 in /etc/resolv.conf. I could not find any information about when and how these nameservers are introduced when I searched around, as they are also there in /etc/resolv.conf even in the base image (docker run --rm k8s.gcr.io/kube-proxy:v1.23.5 cat /etc/resolv.conf shows the nameservers).

My questions are:

  1. When/where are these nameservers set? Are they embedded into the image during the kubeadm init process, or is there something else going on?
  2. How can I make kube-proxy properly use the nameservers defined in the host /etc/resolv.conf? For now I hacked in the custom nameservers directly by editing the DaemonSet via kubectl edit, but obviously this is far from ideal and I’d like to avoid hitting this issue everytime I set up a cluster.

Both docket and k8s craft an in-container resolv.conf from the host’s resolv.conf. kubelet uses the pod’ dnsPolicy to decide exactly what to put into it.

I see. Is it using my default route’s DNS servers (obtained through DHCP) then? Is there a way I can say something like “please use these DNS servers”? Alternatively is there a way for me to specify what the kube-proxy DaemonSet should be using for DNS during kubeadm init?

I solved my issue, detailing it here for future reference. The problem is pretty difficult to spot, and basically comes from systemd-resolved, Tailscale and docker interacting together badly. It can be summarized as:

  • Tailscale sees that the node is using systemd-resolved, adds itself that way instead of adding itself to resolv.conf.
  • When docker sees 127.0.0.53 in the resolv.conf file it uses the non-stub one from systemd-resolved located at /run/systemd/resolved/resolv.conf instead, which does not have the Tailscale nameserver used to resolve Tailscale hostnames.
  • This causes containers to get the resolv.conf without the Tailscale nameserver, causing containers to not be able to resolve those hostnames.

Unfortunately I could not find a way to make Tailscale add itself to the non-stub list of nameservers, so the solution here seems to be disabling systemd-resolved, replacing the /etc/resolv.conf symlink with a regular file and then restarting tailscaled. Hope this helps someone with a similar setup.

Kubelet has a --resolv-conf flag (–resolv-conf) which you can use
to retarget to something other than /etc/resolv.conf. I am not super
familiar with how systemd-resolved works or how tailscale integrates
there, personally.