CoreDNS [ERROR] plugin/errors: 2

Hello everyone

Kubernetes version:v1.21.1
Cloud being used: LAB Private
Installation method: repo install
Host OS: centos7
CNI: flannel

I’ve setup a LAB cluster with 3 node (node1, node2, node3), node 1 is master.
I setup environment and join node to cluster done, nodes Ready, But I’ve issue with coredns. Coredns can’t nslookup name nodes
logs pod coredns
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:43976->8.8.8.8:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:55132->8.8.8.8:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:42492->8.8.8.8:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. AAAA: read udp 10.244.0.25:54212->8.8.8.8:53: i/o timeout

log metricbeat service

2021-05-31T12:00:40.639Z WARN [transport] transport/tcp.go:52 DNS lookup failure "node3": lookup node3 on 10.96.0.10:53: read udp 10.244.2.45:59110->10.96.0.10:53: i/o timeout

but services in cluster deployment did, still working with name services.

Topo
image

wish everyone help me.
thanks !

What CNI driver are you using? I’ve seen that sort of error before if theres a problem at the CNI level.

yes! I’ve update CNI: flannel

Seems like I’ve issue where. i checked configmap coredns and see it’s forward dns to /etc/resolv.conf. But i don’t have idea to fix, because my nodes is visual machine. I’ve fix hostname in file hosts, but issue not resolved yet.

Quick question on this, are you expecting coredns to observe your node’s /etc/hosts file in this the case?

Where are node1, node2, and node3 setup DNS-wise?

Containers don’t use the host’s /etc/hosts file. Each container has it’s own nsswitch configuration. You could probably do something like this.

I’m a bit biased here though. If you need to connect to a node to do something on a node, you can just do it directly in a pod.

To illustrate how to do it, you might want to check out the krew plugin called node-shell.

All node-shell does is creates a pod that runs on the node that you want to do work on and attaches you to it. At that point if you’re thinking you needed to connect to a service, it’s now always going to be localhost.

May I know how did u fix it right?

Facing the same problem here.

@akala515 - I found that my CNI was using iptables-legacy even though my debian 10 VMs were set to use nt_tables. This was causing my rules for 10.96 to be applied to iptables-legacy and iptables-legacy -t nat for CNI and causing this breakage.

Resolution for me was to start over, remove all iptables for both legacy and nft, and start from scratch.
kubeadm init…
Followed by, instead of an install of calico directly from one of their manifest, you should pull it down and add
- name: FELIX_IPTABLESBACKEND
value: “NFT”
to the env vars. This will force NFT. After then installing calico from this updated version of it, it all worked for me and I had no iptables-legacy changes and everything lived in nft.

@Brandon_Ojeda I managed to get it working with your instructions (networking - Kubernetes: unreachable backend: read udp 10.200.0.9:46159->183.60.83.19:53: i/o timeout - Stack Overflow) + HINFO: unreachable backend: read udp 10.200.0.9:46159->183.60.83.19:53: i/o timeout · Issue #2693 · coredns/coredns · GitHub + rebooting my system at some point. I am still confused, but happy that it is working now. Thanks for that

I faced the same issue and resolved it. Details are available here

K8s :- V1.26.1
OS : - RHEL 8.7
VM is hosted using Azure cloud provider.

Core dns logs :-
CoreDNS-1.9.3
linux/amd64, go1.18.2, 45b0a11
[ERROR] plugin/errors: 2 2568389657905608835.8295431261288812352. HINFO: read udp 192.168.54.66:34391->168.63.129.16:53: read: no route to host
[ERROR] plugin/errors: 2 2568389657905608835.8295431261288812352. HINFO: read udp 192.168.54.66:34275->168.63.129.16:53: read: no route to host
[ERROR] plugin/errors: 2 2568389657905608835.8295431261288812352. HINFO: read udp 192.168.54.66:59435->168.63.129.16:53: read: no route to host

Unable to reach internet from inside the pod. nslookup fails with below error
image

That could be the case of proper firewall configuration: