Cloud being used: LAB Private
Installation method: repo install
Host OS: centos7
I’ve setup a LAB cluster with 3 node (node1, node2, node3), node 1 is master.
I setup environment and join node to cluster done, nodes Ready, But I’ve issue with coredns. Coredns can’t nslookup name nodes
logs pod coredns
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:43976->22.214.171.124:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:55132->126.96.36.199:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. A: read udp 10.244.0.25:42492->188.8.131.52:53: i/o timeout
> [ERROR] plugin/errors: 2 node3. AAAA: read udp 10.244.0.25:54212->184.108.40.206:53: i/o timeout
log metricbeat service
2021-05-31T12:00:40.639Z WARN [transport] transport/tcp.go:52 DNS lookup failure "node3": lookup node3 on 10.96.0.10:53: read udp 10.244.2.45:59110->10.96.0.10:53: i/o timeout
but services in cluster deployment did, still working with name services.
wish everyone help me.
What CNI driver are you using? I’ve seen that sort of error before if theres a problem at the CNI level.
yes! I’ve update CNI: flannel
Seems like I’ve issue where. i checked configmap coredns and see it’s forward dns to /etc/resolv.conf. But i don’t have idea to fix, because my nodes is visual machine. I’ve fix hostname in file hosts, but issue not resolved yet.
Quick question on this, are you expecting coredns to observe your node’s
/etc/hosts file in this the case?
Where are node1, node2, and node3 setup DNS-wise?
Containers don’t use the host’s
/etc/hosts file. Each container has it’s own nsswitch configuration. You could probably do something like this.
I’m a bit biased here though. If you need to connect to a node to do something on a node, you can just do it directly in a pod.
To illustrate how to do it, you might want to check out the krew plugin called
All node-shell does is creates a pod that runs on the node that you want to do work on and attaches you to it. At that point if you’re thinking you needed to connect to a service, it’s now always going to be localhost.
May I know how did u fix it right?
Facing the same problem here.
@akala515 - I found that my CNI was using iptables-legacy even though my debian 10 VMs were set to use nt_tables. This was causing my rules for 10.96 to be applied to iptables-legacy and iptables-legacy -t nat for CNI and causing this breakage.
Resolution for me was to start over, remove all iptables for both legacy and nft, and start from scratch.
Followed by, instead of an install of calico directly from one of their manifest, you should pull it down and add
- name: FELIX_IPTABLESBACKEND
to the env vars. This will force NFT. After then installing calico from this updated version of it, it all worked for me and I had no iptables-legacy changes and everything lived in nft.
I faced the same issue and resolved it. Details are available here