However, this address doesn’t resolve BEFORE pod creation if I try to use it, i.e.:
k run -it my-app --image=pidocker-docker-registry:5000 (or pidocker-docker-registry.default.svc.cluster.local:5000) --command – /bin/bash
What happens is I get an immediate ImagePull error because it can’t resolve the docker registry hostname.
But if I create a Pod first (say dnsutils) and then do a dig, everything works (as it should since I noticed that the dnsutil’s Pod has the right /etc/resolv.conf and search/domain entries).
I noticed in the doc under DNS troubleshooting there is the following line:
Kubernetes installs do not configure the nodes’ resolv.conf files to use the cluster DNS by default, because that process is inherently distribution-specific. This should probably be implemented eventually.
Is this what I’m running into? The only way for me to “fix” this is to add the CoreDNS IP address to /etc/resolv.conf on the worker node itself which then allows the kubelet to resolve the FQDN of my docker registry and pull successfully.
Best to do that explicitly, then. I don’t think we’d want to do this on all clusters.
So you could add cluster DNS to your resolvers list or you could expose that registry as a NodePort service and always access localhost:30000 or something like that.
I did the former and added the CoreDNS server to the resolvers list of the workers and that does the trick.
TBH, you still haven’t made a very case on why the kubelet doesn’t at least consult CoreDNS as a secondary resolver at pod creation time though. It seems extremely logical to me (rely on the workers resolve.conf and use CoreDNS as a backup before giving up).
DNS is fiddly. You rarely want to be consulting 2 DNS servers that have different information because then they are not interchangeable. In most cases the cluster DNS’s upstream is the node’s resolver anyway, so that cyclical. IMO it’s better to special case one time, but there’s a huge diversity of use-cases, so other solutions have their place.
You always consult multiple DNS server with different information. That’s the whole point of DNS! When you find the one that says it’s authoritative, you get a response.
I really do not see why the Kubelet doesn’t use the CoreDNS server pre-pod creation as a secondary DNS source. Especially since anything .cluster.local makes it authoritative.
Btw, from the official doc:
Kubernetes installs do not configure the nodes’ resolv.conf files to use the cluster DNS by default, because that process is inherently distribution-specific. This should probably be implemented eventually.
Yeah, I think it really should. Once you install the kubelet and run it, you are saying this is a worker node and consequently should know about the CoreDNS nameserver during run-time.
You always consult multiple DNS server with different information.
Beg to differ, or at least to clarify. If you have a local config (e.g. resolv.conf) with 2 different nameservers which return different answers for the same query, you will eventually have chaos. There is no spec for how clients are expected to consume resolv.conf. Some libcs try them in sequence. Some try them in parallel, and some randomize. Some check all responses for one that succeeded, others take the first response (including NXDOMAIN).
It does not always behave as you would expect. We chase COUNTLESS bugs in Kubernetes when we used to set up pods as you describe.
You’re going to have best client compat by having the local-est resolver forward or pass-thru queries to their upstream.
Btw, from the official doc:
I wrote that. I was probably wrong. To do what you want, we would really want kubelet to have a different config than the rest of the machine (to scope the impact) and probably we’d need to tightly spec and verify the resolver behavior we need and make sure that kubelet ALWAYS has that behavior.
a) You are talking about broken DNS. k8s can’t avoid that and trying to “:work” around it or more design decisions based on it is not what I would do.
b) Are you really taking the positoin that a .cluster.local address is going to resolve differently on two different nameservers on a worker node? What are you exactly trying to avoid?
c) What is “upstream” in your case? The Core DNS server? That makes no sense to me.
Look, you made a design decision. I don’t have to agree with it and I accept my work around.
But it does seem absolutely silly to me that the kubelet won’t even consider using the CoreDNS server on pod creation time automatically (or even optionally) - especially if I give a FQDN of the registry with a domain .cluster.local.
There is NO SPEC (that I can find?) for how client resolvers are supposed to operate in the face of multiple nameservers. To a first approximation, all DNS clients are broken.
Are you really taking the positoin that a .cluster.local address is going to resolve differently on two different nameservers on a worker node?
I am saying that one nameserver might respond NXDOMAIN while the other successfully answers the query. SOME clients will issue those queries in the order you specify nameservers, giving the answer you expect. Some clients will issue those queries in parallel and take whichever returns first, giving you utterly non-deterministic results. Some clients will randomize the nameserver order, also giving you non-deterministic results.
I am saying that, unless you CAREFULLY control ALL of the clients, it will explode in your face. We know this because we TRIED doing this in Kubernetes and were beaten into submission by a relentless stream of bug reports, which ended at exactly the same time we stopped doing this.
What is “upstream” in your case?
I meant that clients ask cluster DNS exclusively. Cluster DNS is canonical for cluster.local and any non-result there is NXDOMAIN. All other queries are forwarded to $someone, usually the node’s own DNS nameserver(s). That means clients always get a consistent view of names, and there’s no ambiguity when asking 2 nameservers and getting 2 responses.
But it does seem absolutely silly to me that the kubelet won’t even consider using the CoreDNS server on pod creation time automatically (or even optionally) - especially if I give a FQDN of the registry with a domain .cluster.local.
I didn’t actually say that. What I said was that you have to control the clients. Hypothetically, kubelet could have a distinct resolv.conf or could internally query DNS differently for the cluster suffix (which it knows). It’s just not as simple as throwing another nameserver line into /etc/resolv.conf on the node. If someone wanted to write a KEP about this, to explore how to make it happen, I’d help review that KEP.
Tim, we are in violent agreement. That was MY point: I shouldn’t have to much with the client’s resolv.conf. I am saying the kubelet should have its own predefined ordering for name server resolution which as you said here is EXACTLY my thoughts on the matter:
Bingo! That’s exactly what I was thinking. The kubelet uses the cluster DNS ALWAYS as the primary DNS server for .cluster.local, i.e it’s authoritative and kicks upstream not the reverse which is what is happening now (or in my case, I’m working around it by flipping the nameserver order manually in resolv.conf - yikes!).
I would love to write this…how do we make this happen? Some general guidance would be appreciated (yeah, yeah, I can Google too but sometimes that yields conflicting paths to salvation).
I am saying the kubelet should have its own predefined ordering for name server resolution
Kubelet’s DNS is defined by /etc/resolv.conf. We don’t have a mechanism for kubelet to have its own ordering. That would need a KEP.
The kubelet uses the cluster DNS ALWAYS as the primary DNS server for .cluster.local
Unfortunately there’s no way (that I know of) to switch nameservers based on suffix match. DNS resolution is not something Kubelet ever does explicitly (today) so we’d need some significantly clever work to make this happen.
I’m working around it by flipping the nameserver order manually in resolv.conf
…which is exactly what I am saying we CAN’T do in general, though it may work for your specific case
I would love to write this…how do we make this happen?
You write a KEP and lay out the problem statement, some possible implementations, the pros/cons/risks, and propose a path forward. Not too hard, except the “possible solutions” part
Hello, I know this discussion is old but I’m facing the exact problem with AKS cluster. I want to pull pod images from a registry deployed on another cluster. I configured coreDNS configmap for the FQDNs of the container registry and I can ping it from inside my pods but not from the nodes. I tried to edit the resolv.conf on the node but the file is managed by man:systemd-resolved(8) and changes are not saved to the file.
Sorry but I’m new in networking things, so if you could help it would be great