I have Microk8’s three-node cluster running on AWS. I’m currently setting this up for production, but I’m free to wipe and start again, which I have done a few times in my testing.
I have three namespaces.
- Dev
- UAT
- Prod
Dev is using a node selector to only run on Node 2 with single replicas
UAT is using a node selector to only run on Node 3 with single replicas
Prod is running on all three nodes, utilizing podAntiAffinity to ensure that the prods are evenly distributed across all three nodes.
This is all working as expected.
The issue is DNS resolution between pods.
If I run the default replica count of 1 for coreDNS, and execute the following command on the same node where coreDNS is running, it will resolve.
k -n dev exec -it <pod> – nslookup mongodb
If I run that command on any other node, the DNS will fail.
If I then increase the replica count to three, coreDNS will run on each node.
Now, when I run the pod nslookup, the issue is intermittent. If, by chance, the pod uses a coreDNS for the lookup that is on the same node, it works. If it picks a coreDNS on another node, it fails.
I have spent a whole day and a half on debugging this, and I keep hitting a brick wall.
I have experience with Microk8s on a single node, but I am relatively new to running on three nodes.
microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 10.0.11.198:19001 10.0.11.206:19001 10.0.11.207:19001
datastore standby nodes: none
addons:
enabled:
cert-manager # (core) Cloud native certificate management dashboard # (core) The Kubernetes dashboard dns # (core) CoreDNS ha-cluster # (core) Configure high availability on the current node helm # (core) Helm - the package manager for Kubernetes helm3 # (core) Helm 3 - the package manager for Kubernetes hostpath-storage # (core) Storage class; allocates storage from host directory ingress # (core) Ingress controller for external access metallb # (core) Loadbalancer for your Kubernetes cluster metrics-server # (core) K8s Metrics Server for API access to service metrics observability # (core) A lightweight observability stack for logs, traces and metrics rbac # (core) Role-Based Access Control for authorisation registry # (core) Private image registry exposed on localhost:32000 storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
cis-hardening # (core) Apply CIS K8s hardening community # (core) The community addons repository gpu # (core) Alias to nvidia add-on host-access # (core) Allow Pods connecting to Host services smoothly kube-ovn # (core) An advanced network fabric for Kubernetes mayastor # (core) OpenEBS MayaStor minio # (core) MinIO object storage nvidia # (core) NVIDIA hardware (GPU and network) support prometheus # (core) Prometheus operator for monitoring and logging rook-ceph # (core) Distributed Ceph storage using Rook