Microk8's three node HA cluster, odd intermittent issue with coreDNS and Pod to Pod communications

I have Microk8’s three-node cluster running on AWS. I’m currently setting this up for production, but I’m free to wipe and start again, which I have done a few times in my testing.

I have three namespaces.

  • Dev
  • UAT
  • Prod

Dev is using a node selector to only run on Node 2 with single replicas
UAT is using a node selector to only run on Node 3 with single replicas

Prod is running on all three nodes, utilizing podAntiAffinity to ensure that the prods are evenly distributed across all three nodes.
This is all working as expected.

The issue is DNS resolution between pods.

If I run the default replica count of 1 for coreDNS, and execute the following command on the same node where coreDNS is running, it will resolve.
k -n dev exec -it <pod> – nslookup mongodb

If I run that command on any other node, the DNS will fail.

If I then increase the replica count to three, coreDNS will run on each node.
Now, when I run the pod nslookup, the issue is intermittent. If, by chance, the pod uses a coreDNS for the lookup that is on the same node, it works. If it picks a coreDNS on another node, it fails.

I have spent a whole day and a half on debugging this, and I keep hitting a brick wall.

I have experience with Microk8s on a single node, but I am relatively new to running on three nodes.

microk8s status

microk8s is running

high-availability: yes

datastore master nodes: 10.0.11.198:19001 10.0.11.206:19001 10.0.11.207:19001

datastore standby nodes: none

addons:

enabled:

cert-manager         # (core) Cloud native certificate management

dashboard            # (core) The Kubernetes dashboard

dns                  # (core) CoreDNS

ha-cluster           # (core) Configure high availability on the current node

helm                 # (core) Helm - the package manager for Kubernetes

helm3                # (core) Helm 3 - the package manager for Kubernetes

hostpath-storage     # (core) Storage class; allocates storage from host directory

ingress              # (core) Ingress controller for external access

metallb              # (core) Loadbalancer for your Kubernetes cluster

metrics-server       # (core) K8s Metrics Server for API access to service metrics

observability        # (core) A lightweight observability stack for logs, traces and metrics

rbac                 # (core) Role-Based Access Control for authorisation

registry             # (core) Private image registry exposed on localhost:32000

storage              # (core) Alias to hostpath-storage add-on, deprecated

disabled:

cis-hardening        # (core) Apply CIS K8s hardening

community            # (core) The community addons repository

gpu                  # (core) Alias to nvidia add-on

host-access          # (core) Allow Pods connecting to Host services smoothly

kube-ovn             # (core) An advanced network fabric for Kubernetes

mayastor             # (core) OpenEBS MayaStor

minio                # (core) MinIO object storage

nvidia               # (core) NVIDIA hardware (GPU and network) support

prometheus           # (core) Prometheus operator for monitoring and logging

rook-ceph            # (core) Distributed Ceph storage using Rook