How does load balancing to a ClusterIP work

I have a case where the load balancing to a ClusterIP is not working as I would expect.

I have many (12) CoreDNS replicas, but sometime all the requests are routed to a single replica, and that replica pretty much dies under the load.

I’m trying to understand how traffic to a ClusterIP is load balanced.
From the pods perspective, CoreDNS is seen as a ClusterIP (from the kube-dns service). So each pod sees exactly one IP address.

How is this translated to the 12 separate IPs of the replicas?
And how does the node or the pod load balance traffic to those replicas?
Are there logs that I can look at to understand why only one of my replicas is getting traffic?

Cluster information:

Kubernetes version: 1.13
Cloud being used: AWS
Installation method: EKS
Host OS: Ubuntu
CNI and version: amazon-k8s-cni:v1.5.3
CRI and version: N/A

It should spread across the replicas BUT a single pod will get the same replica, in general, unless you let the connection-tracking time out.

See, since we don’t know when a UDP “session” is over, we have to just kind of wait. After some time (a few minutes, if I recall?) of inactivity, we discard whatever we used to know and pick a new one.