Is AWS EKS Windows Ready For Production? (and recent DNS issues)

I am trying to figure out what forum to ask this on, and this is for general discussion, and so it seems ok to ask here.

Is AWS EKS Windows Ready For Production? We are now on EKS 1.17. We continuously run into obscure problems. Most recently those problems surround DNS. Examples:

  1. Sometimes Windows nodes get the wrong MAC address of pods in the cluster. This causes an impact when coredns is one of those pods. Pods running on the impacted Windows nodes cannot resolve DNS whenever requests go to the coredns pod with the wrong MAC address (by wrong MAC address i mean the node has the wrong MAC address for that pod)
  2. Sometimes Pods that start on a Windows Node cannot resolve DNS at all. ie- DNS is broken for the entire lifetime of the pod, and the Pod cannot connect to any internal Cluster Ip. However, whenever a fresh Pod is started, the problem goes away.

It’s hard to tell if this is a Windows Container problem or an EKS Problem. But, I am wondering whether other people are successfully running important workflows on Windows EKS. Any comments would be helpful as I assess this offering.

Well to contribute my own thoughts, I do not think AWS EKS Windows is ready for Production. As someone who runs many thousands of Linux Containers, I love AWS’s EKS Linux offering. But, the experience with Windows over the past 8 months has been terrible and inconsistent. Whenever obscure issues come up, it can take many hours and days of going through things with AWS Support to make any progress, and often you end up needing to come up with workarounds and accept handwavy suggestions about “something something Microsoft issue” or “something something local Domain issue” or “something something Windows Containers are full of issues.” Most of the obscure issues I have run into are related to DNS, logging inconsistencies, and gMSA inconsistencies. As of their recent AMIs, logging has improved and gMSA has improved, but DNS continues to be a major problem for them.

If anyone else is having good success on the platform, it would be so great to hear about it. Or if anyone else is having success running Windows Containers in Kubernetes on a different platform (ie- AKS or something else), it would also be great to hear about that. As of now I am moving away from the AWS EKS Windows offering.

@Howard_Roark Did you ever figure out the cause of the DNS issues? We are experiencing similar problems with Windows pods in AKS.