We are running kubernetes on AWS EKS.
We discovered an issue within our cluster whereby 1 AZ of 3 in our cluster was running out of IPs while the other 2 had plenty (each AZ is subnetted such that they have the same number of IPs).
Further research showed that the AZ in question had more available capacity than the other 2.
We configured our ASG to allow multiple instance types (for resiliency purposes) and the AZ with more capacity was found to be running all 9xl nodes while the other AZs had smaller machines. We believe this was causing more pods to be scheduled in the AZ that had more capacity thus using up a disproportional number of IPs in that AZ.
While we believe that this is the root cause, we are not sure how to address it (other than using only 1 size instances in the cluster).
Does anyone have suggestions on how to handle this as we believe we are following best practices by allowing a list of instance types?
thank you