Coredns issues with aws & aws-vpc-cni on a self managed ec2 cluster

I’m new to k8s and I’ve been trying to learn it. I faced a problem with trying to setup aws-vpc-k8s-cni on my fresh k8s cluster with coredns. Here’s the problem in detail.

Cluster & Network Architecture

The cluster setup consists of 1 master (ec2 instance) node only for now. The network is built of 1 VPC with CIDR range 10.0.0.0/16 and 4 Subnets with CIDR ranges (public_1, 10.0.0.0/20), (public_2, 10.0.16.0/20), (private_1, 10.0.32.0/20), (private_2, 10.0.48.0/20). The public subnets route outbound traffic via an internet gateway to the internet and private subnets route outbound traffic via a NAT proxy (ec2 instance) to the internet. The private subnets are also connected to each other via a VPC Endpoint. I access machines on my private subnet via another ec2 instance that acts as a bastion. Hence, There are a total of 3 machines.

  • K8S Master Instance.
  • Bastion Instance (SSH only)
  • NAT Proxy Instance (Private Subnet → Internet [Outbound Only])

AWS Setup Master Instance

I’ve attached a security group that forwards the ports listed in kubernetes.io/docs/reference/networking/ports-and-protocols. I’ve also created an IAM role (ec2 instance profile) and attached the policies listed cloud-provider-aws.sigs.k8s.io/prerequisites.
the inbound rules forwarded on my master instance

On Master Instance

  1. I setup the hostname of the instance to the private ipv4 dns name with curl http://169.254.169.254/latest/meta-data/hostname.
  2. I setup the iptables with iptables -P FORWARD ACCEPT.
  3. I installed containerd, kubelet, kubectl, kubeadm.
  4. I loaded the modules br_netfilter & overlay.
  5. I setup kernal flags namely net.bridge.bridge-nf-call-iptables = 1, net.bridge.bridge-nf-call-ip6tables = 1, net.ipv4.ip_forward = 1.
  6. I added KUBELET_EXTRA_ARGS=--node-ip=<ipv4_address> --cloud-provider=external in /etc/systemd/system/kubelet.service.d/20-aws.conf.
  7. I added --cloud-provider=external to /etc/kubernetes/manifests/kubernetes-controller-manager.yaml & /etc/kubernetes/manifests/kube-apiserver.yaml.
  8. I initialized a fresh k8s cluster with kubeadm init --token--ttl 0 and it initialised successfully.
  9. I deployed aws-vpc-cni-k8s via kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.16.4/config/master/aws-k8s-cni.yaml.
  10. I deployed aws-cloud-controller-manager via kubectl apply -k 'github.com/kubernetes/cloud-provider-aws/examples/existing-cluster/base/?ref=master'.

Problem

Both aws-cloud-controller-manager and aws-vpc-cni deployed successfully and showed no errors. While the node shows up as “Ready”, I’m unable to get coredns working. It’s unable to start. Neither journalctl -u kubelet.service nor journalctl -u containerd.service reveal anything about the source of problem. I believe my ports are forwarded correctly and I’m unsure what’s causing corends to not run.

My assumption is that coredns pod is directing dns lookups from it’s /etc/resolv.conf to my master node /etc/resolv.conf because of the config map forward . /etc/resolv.conf. The master node is is unaware of the cluster & therefore no lookup is successfull. I could be wrong here. I’m not sure to resolve this.

Here’s relevant logs,

  • kubectl logs coredns-76f75df574-465x5 -n kube-system
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:39074->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:47632->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:34254->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:39274->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:49990->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:54792->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:46469->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:33147->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[597383834]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.572) (total time: 30002ms):
Trace[597383834]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (09:09:36.574)
Trace[597383834]: [30.002080972s] [30.002080972s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1358557007]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.573) (total time: 30001ms):
Trace[1358557007]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (09:09:36.574)
Trace[1358557007]: [30.001364092s] [30.001364092s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[784641658]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.574) (total time: 30001ms):
Trace[784641658]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (09:09:36.575)
Trace[784641658]: [30.001209965s] [30.001209965s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:33770->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:54282->10.0.0.2:53: i/o timeout
  • kubectl describe pod coredns-76f75df574-465x5 -n kube-system
admin@ip-10-0-32-163:~$ kubectl describe pod  coredns-76f75df574-465x5 -n kube-system
Name:                 coredns-76f75df574-465x5
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 ip-10-0-32-163.ap-south-1.compute.internal/10.0.32.163
Start Time:           Sat, 16 Mar 2024 09:09:05 +0000
Labels:               k8s-app=kube-dns
                      pod-template-hash=76f75df574
Annotations:          <none>
Status:               Running
IP:                   10.0.43.148
IPs:
  IP:           10.0.43.148
Controlled By:  ReplicaSet/coredns-76f75df574
Containers:
  coredns:
    Container ID:  containerd://cb07b34ca26b0b5d51dfcf658ba03fed0979346da243abd98ab8190a6df46800
    Image:         registry.k8s.io/coredns/coredns:v1.11.1
    Image ID:      registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Sat, 16 Mar 2024 09:31:42 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 16 Mar 2024 09:24:46 +0000
      Finished:     Sat, 16 Mar 2024 09:26:40 +0000
    Ready:          False
    Restart Count:  8
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5l9f6 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-5l9f6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  27m                   default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled         23m                   default-scheduler  Successfully assigned kube-system/coredns-76f75df574-465x5 to ip-10-0-32-163.ap-south-1.compute.internal
  Normal   Created           23m                   kubelet            Created container coredns
  Normal   Started           23m                   kubelet            Started container coredns
  Warning  Unhealthy         23m                   kubelet            Readiness probe failed: Get "http://10.0.43.148:8181/ready": dial tcp 10.0.43.148:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         21m (x5 over 22m)     kubelet            Liveness probe failed: Get "http://10.0.43.148:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing           21m                   kubelet            Container coredns failed liveness probe, will be restarted
  Normal   Pulled            21m (x2 over 23m)     kubelet            Container image "registry.k8s.io/coredns/coredns:v1.11.1" already present on machine
  Warning  Unhealthy         13m (x80 over 23m)    kubelet            Readiness probe failed: Get "http://10.0.43.148:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff           3m36s (x28 over 10m)  kubelet            Back-off restarting failed container coredns in pod coredns-76f75df574-465x5_kube-system(73e7c967-cf8f-4ee5-b525-1839d64e5058)
  • kubectl get pods -A -o wide
admin@ip-10-0-32-163:~$ kubectl get pods -A -o wide
NAMESPACE     NAME                                                                 READY   STATUS             RESTARTS         AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
default       dnsutils                                                             1/1     Running            1 (65m ago)      10h   10.0.35.236   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   aws-cloud-controller-manager-czwrm                                   1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   aws-node-4rtrh                                                       2/2     Running            8 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   coredns-76f75df574-465x5                                             0/1     CrashLoopBackOff   11 (4m24s ago)   43m   10.0.43.148   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   coredns-76f75df574-b9r5t                                             0/1     CrashLoopBackOff   11 (2m53s ago)   43m   10.0.44.228   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   etcd-ip-10-0-32-163.ap-south-1.compute.internal                      1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal   1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   kube-proxy-6cg92                                                     1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
kube-system   kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running            4 (65m ago)      19h   10.0.32.163   ip-10-0-32-163.ap-south-1.compute.internal   <none>           <none>
  • kubectl get all -A
admin@ip-10-0-32-163:~$ kubectl get all -A
NAMESPACE     NAME                                                                     READY   STATUS    RESTARTS        AGE
kube-system   pod/aws-cloud-controller-manager-8d4m9                                   1/1     Running   3 (2m17s ago)   17h
kube-system   pod/aws-node-lvl5p                                                       2/2     Running   4 (3m7s ago)    17h
kube-system   pod/coredns-76f75df574-jqjk4                                             0/1     Running   13 (38s ago)    10h
kube-system   pod/coredns-76f75df574-nbd9c                                             0/1     Running   13 (50s ago)    10h
kube-system   pod/dnsutils                                                             1/1     Running   1 (3m7s ago)    10h
kube-system   pod/etcd-ip-10-0-32-163.ap-south-1.compute.internal                      1/1     Running   14 (3m7s ago)   17h
kube-system   pod/kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running   2 (3m7s ago)    17h
kube-system   pod/kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal   1/1     Running   2 (3m7s ago)    17h
kube-system   pod/kube-proxy-r2sh2                                                     1/1     Running   2 (3m7s ago)    17h
kube-system   pod/kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal            1/1     Running   5 (3m7s ago)    17h

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.97.0.1    <none>        443/TCP                  17h
kube-system   service/kube-dns     ClusterIP   10.97.0.10   <none>        53/UDP,53/TCP,9153/TCP   17h

NAMESPACE     NAME                                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                            AGE
kube-system   daemonset.apps/aws-cloud-controller-manager   1         1         1       1            1           node-role.kubernetes.io/control-plane=   17h
kube-system   daemonset.apps/aws-node                       1         1         1       1            1           <none>                                   17h
kube-system   daemonset.apps/kube-proxy                     1         1         1       1            1           kubernetes.io/os=linux                   17h

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   0/2     2            0           17h

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-76f75df574   2         2         0       10h
  • sudo iptables -vnL
admin@ip-10-0-32-163:~$ sudo iptables -vnL
Chain INPUT (policy ACCEPT 576K packets, 102M bytes)
 pkts bytes target     prot opt in     out     source               destination
 5175  311K KUBE-PROXY-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
 571K  100M KUBE-NODEPORTS  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes health check service ports */
 5175  311K KUBE-EXTERNAL-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
 575K  102M KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 KUBE-PROXY-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
    0     0 KUBE-FORWARD  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
    0     0 KUBE-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
    0     0 KUBE-EXTERNAL-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
    0     0 ACCEPT     0    --  *      *       0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 580K packets, 110M bytes)
 pkts bytes target     prot opt in     out     source               destination
 7345  442K KUBE-PROXY-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
 7345  442K KUBE-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
 579K  110M KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0

Chain KUBE-EXTERNAL-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       0    --  *      *      !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
    0     0 ACCEPT     0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x4000/0x4000
    0     0 ACCEPT     0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-NODEPORTS (1 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-PROXY-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-PROXY-FIREWALL (3 references)
 pkts bytes target     prot opt in     out     source               destination

Chain KUBE-SERVICES (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     6    --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
    0     0 REJECT     17   --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns has no endpoints */ udp dpt:53 reject-with icmp-port-unreachable
    0     0 REJECT     6    --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:53 reject-with icmp-port-unreachable
  • kubectl cluster-info
Kubernetes control plane is running at https://10.0.32.163:6443
CoreDNS is running at https://10.0.32.163:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'
  • cat /etc/resolve.conf (on my master node)
admin@ip-10-0-32-163:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "resolvectl status" to see details about the actual nameservers.
nameserver 10.0.0.2
  • kubectl -n kube-system exec dnsutils -- cat /etc/resolv.conf (on a test pod deployed to kube-system to debug dns)
admin@ip-10-0-32-163:~$ kubectl -n kube-system exec dnsutils -- cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.97.0.10
options ndots:5
  • kubectl describe cm corends -n kube-system
admin@ip-10-0-32-163:~$ kubectl describe cm coredns -n kube-system
Name:         coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
Corefile:
----
.:53 {
    log
    errors
    health {
       lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
       max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}


BinaryData
====

Events:  <none>

Cluster information:

  • Kubernetes version: v1.29.2
  • Cloud being used: (put bare-metal if not on a public cloud) AWS
    Installation method: kubeadm init --pod-network-cidr=10.96.0.0/16 --service-cidr=10.97.0.0/16 --token-ttl 0
  • Host OS: Linux ip-10-0-32-163.ap-south-1.compute.internal 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
  • CNI and version: aws-vpc-k8s-cni v1.16.4
  • CRI and version: containerd containerd.io 1.6.28 ae07eda36dd25f8a1b98dfbf587313b99c0190bb

Were you able to resolve the issue.
You cloud have checked by using “kubectl get events -n kube-system” command.
It will tell what is happening inside the cluster why the pod is not coming up.

Unfortunately No! I did more digging for the next two months. I was using the in-tree provider for aws which had been deprecated. I also learnt that since I was using kubeadm I had to manually setup providerID that aws-vpc-cni and other aws controllers use to link the current running instance to aws cloud resources. Coredns fails to boot up and throws if networking isn’t setup correctly (I tested it a ton when I moved away from aws-vpc-cni). I think a combination of these issues and the lack of my knowledge when I started learning might’ve led to the failure of the above cluster. I switched away from aws-vpc-cni entirely to cilium and never had connection problems in the cluster.

I tried although it seems a lack of proper support from AWS for self-managed clusters was the reason my above setup failed to integrate with their cloud. This problem goes beyond aws-vpc-cni and occurs in their other controllers too. Here’s an issue i filed recently for their aws-load-balancer-controller which suffers a similar fate i.e. incompatibility with self-managed (non-EKS) clusters on vanilla k8s distro (i.e. kubeadm).

In my case, I am using flannel and it works like a charm.
Coredns was not working for me. I checked the events in kubeadm and it showed due to “node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule-” taint it not coming up.
I temporarily made the node tolarent to it and the coredns works.
I am getting stuck in getting the aws-cloud-controller-manager-wngz9 to work. I am getting this error:
E0627 08:35:57.390169 1 tags.go:95] Tag “KubernetesCluster” nor “kubernetes.io/cluster/…” not found; Kubernetes may behave unexpectedly.
F0627 08:35:57.390202 1 main.go:106] Cloud provider could not be initialized: could not init cloud provider “aws”: AWS cloud failed to find ClusterID
I have placed proper tags in all the resources.
For EC2 I have given owned = kubernetes.io/cluster/kubernetes
Even for Security Group its the same.
For VPC, subnet, rt, igw i have changed the value from owned to shared and rest is the same.

Still k8s is not able to find the Cluster ID.

Also I am using out-of-tree AWS cloud provided which looks like is there latest set up that they are maintaining.
Any help will be appreciated.