CoreDNS is stuck in CrashLoopBackOff

hey, I have setup the following cluster with kubeadm kubectl get pods --all-namespaces

NAMESPACE      NAME                              READY   STATUS             RESTARTS        AGE
kube-flannel   kube-flannel-ds-h969m             1/1     Running            1 (45h ago)     45h
kube-system    coredns-55cb58b774-2ht9j          0/1     CrashLoopBackOff   551 (66s ago)   45h
kube-system    coredns-55cb58b774-xbtmb          0/1     CrashLoopBackOff   551 (72s ago)   45h
kube-system    etcd-michael                      1/1     Running            68 (45h ago)    45h
kube-system    kube-apiserver-michael            1/1     Running            73 (45h ago)    45h
kube-system    kube-controller-manager-michael   1/1     Running            6 (45h ago)     45h
kube-system    kube-proxy-hc75n                  1/1     Running            7 (45h ago)     45h
kube-system    kube-scheduler-michael            1/1     Running            78 (45h ago)    45h

The problem is that coredns keeps crashing.

Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Warning  BackOff  16m (x13377 over 45h)  kubelet  Back-off restarting failed container coredns in pod coredns-55cb58b774-2ht9j_kube-system(03fd6415-8b41-4412-ad09-c4fc2bccb762)
  Normal   Pulled   11m (x4 over 12m)      kubelet  Container image "registry.k8s.io/coredns/coredns:v1.11.3" already present on machine
  Normal   Created  11m (x4 over 12m)      kubelet  Created container coredns
  Warning  Failed   11m (x4 over 12m)      kubelet  Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/coredns": permission denied: unknown
  Warning  BackOff  2m44s (x57 over 12m)   kubelet  Back-off restarting failed container coredns in pod coredns-55cb58b774-2ht9j_kube-system(03fd6415-8b41-4412-ad09-c4fc2bccb762)

and kubectl logs -n kube-system coredns-55cb58b774-xbtmb returns nothing.

Does anyone have an idea what goes wrong and how I should go about debugging/fixing this? Thanks!

It does seem like the file exists and has execute permissions but I’m guessing something is wrong in the container?

sudo ls -l /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/30/fs/coredns
-rwx------ 1 root root 59035648 jul 29 19:17 /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/30/fs/coredns

Cluster information:

Kubernetes version: v1.30.6
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: Ubuntu 22.04.2 LTS x86_64
CNI and version: flannel v0.26.1
CRI and version: containerd 1.7.18

Hi, Simon

Generally
CrashLoopBackOff When a container crashes, Kubernetes tries to restart it automatically. However, if the container keeps crashing, Kubernetes introduces an exponential backoff delay between restart attempts. This means that the time between each restart increases gradually (e.g. 1 second, 2 seconds, 4 seconds, etc.) until a maximum is reached. Finally, if the container keeps crashing, Kubernetes enters a CrashLoopBackOff state that prevents further restart attempts for a certain period of time. This mechanism is designed to protect cluster resources and prevent excessive pressure on the system.
But
In my opinion, to fix the CrashLoopBackOff issue in CoreDNS in a Kubernetes cluster, first of all 1. Check permissions and execution status: Make sure that the CoreDNS binary has the correct permissions and is executable. You can check this by running sudo ls -l /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/30/fs/coredns . 2. Check CoreDNS configuration: Check the CoreDNS configuration in ConfigMap. Identify any misconfigurations, especially regarding DNS resolution, by running kubectl -n kube-system edit configmap coredns . 3. Check /etc/resolv.conf: Make sure this file points to external DNS servers (such as 8.8.8.8) and not 127.0.0.1. If you are using systemd-resolved, make sure it is configured correctly. 4. Allow privilege escalation: CoreDNS may require higher privileges to function properly. You can modify the deployment to allow privilege escalation. 5. Restart CoreDNS pods: After making the changes, delete the existing CoreDNS pods so that they are recreated with the new settings. 6. Finally, if that doesn’t work, I suggest you try Calico.

I think you may want the -p option of kubectl logs, but I haven’t been debugging such in awhile. Might also check journalctl.