Coredns pending, control-plane NotReady

Cluster information:

Kubernetes version: v1.26.1
Cloud being used: bare metal
Installation method: kubeadm
Host OS: CentOS stream 8
CNI and version: 0.3.1, flannel with RBAC integrated
CRI and version: cgroup as installed with Docker 20.10.17

My kubeadm config:

kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: v1.26.0
networking:
  podSubnet: 10.244.0.0/16  # --pod-network-cidr
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: cgroupfs

My pods:

kubectl get pods --all-namespaces -o wide
NAMESPACE      NAME                                     READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
kube-flannel   kube-flannel-ds-ph2mx                    1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>
kube-system    coredns-787d4945fb-9k5cl                 0/1     Pending   0          2d19h   <none>         <none>           <none>           <none>
kube-system    coredns-787d4945fb-vn7l6                 0/1     Pending   0          2d19h   <none>         <none>           <none>           <none>
kube-system    etcd-scc-fineci-amd                      1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>
kube-system    kube-apiserver-scc-fineci-amd            1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>
kube-system    kube-controller-manager-scc-fineci-amd   1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>
kube-system    kube-proxy-5dltp                         1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>
kube-system    kube-scheduler-scc-fineci-amd            1/1     Running   0          2d19h   141.52.72.27   scc-fineci-amd   <none>           <none>

And finally the control-plane nodes:

kubectl get nodes
NAME             STATUS     ROLES           AGE     VERSION
scc-fineci-amd   NotReady   control-plane   2d19h   v1.26.1

The problem is that coredns stays in Pending state, and the control-plane remains NotReady. Not sure what I did wrong.

Is it maybe cgroupfs? I kinds have my hands tied there cause I have a LOT of docker activity on that server that could be disrupted.

2 Likes

What is the result of kubectl describe pod coredns-787d4945fb-9k5cl -n kube-system?

Here is the output:

# kubectl describe pod coredns-787d4945fb-9k5cl -n kube-system
Name:                 coredns-787d4945fb-9k5cl
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 <none>
Labels:               k8s-app=kube-dns
                      pod-template-hash=787d4945fb
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-787d4945fb
Containers:
  coredns:
    Image:       registry.k8s.io/coredns/coredns:v1.9.3
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w2r49 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-w2r49:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  24s (x1379 over 4d18h)  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..

Warning FailedScheduling 24s (x1379 over 4d18h) default-scheduler **0/1 nodes are available**: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. . Have you deployed network plugin?

CoreDNS will always remain pending until the pod network is up. It essentially has no network to exist on until then.

2 Likes

I have flannel installed by having applied its YAML config with kubectl. By network plugin do you mean something like this GitHub - flannel-io/cni-plugin?

I did not see any requirement for it in the official K8s installation docs. Reading in its page it says it will create a flannel exec under /bin, but I already have one under /opt/cni/bin (off-path) that is linked to from /usr/bin (on-path). Will these execs conflict?

Fixed it! It was the classic ephemeral storage issue. I added some less restrictive eviction manager settings in /var/lib/kubelet/config.yaml, restarted kubelet.service, and it’s all good now.

This is the snippet added to /var/lib/kubelet/config.yaml, and which is the generally accepted remedy to an overly aggressive eviction manager:

evictionHard:
  imagefs.available: 1%
  memory.available: 100Mi
  nodefs.available: 1%
  nodefs.inodesFree: 1%

After that restart kubelet.service (prepend with sudo if not root user):
systemctl restart kubelet

Check your nodes and pods:

$> kubectl get nodes
NAME             STATUS   ROLES           AGE     VERSION
scc-fineci-amd   Ready    control-plane   4d21h   v1.26.1
$> kubectl get pods -A
NAMESPACE      NAME                                     READY   STATUS    RESTARTS       AGE
kube-flannel   kube-flannel-ds-ph2mx                    1/1     Running   1 (107m ago)   4d21h
kube-system    coredns-787d4945fb-9k5cl                 1/1     Running   0              4d21h
kube-system    coredns-787d4945fb-vn7l6                 1/1     Running   0              4d21h
kube-system    etcd-scc-fineci-amd                      1/1     Running   1 (107m ago)   4d21h
kube-system    kube-apiserver-scc-fineci-amd            1/1     Running   1 (107m ago)   4d21h
kube-system    kube-controller-manager-scc-fineci-amd   1/1     Running   1 (107m ago)   4d21h
kube-system    kube-proxy-5dltp                         1/1     Running   1 (107m ago)   4d21h
kube-system    kube-scheduler-scc-fineci-amd            1/1     Running   1 (107m ago)   4d21h

:beers:

1 Like

I got the Same problem coredns stays in Pending .
I tried the solution of adding the configuration in /var/lib/kubelet/config.yaml
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

But Still the same problem.

1 Like