Hi you all,
yesterday I started to “install” a brand new cluster via kubeadm but after some seconds it continuously goes down.
the cluster is formed by 3 control plane nodes and, by now, only two dedicated working nodes
I tried to start “kubeadm init…” on each of control plane nodes.
It is enough to wait some seconds…
Here’s the same command I used with many other ( working ) clusters:
KUBE_VERSIONE=1.28.0
CP1=cp-1
CP1_IP=192.168.17.11
kubeadm init --control-plane-endpoint=$CP1_IP --apiserver-advertise-address=$CP1_IP --apiserver-bind-port=6443 --apiserver-cert-extra-sans=$CP1_IP --pod-network-cidr=10.244.0.0/16 --kubernetes-version=$KUBE_VERSIONE --node-name=$CP1 --cert-dir=/etc/kubernetes/pki --cri-socket=/run/containerd/containerd.sock --image-repository=registry.k8s.io --upload-certs
root@cp-1:~# kubectl create -f custom-resources.yaml
installation.operator.tigera.io/default created
apiserver.operator.tigera.io/default created
root@cp-1:~# k describe no cp-1
Name: cp-1
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=cp-1
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 06 Oct 2023 10:13:36 +0200
Taints: node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: cp-1
AcquireTime:
RenewTime: Fri, 06 Oct 2023 10:15:39 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
MemoryPressure False Fri, 06 Oct 2023 10:14:21 +0200 Fri, 06 Oct 2023 10:13:31 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 06 Oct 2023 10:14:21 +0200 Fri, 06 Oct 2023 10:13:31 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 06 Oct 2023 10:14:21 +0200 Fri, 06 Oct 2023 10:13:31 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Fri, 06 Oct 2023 10:14:21 +0200 Fri, 06 Oct 2023 10:13:31 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
InternalIP: 192.168.17.11
Hostname: cp-1
Capacity:
cpu: 4
ephemeral-storage: 39937312Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8092168Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 36806226679
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7989768Ki
pods: 110
System Info:
Machine ID: 165fab314945431d86578c6e879dd8f5
System UUID: 4207de69-e688-30cb-5062-2bb88b6ec64a
Boot ID: e66bdde2-8aa3-42c2-b2c2-319b050d7f33
Kernel Version: 5.15.0-86-generic
OS Image: Ubuntu 22.04.3 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.24
Kubelet Version: v1.28.0
Kube-Proxy Version: v1.28.0
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
kube-system etcd-cp-1 100m (2%) 0 (0%) 100Mi (1%) 0 (0%) 2m4s
kube-system kube-apiserver-cp-1 250m (6%) 0 (0%) 0 (0%) 0 (0%) 84s
kube-system kube-controller-manager-cp-1 200m (5%) 0 (0%) 0 (0%) 0 (0%) 2m2s
kube-system kube-proxy-2sjnd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 62s
kube-system kube-scheduler-cp-1 100m (2%) 0 (0%) 0 (0%) 0 (0%) 23s
tigera-operator tigera-operator-94d7f7696-ts59d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 42s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
cpu 650m (16%) 0 (0%)
memory 100Mi (1%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
Normal Starting 61s kube-proxy
Normal Starting 59s kube-proxy
Normal NodeHasSufficientMemory 2m11s (x8 over 2m11s) kubelet Node cp-1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m11s (x7 over 2m11s) kubelet Node cp-1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m11s (x7 over 2m11s) kubelet Node cp-1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m11s kubelet Updated Node Allocatable limit across pods
Warning InvalidDiskCapacity 2m1s kubelet invalid capacity 0 on image filesystem
Normal Starting 2m1s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 2m kubelet Node cp-1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 2m kubelet Node cp-1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 2m kubelet Node cp-1 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m kubelet Updated Node Allocatable limit across pods
Normal RegisteredNode 63s node-controller Node cp-1 event: Registered Node cp-1 in Controller
here you can find some dumps from kubelet
Oct 06 10:32:08 cp-1 kubelet[162397]: I1006 10:32:08.072000 162397 scope.go:117] “RemoveContainer” containerID=“b5743bcd9b21786be0a7e83ac18246b8b54867febf6a5966160abc8fd83b71e3”
Oct 06 10:32:08 cp-1 kubelet[162397]: E1006 10:32:08.072570 162397 pod_workers.go:1300] “Error syncing pod, skipping” err=“failed to "StartContainer" for "kube-scheduler" with CrashLoopBackOff: "back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-cp-1_kube-system(c93d408edd64d1a82383021a45676636)"” pod=“kube-system/kube-scheduler-cp-1” podUID=“c93d408edd64d1a82383021a45676636”
Oct 06 10:32:08 cp-1 kubelet[162397]: I1006 10:32:08.112308 162397 scope.go:117] “RemoveContainer” containerID=“cc7217087a8d2e720caa5abecf0b16a645d5ff2d181815cdf103d8c3f19b9297”
Oct 06 10:32:08 cp-1 kubelet[162397]: E1006 10:32:08.112781 162397 pod_workers.go:1300] “Error syncing pod, skipping” err=“failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-2sjnd_kube-system(598f8acf-57b8-4738-9e7b-2e8a873e3174)"” pod=“kube-system/kube-proxy-2sjnd” podUID=“598f8acf-57b8-4738-9e7b-2e8a873e3174”
Oct 06 10:32:08 cp-1 kubelet[162397]: E1006 10:32:08.253233 162397 event.go:289] Unable to write event: ‘&v1.Event{TypeMeta:v1.TypeMeta{Kind:“”, APIVersion:“”}, ObjectMeta:v1.ObjectMeta{Name:“tigera-operator-94d7f7696-ts59d.178b76a9a5007811”, GenerateName:“”, Namespace:“tigera-operator”, SelfLink:“”, UID:“”, ResourceVersion:“774”, Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:v1.OwnerReference(nil), Finalizers:string(nil), ManagedFields:v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:“Pod”, Namespace:“tigera-operator”, Name:“tigera-operator-94d7f7696-ts59d”, UID:“d503f3d3-950e-4e7a-b40d-6e7dc2a6d83b”, APIVersion:“v1”, ResourceVersion:“524”, FieldPath:“spec.containers{tigera-operator}”}, Reason:“BackOff”, Message:“Back-off restarting failed container tigera-operator in pod tigera-operator-94d7f7696-ts59d_tigera-operator(d503f3d3-950e-4e7a-b40d-6e7dc2a6d83b)”, Source:v1.EventSource{Component:“kubelet”, Host:“cp-1”}, FirstTimestamp:time.Date(2023, time.October, 6, 10, 15, 55, 0, time.Local), LastTimestamp:time.Date(2023, time.October, 6, 10, 25, 49, 72957407, time.Local), Count:45, Type:“Warning”, EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:“”, Related:(*v1.ObjectReference)(nil), ReportingController:“kubelet”, ReportingInstance:“cp-1”}’: ‘Patch “https://192.168.17.11:6443/api/v1/namespaces/tigera-operator/events/tigera-operator-94d7f7696-ts59d.178b76a9a5007811”: dial tcp 192.168.17.11:6443: connect: connection refused’(may retry after sleeping)
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.063764 162397 kubelet_node_status.go:540] “Error updating node status, will retry” err=“error getting node "cp-1": Get "https://192.168.17.11:6443/api/v1/nodes/cp-1?resourceVersion=0&timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.064659 162397 kubelet_node_status.go:540] “Error updating node status, will retry” err=“error getting node "cp-1": Get "https://192.168.17.11:6443/api/v1/nodes/cp-1?timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.065370 162397 kubelet_node_status.go:540] “Error updating node status, will retry” err=“error getting node "cp-1": Get "https://192.168.17.11:6443/api/v1/nodes/cp-1?timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.066000 162397 kubelet_node_status.go:540] “Error updating node status, will retry” err=“error getting node "cp-1": Get "https://192.168.17.11:6443/api/v1/nodes/cp-1?timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.066632 162397 kubelet_node_status.go:540] “Error updating node status, will retry” err=“error getting node "cp-1": Get "https://192.168.17.11:6443/api/v1/nodes/cp-1?timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.066678 162397 kubelet_node_status.go:527] “Unable to update node status” err=“update node status exceeds retry count”
Oct 06 10:32:10 cp-1 kubelet[162397]: E1006 10:32:10.190536 162397 controller.go:146] “Failed to ensure lease exists, will retry” err=“Get "https://192.168.17.11:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/cp-1?timeout=10s\”: dial tcp 192.168.17.11:6443: connect: connection refused" interval=“7s”
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.077548 162397 status_manager.go:853] “Failed to get status for pod” podUID=“598f8acf-57b8-4738-9e7b-2e8a873e3174” pod=“kube-system/kube-proxy-2sjnd” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/kube-system/pods/kube-proxy-2sjnd\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.079283 162397 status_manager.go:853] “Failed to get status for pod” podUID=“d503f3d3-950e-4e7a-b40d-6e7dc2a6d83b” pod=“tigera-operator/tigera-operator-94d7f7696-ts59d” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/tigera-operator/pods/tigera-operator-94d7f7696-ts59d\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.080319 162397 status_manager.go:853] “Failed to get status for pod” podUID=“b9d3980e82268177cc646c7de29f9a83” pod=“kube-system/kube-controller-manager-cp-1” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-cp-1\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.080969 162397 status_manager.go:853] “Failed to get status for pod” podUID=“96622ec9e0bcf3e14f8e635f1b525c92” pod=“kube-system/kube-apiserver-cp-1” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-cp-1\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.081448 162397 status_manager.go:853] “Failed to get status for pod” podUID=“0d3a7de83df9718fc14b197959ebdfa8” pod=“kube-system/etcd-cp-1” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/kube-system/pods/etcd-cp-1\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: I1006 10:32:11.082036 162397 status_manager.go:853] “Failed to get status for pod” podUID=“c93d408edd64d1a82383021a45676636” pod=“kube-system/kube-scheduler-cp-1” err=“Get "https://192.168.17.11:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-cp-1\”: dial tcp 192.168.17.11:6443: connect: connection refused"
Oct 06 10:32:11 cp-1 kubelet[162397]: E1006 10:32:11.687130 162397 kubelet.go:2855] “Container runtime network not ready” networkReady=“NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized”
Did anybody faced something similar?
thanks in advance
Cluster information:
Kubernetes version:
Cloud being used: bare-metal ( on vmWare hypervisor )
Installation method: kubeadm
Host OS: Ubuntu 22.04.3 LTS
CNI and version: calico/v3.26.1
CRI and version: containerd 1.6.24