SOLVED: `kubeadm init` kinda dies after minutes; 6443 connection refused

Cluster information:

Kubernetes version: 1.25.4

Cloud being used: bare-metal; personal laptop

Installation method: manual

Host OS: Debian GNU/Linux 11 (bullseye); Linux birl-work-laptop 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux

CNI and version: Didnt even make it that far.
CRI and version:

Reading through Creating a cluster with kubeadm for the first time. Got through the init I think:

# echo '1' > /proc/sys/net/ipv4/ip_forward ; modprobe br_netfilter ; swapoff -va ; kubeadm init
swapoff /dev/mapper/birl--laptop--2018--vg-swap_1
[init] Using Kubernetes version: v1.25.4
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: missing optional cgroups: blkio
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [birl-work-laptop kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.168]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [birl-work-laptop localhost] and IPs [10.0.0.168 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [birl-work-laptop localhost] and IPs [10.0.0.168 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 6.505112 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node birl-work-laptop as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node birl-work-laptop as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: kco2t8.mq3zud2fsqmvrssp
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Looked over the initial setup of pods:

# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubectl get pods --all-namespaces
NAMESPACE     NAME                                       READY   STATUS             RESTARTS        AGE
kube-system   coredns-565d847f94-q2cks                   0/1     Pending            0               4m57s
kube-system   coredns-565d847f94-zcsdn                   0/1     Pending            0               4m57s
kube-system   etcd-birl-work-laptop                      1/1     Running            4 (2m39s ago)   5m5s
kube-system   kube-apiserver-birl-work-laptop            1/1     Running            4 (69s ago)     5m5s
kube-system   kube-controller-manager-birl-work-laptop   0/1     Running            6 (98s ago)     5m5s
kube-system   kube-proxy-b8qq8                           0/1     CrashLoopBackOff   3 (5s ago)      4m58s
kube-system   kube-scheduler-birl-work-laptop            1/1     Running            6 (2m39s ago)   5m5s

Was looking over the CNI addons page. Decided on Calico:

# kubectl apply -f calico.yaml
The connection to the server 10.0.0.168:6443 was refused - did you specify the right host or port?

Guess I need to be faster in my decision making? :smiley:

I did journalctl -xeu kubelet for a 1000 lines of output. Here’s a head -20:

-- Journal begins at Wed 2022-11-09 10:13:49 EST, ends at Wed 2022-11-30 12:38:44 EST. --
Nov 30 12:31:02 birl-work-laptop kubelet[54504]: I1130 12:31:02.060965   54504 scope.go:115] "RemoveContainer" containerID="08576bbf095ab004519368b5a18d6f461f6db2a58f20c163a0de25b09964bc0c"
Nov 30 12:31:02 birl-work-laptop kubelet[54504]: E1130 12:31:02.061247   54504 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-b8qq8_kube-system(b4b6ae6a-2ac5-4ba8-bd0a-841bada1fdd1)\"" pod="kube-system/kube-proxy-b8qq8" podUID=b4b6ae6a-2ac5-4ba8-bd0a-841bada1fdd1
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.012391   54504 status_manager.go:667] "Failed to get status for pod" podUID=b4b6ae6a-2ac5-4ba8-bd0a-841bada1fdd1 pod="kube-system/kube-proxy-b8qq8" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-proxy-b8qq8\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.012870   54504 status_manager.go:667] "Failed to get status for pod" podUID=170c7129a4f554e546a772fa7a3e2724 pod="kube-system/etcd-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/etcd-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.013290   54504 status_manager.go:667] "Failed to get status for pod" podUID=b706d133717e6b128f68b2ae26ef5267 pod="kube-system/kube-controller-manager-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.013781   54504 status_manager.go:667] "Failed to get status for pod" podUID=0fa3a7ee79ee8742d4353760695ed280 pod="kube-system/kube-apiserver-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.014478   54504 status_manager.go:667] "Failed to get status for pod" podUID=19b646392d15d5f4dba48aa1ff548254 pod="kube-system/kube-scheduler-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.204514   54504 status_manager.go:667] "Failed to get status for pod" podUID=b706d133717e6b128f68b2ae26ef5267 pod="kube-system/kube-controller-manager-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.204839   54504 status_manager.go:667] "Failed to get status for pod" podUID=b706d133717e6b128f68b2ae26ef5267 pod="kube-system/kube-controller-manager-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.366361   54504 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="13f6ee2e94bb3eef728ef3037cdf0cda47e1086c57543d18b7fd42224a6aa750"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: I1130 12:31:05.366391   54504 scope.go:115] "RemoveContainer" containerID="4b58bfb7d77d2e8feae66ea8fc88140d3ffe42d5e7d5ddb827c08ffe9ab7f532"
Nov 30 12:31:05 birl-work-laptop kubelet[54504]: E1130 12:31:05.701345   54504 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Nov 30 12:31:06 birl-work-laptop kubelet[54504]: I1130 12:31:06.012106   54504 scope.go:115] "RemoveContainer" containerID="919bf2782a64d3a539371ffc539e5873dc1b931050fcef29c3dbffd17cd8a29a"
Nov 30 12:31:06 birl-work-laptop kubelet[54504]: E1130 12:31:06.014056   54504 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-birl-work-laptop_kube-system(0fa3a7ee79ee8742d4353760695ed280)\"" pod="kube-system/kube-apiserver-birl-work-laptop" podUID=0fa3a7ee79ee8742d4353760695ed280
Nov 30 12:31:06 birl-work-laptop kubelet[54504]: I1130 12:31:06.375531   54504 status_manager.go:667] "Failed to get status for pod" podUID=b706d133717e6b128f68b2ae26ef5267 pod="kube-system/kube-controller-manager-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:06 birl-work-laptop kubelet[54504]: E1130 12:31:06.614325   54504 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 1m20s restarting failed container=kube-controller-manager pod=kube-controller-manager-birl-work-laptop_kube-system(b706d133717e6b128f68b2ae26ef5267)\"" pod="kube-system/kube-controller-manager-birl-work-laptop" podUID=b706d133717e6b128f68b2ae26ef5267
Nov 30 12:31:06 birl-work-laptop kubelet[54504]: E1130 12:31:06.701573   54504 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.0.0.168:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/birl-work-laptop?timeout=10s": dial tcp 10.0.0.168:6443: connect: connection refused
Nov 30 12:31:07 birl-work-laptop kubelet[54504]: I1130 12:31:07.390824   54504 status_manager.go:667] "Failed to get status for pod" podUID=b706d133717e6b128f68b2ae26ef5267 pod="kube-system/kube-controller-manager-birl-work-laptop" err="Get \"https://10.0.0.168:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-birl-work-laptop\": dial tcp 10.0.0.168:6443: connect: connection refused"
Nov 30 12:31:07 birl-work-laptop kubelet[54504]: I1130 12:31:07.419439   54504 scope.go:115] "RemoveContainer" containerID="67a6ba7f6ae8439cd743a20c00e1427db52306cd3eab9673fac663c6319b9f26"

Any insight appreciated. Thanks ahead of time.

(Edit: realized I spelled the CNI “calicao”, but even with corrected spelling it fails. Also tried the kubectl as non-root too.)

When I say “kubeadm init kinda dies”, I mean that I noticed the kube-apiserver that’s listening on 6443 dies off in about 1-2 minutes, even after seeing established connections from kubelet (via lsof -Pi).

Apparently the problem was 2-fold: the misspelling of calico and kube-apiserver was not ready for a good 30s to accept my kubectl apply.

# kubectl apply -f calico.yaml
poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
deployment.apps/calico-kube-controllers created