Kubeadm init gets stuck with "Error writing Crisocket information for the control-plane node"

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.16.3
Cloud being used: bare-metal
Installation method: Gentoo Packages
Host OS: Gentoo Linux
CNI and version: none yet
CRI and version: using docker 19.03.5

Cluster initialization gets stuck outputting the given error message. Looks similar to many bugs/wishes/questions I found, but no answer helped yet. Maybe related to [1], not sure, but answer is missing over there…

kubelet was started using args “–cgroup-driver=cgroupfs --fail-swap-on=false --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=my-domain.com --container-runtime-endpoint=/var/
run/docker/containerd/containerd.sock --v=9”; output of kubelet is given in [2]

kubeadm was started using " kubeadm init --ignore-preflight-errors=Swap,Port-10250 --apiserver-cert-extra-sans my-hostname --v=9 --node-name my-domain.com --cri-socket=/var/run/docker/containerd/conta
inerd.sock > kubeadm.log 2>&1; output is given in [3]

I did most tests with and without activated swap, which didn’t have any effect on the specific problem. And I cannot use systemd cgroups, since there is no systemd on the particular maschine.

I suspected a network/iptables problem, but logs say the api-service is questioned but does not know the local node (does kubaadm init generate any entries for the local maschine?). I’m kind of stuck, too…

[1] (help needed) unable to init cluster - fails on CRI Upload
[2] https://drive.google.com/file/d/1ddnEYbLG-ApsSftUhbpr0LdxsZ6enzU-/view?usp=sharing
[3] https://drive.google.com/file/d/11ij1RsMJXaaQaUm3dlI6CKz3187htRaG/view?usp=sharing

I think the main missing thing here is not supporting OpenRC in kubeadm ([1]). Probably I’ll just have to wait since I cannot switch init systems that easy/fast. Or I’ll have to go the hard way to manage install/configuration myself…

[1] https://github.com/kubernetes/kubeadm/issues/1295

OK, since this took me far too long to figure out and the docs didn’t help that much, here’s how it works (on my maschine :smiley:)

(you can omit all the --v=9 stuff, but helps enormously tracking down problems)
(I have activated swap on my maschine; that is not supported, I know, so you should probably not use swap)

  1. Start kubelet:
    kubelet --cgroup-driver=cgroupfs --fail-swap-on=false --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=${HOSTNAME} --v=9
    That --pod-manifest-path=/etc/kubernetes/manifests is essentially important, since otherwise the downloaded docker images will never be started

  2. Run kubeadmin:
    kubeadm init --ignore-preflight-errors=Swap,Port-10250 --v=9 --node-name ${HOSTNAME} > kubeadm.log 2>&1
    It will get stuck, and the kubeadm log will repeatedly print

I1204 15:52:08.196086 18835 round_trippers.go:443] GET https://1.2.3.4:6443/api/v1/nodes/${HOSTNAME} 404 Not Found in 2 milliseconds
I1204 15:52:08.196118 18835 round_trippers.go:449] Response Headers:
I1204 15:52:08.196154 18835 round_trippers.go:452] Cache-Control: no-cache, private
I1204 15:52:08.196171 18835 round_trippers.go:452] Content-Type: application/json
I1204 15:52:08.196186 18835 round_trippers.go:452] Content-Length: 178
I1204 15:52:08.196201 18835 round_trippers.go:452] Date: Wed, 04 Dec 2019 14:52:08 GMT
I1204 15:52:08.196241 18835 request.go:968] Response Body: {“kind”:“Status”,“apiVersion”:“v1”,“metadata”:{},“status”:“Failure”,“message”:“nodes "${HOSTNAME}" not found”,“reason”:“NotFound”,“details”:{“name”:“${HOSTNAME}”,“kind”:“nodes”},“code”:404}

  1. Restart kubelet
    kubelet --cgroup-driver=cgroupfs --fail-swap-on=false --pod-manifest-path=/etc/kubernetes/manifests --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --hostname-override=${HOSTNAME}
    The different options are mandatory, and will make the system a node known by the kube (as that was the error before)

  2. Finish the work that should have been done by kubeadm if it supported OpenRC
    kubeadm init phase upload-config all --v=9 → it got stuck there, so we restart here…
    kubeadm init phase mark-control-plane --v=9
    kubeadm init phase bootstrap-token --v=9
    kubeadm init phase addon all --v=9

That’s it for the init in OpenRC. If it only had been documented somewhere…

PS: For testing purposes, if you had a failed attempt to init a cluster kubeadm reset -f is your friend. Don’t forget to also remove ${HOME}/.kube, and /var/run/dockershim.sock (it gets generated by kubelet if you’re using docker as cri, but never deleted after stopping the process, so better remove it…)

Best Jan