Kubeadm init fails with controlPlaneEndpoint

Cluster information:

Kubernetes version: v1.21.1
Cloud being used: bare-meta
Installation method: kubeadm
Host OS: Ubuntu 20.04
CNI and version: calico
CRI and version: containerd

If I add controlPlaneEndpoint: "DNS_NAME:6443" to the ClusterConfiguration.yaml when trying to initialize the cluster it fails. I believe it’s because of the cert. If I remove controlPlaneEndpoint: "DNS_NAME:6443" the cluster initializes just fine.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
	- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'

Running journalctl -xeu kubelet get’s me this information.

May 18 23:34:39 k8s-cp-1 kubelet[19261]: E0518 23:34:39.595057 19261 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get “https://kube-apiserver.suw1.trolleyesecurity.com:6443/api/v1/services?limit=500&resourceVersion=0”: x509: certificate signe>
May 18 23:34:39 k8s-cp-1 kubelet[19261]: E0518 23:34:39.688243 19261 kubelet.go:2291] “Error getting node” err=“node “k8s-cp-1” not found”

I created the cert with the same CA as my etcd certs.

 Certificate #0 ( _RSAPublicKey )
   SHA1 Fingerprint:                  1c3d3cc6cbc73d9c3a32aafa907b88899b1c3643
   Common Name:                       kube-apiserver.suw1.trolleyesecurity.com
   Issuer:                            CA
   Serial Number:                     199899588946265901135907970683992546094909252481
   Not Before:                        2021-05-18
   Not After:                         2121-04-24
   Public Key Algorithm:              _RSAPublicKey
   Signature Algorithm:               sha256
   Key Size:                          2048
   Exponent:                          65537
   DNS Subject Alternative Names:     ['kube-apiserver.suw1.trolleyesecurity.com', 'k8s-cp-1.suw1.trolleyesecurity.com', 'k8s-cp-2.suw1.trolleyesecurity.com', 'k8s-cp-3.suw1.trolleyesecurity.com']

What am I missing?

I figure out the issue or at least got it working, and would love to heard feedback if this is the right setup.

I changed the nginx proxy from HTTPS (terminating SSL/TLS) to tcp stream, not terminating SSL/TLS.

steam {
	upstream kube-apiserver-backend {
		least_conn;
		server k8s-cp-1:6443;
		server k8s-cp-2:6443;
		server k8s-cp-3:6443;
	}
	server {
		listen        6443;
		proxy_pass    kube-apiserver-backend;
		proxy_timeout 3s;
		proxy_connect_timeout 1s;
	}
}

that looks fine however – one item of caution, you’re not going to enjoy that 3s connection timeout when you get your cluster functional.

One of my common tasks when interacting with deployed pods is to hop directly into a container (e.g. kubectl exec -it – /bin/bash. Your proxy configuration will kill that session if it’s idle for 3 seconds. My HAProxy conf defaulted to 20s and that was annoying enough. It’s now 60m. :wink: