I deployed a Kubernetes cluster using Kubespray with the default settings on three baremetal servers. The master node and one of the workers are working fine, but another worker node is having issues with connecting to the Kubernetes service at 10.233.0.1
.
I can curl -k https://10.233.0.1:443/api/
from two nodes, but it times out from a problematic worker. Here is the message on the broken worker node:
cloud-user@ubuntuworker:~$ curl -v -k https://10.233.0.1:443/api/
* Trying 10.233.0.1:443...
* TCP_NODELAY set
* Connected to 10.233.0.1 (10.233.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* Operation timed out after 10001 milliseconds with 0 out of 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 100001 milliseconds with 0 out of 0 bytes received
I tried deploying the cluster with Flannel instead of Calico, but the issue persists. It is worth noting that I can curl -k https://MASTER_NODE_IP:6443/api/
from all of the nodes. It is only the Kubernetes ClusterIP (10.233.0.1
) that has issues from this single worker. The problematic node has Ubuntu 20.04, but the other two nodes are CentOS 7. I believe I have turned off the firewalls on all of the nodes. Any advice on how I can debug this is much appreciated.
Cluster information:
Kubernetes version: client/server v1.19.7
Cloud being used: bare-metal
Installation method: kubespray
Host OS: Ubuntu 20.04 and CentOS 7 nodes
CNI and version: Calico v3.16.6
CRI and version: Docker 19.03.14