Cluster information:
Kubernetes version: v1.32.0+rke2r1
Installation method: RKE2 Lablabs Ansible Role RKE2
Host OS: Rocky Linux 9.3
CNI and version: CANAL
CRI and version:
Hi everyone,
I’m facing an issue with kube-proxy
in my RKE2 HA cluster setup, which consists of 3 master nodes, 3 worker nodes, and an external load balancer. The kube-proxy
instances on all 3 master nodes fail to connect to the API server. Below are the error logs from kube-proxy
:
E0122 16:18:27.308126 1 proxier.go:733] "Error cleaning up nftables rules"
err="could not find nftables binary: exec:
\"nft\": executable file not found in $PATH"
E0122 16:18:27.308193 1 proxier.go:733] "Error cleaning up nftables rules"
err="could not find nftables binary: exec: \"nft\": executable file not found in $PATH"
E0122 16:18:27.310875 1 server.go:687] "Failed to retrieve node info" err="Get
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:
connection refused"
E0122 16:18:28.392318 1 server.go:687] "Failed to retrieve node info" err="Get
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:
connection refused"
E0122 16:18:30.765162 1 server.go:687] "Failed to retrieve node info" err="Get
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:
connection refused"
E0122 16:18:34.773599 1 server.go:687] "Failed to retrieve node info" err="Get
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:
connection refused"
When I check the process with ps aux | grep proxy
, I see it using the following kubeconfig file: /var/lib/rancher/rke2/agent/kubeproxy.kubeconfig
. The contents of the file are:
apiVersion: v1
clusters:
- cluster:
server: https://127.0.0.1:6443
certificate-authority: /var/lib/rancher/rke2/agent/server-ca.crt
name: local
contexts:
- context:
cluster: local
namespace: default
user: user
name: Default
current-context: Default
kind: Config
preferences: {}
users:
- name: user
user:
client-certificate: /var/lib/rancher/rke2/agent/client-kube-proxy.crt
client-key: /var/lib/rancher/rke2/agent/client-kube-proxy.key
Interestingly, when I manually test the API server with curl using the same certificates and URL, I DONT’T get a 403 error. Instead, I receive a valid JSON response:
curl --cacert /var/lib/rancher/rke2/agent/server-ca.crt \
--cert /var/lib/rancher/rke2/agent/client-kube-proxy.crt \
--key /var/lib/rancher/rke2/agent/client-kube-proxy.key \
https://127.0.0.1:6443/api/v1/nodes/master01
This proves the certificates are valid and the API server is reachable. However, kube-proxy still cannot connect.
Additional Notes:
- I deployed the rke2 cluster with the following Ansible role: [labLabs][1]
- kube-proxy runs default configuration (iptables mode)
- I changed SELinux to Permissive mode
- By executing
ss -tunlp
, I do see the the API server listens to 6443:
LISTEN 0 4096 *:6443 *:* users:(("kube-apiserver",pid=15859,fd=3))
- I have got the same behavior when I execute
iptables flush
- The log mentions issues with nftables: exec: “nft”: executable file not found in $PATH. Could this be related? Even if I don’t think this could be the issue as I have the same connection refused when I deploy a lower version of RKE2 which doesn’t raise the “nft” log error
- The server address in the kubeconfig is https://127.0.0.1:6443. Should it point to the external load balancer instead?
- What else could trigger the issue.
- Overall the rke2-server triggers log errors
no route to host
when it wants send requests to internal pods addresses. I do believe it’s becausekube-proxy
doesn’t work properly