Kube proxy to API Server connection refused in RKE2 HA

Cluster information:

Kubernetes version: v1.32.0+rke2r1
Installation method: RKE2 Lablabs Ansible Role RKE2
Host OS: Rocky Linux 9.3
CNI and version: CANAL
CRI and version:

Hi everyone,

I’m facing an issue with kube-proxy in my RKE2 HA cluster setup, which consists of 3 master nodes, 3 worker nodes, and an external load balancer. The kube-proxy instances on all 3 master nodes fail to connect to the API server. Below are the error logs from kube-proxy:

E0122 16:18:27.308126       1 proxier.go:733] "Error cleaning up nftables rules" 
err="could not find nftables binary: exec: 
\"nft\": executable file not found in $PATH"

E0122 16:18:27.308193       1 proxier.go:733] "Error cleaning up nftables rules"                          
err="could not find nftables binary: exec: \"nft\": executable file not found in $PATH"

E0122 16:18:27.310875       1 server.go:687] "Failed to retrieve node info" err="Get  
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect: 
connection refused"

E0122 16:18:28.392318       1 server.go:687] "Failed to retrieve node info" err="Get   
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:     
connection refused"

E0122 16:18:30.765162       1 server.go:687] "Failed to retrieve node info" err="Get   
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect:   
connection refused"

E0122 16:18:34.773599       1 server.go:687] "Failed to retrieve node info" err="Get 
\"https://127.0.0.1:6443/api/v1/nodes/master01\": dial tcp 127.0.0.1:6443: connect: 
connection refused"

When I check the process with ps aux | grep proxy, I see it using the following kubeconfig file: /var/lib/rancher/rke2/agent/kubeproxy.kubeconfig . The contents of the file are:

apiVersion: v1
clusters:
- cluster:
    server: https://127.0.0.1:6443
    certificate-authority: /var/lib/rancher/rke2/agent/server-ca.crt
  name: local
contexts:
- context:
    cluster: local
    namespace: default
    user: user
  name: Default
current-context: Default
kind: Config
preferences: {}
users:
- name: user
  user:
    client-certificate: /var/lib/rancher/rke2/agent/client-kube-proxy.crt
    client-key: /var/lib/rancher/rke2/agent/client-kube-proxy.key

Interestingly, when I manually test the API server with curl using the same certificates and URL, I DONT’T get a 403 error. Instead, I receive a valid JSON response:

curl --cacert /var/lib/rancher/rke2/agent/server-ca.crt \
     --cert /var/lib/rancher/rke2/agent/client-kube-proxy.crt \
     --key /var/lib/rancher/rke2/agent/client-kube-proxy.key \
     https://127.0.0.1:6443/api/v1/nodes/master01

This proves the certificates are valid and the API server is reachable. However, kube-proxy still cannot connect.

Additional Notes:

  • I deployed the rke2 cluster with the following Ansible role: [labLabs][1]
  • kube-proxy runs default configuration (iptables mode)
  • I changed SELinux to Permissive mode
  • By executing ss -tunlp, I do see the the API server listens to 6443:
LISTEN   0    4096    *:6443      *:*      users:(("kube-apiserver",pid=15859,fd=3))   
  • I have got the same behavior when I execute iptables flush
  • The log mentions issues with nftables: exec: “nft”: executable file not found in $PATH. Could this be related? Even if I don’t think this could be the issue as I have the same connection refused when I deploy a lower version of RKE2 which doesn’t raise the “nft” log error
  • The server address in the kubeconfig is https://127.0.0.1:6443. Should it point to the external load balancer instead?
  • What else could trigger the issue.
  • Overall the rke2-server triggers log errors no route to host when it wants send requests to internal pods addresses. I do believe it’s because kube-proxy doesn’t work properly