I have a cluster with 4 nodes (3 raspi, 1 NUC) and have setup several different workloads. The cluster itself worked perfectly fine, so I doubt that it is a general problem with the configuration. After a reboot of all nodes the cluster came back up well and all pods are running without issues. Unfortunately, pods that are running on one of my nodes (NUC) are not reachable via ingress anymore. If I access them through kube-proxy, I can see that the pods itself run fine and the http services behave as exptected. I upgraded the NUC node from Ubuntu 20.10 from 21.04, which may be related to the issues, but is not confirmed.
When the same pods are scheduled to the other nodes everything works as expected. For pods on the NUC node, I see the following in the ingress-controller logs:
2021/08/09 09:17:28 [error] 1497#1497: *1027899 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.1.1, server: gitea.fritz.box, request: "GET / HTTP/2.0", upstream: "http://10.244.3.50:3000/", host: "gitea.fritz.box"
I can only assume that the problem is related to the cluster internal network and have compared iptables rules and the like, but have not found differences that seem relevant.
The NUC node is running on Ubuntu 21.04 with kube v1.21.1, the raspis run Ubuntu 20.04.2 LTS. The master node still runs v1.21.1, the two worker nodes already run v.1.22.0, which works fine.
I have found a thread that points out incompatibility between metallb and nftables (Update documentation for Debian Buster gotchas · Issue #451 · metallb/metallb · GitHub) and though it’s a bit older, I already changed to xtables as suggested (update-alternatives --set iptables /usr/sbin/iptables-legacy …) without success.
Currently I’m running out of ideas on where to look. Can anyone suggest possible issues?
Thanks in advance!
P.S.:
I posted this question on stackoverflow as well, but afterwards thought it probably fits better to ask here.