Ingress unable to reach one node after reboot, others are okay

justsomebody42 · August 9, 2021, 11:27am

I have a cluster with 4 nodes (3 raspi, 1 NUC) and have setup several different workloads. The cluster itself worked perfectly fine, so I doubt that it is a general problem with the configuration. After a reboot of all nodes the cluster came back up well and all pods are running without issues. Unfortunately, pods that are running on one of my nodes (NUC) are not reachable via ingress anymore. If I access them through kube-proxy, I can see that the pods itself run fine and the http services behave as exptected. I upgraded the NUC node from Ubuntu 20.10 from 21.04, which may be related to the issues, but is not confirmed.

When the same pods are scheduled to the other nodes everything works as expected. For pods on the NUC node, I see the following in the ingress-controller logs:

2021/08/09 09:17:28 [error] 1497#1497: *1027899 upstream timed out (110: Operation timed out) while connecting to upstream, client: 10.244.1.1, server: gitea.fritz.box, request: "GET / HTTP/2.0", upstream: "http://10.244.3.50:3000/", host: "gitea.fritz.box"

I can only assume that the problem is related to the cluster internal network and have compared iptables rules and the like, but have not found differences that seem relevant.

The NUC node is running on Ubuntu 21.04 with kube v1.21.1, the raspis run Ubuntu 20.04.2 LTS. The master node still runs v1.21.1, the two worker nodes already run v.1.22.0, which works fine.

I have found a thread that points out incompatibility between metallb and nftables (Update documentation for Debian Buster gotchas · Issue #451 · metallb/metallb · GitHub) and though it’s a bit older, I already changed to xtables as suggested (update-alternatives --set iptables /usr/sbin/iptables-legacy …) without success.

Currently I’m running out of ideas on where to look. Can anyone suggest possible issues?

Thanks in advance!

P.S.:
I posted this question on stackoverflow as well, but afterwards thought it probably fits better to ask here.

justsomebody42 · August 11, 2021, 9:41pm

Updating flannel from 13.1-rc2 to 14.0 seems to have done the trick.
Maybe some of the iptables rules were screwed and got revreated, maybe 14.0 is necessary to work with 21.04? Who knows…
I’m back up running fine and happy

Topic		Replies	Views
Only one node cannot reach the kube-apiserver General Discussions network , deployment	0	1264	February 5, 2021
Etcd cluster: one pod is not working... why? General Discussions	0	2010	January 16, 2019
Pods cannot establish connections to the outside on specific Node only General Discussions network	2	1104	November 8, 2024
Nginx-ingress-controller no access to all nodes General Discussions	1	640	May 31, 2023
Pod can not reach any domain microk8s	4	2479	February 22, 2022

Ingress unable to reach one node after reboot, others are okay

Related topics