Flannel CNI fails to create route to service CIDR, pods cannot contact API server

georges_clemenceau · July 27, 2024, 4:06pm

I recently created a Kubernetes cluster with kubeadm init --pod-network-cidr=10.0.0.0/16 and have attempted to install flannel as my CNI by kubectl apply -fing the most recent kube-flannel.yaml file from their GitHub page. (I do change the “Network” key in “data.net-conf.json” to reflect my pod CIDR of 10.0.0.0/16.) However, after doing so and after having systemctl restarted containerd and kubelet, I have found that Kubernetes pods are unable to connect with the Kubernetes API server or seemingly any other service IP. Running route -n from within a busybox pod shows the following output.

/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.0        10.0.0.1        255.255.0.0     UG    0      0        0 eth0

This looks incorrect to me for two reasons. Firstly, there are no routes corresponding to the service CIDR 10.96.0.0/16, which explains why nodes cannot access the Kubernetes API server. Secondly, all routes point towards the interface eth0 rather than a CNI-specific interface like cni0. In fact, running ifconfig shows only two interfaces inside the busybox pod: eth0 and lo.

I have explored a few potential fixes for this problem. Based on this GitHub issue, I have switched the “Backend.Type” key in data.net-conf.json from “vxlan” to “host-gw”, which has not fixed the problem. I have also ran iptables -P INPUT ACCEPT && iptables -P FORWARD ACCEPT && iptables -P OUTPUT ACCEPT as well as sudo systemctl stop firewalld on the host machine to rule out any firewall issues. Neither has resolved the problem. I give the flannel pod’s logs at the bottom of the post, in case they may be helpful.

I should also mention that I am attempting to install flannel after previously having installed Calico. When uninstalling calico, I completed a full cluster reset (kubeadm reset) and ran rm -rf /etc/cni/net.d, so I don’t think that the deleted Calico installation is interfering with my flannel one, but it is a possibility.

Can anyone recommend potential fixes to this issue? I would appreciate any suggestion. Thanks.

Cluster information:

Kubernetes version: k8s v.130.3
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: RHEL 7.9
CNI and version: flannel v 0.25.5 (flannel/Documentation/kube-flannel.yml at master · flannel-io/flannel · GitHub)
CRI and version: containerd 1.6.33

You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

Flannel Logs:

$ kubectl logs kube-flannel-ds-7ln52 -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
I0726 14:17:47.170091       1 main.go:211] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W0726 14:17:47.170246       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0726 14:17:47.182362       1 kube.go:139] Waiting 10m0s for node controller to sync
I0726 14:17:47.182429       1 kube.go:469] Starting kube subnet manager
I0726 14:17:48.182920       1 kube.go:146] Node controller sync successful
I0726 14:17:48.182987       1 main.go:231] Created subnet manager: Kubernetes Subnet Manager - [REDACTED]
I0726 14:17:48.182996       1 main.go:234] Installing signal handlers
I0726 14:17:48.183170       1 main.go:452] Found network config - Backend type: host-gw
I0726 14:17:48.186311       1 kube.go:669] List of node([REDACTED]) annotations: map[string]string{"kubeadm.alpha.kubernetes.io/cri-socket":"unix:///var/run/containerd/containerd.sock", "node.alpha.kubernetes.io/ttl":"0", "volumes.kubernetes.io/controller-managed-attach-detach":"true"}
I0726 14:17:48.186382       1 match.go:210] Determining IP address of default interface
I0726 14:17:48.187839       1 match.go:263] Using interface with name eno1 and address 10.129.37.221
I0726 14:17:48.187878       1 match.go:285] Defaulting external address to interface address (10.129.37.221)
I0726 14:17:48.195066       1 iptables.go:51] Starting flannel in iptables mode...
I0726 14:17:48.195863       1 kube.go:490] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.0.0.0/24]
W0726 14:17:48.196107       1 main.go:540] no subnet found for key: FLANNEL_IPV6_NETWORK in file: /run/flannel/subnet.env
W0726 14:17:48.196154       1 main.go:540] no subnet found for key: FLANNEL_IPV6_SUBNET in file: /run/flannel/subnet.env
I0726 14:17:48.196169       1 iptables.go:125] Setting up masking rules
I0726 14:17:48.223989       1 iptables.go:226] Changing default FORWARD chain policy to ACCEPT
I0726 14:17:48.231297       1 main.go:396] Wrote subnet file to /run/flannel/subnet.env
I0726 14:17:48.231320       1 main.go:400] Running backend.
I0726 14:17:48.231449       1 route_network.go:56] Watching for new subnet leases
I0726 14:17:48.238159       1 main.go:421] Waiting for all goroutines to exit
I0726 14:17:48.252821       1 iptables.go:372] bootstrap done
I0726 14:17:48.264594       1 iptables.go:372] bootstrap done

georges_clemenceau · July 30, 2024, 6:14pm

The solution which ultimately resolved this required performing a kubeadm reset and rebooting.

kubeadm reset
systemctl stop kubelet containerd
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start kubelet containerd

(See "Failed to setup network for pod \ using network plugins \"cni\": no IP addresses available in network: podnet; Skipping pod" · Issue #39557 · kubernetes/kubernetes · GitHub for more info.) However, I had to reinitialize the cluster, reapply flannel, and reboot before anything worked correctly. I have no idea why this was necessary.

Topic		Replies	Views
K8s cni plugin issue: cni plugin not initialized while kube-flannel is running General Discussions network	4	19527	November 29, 2024
About installation，network plugin，flannel General Discussions	2	2604	March 28, 2023
Master cannot access nodeport service on k8s 1.27 General Discussions development , network	1	685	March 14, 2024
Fail to connect to pod network General Discussions network	0	517	January 8, 2024
Can't get DNS to work on a Raspberry Pi cluster General Discussions	0	646	November 19, 2020

Flannel CNI fails to create route to service CIDR, pods cannot contact API server

Cluster information:

Flannel Logs:

Related topics