Multi-network cluster broken without --masquerade-all

In setting up a new cluster on nodes with multiple networks, things didn’t work until I set masqueradeAll for kube-proxy. As this seems like a fairly straight-forward setup, I figured I either did something wrong or missed something obvious, because it doesn’t seem like --masquerade-all should be necessary.

Network details

All nodes are on three networks:

  • E: X.X.X.0/27 external internet, default GW, public IPs (masked for privacy, bonded eth with vlan tag)
  • M: 10.1/16 management network, for ssh only, should not be used for traffic
  • I: 10.2/16 internal network, for inter-node communication (actually IP-over-IB)

At some point I noticed the node internal IPs were 10.1 on M, so I added kubelet --node-ip=10.2... for I. However, this made no difference for the problem below.

Setup the master node1 (X.X.X.8, 10.1.250.1, 10.2.250.1) with kubeadm, advertise 10.2.250.1, service cidr 10.96/12, default pod cidr (192.168…). Untainted, setup pods, services, everything working perfectly.

Joined a second node2, (X.X.X.10, 10.1.250.3, 10.2.250.3), join successful, kube-proxy runs, calcio-node fails.

calico-node on node2 tries to connect to https://10.96.0.1:443/ and times out. From tcpdumps, I see that the SYNs end up DNATed to 10.2.250.1:6443 correctly, but they come from the E X.X.X.10 address (which would be the default IP of this node, having the default gw). The calico-node pod does have the right I2 10.2.250.3 IP.

So, on I1 interface I see packets from E2 IP to I1 IP. They get no response (I’m not exactly sure why but it’s not surprising given the assymetric routing – it may drop them as martians, or respond on the E1 interface to E2 IP from the I1 IP, which probably won’t work either).

By enabling masqueradeAll, the packets now get NATed to be from I2 to I1 correctly and everything works. Should this be necessary? Is something else going wrong? Do I need to manually specify a --cluster-cidr instead?

Cluster information:

Kubernetes version: v1.19.3
Cloud being used: bare-metal
Installation method: kubeadm
Host OS: centos7
CNI and version: calico v3.16.0
CRI and version: