Routing behaviour for K3S leaving POD behind SNAT kube-router

Apologies as I don’t know the correct wording for some of these concepts

Basic Setup

Host OS - is Debian, but as part of an appliance TrueNAS Scale by IXSystems. This is a baremetal deployment
Pod is a Truecharts Pod deployed on IP 172.16.x.y/16
NAS is on 192.168.38.32/24
Gateway is 192.168.38.15/24
DNS Server is on 192.168.38.10/24

To get to serives on a Pod I have to port forward from the NAS 192.168.38.32 address to the port on the pod. I don’t actually get to see the IP address of the POD but I do know its 172.16.x.y/16. The actual address doeesn’t matter anyhow for this discussion

I am getting what I consider wierd / faulty routing behaviour from pods when attempting (in this example) to resolve DNS using the DNS Server on the local network. I am trying to find out if this is a bug or by design, and if its a bug then whose bug. I am in discussions with IXSystems who are telling me one thing, but to me its highly illogical and incorrect behaviour.

If I run a traceroute from a pod 172.16.x.y this first goes to 172.16.0.1 - what is being described as the kube-router - I consider this correct
I am told that the kube-router then SNAT’s the packet and presumably maintains a state table so that return packets know where to go.
The packet then goes to the default gateway of the LAN - 192.168.38.15/24 - I consider this incorrect
The packet is then re-directed back out of the same interface to 192.168.38.10/24
Traceroute complete
image

I have a problem with the routing of the packet after the SNAT.

  • Destination should be 192.168.38.10
  • Source should be 192.168.38.32 assuming SNAT is happenning - and I believe it is
  • So the destination is on net - and should go direct to 192.168.38.10 using the routing table from the host OS. But this would appear to be bypassed and the packet sent to the DG on the main LAN which has to redirect the traffic back to the correct destination.
    The explanation for this behaviour is that “The destination, once it leaves the NAS is not 192.168.38.10, it’s 192.168.38.1. kube-router doesn’t know how to get to 192.168.38.10 so it forwards it to the default gateway of the interface on the 192.168.38 subnet” an explanation that I consider highly suspicious as a packet has 2 address fields, 1 source and 1 destination. It does not have a third address field for router. Thus if the packet has a destination of 192.168.38.15 then thats where it would stop.
    Unless DNAT is involved then the destination never changes

SNAT has to be involved as no device other than the host OS knows where 172.16.x.y is. The DG certainly doesn’t and neither does the DNS Server

I found this behaviour because I implemented policy based routing on the gateway which was grabbing the traffic and sending it out the wrong interface. Its an easy enough work around but the issue is that all traffic destined for 192.168.38.0/24 traffic is going via the gateway - hardly optimal

I did notice a previous discussion that got no traction where someone pointed out that his firewall people were complaining about the FW being hit by vast amounts of traffic to be directed back onto the LAN - but there no responses and that was K8S (and to be honest I have no idea what the difference is)

Anyone, no-one?