Hi,
Why is kubernetes communicating with the netfilter modules in the kernel by explicitly executing “iptables” and not by sending/receiving commands on AF_NETLINK sockets?
Thank you,
Cristian
Hi,
Why is kubernetes communicating with the netfilter modules in the kernel by explicitly executing “iptables” and not by sending/receiving commands on AF_NETLINK sockets?
Thank you,
Cristian
Hi,
no answer so far… I am trying to add some more details to my question.
There are two ways of interacting with the netfilter modules in the kernel:
cco@DEU1145:~$ which iptables
/usr/sbin/iptables
cco@DEU1145:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION=“Ubuntu 22.04.4 LTS”
It looks like kubernetes is interacting with the netfilter kernel modules by explicitly executing “usr/sbin/iptables” on the nodes where the respective iptables rules are needed. This involves one of the exec*() syscalls from the exec() family of syscalls; see also exec(3). It is also well known that exec() syscalls are consuming a lot of system resources.
So, my question is:
Why is kubernetes using exec*() syscalls for interacting with the netfilter modules instead of send(2)/recv(2) syscalls on AF_NETLINK sockets?
Thanks a lot,
Cristian
The only real answer is “that’s just how it was written”.
In truth, it’s FAR easier to comprehend this way. It’s trivial to try things by hand, and then bring them into the code. The man pages are very complete, unlike the alternative, and lots of examples and help can be found all over the internet. In short, this is the better approach, but I am biased.
I dispute your assertion that exec consumes a lot of resources. It’s never, not once, come up as a problem in 10 years. There are OTHER things that are tricky about using iptables, but not that. The interplay between kube-proxy and OTHER users of iptables has been a long pain point, since the actual kernel API is not as stable as you might hope for.