WireGuard client container can't reach server in K8s but can in Docker

Cluster information:

Kubernetes version: 1.29.3
Cloud being used: bare metal
Installation method: kubeadm
Host OS: Ubuntu Server 23.04
CNI and version: Calico v3.27.2
CRI and version: containerd 1.7.2

I am trying to run a simple WireGuard container as part of a BitTorrent combo, but I’m running into connectivity issues that are unique to Kubernetes: for example, the same configuration works perfectly in Docker.

Since the WireGuard container requires net.ipv4.conf.all.src_valid_mark=1 in client mode, and because I want IPv6 forwarding, I used the following kubeadm init config to initiate the cluster:

apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    allowed-unsafe-sysctls: "net.ipv4.conf.all.src_valid_mark,net.ipv6.conf.all.forwarding"
--- 
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration 
networking: 
  podSubnet: 192.168.0.0/16

I then deploy the following, alongside various services and an nginx gateway.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bittorrent 
  annotations:
    keel.sh/policy: all
    keel.sh/trigger: poll
    keel.sh/pollSchedule: "@hourly"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: bittorrent
  template:
    metadata:
      labels:
        app: bittorrent
    spec:
      nodeSelector:
        kubernetes.io/hostname: obsidiana
      securityContext:
        sysctls:
        - name: net.ipv4.conf.all.src_valid_mark
          value: "1"
        - name: net.ipv6.conf.all.forwarding
          value: "1"
      containers:
      - name: airvpn
        image: lscr.io/linuxserver/wireguard:latest
        livenessProbe:
          exec:
            command:
              - /bin/sh
              - -c
              - "wg show | grep -q transfer"
          initialDelaySeconds: 65
          periodSeconds: 120
        securityContext:
          privileged: true
          capabilities:
            add: ["NET_ADMIN"]
            add: ["SYS_MODULE"]
        env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        - name: TZ
          value: America/Los_Angeles
        volumeMounts:
        - name: airvpn-config
          mountPath: /etc/wireguard/
        - name: lib-modules
          mountPath: /lib/modules
        ports:
        - containerPort: 9091
          protocol: TCP
      - name: transmission
        image: lscr.io/linuxserver/transmission:latest
        livenessProbe:
          httpGet:
            path: /rpc
            port: 9091
            httpHeaders:
              - name: Authorization
                value: Basic <redacted>
        env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        - name: TZ
          value: America/Los_Angeles
        - name: USER
          valueFrom:
            secretKeyRef:
              name: transmission-secrets
              key: USER
        - name: PASS
          valueFrom:
            secretKeyRef:
              name: transmission-secrets
              key: PASS
        volumeMounts:
        - name: transmission-config
          mountPath: /config
        - name: downloads
          mountPath: /downloads
      volumes:
      - name: transmission-config
        hostPath:
          path: /srv/bittorrent/transmission/config
      - name: airvpn-config
        hostPath: 
          path: /srv/bittorrent/airvpn
      - name: lib-modules
        hostPath:
          path: /lib/modules 
      - name: downloads 
        hostPath:
          path: /downloads

with the following wg0.conf file:

[Interface]
Address = 10.145.<redacted>/32, fd7d:76ee:e68f:a993:<redacted>/128
PrivateKey = <redacted>
MTU = 1320
DNS = 10.128.0.1, fd7d:76ee:e68f:a993::1

[Peer]
PublicKey = <redacted>
PresharedKey = <redacted>
Endpoint = america3.vpn.airdns.org:1637
AllowedIPs = 0.0.0.0/0, ::/0
PersistentKeepalive = 15

I’ve also tried this with various other servers, ca3 and europe3, for example, with identical results: always works in Docker, almost never works in Kubernetes. That is, while the WireGuard client does connect on rare occasions, the vast majority of the time, the stdout of WireGuard looks like this:

Uname info: Linux bittorrent-6db8674f9-6bv9s 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
** It seems the wireguard module is already active. Skipping kernel header install and module compilation. **
** Client mode selected. **
[custom-init] No custom files found, skipping...
** Disabling CoreDNS **
** Found WG conf /config/wg_confs/wg0.conf, adding to list **
** Activating tunnel /config/wg_confs/wg0.conf **
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.00 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.20 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.44 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 1.73 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.07 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.49 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 2.99 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 3.58 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 4.30 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 5.16 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 6.19 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 7.43 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 8.92 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 10.70 seconds...
Try again: `america3.vpn.airdns.org:1637'. Trying again in 12.84 seconds...
Try again: `america3.vpn.airdns.org:1637'
Configuration parsing error
[#] ip link delete dev wg0
** Tunnel /config/wg_confs/wg0.conf failed, will stop all others! **
** All tunnels are now down. Please fix the tunnel config /config/wg_confs/wg0.conf and restart the container **
[ls.io-init] done.

Here is the functional docker-compose file for reference:

version: "3.9"
services:
  airvpn:
    image: linuxserver/wireguard:latest
    container_name: airvpn
    cap_add:
      - NET_ADMIN
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Los_Angeles
    volumes:
      - ./airvpn/wg0.conf:/config/wg0.conf
      - /lib/modules:/lib/modules
    sysctls:
      net.ipv4.conf.all.src_valid_mark: 1
      net.ipv6.conf.all.disable_ipv6: 0
    ports:
      - 9091:9091
    privileged: true
    restart: always

  transmission:
    image: linuxserver/transmission:latest
    container_name: transmission
    network_mode: service:airvpn
    depends_on:
      - airvpn
    volumes:
      - ./transmission/config:/config:rw
      - /downloads:/downloads:rw
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Los_Angeles
    env_file:
      - ./.env
    restart: always

…and the result I get every time I try to use Docker to connect instead of Kubernetes:

Uname info: Linux 5b3141a4c699 6.2.0-39-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:18:00 UTC 2023 x86_64 GNU/Linux
**** It seems the wireguard module is already active. Skipping kernel header install and module compilation. ****
**** Performing migration to new folder structure for confs. Please see the image changelog 2023-10-03 entry for more details. ****
rm: cannot remove '/config/wg0.conf': Resource busy
**** Client mode selected. ****
[custom-init] No custom files found, skipping...
**** Disabling CoreDNS ****
**** Found WG conf /config/wg_confs/wg0.conf, adding to list ****
**** Activating tunnel /config/wg_confs/wg0.conf ****
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
[#] ip -4 address add 10.145<redacted>/32 dev wg0
[#] ip -6 address add fd7d:76ee:e68f:a993:<redacted>/128 dev wg0
[#] ip link set mtu 1320 up dev wg0
[#] resolvconf -a wg0 -m 0 -x
s6-rc: fatal: unable to take locks: Resource busy
[#] wg set wg0 fwmark 5182x
[#] ip -6 route add ::/0 dev wg0 table 5182x
[#] ip -6 rule add not fwmark 5182x table 5182x
[#] ip -6 rule add table main suppress_prefixlength 0
[#] ip6tables-restore -n
[#] ip -4 route add 0.0.0.0/0 dev wg0 table 5182x
[#] ip -4 rule add not fwmark 5182x table 5182x
[#] ip -4 rule add table main suppress_prefixlength 0
[#] iptables-restore -n
**** All tunnels are now active ****
[ls.io-init] done.

My custom nettools testing pod has no trouble resolving the address:

-> % kubectl exec -it nettools-test-674f556b96-2vv5j -- nslookup america3.vpn.airdns.org
Server:		10.96.0.10
Address:	10.96.0.10#53

Non-authoritative answer:
Name:	america3.vpn.airdns.org
Address: 184.75.223.205

I’m new to Kubernetes so I’m really stumped by this one and I would appreciate any help or guidance. I’m not even sure where to start troubleshooting as I’ve never directly interacted with Calico aside from deploying it.

I should mention that, while my nettools-test deployment has no problem resolving the address, the WireGuard container itself cannot resolve kubernetes.default or ping the DNS server.

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup america3.vpn.airdns.org
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup google.com             
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1

-> % kubectl exec bittorrent-6db8674f9-mbszb -c airvpn -- ping -c3 10.96.0.10        
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.

--- 10.96.0.10 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2054ms

command terminated with exit code 1