Configuring TCP Keepalive

rittneje · May 21, 2022, 5:26pm

We are running an AKS cluster behind a firewall. The firewall severs inactive TCP connections after a few minutes, so we’d like to modify the default TCP keepalive configuration. (The default for Linux is to wait 2 hours, which is way too long.) We tried to configure net.ipv4.tcp_keepalive_time, etc. on the nodes, but unfortunately Kubernetes ignores this and our pods continue to use the original Linux defaults.

It seems our only option is to use securityContext.sysctls in every pod spec. Is that correct? Unfortunately, the TCP keepalive sysctls are not considered “safe” so it seems this would require passing --allowed-unsafe-sysctl to kubelet. Are these particular sysctls actually “unsafe”? If so, why? If not, can they be added to the default allowlist?

Note: I know we can also configure TCP keepalive in the application itself via socket options. Unfortunately, some third-party libraries/applications (e.g., boto3) do not offer any way to set these. Setting the system defaults is the only way, unfortunately.

https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

rittneje · June 1, 2022, 3:22am

Bumping this. Does anyone have an answer?

Trolldemorted · August 19, 2022, 8:05am

DId you ever find a solution for this issue?

rittneje · August 19, 2022, 10:21am

No, we did not.

miskr-instructure · June 19, 2023, 6:10pm

The really ugly workaround that may not help everyone is to run a privileged: true initContainer that sets these flags.

      initContainers:
      - name: sysctls
        image: alpine
        command:
        - "sh"
        - "-c"
        - |-
          set -x
          cd /proc/sys/net/ipv4
          echo 240  > tcp_keepalive_time
          echo 3    > tcp_keepalive_probes
          echo 10    > tcp_keepalive_intvl
          cat tcp_keepalive*
        securityContext:
          privileged: true
          runAsUser: 0

… or alternatively, using hostPath (if your PSP allows hostPath but not privileged: true):

      volumes:
      - name: proc-sys-net-ipv4
        hostPath:
          path: "/proc/sys/net/ipv4"
          readOnly: false
      initContainers:
      - name: sysctls
        image: alpine:3.15
        command:
        - "sh"
        - "-c"
        - |-
          set -x
          cd /mnt/proc-sys-net-ipv4
          echo 240  > tcp_keepalive_time
          echo 3    > tcp_keepalive_probes
          cat tcp_keepalive*
        volumeMounts:
        - name: proc-sys-net-ipv4
          mountPath: /mnt/proc-sys-net-ipv4
        securityContext:
          runAsUser: 0

I do not understand why the people who decided the “safe” sysctls consider these 3 flags “unsafe”. It’s very easy to verify that they are network-namespace-scoped with a simple unshare --map-root-user --net test on a Linux VM.

Everything you can accomplish with these flags you can also accomplish with setsockopt syscalls from an unprivileged program, but many libraries do not nicely expose the kernel API due to lazy implementation (hence the need to be able to set this in the network namespace of the kernel instead).

I am guessing the people who implemented sysctls for kubernetes did not do a study of each net.ipv4.* flag individually, just picked a few that the stakeholders of the feature wanted and left the rest in the dust as “unsafe”.

Topic		Replies	Views
Sysctl for k8s General Discussions	0	831	December 17, 2019
How to apply --config flag to update kubelet-config.json General Discussions	1	1183	April 12, 2023
Container has net.ipv4.ip_forward enabled but when run it is disabled General Discussions security , network	1	5433	November 12, 2022
Unsafe sysctl attributes microk8s	1	871	May 29, 2022
TCP Tuning for Microk8s Node microk8s	0	395	May 29, 2022

Configuring TCP Keepalive

Related topics