Client early failover in case of the node failure for long lived TCP connections

Looking for guidance on TCP connection failover behavior with Kubernetes Services + MetalLB.

Setup:

  • Kubernetes cluster with 3 worker nodes
  • DaemonSet running one pod per node
  • Service type = LoadBalancer
  • MetalLB in L2 mode
  • externalTrafficPolicy: Cluster
  • Clients establish long-lived TCP connections to the Service VIP
  • Clients are external and not under our control

Example flow:

Client
  |
  v
VIP (owned by Node1)
  |
  v
kube-proxy
  |
  v
Pod on Node3

Failure scenario:

  1. Client establishes a TCP connection to the Service VIP.
  2. MetalLB advertises the VIP from Node1.
  3. kube-proxy selects a backend pod running on Node3.
  4. Node3 crashes (or becomes unreachable).
  5. The existing TCP connection becomes unusable.
  6. The client does not establish a new connection for ~60 seconds (appears to be waiting on TCP timeout/retransmission behavior).

Question:

Is there any Kubernetes networking mechanism (Service, kube-proxy, conntrack tuning, MetalLB configuration, etc.) that can reduce the failure detection time for an already-established TCP connection when the selected backend node disappears?

More specifically:

  • Can Kubernetes/MetalLB cause the client to receive a faster TCP failure indication (RST/ICMP/etc.) when the backend node hosting the selected endpoint dies?
  • Is the ~60 second wait fundamentally a client TCP behavior once the backend connection state is lost?
  • Would moving to MetalLB BGP mode with externalTrafficPolicy: Local change anything for existing TCP sessions, or only improve routing of new connections after node failure?

My current understanding is that Kubernetes can help steer new connections away from failed endpoints, but cannot accelerate failure detection of an already-established TCP session when the endpoint node hard-crashes. Looking to confirm whether that’s correct or if I’m missing any networking-level options.