In a multi node kubernetes cluster, for a service call, when a Service gets the IP of the POD (from iptables DNAT), how correct Node is chosen to forward the request?

In a service call, when a service gets the backend pod IP from iptable (DNAT), how does it know which node contains the pod when the request first goes to a node where the pod is not present?

2 Likes

The request for a service call is made to the ClusterIP, the kube-proxy on the node intercepts the request. The kube-proxy manages rules in iptables or IPVS. These rules perform destination NAT (DNAT) to translate the Service IP into one of the backend Pod IPs.

I wrote about it recently if you want to take a look here

4 Likes

Yes, the Service IP translates to backend pod ip using Iptables. But suppose the node where the iptables rule is executed doesnā€™t contain the pod. The pod is available in another node. So there should be an extra step, where it decides which nodes contain the pod and forward the request there. So, question is, how this destination node is seleted? Is there any pod to node mapping? if yes, where the details can be found?

CNI is knows map relations between pods and nodes, It can forward to other node by chain of the iptables create when receive the trafic of the svc

2 Likes

Traffic Flow Explanation

  1. Client Sends Traffic to a Kubernetes Service:
  • The client sends a request to a serviceā€™s ClusterIP (e.g., 10.96.0.1) or an external IP if using NodePort or LoadBalancer.
  1. Service-Level Load Balancing in kube-proxy:
  • The kube-proxy on the receiving node intercepts this traffic.
  • It uses iptables rules (or IPVS in some configurations) to match the destination port and IP.
  • An iptables rule matches the service IP and jumps to a service-specific chain.
  1. Selection of a Pod:
  • Within the service-specific chain, a rule is applied to select a backend pod based on probability (random load balancing).
  • This rule jumps to an endpoint-specific chain.
  1. DNAT to Pod IP:
  • In the endpoint-specific chain, the traffic is DNATā€™ed:
    • The destination IP is replaced with the podā€™s IP (e.g., 192.168.1.2), and the destination port is replaced with the podā€™s port (if necessary).
    • The traffic is directed toward the selected pod.
  1. Traffic to a Pod on a Different Node:
  • If the selected pod is on a different node, kube-proxy forwards the traffic to the podā€™s node via an overlay network (e.g., Flannel, Calico) or host networking (depending on the CNI).
  1. Arrival at the Target Node:
  • On the target node, the traffic enters through the nodeā€™s network interface.
  • It bypasses kube-proxy and directly reaches the pod via the CNI, as the DNATā€™ed IP matches the pod IP.
  1. Pod Processes the Request:
  • The pod receives the request and processes it.
  1. Response Path:
  • The pod sends a response back to the original client using the same DNAT/IP translation rules to maintain connection state.

Traffic Flow Diagram

Hereā€™s a simple diagram to visualize the flow:

Client -> Service (ClusterIP) -> kube-proxy -> iptables chain:
     - Match ServiceIP:Port
     - Select Pod using probability
     - DNAT to PodIP:Port

Node A (kube-proxy):
   - DNAT traffic to PodIP (on Node B)

Node B:
   - Receive DNAT'ed traffic
   - Route to Pod
   - Pod processes request and sends response back

Key Points about DNAT:

  1. iptables and Chains: The DNAT process in kube-proxy uses iptables to rewrite the destination IP and port to the podā€™s IP and port.
  2. Probability Matching: The serviceā€™s iptables chain uses a probabilistic algorithm to ensure traffic is balanced across all endpoints (pods).
  3. Cross-Node Traffic: When the pod is on a different node, traffic is routed via the overlay network provided by the Kubernetes CNI plugin.
2 Likes