Deprecated topologyKeys alternative for node locality

Hello,

Kubernetes documentation has following statement regarding service’s topologyKeys field:

Note: This feature, specifically the alpha topologyKeys API, is deprecated since Kubernetes v1.21. Topology Aware Hints, introduced in Kubernetes v1.21, provide similar functionality.

The new mechanism is a different way of tackling kubernetes availability zones, but does not provide all functionality topologyKeys did, especially with host-based routing.

Consider following example: in our kubernetes clusters we have a proxy meant to forward traffic to other nodes. This proxy runs as a DaemonSet, so it is present on each node. Traffic to this proxy goes through ClusterIP service containing following snippet:

spec:
  topologyKeys:
  - kubernetes.io/hostname
  - '*'

With this configuration the traffic always reaches local instance of the proxy and then is forwarded to pods, possibly on other nodes. When current instance is not present, i.e during rolling update, the traffic originating from this node only is directed to other machines.

This works very well as it minimizes traffic between nodes and decreases latency. With heavy loaded nodes there is a significant gain. From kubernetes 1.22 on this is no more possible.

If we wanted to retain this optimization we would need to (ab)use new Topology Aware Hints labbeling each node with different zone. This would probably work in steady state, but during rolling update (or higher allocatable cpu differences) will disable optimization on all nodes.

Is there a better way of doing this? Will Topology Aware Hints support host-based routing in addition to zoned one?

Or maybe using such optimizations is considered a bad practice and should be avoided?

I’m looking for general advice on this topic as I’m hesitating to create a GitHub issue in kubernetes project?


Regards,
Bartosz Borkowski

2 Likes

We’re working on internalTrafficPolicy as a way to express “use endpoints on this node”.

Thanks for your response.

I think you mentioned feature described here: Service Internal Traffic Policy | Kubernetes

This allows to use local endpoints indeed, but it does not have fallback behaviour, so when local endpoint does not exist, the service just doesn’t work. It looks just like externalTrafficPolicy: Local - when there are no local endpoints it behaves as a black hole to the packets and healthchecks must be used to disable such upstream in the clients (and it seems we don’t have healtchecks for internal counterpart).

Such behaviour makes both policies hard to use or even unusable in some scenarios. Also DaemonSet updates are problematic.

It would be great if we could have fallback policies for both internalTrafficPolicy and externalTrafficPolicy, such as LocalWithFallback or PreferLocal. This could work just like topologyKeys set mentioned in original post. Such change would be backwards compatible and would allow users to change behaviour as needed.

Would you consider extending traffic policies to retain functionality available in previous kubernetes versions?

As I think of how k8s cluster looks like, I can see 3 layers of topology. On top there is cluster. In this cluster we have zones. And in each zone we have some nodes.

L1 |            [CLUSTER]
   |           /         \
L2 |    [zoneA]          [zoneB]
   |    /     \          /     \
L3 | [node1] [node2]   [node3] [node4]

We can name this layers:

  • L1 - cluster wide
  • L2 - zone wide
  • L3 - node wide

To achieve best traffic performance (lowest network stack latency) we should go through as little layers as possible.

All our internal traffic start on node layer (L3).
I think that optimal policy would be to try L3 = local node first. And if target service is not available locally we should fallback to L2 = local zone. If service is not available on zone level, then we should fallback to L1 = cluster wide.

Topology Aware Hints already allow us to try L2 and fallback to L1 if necessary. We are lacking option to try L3 and fallback to L2. I think we should not implement LocalWithFallbackToCluster (skipping L2). But we should try to introduce LocalWithFallbackToTopology.

There’s a followup proposal to add “PreferLocal” - we didn’t want to rush that until we were sure we had the kinks worked out, and indeed we have found some bugs in how a couple of different features interact.