Cluster information:
Kubernetes version: 1.24.9
Installation: RKE on AWS ec2 nodes
Hello everyone,
I have been trying for days to test the Topology Aware Hints feature (Topology Aware Hints | Kubernetes) on a RKE 1.24 cluster, with no success.
- The feature is enabled by default from k8s v1.24 on, so no feature-gate activation should be needed
- The doc says “If there are less endpoints than zones in a cluster, the controller will not assign any hints.” My cluster is composed of 2 nodes distributed over 2 aws zones, and I see 2 endpoints in my endpointslice, so hints should be assigned. Incidentally, how does k8s know how many zones are in the cluster? Is it a value to be set somewhere in the cluster configuration or does it derive this information from the node labels themselves?
- All nodes have the needed topology key (topology.kubernetes.io/zone) and correctly report the required allocatable CPU value under “Capacity”
- The svc has “internalTrafficPolicy: Cluster” set in its spec
- The svc has been annotated with the required “service.kubernetes.io/topology-aware-hints: auto”
No hints are being generated on the endpointslice. I have tried looking at the kube-controller-manager logs after having raised its verbosity to 5, but I cannot see anything relevant appear when I annotate the svc: no errors, no activities. I would expect something to be logged by topologycache.go, but I see nothing.
Can someone give me any pointers on what could be missing from my configuration or what logs should I expect to see on the controller when the svc is annotated?
Thank you,
Francesco
The hints heuristic is not really built for very small scale. This seems like an opportunity for improvement of the “auto” mode. We are also looking at adding a more explicitly defined mode, but that is not done yet.
Hello Tim,
first of all thank you for your answer.
My cluster is a test cluster, if the issue is scale related I can scale it up at will, within reason. I don’t see any mention of scale in the documentation however, and I struggle to understand why a feature should only work in a “big” cluster. I don’t mean to be critic towards Kubernetes, however. I understand that things are always more complex behind the scenes than they appear.
Is there a specific or approximate number of nodes I should add to the cluster to see the feature kick in? Also, my original questions still stand if anyone has anything useful to add: can you see something I might have overlooked in my config / can someone provide some info on the expected controller logs when the feature works as intended?
Thanks again,
Francesco
The current heuristic is VERY conservative and will only add hints when it is reasonably confident taht it can do so “safely”. In particular, it seeks to avoid a case where one endpoint can be overwhelmed disproportionately. To figure that out, it makes some assumptions (it’s a heuristic, not a mind reader), such as that clients of a service can come from any node, and that client-traffic from a node or zone is proportional to the number of CPUs in that node/zone.
For larger services in larger clusters, this seems pretty safe. But it breaks down with a small number of endpoints.
Hello Tim,
I had to dig into the source code of Kubernetes (kubernetes/topologycache.go at release-1.24 · kubernetes/kubernetes · GitHub) to find the following snippet:
// Nodes with any of these labels set to any value will be excluded from
// topology capacity calculations.
func hasExcludedLabels(labels map[string]string) bool {
if len(labels) == 0 {
return false
}
if _, ok := labels["node-role.kubernetes.io/control-plane"]; ok {
return true
}
if _, ok := labels["node-role.kubernetes.io/master"]; ok {
return true
}
return false
}
Basically, any node flagged as control-plane is ignored in the topology calculation. In my (objectively poorly configured) test cluster, all nodes serve as control-plane, etcd, and worker. As a result, the hints are not calculated.
It would probably be appropriate for this precondition to be reported in the official documentation.
Anyway, thanks again for your time.
Francesco
1 Like
Hey @Francesco, thanks for catching this! Sorry we missed this in the docs. I’ve added this to my WIP PR to update the docs for the 1.27 cycle: Updating documentation for Topology Aware Routing in 1.27 by robscott · Pull Request #40069 · kubernetes/website · GitHub
Hey @Francesco , were you able to achieve the topology hints with non-control plane nodes?
Ciao Bharath,
we were not able to achieve the target result in k8s 1.24. It is likely that this could be done in a newer version, but we haven’t tried yet since we went with a different solution.
Sorry I could not be of more use.
Francesco
Okay, thanks Francesco.
Have a great day!!! 