[TAH] topology aware hints not generated : Insufficient Node information: allocatable CPU or zone not specified on one or more nodes

Hello,

I try to set up the TAH on a test 1.26.4 cluster (onpremises on WMWare with RHEL 9.2 VMs), with one master (2 vCPU) and 2 workers (4 vCPU).

The master should be excluded from TAH has it has a node-role.kubernetes.io/master label.

I tried with 2, 4 and 6 replicas of my deployment, but I still get a “Insufficient Node information: allocatable CPU or zone not specified on one or more nodes” error on the service.

I tried with labels kubernetes.io/zone=zone-a and topology.kubernetes.io/zone but no luck.
The allocatable cpu is shown in the nodes yml.

With replica=6 I have 3 pods on each node, so 3 endpoints in each zone.

Do you know what I could try to enable TAH ?

Thanks.

(my need is that an A app calls a B app preferably on the same node (= same onpremise Datacenter) and fallback to call B on any other node/DC if the same node pod is down)

@robscott

service :

apiVersion: v1
kind: Service
metadata:
  name: nodejs2-service
  namespace: dev
  annotations:
    service.kubernetes.io/topology-aware-hints: "auto"
spec:
  type: ClusterIP
  selector:
    app: nodejs2
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 3000

endpointslices :

addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - 100.96.0.52
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.3
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-2v452
    namespace: dev
  zone: zone-a
- addresses:
  - 100.96.3.51
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.4
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-njhx2
    namespace: dev
  zone: zone-b
- addresses:
  - 100.96.0.53
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.3
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-vmnh6
    namespace: dev
  zone: zone-a
- addresses:
  - 100.96.3.87
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.4
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-vnsrf
    namespace: dev
  zone: zone-b
- addresses:
  - 100.96.0.56
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.3
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-fmr2n
    namespace: dev
  zone: zone-a
- addresses:
  - 100.96.3.91
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: 10.61.10.4
  targetRef:
    kind: Pod
    name: node1-c6b879cb9-5z4ts
    namespace: dev
  zone: zone-b
kind: EndpointSlice
metadata:
  labels:
    endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
    kubernetes.io/service-name: node1-service
  name: node1-service-plv6k
  namespace: dev

nodes :

10.61.10.2   master   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.61.10.2,kubernetes.io/os=linux, kubernetes.io/zone=zone-a,topology.kubernetes.io/zone=zone-a,node-role.kubernetes.io/master=
10.61.10.3   <none>   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.61.10.3,kubernetes.io/os=linux, kubernetes.io/zone=zone-a,topology.kubernetes.io/zone=zone-a
10.61.10.4   <none>   v1.26.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=10.61.10.4,kubernetes.io/os=linux, kubernetes.io/zone=zone-b,topology.kubernetes.io/zone=zone-b

nodes allocatable cpu :

 allocatable:
    cpu: 3900m
    ephemeral-storage: "22398631896"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 14938628Ki
    pods: "110"
  capacity:
    cpu: "4"
    ephemeral-storage: 26010Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 16089604Ki
    pods: "110"

services events :

Events:
  Type     Reason                      Age                 From                       Message
  ----     ------                      ----                ----                       -------
  Warning  TopologyAwareHintsDisabled  68s (x35 over 94m)  endpoint-slice-controller  Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4

kube-proxy logs :

topology.go:171] "Skipping topology aware endpoint filtering since one or more endpoints is missing a zone hint"

Cluster information:

Kubernetes version: 1.26.4
Cloud being used: bare-metal
Installation method: Kublr
Host OS: RHEL 9.2
CNI and version:
CRI and version:

You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

I don’t understand why, but for debugging I set --v=2 in the kubelet systemctl service and restarted it. The TAH was working without doing anything else…