GKE Metrics are gone

Hi All,

I do not know if this is the right place, so forgive me if not the palce. I have a 3 cluster in GKE, development, staging and production. They’re all configured to collect metrics and logs of the pods. Everything was fine util 10 days ago when the production cluster stopped to collect metrics data, here an images showing the fact:


Since then I started another node in that node pool and the new node is collecting data as normal.

Any ideas on why that node has stopped collecting metrics data?

Any help or hint will be very appreciated.

thank you.

I forgot the node details:

probably the metrics pod on that node is either malfunctioning/failing or not sending data (my best guess anyway).

check all daemonset pods are running:

kubectl get ds -n kube-system workload-metrics

check all pods are running in that daemonset

Resolved,

it was the gke-metrics-agent DaemonSet running on the misbehaving node that was in error. Restarting it resolved the issue.

2 Likes