CPU and memory metrics not working

Hi All,

I’m in the process of getting HPA working for my application and I’m seeing that, by default, neither memory nor CPU metrics are visible to master node / autoscaler. I’m seeing a couple suggestions but mostly these seem to apply either to different cases (AWS) or older versions.

Should I expect these basic metrics, availble via the GKE console, to work OOTB? Or is it expected that I roll some code to expose these metrics?

I have configured my Java application to work with available memory in the container (relatively new feature of Java 8+) and I have requested a safe amount memory (only 512Mi) and set a low-ish memory limit (1Gi). I have two Java apps (REST services) and an Ngnix instance serving static content (SPA) with generous request/limit at half of the Java services.

With this configuration, I think the OS (process) level memory and CPU metrics should work. Do I have this right? If so, what do I need to do to get the metrics visible to the HPA?

I have a separate question about all the various metrics oriented services running on the nodes. These seem like different, if related, questions. Here I’m specifically trying to get the OOTB CPU and memory metrics working. Not alter any system pods, per se.

Thanks for any pointers.

–Charlie

Cluster information:
Kubernetes version: 1.14.8-gke.33
Cloud being used: GKE
Installation method: Console + kubectl apply
Host OS: Container Optimized
CNI and version: VPC Network (?)
CRI and version: ?

@creitzel

Hey Charles, monitoring in GKE isn’t terribly useful by default.
Try the services listed below as an alternative.

For my humble needs, bluematador adds the most value for a very reasonable price past the free trial.



Thanks @
marvin-hansen, Thanks for the tip. When I get to monitoring, I’ll check these out.

But, at the moment, I’m just trying to get the Horizontal Pod Autoscaler (HPA) to work for my Java REST services … :–)

Did you enable Stackdriver for the cluster? https://cloud.google.com/monitoring/kubernetes-engine/

I took the defaults, as of last week. So, yes, to both Stack Driver Logging and Monitoring.

Ok, it seems to have repaired itself. I must have dropped the CPU metric, because the HPA for one of my deployments is down to one memory metric (80% of requested).

The warning in the Deploymen Details page of the console is gone.

Not sure what changed. I haven’t changed anything since posting. I did update the deployment with an update to the container image (code push). And I saw that it briefly went up to 3-4 pods and then falls back down to 1 after a bit. So the “surge” features seems a bit over eager. Especially since I didn’t turn it on :–)

Anyway, it appears the HPA memory trigger is based on my requested memory not the limit. Which seems wrong to me. I mean, it should schedule the pod at all of the request size isn’t available, right? So it hardly seems right to scale up another pod just because you hit what is, in effect, your minimum.

Otoh, if you hit 80% of your limit, that would be a good time to see about some additional capacity. Does anyone know, is there a way to make HPA trigger based on percent of the limit? The same logic should apply to CPU as well, I believe.