Container CPU usage reported by cadvisor

Let’s take we want to measure CPU usage in containers. cadvisor provides a metric named “container_cpu_usage_seconds_total” which is of a metric type: Counter. As the counter, is an accumulator we need to apply a function over it to see the type of values we need.

I saw in most of the articles, rate function is applied. As we know, the rate function “calculates the per-second average rate of increase of the time series in the range vector”. Actually, the output of rate function query gives us a value that is an average value measure over the period of time.

Example: rate (container_cpu_usage_seconds_total{namespace=“default”}[5m]) gives the average CPU usage in the last 5 mins.
That means, for each instant t in the provided instant vector the rate function uses the values from t - 5m to t to calculate its average value. So, for example, the value at 08:30 describes the average number of “container_cpu_usage_seconds_total” per second that were used between 08:25 and 08:30, the value at 08:31 describes the average number of “container_cpu_usage_seconds_total” per second that were created between 08:26 and 08:31, and so on…

Considering such values to identify the peak CPU usage at instant, as it is an average we might miss the peak spot.

The question here is:

  1. Is it not possible to measure the absolute value of “container_cpu_usage_seconds_total”? Ex: What is the absolute current CPU usage at this instance of time?

  2. Like, memory metric defined as Gauge “container_memory_usage_bytes”, why “container_cpu_usage_seconds_total” could not be gauge metric type. In this case, we would get the absolute current value of CPU usage. Isn’t? Correct my understanding if I am wrong?

  3. In general to dimension an application, is it suggested to take the output of rate function.

1 Like

Talking about rate() I’ve noticed that it needs at least 4 values to build an average and return something, otherwise it will return empty value.
So basing on how often you ship your metrics (AFAIR, default value is 15 seconds), you can narrow your rate window to catch peaks.
I think that in general, the idea is not to show the very short term few seconds peaks, but rather something more serious and impactful, which likely may last minutes…