Hi,
I’m looking to monitor a production kubernetes cluster with prometheus. I have a pretty solid grasp on prometheus - I have been using it for a while for monitoring various devices with node_exporter, snmp_exporter etc. I also found kubernetes_sd
in prometheus and it seems it can discover nodes and pods via the k8s API.
However, I’d like to know where the actual metrics endpoints are. This is so I can scrape them manually with curl, understand what metrics are available, and decide which ones to include/exclude. What’s difficult is finding a simple list of the actual URLs!
For testing purposes I’m using microk8s. What I’ve discovered so far by poking around:
-
There is a
/metrics
path at the main API endpoint (microk8s: https port 16443). This returns about 12,000 metrics, mainly the response times for the API and etcd split over lots of buckets:# curl -sk https://admin:<password>@localhost:16443/metrics
-
There is an endpoint
/api/v1/nodes/<nodename>/proxy/metrics
, with kubelet stats, rest client statsThere seems to be very little of interest there unless you’re debugging kubelet performance: looks like I still need
node_exporter
for full host stats. -
There is an endpoint
/api/v1/namespaces/<namespace>/pods/<pod>/proxy/metrics
, with things like:process_cpu_seconds_total 524.6 process_max_fds 65536 process_open_fds 24 process_resident_memory_bytes 2.7062272e+07 process_start_time_seconds 1.57953723378e+09 process_virtual_memory_bytes 1.33914624e+08 process_virtual_memory_max_bytes -1
That looks to be very interesting at pod level. Each pod needs scraping separately though (
kubernetes_sd
will help there) -
I’ve seen mention of “cAdvisor” metrics collected by kubelet - and after more browsing I finally stumbled upon
/api/v1/nodes/<nodename>/proxy/metrics/cadvisor
Despite the metrics being prefixed
container_*
, this appears to be node-level info, e.g. counts of network bytes per interface, information on filesystems - all metrics are labelled with{container=""}
.(Trying
/api/v1/namespaces/<namespace>/pods/<pod>/proxy/metrics/cadvisor
gives a 404)Note that microk8s uses containerd, not docker, as its runtime.
-
If I install the
metrics-server
package (viamicrok8s.enable metrics-server
) then I can find very basic node and pod metrics at/apis/metics.k8s.io/v1beta1/nodes/
and/apis/metics.k8s.io/v1beta1/pods/
in a single call - but these are JSON, not prometheus. The per-pod metrics are even fewer than theprocess_*
ones shown above:"containers": [ { "name": "speaker", "usage": { "cpu": "3m", "memory": "8012Ki" } } ]
-
If I install
kube-state-metrics
(viakubectl apply -f examples/standard
in the kube-state-metrics repo) then it installs a new service with a/metrics
endpoint on port 8080. To get to it temporarily from outside the cluster I did:kubectl port-forward -n kube-system service/kube-state-metrics 8111:8080 ... curl localhost:8111/metrics
Here I see info on the status of deployments, daemonsets, configmaps, services etc, and some resource info (
kube_pod_container_resource_limits
,kube_pod_container_resource_requests
)
Questions:
- Are there other core metrics that I’ve not listed above? Or a comprehensive list somewhere?
- In particular, are there any more metrics under the
/proxy/metrics/
prefix? I only discovered/proxy/metrics/cadvisor
via a config I saw in a github issue.
Many thanks,
Brian.
UPDATE: on microk8s I’ve also discovered I can get some of these metrics directly from kubelet using <host>:10255/metrics
and <host>:10255/metrics/cadvisor
.