There are 2 Kubernetes metrics which give us a number about scheduling duration:
scheduler_scheduling_attempt_duration_seconds_bucket
scheduler_pod_scheduling_duration_seconds_bucket
What is the difference between these two metrics and what does the output of this metrics mean? For example I can see this output for scheduling_attemept_duration
scheduler_scheduling_attempt_duration_seconds_bucket{endpoint="https", instance="172.19.2.1:10259", job="kube-scheduler", le="+Inf", namespace="kube-system", pod="kube-scheduler-master1", profile="default-scheduler", result="scheduled", service="kube-scheduler-prometheus-discovery"}
905943
It definitely doesn’t take 905943 seconds for a pod being scheduled in my cluster so how am I supposed to measure scheduling time in my cluster in order to meet SLI/SLO requirements?
Below is the output for second metric which shows a high value as a result.
scheduler_pod_scheduling_duration_seconds_bucket{attempts="1", endpoint="https", instance="172.19.2.1:10259", job="kube-scheduler", le="+Inf", namespace="kube-system", pod="kube-scheduler-master1", service="kube-scheduler-prometheus-discovery"}
882723