Hi I think it would be nice to be able to define generic health checks for each container, in a similar fashion as startup/readiness/liveness probes, but without any negative effects.
In my case, I cannot use a readiness probe because I still want to be able to connect with the failing containers, and I cannot use a liveness probe, because I don’t want it to be restarted. I just want to know if the container is fine. Ideally it should be possible to define multiple probes for each container.
Regarding the effect of the probe, it might be an annotation or custom status, or anything that can be conveniently queried from the API. The failing probes would also generate pod events that show up in k8s monitoring software.
What do you think about it?
BTW, I have no clear idea who are the people responsible for this area and how to get their attention/opinion. If you think this is a good idea, please let them know.
The kubelet uses liveness probes to know when to restart a container.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic.
The kubelet uses startup probes to know when a container application has started.
(from Configure Liveness, Readiness and Startup Probes | Kubernetes )
The goal of the probes, AFAIK, is to prevent unresponsive pods to impact the customer (either because the pod is still starting or because it has stopped working as it should, restarting the pod and restablishing the expected behaviour).
You can define an endpoint from your application as you would with a probe (like
/healthz). But instead of checking the health using a probe (that will result in the kubelet restarting the pod), you use a different method to check the well-being of the application and act depending on what the application returns when it “misbehaves”.
Instead of the kubelet restarting the pod when an HTTP probe returns an “error code”…
Any code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.
… your “monitoring system” can apply a label to indicate that the pod is failing, or remove a label from the pod (that may remove the pod from a service if the label is used as a selector and thus, prevent it to impact your users and allowing you to identify it and connect to the “failing” pod), apply some annotation to the pod, and so on…
Not using probes does not prevent you to use the
/healthz endpoint (or any other exposed by your application) for checking the health of the pods; it just prevents the kubelet to try to fix failing pods (restarting them) or “ignoring” them until they are ready.
Thanks for the response, but I’m not convinced The problem is that often I’m not the author of the container image - in my case it’s MongoDB - so I cannot just implement an endpoint to obtain health info. I’d have to either:
- run a pod (or side container inside the pod) to query the status and then expose/push it somewhere
kubectl exec for every pod from some external monitoring system
In both cases you have to do some tricks, either build and run additional images, or remote exec calls. The first solution may be impossible if the software we want to check doesn’t provide network access, while the second may not work if the software image doesn’t include a shell and/or commands that could be executed.
Given that Kubernetes has this functionality already in place, it should be rather easy to implement.
Well, maybe you’d like to consider Prometheus: it gets metrics automagically from (well designed) applications and can be configured to alert you or it can be combined with Grafana to get nice dashboards.
The Bitnami Helm Chart allows to automatically enable MongoDB metrics: Enable metrics , so it’s also very easy to setup.
We’ve been running MongoDB on OpenShift (in production environments) for years (~4yrs) and I can only recall having problems once or twice because the pod was not automatically restarted (got stuck and somehow the default
--grace-period=30 (seconds) was ignored).
I agree that maybe the kubelet behaviour should be more configurable… But pods are designed to be disposable and restarting them (re-creating them, in fact), is the best, quickest and unattended way to restore the service, even when for databases (that’s what StatefulSets are for, right? )
I have, in the past, advocated for arbitrary probes as a way to trigger things. The problem is that it gets very complex very quickly, in a place where we REALLY don’t have a lot of budget for wasted complexity (kubelet).
Given that there is a workaround (run a sidecar) that works (even if you don’t like it), I think we can’t justify taking something like this as a goal.
Yep, I’ve been using Prometheus with Grafana too
I don’t use external charts as I prefer to have full control over the config. Anyway, as they said:
The chart can optionally start a sidecar metrics exporter
So they just do all these tricks for you. But it’s not always available.
Regarding MongoDB in my case, I want to monitor the status of members within the replica set. If something goes wrong, pod restart is not going to help, but rather make things worse. For instance, restarting and attempting to resync a stale replica member over and over again won’t fix the problem, but needlessly make a huge IO strain on the remaining replicas
My idea is about reducing unnecessary complexity, because Kubernetes itself already provides Prometheus compatible metrics via kube-state-metrics exporter that reports information about all Kubernetes resources within the cluster, out of the box, from a single endpoint, without any custom exporters and annotations.
Well, the point is that they would not trigger anything other than generate an Event and set some Pod state field. IMHO this would actually simplify things, because lots of bloated sidecars could finally go away
Also, as I pointed out, in some cases sidecars just cannot replace exec.
Every proposal needs sufficient rationale to justify it. I’m not sure we have enough rationale to justify this work in Kubelet.
OK, so here is my rationale:
- it’s beneficial for the pod’s health status to be reflected in the Kubernetes API
- in some cases startup, readiness and liveness probes cannot be used because their hard effects on the pod may be undesired
- in many cases running a sidecar is an overkill - additional development, config and cluster resources
- using native Kubernetes API for pod status allows us to use the built-in mechanisms like Events, kube-state-metrics exporter, etc. so that existing k8s monitoring software works out of the box
- sometimes network-based monitoring (sidecars, pods, external software) cannot be used, because the monitored pod doesn’t communicate through network, so we can only use an exec probe or
Whether it convinces you or not, I’m grateful for your valuable inputs which helped me clarify this idea
Let me be clear - I have, myself, advocated for something like this. I like the IDEA, but it is problematic for a number of reasons.
Probes are not free - kubelet has to track and manage and run them. Aside from the API and code complexity argument, which is real IMO, we have practical issues.
Allowing every pod to do second-granular probes that require waiting for results is a potential problem for scale and complexity.
Writes to kube apiserver are not free - kubelet already tries to batch things up, so fidelity here is dubious, anyway.
Readiness is really what you want. If you want a service to include not-ready endpoints, we have a seldom-used feature for this already -
OK, I see. But a readiness probe together with the
publishNotReadyAddresses service option (which I’ve never seen before) looks like a perfect workaround. Thanks a lot!