How can I have a pod delete itself on failure

I have a deployment that ensures there are x number of pods running. These pods use emptyDir for their working files, but the data is not needed. When my pod fails/restarts (OOM or similar), since the pod tries to restart on the same node, it still has the same data available to it in its emptyDir. In this case, that causes a number of problems for me, since the application behaves differently when it sees it has existing data (attempts repair, which is not wanted here), and also I have other systems trying to reconnect to the same pod.
IF there is a way for me to tell the pod to just delete on failure instead of restart on failure, the deployment will spin up a new pod and my issue with pods trying to repair from their partial storage and some other headaches just go away, so Iā€™m hoping there is a way for me to have pods delete on failure.

Cluster information:
Kubernetes version: 1.14
Cloud being used: aws

Thanks!

I would take a look at this.
https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/


kevb

    November 16

I have a deployment that ensures there are x number of pods running. These pods use emptyDir for their working files, but the data is not needed. When my pod fails/restarts (OOM or similar), since the pod tries to restart on the same node, it still has the same data available to it in its emptyDir. In this

No, it really shouldnā€™t. If the pod crashes, the emptyDir data should be lost, no matter if itā€™s ok the same node or not. Are you sure it is there when it crashes?

Can you please check that again?

case, that causes a number of problems for me, since the application behaves differently when it sees it has existing data (attempts repair, which is not wanted here), and also I have other systems trying to reconnect to the same pod.

IF there is a way for me to tell the pod to just delete on failure instead of restart on failure, the deployment will spin up a new pod and my issue with pods trying to repair from their partial storage and some other headaches just go away, so Iā€™m hoping there is a way for me to have pods delete on failure.

I think the previous assumption (data is there) might be false, and therefore the problem completely different.

Please check that again, to see if the analysis might be completely different :slight_smile:

Hi rata. Yes, Iā€™m sure the data is still there in the emptydir after a crash. The Kubernetes documentation on how the emptydir works explicitly verifies that it is intended behavior.

From Volumes - Kubernetes

Note: A Container crashing does NOT remove a Pod from a node, so the data in an emptyDir volume is safe across Container crashes.

hi miker256,
Which part of that are you thinking is a good fit here? I was hoping for a simple way of having the pod just delete on failure. Is there a way for me to automatically trigger that from what you linked to, or were you saying that I might be able to write something to watch the pod for a failure and then trigger the eviction api if a pod fails? I think thatā€™d end up being the same as just calling a delete on the pod. I was hoping to avoid writing a new service to watch the pod for a failure and trigger a delete, and was rather thinking there might be a way to have a pod delete on failure so an entirely new pod starts up using native k8s capabilities/configurations.

This is right - data is preserved unless a pod is failed by the Kubelet

Thanks for re-confirming that thockin.
I guess my question boils down to if anyone knows of a way I can tell k8s to NOT restart the pods in this deployment, but rather delete it if it fails for any reason? Iā€™m trying to avoid writing a service to watch these pods and delete them when they fail, so was hoping there was an option/configuration/trick to get pods to delete on failure.

Just one ideaā€¦
If I unterstand you right you could live with a restarting pod as long as there is an emptydir thatā€˜s in fact empty?
You could use an init container that deletes the content before spinning up the new pod.

Historically Deployments required that pods be restart-always. But I guess I donā€™t see why (logically) they have to be. restart-never is semantically consistent, if a little odd. But thatā€™s not what is implemented right now.

Sorry, my bad! :-/

Thanks, sorry again!

The system is big enough that we all lose track of some details. Thanks for being such a great question-answerer!

1 Like

Hey malagant.
I tried the init container route, but that seems to only fire when the pod inits, and not when a single container OOMs and restarts. I then looked at using a lifecycle hook to flush the data dir which seemed not to be as reliable with when it executes (which the docs do mention they donā€™t guarantee exactly when they execute), which improves the situation, but I still see some random issues because of it. Being able to have the pod delete on error should make this behave consistently, and gets me past a secondary issue of some clients which handle keepalives and active connections oddly and stick with my unhealthy pod. So a very valid suggestion, but one that seems not to fully fix this particular issue.

Ok, I see. But then my last bet is that you change the command attribute when starting the pod into something like this:

rm -fr /content && actual_container_command

This will definitely delete the content dir of your pod before it starts. This might be the solution but only because you donā€™t need the content existing before each start.

Iā€™m having an almost exactly same problem.
Did you manage to find a solution for this ?

As far as I know, this has not fundamentally changed. If we want to push something, weā€™ll need a github bug to discuss.

I suspect that a restartPolicy which allowed a maximum number of restarts would satisfy this problem, assuming a pod is replaced via its deployment/HPA configuration once it has exited.

This isnā€™t supported today but there is an ongoing enhancement: https://github.com/kubernetes/enhancements/pull/3339