(CronJob) PODs are deleted immediately when a job fails

sindrepm · February 16, 2019, 8:23am

I have a question regarding the CronJob resource. In the spec I have set both successfulJobsHistoryLimit and failedJobsHistoryLimit to 3 and backoffLimit=0 to basically always keep history of minimum last 3 jobs and not retry a job if it fails for any reason (e.g. OOM). This works, but what I am seeing is that the PODs are terminated (deleted) immediately as they fail even though the jobs are kept according to the history limits.

According to the Job docs:
“When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status.”

This seems to contradict the behavior I am seeing (pods are deleted). Is this by any chance different for CronJobs or is there a bug?

I am running GKE 1.11.7-gke.4

I want to be able to find out why a previous job has failed but this is not possible if the pod is deleted immediately.

rata · February 20, 2019, 12:59am

So, you see this behavior just with failed cronjobs and not with successful?

Have you tried if it happens with plain jobs (instead of cronjobs) too?

Also, what do you mean with “even though the jobs are kept according to the history limits”? Can you be more specific? What is preserved exactly? I get pods aren’t, but not sure I follow what it is preserved

Thanks!

sindrepm · February 20, 2019, 1:43pm

@rata

I only tested this with failed jobs as I ran into the issue while debugging some issue with my cronjob. I have not tried with plain job.

What I ment was that the actual Job objects were not deleted and were kept in accordance with the number set for successfulJobsHistoryLimit and failedJobsHistoryLimit, respectively. The pods were deleted as soon as it failed, though.

After posting I deleted and re-created the CronJob object (instead of just editing it) and now it seems to work fine with the following settings:

jobTemplate.spec.backoffLimit = 0
successfulJobsHistoryLimit = 1
failedJobsHistoryLimit = 1
podSpec.restartPolicy = Never

rata · February 20, 2019, 2:41pm

Pods are not deleted now? (Only the last one)

And if you go back to 3 it happens the same again?

It would be nice to try with successful pods, too, to better understand the problem, IMHO

sindrepm · February 20, 2019, 2:46pm

Yes, I will do more testing with both failed and successful pods but so far it seems to be working.

rata · February 20, 2019, 3:38pm

Glad!

Please report back. Even if it just happens to work with both settings to 3 now

Topic		Replies	Views
Understanding backoffLimit in Kubernetes Job General Discussions	0	14521	February 21, 2019
Job failing before backoffLimit General Discussions	0	2718	December 30, 2020
CronJob-clarify on the spec fields General Discussions	1	848	October 18, 2023
When going over the backoff limit my job pod seems to restart and being killed directly General Discussions	2	3685	March 24, 2024
Question: How does k8s manage pod creation/deletion in cronjobs and how to specify memory/cpu requirements in CronJob? General Discussions docs , development	1	43	February 16, 2025

(CronJob) PODs are deleted immediately when a job fails

Related topics