Bug: CronJob consistently stalled for 90-120s on one cluster, while other clusters run the same job on time

charmut · November 14, 2025, 8:34am

Hello! I’m reaching out to check if I can get help on this specific issue I am encountering with a cronjob in k8s and understand what could be causing this behaviour.

I have an specific cluster where a daily CronJob is consistently delayed by 90-120 seconds.

The exact same job in three other clusters runs on time, with a normal 3-4 second delay.

This is a differential failure specific to this one cluster.

Here are the success timestamps for the same job , all scheduled for 03:00:00Z:

Healthy Cluster 1: 2025-11-14T03:00:03.928Z (3s delay)
Healthy Cluster 2: 2025-11-14T03:00:03.375Z (3s delay)
Healthy Cluster 3: 2025-11-14T03:00:04.617Z (4s delay)
Problem Cluster: 2025-11-14T03:01:38.500Z (98s delay)

The cause of the delay was suspected to be a 92-second stall inside the kube-controller-manager on the problem cluster:

I1112 03:00:00.107020       1 job_controller.go:568] "enqueueing job" logger="job-controller" delay="0s"
I1112 03:00:00.111085       1 job_controller.go:568] "enqueueing job" logger="job-controller" delay="0s"
I1112 03:00:00.138959       1 job_controller.go:568] "enqueueing job" logger="job-controller" delay="1s"
...
I1112 03:00:00.207072       1 job_controller.go:568] "enqueueing job" logger="job-controller" delay="1s"

[... 92-SECOND GAP IN LOGS ...]

I1112 03:01:32.532598       1 job_controller.go:568] "enqueueing job" logger="job-controller" delay="0s"

The CronJob’s own status field confirms this consistent delay.

# From: kubectl get cronjob control-message-sender -o yaml
status:
  lastScheduleTime: "2025-11-12T03:00:00Z"
  lastSuccessfulTime: "2025-11-12T03:01:36Z"

The configurations of the “good” (on-time) jobs and “bad” (delayed) job show no meaningful difference (like startingDeadlineSeconds) that would explain this behavior.

Factors Investigated (Ruled Out)

Clock Skew: We have checked the clock on the node and the kube-controller-manager pod. It is in sync with UTC. The logs from 3:00 AM also show other jobs being enqueued at the correct time, proving the clock is not the issue.
Ghost Resource Errors: This cluster did have a Failed to list *v1.PartialObjectMetadata error loop. We fixed this by restarting the controller pod. The errors are now gone, but the 90-120 second delay persists. This proves the error loop was a separate symptom, not the root cause of the stall.

Question:
Is it possible that this systematic delay for this specific cluster’s cronjob is due to the approximate schedule that k8s cronjob have? Or is there any other reason that could be causing this delay?

Thank you for your help!

Environment:

Kubernetes version: v1.31.1
Cloud being used: bare-metal
Installation method: Talos OS
Host OS: Talos Linux
CNI and version: Cilium
CRI and version: containerd

Topic		Replies	Views
Unable to schedule a cronjob which executes a kubectl command General Discussions	1	2295	March 3, 2019
Job not running on scheduled time General Discussions	0	488	July 1, 2020
CronJob: inexplicable "timed out waiting for the condition" error General Discussions	0	4868	October 27, 2019
Fixed time delay between two jobs / cronjobs General Discussions	1	1349	November 5, 2020
Cronjob does not create job, manual crated job does not create pod General Discussions	1	5235	July 2, 2023

Bug: CronJob consistently stalled for 90-120s on one cluster, while other clusters run the same job on time

Factors Investigated (Ruled Out)

Related topics