My kubernetes cluster is running a daily CronJob for automatically renewing expiring public TLS certificates. Jobs/Pods for this CronJob are successfully created while no cert update is performed, but once the Pod would actually have to update the certs it suddenly isn’t run at all. The logs and events shown by kubectl don’t appear to be useful at all, so I’m at a loss how this weird problem is caused.
The CronJob successfully creates the Job, but the Job suddenly fails to create the corresponding Pod. “Event” list is empty for both CronJob and Job. Log for the Job only consists of one single line: “error: timed out waiting for the condition” (see kubectl describe outputs below)
Other CronJobs are still working as expected, but this one has been failing under the same circumstances in the past, too.
Can anyone help me figure out what is going on?
Cluster information:
Kubernetes version: v1.13.4
Cloud being used: bare-metal
Installation method: kubectl manual
Host OS: Fedora 29.20190318.0 (Atomic Host)
CNI and version: cilium 1.4.90
CRI and version: docker 1.13.1
kubectl describe CronJob/letsencrypt-certupdate
Name: letsencrypt-certupdate
Namespace: default
Labels: app=letsencrypt-server
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1beta1","kind":"CronJob","metadata":{"annotations":{},"labels":{"app":"letsencrypt-server"},"name":"letsencrypt-cert...
kubernetes.io/change-cause:
kubectl set image cronjob/letsencrypt-certupdate letsencrypt=docker-registry.dev:5000/letsencrypt-docker:57269320 --namespace=default --re...
Schedule: 46 3 * * *
Concurrency Policy: Replace
Suspend: False
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Service Account: tls-cert-update
Containers:
letsencrypt:
Image: docker-registry.dev:5000/letsencrypt-docker:57269320
Ports: 5353/TCP, 5353/UDP, 8000/TCP
Host Ports: 0/TCP, 0/UDP, 0/TCP
Limits:
cpu: 300m
memory: 300Mi
Environment: <none>
Mounts:
/mnt/crt/ from crt (ro)
/mnt/csr_dns/ from csr-dns (ro)
/mnt/csr_http/ from csr-http (ro)
/mnt/key/ from key (ro)
Volumes:
key:
Type: Secret (a volume populated by a Secret)
SecretName: letsencrypt-account-key
Optional: false
crt:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-certs
Optional: false
csr-http:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-csr-http
Optional: false
csr-dns:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-csr-dns
Optional: false
Last Schedule Time: Sun, 27 Oct 2019 04:46:00 +0100
Active Jobs: <none>
Events: <none>
kubectl describe job letsencrypt-certupdate-1572147960
Name: letsencrypt-certupdate-1572147960
Namespace: default
Selector: controller-uid=4e33824e-f86c-11e9-adda-d43d7eed0e14
Labels: controller-uid=4e33824e-f86c-11e9-adda-d43d7eed0e14
job-name=letsencrypt-certupdate-1572147960
Annotations: <none>
Controlled By: CronJob/letsencrypt-certupdate
Parallelism: 1
Completions: 1
Start Time: Sun, 27 Oct 2019 04:46:06 +0100
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=4e33824e-f86c-11e9-adda-d43d7eed0e14
job-name=letsencrypt-certupdate-1572147960
Service Account: tls-cert-update
Containers:
letsencrypt:
Image: docker-registry.dev:5000/letsencrypt-docker:57269320
Ports: 5353/TCP, 5353/UDP, 8000/TCP
Host Ports: 0/TCP, 0/UDP, 0/TCP
Limits:
cpu: 300m
memory: 300Mi
Environment: <none>
Mounts:
/mnt/crt/ from crt (ro)
/mnt/csr_dns/ from csr-dns (ro)
/mnt/csr_http/ from csr-http (ro)
/mnt/key/ from key (ro)
Volumes:
key:
Type: Secret (a volume populated by a Secret)
SecretName: letsencrypt-account-key
Optional: false
crt:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-certs
Optional: false
csr-http:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-csr-http
Optional: false
csr-dns:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tls-csr-dns
Optional: false
Events: <none>
kubectl logs job/letsencrypt-certupdate-1572147960
error: timed out waiting for the condition