SLL certificate expired but should have been auto-renewed: cert-manager

Our SLL certificate expired but was not (yet) renewed. It should have been renewed automatically but it didn’t. As a result, one of our production systems is down – can you help?

We are using is cert-manager, certificate manager tool. All other SSL certificates look healthy except one.

  1. +Command to see SSL certificates via kubectl: kubectl get certificates --all-namespaces
    Results of that command are like:
NAMESPACE        NAME                               		READY   SECRET                             		AGE
elastic          	kibana-tls                         		True    	kibana-tls                         		154d
prod-company     	company-prod-company-com-tls       	False   	company-prod-company-com-tls       	154d
prod-aaa      	aaa-prod-company-com-tls        	True    	aaa-prod-company-com-tls        	154d
prod-bbb     	bbb-prod-company-com-tls       	True    	bbb-prod-company-com-tls       	154d
prod-ccc      	ccc-prod-company-com-tls        	True    	ccc-prod-company-com-tls        	154d
  1. Then error details can be found from:
    kubectl describe certificate -n prod-company company-prod-company-com-tls

In the Events section we can see that cert-manager is trying to renew the certificate periodically, but it is unable.

  Type     Reason   Age                   From          Message
  ----     ------   ----                  ----          -------
  Normal   Issuing  48m (x106 over 4d9h)  cert-manager  Renewing certificate as renewal was scheduled at 2022-04-20 17:35:26 +0000 UTC
  Normal   Reused   48m (x106 over 4d9h)  cert-manager  Reusing private key stored in existing Secret resource “company-prod-company-com-tls"
  Warning  Failed   48m (x106 over 4d9h)  cert-manager  The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in "invalid" state:
  1. The final error found in the logs is:

“The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in “invalid” state:”

I have technology in place in each of our environments to automatically renew expired SSL certificates. However, that has emerged to be unreliable. When I previously saw this issue, after two days (of downtime) the certificate was eventually automatically renewed and the issue disappeared. I need to fix the underlying issue.

  • Kubernetes server version: 1.21
  • Google cloud SDK: 386.0.0

I’d really appreciate help on what could be the root cause and how to fix it?

We resolved it.

It was due to a bug in cert-manager. The bug was fixed in Release 1.5 - cert-manager Documentation release.

Once we upgraded the version of cert-manager, the SSL certificate was renewed automatically.