Our SLL certificate expired but was not (yet) renewed. It should have been renewed automatically but it didn’t. As a result, one of our production systems is down – can you help?
We are using is cert-manager
, certificate manager tool. All other SSL certificates look healthy except one.
- +Command to see SSL certificates via
kubectl
:kubectl get certificates --all-namespaces
Results of that command are like:
NAMESPACE NAME READY SECRET AGE
elastic kibana-tls True kibana-tls 154d
prod-company company-prod-company-com-tls False company-prod-company-com-tls 154d
prod-aaa aaa-prod-company-com-tls True aaa-prod-company-com-tls 154d
prod-bbb bbb-prod-company-com-tls True bbb-prod-company-com-tls 154d
prod-ccc ccc-prod-company-com-tls True ccc-prod-company-com-tls 154d
- Then error details can be found from:
kubectl describe certificate -n prod-company company-prod-company-com-tls
In the Events
section we can see that cert-manager is trying to renew the certificate periodically, but it is unable.
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Issuing 48m (x106 over 4d9h) cert-manager Renewing certificate as renewal was scheduled at 2022-04-20 17:35:26 +0000 UTC
Normal Reused 48m (x106 over 4d9h) cert-manager Reusing private key stored in existing Secret resource “company-prod-company-com-tls"
Warning Failed 48m (x106 over 4d9h) cert-manager The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in "invalid" state:
- The final error found in the logs is:
“The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in “invalid” state:”
I have technology in place in each of our environments to automatically renew expired SSL certificates. However, that has emerged to be unreliable. When I previously saw this issue, after two days (of downtime) the certificate was eventually automatically renewed and the issue disappeared. I need to fix the underlying issue.
- Kubernetes server version: 1.21
- Google cloud SDK: 386.0.0
I’d really appreciate help on what could be the root cause and how to fix it?