Our SLL certificate expired but was not (yet) renewed. It should have been renewed automatically but it didn’t. As a result, one of our production systems is down – can you help?
We are using is
cert-manager, certificate manager tool. All other SSL certificates look healthy except one.
- +Command to see SSL certificates via
kubectl get certificates --all-namespaces
Results of that command are like:
NAMESPACE NAME READY SECRET AGE elastic kibana-tls True kibana-tls 154d prod-company company-prod-company-com-tls False company-prod-company-com-tls 154d prod-aaa aaa-prod-company-com-tls True aaa-prod-company-com-tls 154d prod-bbb bbb-prod-company-com-tls True bbb-prod-company-com-tls 154d prod-ccc ccc-prod-company-com-tls True ccc-prod-company-com-tls 154d
- Then error details can be found from:
kubectl describe certificate -n prod-company company-prod-company-com-tls
Events section we can see that cert-manager is trying to renew the certificate periodically, but it is unable.
Type Reason Age From Message ---- ------ ---- ---- ------- Normal Issuing 48m (x106 over 4d9h) cert-manager Renewing certificate as renewal was scheduled at 2022-04-20 17:35:26 +0000 UTC Normal Reused 48m (x106 over 4d9h) cert-manager Reusing private key stored in existing Secret resource “company-prod-company-com-tls" Warning Failed 48m (x106 over 4d9h) cert-manager The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in "invalid" state:
- The final error found in the logs is:
“The certificate request has failed to complete and will be retried: Failed to wait for order resource “company-prod-company-com-tls-ksc5v-1897634450" to become ready: order is in “invalid” state:”
I have technology in place in each of our environments to automatically renew expired SSL certificates. However, that has emerged to be unreliable. When I previously saw this issue, after two days (of downtime) the certificate was eventually automatically renewed and the issue disappeared. I need to fix the underlying issue.
- Kubernetes server version: 1.21
- Google cloud SDK: 386.0.0
I’d really appreciate help on what could be the root cause and how to fix it?