Now it shows my certificates as updated, and it seemed to restart the required services, so I am wondering:
Do I need to leave and re-join the cluster with the other 2 nodes? (this is a cluster not a multi-node setup, so it is unclear, and if I go to the other nodes, either before or after renewing the cert, when I run “microk8s refresh-certs -c” all I get back is the CA certificate.)
Not OP but just did a cert renew on my 4 node microk8s cluster (running v1.21.3) and saw that metrics-server stopped working with errors like:
[metrics-server-7b9c4d7fd9-ggvtd] E0802 16:53:10.892364 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.31.1.8:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate has expired or is not yet valid: current time 2022-08-02T16:53:10Z is after 2022-07-29T03:10:28Z" node="k8s-master"
[metrics-server-7b9c4d7fd9-ggvtd] E0802 16:53:10.894897 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.31.1.14:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate has expired or is not yet valid: current time 2022-08-02T16:53:10Z is after 2022-07-29T03:10:27Z" node="k8s-worker-03"
[metrics-server-7b9c4d7fd9-ggvtd] E0802 16:53:10.904273 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.31.1.12:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate has expired or is not yet valid: current time 2022-08-02T16:53:10Z is after 2022-07-29T03:10:27Z" node="k8s-worker-01"
[metrics-server-7b9c4d7fd9-ggvtd] E0802 16:53:10.915843 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.31.1.13:10250/stats/summary?only_cpu_and_memory=true\": x509: certificate has expired or is not yet valid: current time 2022-08-02T16:53:10Z is after 2022-07-29T03:10:25Z" node="k8s-worker-02"
My other services are randomly failing with calico errors like:
error getting ClusterInformation. https://<calico pod IP>:443/apis/crd.projectcalico.org/v1/clusterinformations/default: x509 certificate has expired or is not yet valid.
I’ll try disabling/enabling metrics server and restart my app deployments to see if it helps.
Ok Interesting observation. When trying to disable metrics server addon I see:
root@k8s-master:~# microk8s disable metrics-server
Disabling Metrics-Server
Error from server (NotFound): configmaps "metrics-server-config" not found
Error from server (NotFound): deployments.apps "metrics-server-v0.2.1" not found
clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
rolebinding.rbac.authorization.k8s.io "metrics-server-auth-reader" deleted
Warning: apiregistration.k8s.io/v1beta1 APIService is deprecated in v1.19+, unavailable in v1.22+; use apiregistration.k8s.io/v1 APIService
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted
serviceaccount "metrics-server" deleted
deployment.apps "metrics-server" deleted
service "metrics-server" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
Error from server (NotFound): error when deleting "/root/snap/microk8s/3410/tmp/temp.metrics-server.yaml": clusterrolebindings.rbac.authorization.k8s.io "microk8s-admin" not found
which is strange.
Now, if I try enabling/disabling metrics server, the following is what I see
root@k8s-master:~# microk8s enable metrics-server
Addon metrics-server is already enabled.
root@k8s-master:~# microk8s disable metrics-server
Addon metrics-server is already disabled.
Looking at the cluster, the metrics server pods were still being terminated. After giving it a few minutes, enabling metrics-server worked fine.
No longer seeing the x509 cert issues.
So far, didn’t had to leave the cluster and no reboots required on the nodes.
Will update the thread if anything changes.