Asking for help? Comment out what you need so we can get more information to help you!
Cluster information:
Kubernetes version: 1.22
Cloud being used: bare-metal
Installation method: OnPremise
Host OS: Red Hat Enterprise Linux release 8.10
CNI and version:
CRI and version:
Good morning, I currently have a problem after updating the certificates of my Kubernetes cluster with the kubeadm kubeadm certs renew all command, we have a Kubernetes cluster in On-Premise that has 2 masters and 6 worker nodes, after the update we lost management of the cluster, what was done was apply the aforementioned command and update the certificates in one of the masters and replicate the /etc/kubernetes folder with the certificates in the other, so that both masters had their certificates renewed:
Renewal Master Principal TMT102 (10.164.5.236) and Renewal of Secondary Master TCOLD013 (10.161.169.26)
However, the issue is that the apiserver pod does not start and in both masters it gives the following error:
*I0909 15:46:36.537724 1 server.go:553] external host was not specified, using 10.164.5.236*
*I0909 15:46:36.538897 1 server.go:161] Version: v1.22.0*
*I0909 15:46:37.156242 1 shared_informer.go:240] Waiting for caches to sync for node_authorizer*
*I0909 15:46:37.158840 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.*
*I0909 15:46:37.158879 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.*
*I0909 15:46:37.161155 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.*
*I0909 15:46:37.161190 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.*
*Error: context deadline exceeded*
If we validate the status of etcd on both masters, it says the following on the main master:
*[root@TMT102 jenkinsqa]# systemctl status etcd*
*● etcd.service - etcd*
* Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)*
* Active: active (running) since Fri 2024-09-06 08:41:51 -04; 3 days ago*
* Docs: https://github.com/coreos*
* Main PID: 921 (etcd)*
* Tasks: 10 (limit: 23184)*
* Memory: 70.3M*
* CGroup: /system.slice/etcd.service*
* └─921 /usr/local/bin/etcd --name TMT102 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/e>*
*Sep 09 11:49:08 TMT102 etcd[921]: health check for peer 38b126bffa9e7ff7 could not connect: x509: certificate has expired or is not yet valid*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36578" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36588" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36590" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36600" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36616" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36632" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36644" (error "remote error: tls: bad certificate", ServerName "")*
*Sep 09 11:49:08 TMT102 etcd[921]: health check for peer 38b126bffa9e7ff7 could not connect: x509: certificate has expired or is not yet valid*
*Sep 09 11:49:08 TMT102 etcd[921]: rejected connection from "10.161.169.26:36648" (error "remote error: tls: bad certificate", ServerName "")*
If we validate the status of the kubelet, we fail to recognize the node:
[root@TMT102 jenkinsqa]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Fri 2024-09-06 08:41:52 -04; 3 days ago
Docs: https://kubernetes.io/docs/
Main PID: 1055 (kubelet)
Tasks: 17 (limit: 23184)
Memory: 117.9M
CGroup: /system.slice/kubelet.service
└─1055 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/va>
Sep 09 11:50:02 TMT102 kubelet[1055]: E0909 11:50:02.966092 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.066193 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.167212 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.267684 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.368502 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.468755 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.569086 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.670261 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.771753 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
Sep 09 11:50:03 TMT102 kubelet[1055]: E0909 11:50:03.872367 1055 kubelet.go:2407] "Error getting node" err="node \"tmt102\" not found"
With these errors we cannot connect to manage the Kubernetes cluster, what could be happening, it is an OnPremise Bare Metal installation