I hope to get here some help about an amazing problem I run into with our cluster.
It started last week with trying to update kubernetes version. With this i did a simple
kubeadm upgrade plan
This failed with a message:
FATAL: failed to get node registration: failed to get corresponding node: nodes "ogni-lin-backend-k8s-master.my.local" not found
The node name and hostname is : ogni-lin-backend-k8s-master
So the internal hostname from kubernetes and the hostname are different. Why? I do not know. Maybe this was the hostname when the cluster was created (1 year ago).
However. I did a restart of the master node with a pre running drain. Server started ok. I uncordon the master node and it run well again.
I tried again a preflight but the same error message.
I checked the configuration of the master node and finaly I found inside the kubelet.conf that the masternode was named ogni-lin-backend-k8s-master.my.local
I changed this to ogni-lin-backend-k8s-master
and after saving the kublet.conf I did:
systemctl daemon-reload
systemctl restart kubelet
After this the node got crazy. Since I did this, the master node cannot authorized itself. It cannot manage itself. I do not know why.
All pods on the master switch into the state terminate. Inside the log files I can see:
E0223 08:56:25.112958 31774 kubelet_node_status.go:92] Unable to register node "ogni-lin-backend-k8s-master" with API server: nodes "ogni-lin-backend-k8s-master" is forbidden: node http://ogni-lin-backend-k8s-master.my.local" is not allowed to modify node "ogni-lin-backend-k8s-master"
Ok, I re-changed the kubelet.conf to the old settings, did
systemctl daemon-reload
systemctl restart kubelet
again and the same problem. Also drain and restarted the node. Same problem.
I tried to override the kubelet service with --hostname-override ogni-lin-backend-k8s-master
Again
systemctl daemon-reload
systemctl restart kubelet
But the master node is not available to come up again. The cluster itself is running well.
Anybody has an idea how to find the problem with the master node? Why it cannot manage itself?
Is there a way to repair this?
Cluster information:
Linux Cluster with one Master and 5 Worker.
Kubernetes version: 1.18.8
Cloud being used :bare-metal
Installation method:
Host OS: Linux