So without thinking about the consequences and wanting to get up and running quickly to start investigating viability of a RP 4 based ha-cluster to run my local home based workloads, I neglected to first wire up each node to an ethernet switch when i first started playing and 2 of the three nodes had ONLY their wlan interfaces active when i joined them to the first wired node. This has resulted in a situation that concerns me but i cannot find the best way to resolve it which will result in the least amount of damage to the cluster.
# microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.1.240:19001 192.168.1.117:19001 192.168.1.118:19001
datastore standby nodes: none
addons:
enabled:
dns # CoreDNS
ha-cluster # Configure high availability on the current node
storage # Storage class; allocates storage from host directory
So we are basically dealing with two ip ranges in the same /24 subnet. Ethernet is the .240 range and the wifi range is the .117 and .118 IP address you see above. You will notice above that the ha-cluster addon lists the wlan interfaces (which were the only ones available on those nodes when they joined)
Butā¦ Interestingly enough, there is nothing else in k8s that has decided to prefer the wlan interface over the ethernet ones except this ha-cluster addon. This:
# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m1 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.240 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4
w1 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.241 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4
w2 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.242 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4
and this:
# kubectl get all -A -o wide | grep 192.168.1.117
# kubectl get all -A -o wide | grep 192.168.1.118
(no result)
Of furthur concern now, I have just noticed that while the two āwifiā nodes used to output the same output as the wired one when running microk8s status
but when starting to take a shot at tackling this today i noticed the following worrying change. The two āwifiā nodes are both reporting the following:
microk8s is running
high-availability: no
datastore master nodes: none
datastore standby nodes: none
addons:
enabled:
dns # CoreDNS
ha-cluster # Configure high availability on the current node
storage # Storage class; allocates storage from host directory
But the cluster is still fully functional for about almost 2 weeks as you can see above from the output of: kubectl get nodes -o wide
I currently have some important workloads running now in this cluster and I want to get the cluster āhealthyā with minimal downtime for these workloads.
What would be the least impactful method of resolving this? My initial thought was to just microk8s stop
on all nodes, disable all wlan interfaces, and see if it would start up again and if the nodes would become ready. Failing that, I was thinking to try leaving each of the two nodes that joined with the wrong ip and joining them again with wlan disabled.
Any ideas of the best route forward? There is data in PVās that i donāt want to lose or have to restore. I am hoping to just be able to scale my deployments down to zero replicas, stop microk8s, āfix the clusterā and network settings, start it back up, and scale the deployments back up to 1 replica for each workload.
Thanks!