So without thinking about the consequences and wanting to get up and running quickly to start investigating viability of a RP 4 based ha-cluster to run my local home based workloads, I neglected to first wire up each node to an ethernet switch when i first started playing and 2 of the three nodes had ONLY their wlan interfaces active when i joined them to the first wired node. This has resulted in a situation that concerns me but i cannot find the best way to resolve it which will result in the least amount of damage to the cluster.
# microk8s status microk8s is running high-availability: yes datastore master nodes: 192.168.1.240:19001 192.168.1.117:19001 192.168.1.118:19001 datastore standby nodes: none addons: enabled: dns # CoreDNS ha-cluster # Configure high availability on the current node storage # Storage class; allocates storage from host directory
So we are basically dealing with two ip ranges in the same /24 subnet. Ethernet is the .240 range and the wifi range is the .117 and .118 IP address you see above. You will notice above that the ha-cluster addon lists the wlan interfaces (which were the only ones available on those nodes when they joined)
But… Interestingly enough, there is nothing else in k8s that has decided to prefer the wlan interface over the ethernet ones except this ha-cluster addon. This:
# kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME m1 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.240 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4 w1 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.241 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4 w2 Ready <none> 13d v1.21.1-3+08fd9d63ea534e 192.168.1.242 <none> Ubuntu 21.04 5.11.0-1012-raspi containerd://1.4.4
# kubectl get all -A -o wide | grep 192.168.1.117 # kubectl get all -A -o wide | grep 192.168.1.118 (no result)
Of furthur concern now, I have just noticed that while the two “wifi” nodes used to output the same output as the wired one when running
microk8s status but when starting to take a shot at tackling this today i noticed the following worrying change. The two ‘wifi’ nodes are both reporting the following:
microk8s is running high-availability: no datastore master nodes: none datastore standby nodes: none addons: enabled: dns # CoreDNS ha-cluster # Configure high availability on the current node storage # Storage class; allocates storage from host directory
But the cluster is still fully functional for about almost 2 weeks as you can see above from the output of:
kubectl get nodes -o wide
I currently have some important workloads running now in this cluster and I want to get the cluster ‘healthy’ with minimal downtime for these workloads.
What would be the least impactful method of resolving this? My initial thought was to just
microk8s stop on all nodes, disable all wlan interfaces, and see if it would start up again and if the nodes would become ready. Failing that, I was thinking to try leaving each of the two nodes that joined with the wrong ip and joining them again with wlan disabled.
Any ideas of the best route forward? There is data in PV’s that i don’t want to lose or have to restore. I am hoping to just be able to scale my deployments down to zero replicas, stop microk8s, “fix the cluster” and network settings, start it back up, and scale the deployments back up to 1 replica for each workload.