MicroK8s failed to join RPI cluster error code 500

Hello there. I can’t for the life of me figure out why the designated leaf nodes can’t join the master node’s cluster? I’m following the documentation correctly from the ubuntu tutorial on microk8s and the official microk8s documentation page here: MicroK8s - Clustering with MicroK8s

I can issue the add node command on the master node fine, the join command I paste into one of the leaf nodes in order to join them into the cluster to become leaf nodes… fails with error code 500 but isn’t helpful. Could anyone point me in the right direction? I’m using Ubuntu server for ARM.

I am using carrier grade NAT, a 4G router with a 4G simcard inside; could this be the cause of the problem? I thought that it would at least join the cluster due to it not requiring WAN connectivity, as it’s on the same LAN (192.168.0.X) Thanks for your time.

The join command just returns ‘contacting cluster at 192.168.0.125’
‘failed to join cluster, error code 500’

The error is the same on all PI 3 Bs and PI 4, just much faster at executing on the PI 4.

1 Like

Hi,
There is a cluster-agent service running on the main node. In your example, 192.168.0.125.

journalctl -u snap.microk8s.daemon-cluster-agent. It might give some hint.

Shows no logs. I followed this tutorial exactly:

I’ve tried enabling DMZ to the master node PI incase it was a port forwarding issue. Could carrier grade NAT be blocking port 25000 on LAN regardless of router settings? I may try plugging everything into the dial up ADSL connection we’re about to cancel, just to see if it does resolve things. The 4G router / Carrier grade NAT is 3 megabytes per second as opposed to dial up speeds, which is why I use it.

Hi if you have some firewall in between, check the ports MicroK8s uses.

Hello, thanks for your reply. There is no firewall enabled on the master PI and DMZ is enabled to the master PI, I’m willing to try anything though so I enabled ufw and allowed all of the ports on the MicroK8s services and ports page, for all PIs.

Still no luck :frowning: any other suggestions before I plug the PIs into an ADSL router to see if that resolves it? I’m currently running the microk8s.inspect command but it just hangs on the inpecting cluster heading forever.

Edit - same results on ADSL connection so it wasn’t carrier grade nat after all. I’m going to try removing the microk8s snap packages and installing a newer build. There’s no problems in the network configuration, DMZ has been enabled when using cluster on both ADSL and 4G router, firewall wasn’t enabled but has been enabled anyway and allowed all microk8s specific ports. I’m not sure what else I can try and I’ve hit a wall with my uni dissertation due to this error and there doesn’t seem to be any indication of what’s causing it.

Image of same result on different LAN using ADSL router instead of 4G router to rule out carrier grade LAN as the issue:

1 Like

I’ve tried all of 3 addresses, same results

I don’t understand why this issue is so uncommon, I’m starting to wonder if it’s Ubuntu server but it’s anyone’s guess at this point. Hopefully someone can point me in the right direction.

I tried to run the ‘microk8s.enable dns storage’ command and even that failed, something is clearly wrong with the microk8s config on the master PI. The microk8s status command just hangs forever also.

Doesn’t seem to be starting? :thinking:

I am suffering the same issue. My system automatically updated to 1.20.2 today which killed my cluster. All worker nodes went into “Not Ready” state and I did not have a chance to reactivate them. It is the second time that something goes wrong due to a SNAP update and I do not really find a good guide how to disable automatic updates generally (maybe I just set a time I will never see being alive).

While MicroK8S is lovely, these issues are really annoying, particularly since I almost finsihed my move of Docker single host to a 4 node cluster :frowning:

Thx for helping getting this fixed and stable. I will now try to backport and install v1.19/stable

Jens

I ended up just installing k3s, successfully got my pi3b workers up although they’re not doing anything yet.

this is pretty frustrating, I am not even getting MicroK8S 1.19 up and running anymore, despite running snap remove microk8s --purge

The 1.19 version after disabling HA cluster always complained it cannot connect nodes to a DQLITE / HA-Cluster despite all nodes and the master were disabled…

I give it one last try with installing snap microk8s --classic, leaving HA enabled. If I again cannot add nodes I will revert to kubespray (as this will not kill my cluster with auto-updates)

Ok, this was working now: snap install microk8s --classic >> so far I always added the channel=1.20/stable (or 1.19,…) > so far as well always deactivated HA-Cluster since I did want to use all 4 nodes and not spare one.

Decided now to try the native install and as well monitor for another snap update. If all fails once more I have to reconsider since I want to move to production and need reliability.

If anyone has ideas what is causing these effects, that would be very interesting.

It seems more people here are running k3s than microk8s hence why I switched, it just worked thankfully. (Coming from a noob) perhaps use that or kubespray if you are already familiar with it. The kubernetes discord channel were helpful, if you want to ask for help identifying the cause of your issues in there.

@JensF @robtheslob sorry about your issues.
There’s a fix coming into 1.20 and 1.19 in the next few days concerning the memory leaks on dqlite.

Hi @JensF , you can set the date and time when snap updates will reach you as described in Managing updates | Snapcraft documentation

Yes I am aware, thx for reminding. I just did not find a full opt out from automated updates - maybe just picking a date in far future…

Yet: I do as well like the possibility to stay up2date but am frustrated that I had two times issues with 2 releases this is 100% of all updates fail for me :grinning:

I no left the HA-Cluster enabled, looks as well all 4 nodes are being scheduled not as when I first tried HA and one node was set aside for spare.

Will see what happens with 1.20.3 update - is that any way to subscribe to release notifications?

Thx
Jens

Continuing the discussion from MicroK8s failed to join RPI cluster error code 500:

Yes, thats pretty frustrating. Same problems here. It seems not possible with microk8s to create a cluster if the cluster members are on different physical hardware. I got it working however if i only use one server having multiple lxc containers to build a cluster. As soon as multiple servers are involved it just doesn’t work. Even worse - joining a cluster from a remote seems to work ok, no complaints on the joining cluster but then the master and all other nodes are no longer accessible. kubectl runs into timeout same with microk8s inspect.