Joining VM-based microk8s nodes to existing cluster

Hi, this has been driving me mad for a couple of days.

I have a two-node microk8s cluster running on two physical, separate ubuntu hosts.

These nodes have IP addresses like 192.168.0.4 and 192.168.0.5.

Now I want to add nodes from a third host, but I’d like to run them on VMs.

  • I’ve set up Multipass with lxd
  • I can spin up VMs
  • the VMs have a bridged network to the 192.168.0.255 network
  • the VMs then have two IP addresses - 10.133.171.xyz/24 and 192.168.0.xyz/24.
  • I can ping the existing nodes from the new VMs that I spin up.
  • the node that i run microk8s add-node on can resolve the name/ip of the new VM
  • the new VM is ping-able from the existing node

I can do a sudo snap install microk8s --classic, and I can issue the microk8s join [...] command - but that doesn’t join the node to the cluster.

I’ve followed the instructions on the microk8s docs website to try to control what addresses the various bits of microk8s are supposed to listen on, but it stubbornly listens on the wrong interface, so I think that might be preventing the VM from joining the cluster.

Can anyone help me with this?

current setup/network diagram:

desired network architecture diagram

Progress: following the hints at the microk8s docs on host interfaces

NOTE: For the rest of this document, by default interface we refer to the host interface that includes a default gateway route.

I inspected the routes using ip route:

ubuntu@still-horntail:~$ ip route
default via 10.133.171.1 dev enp5s0 proto dhcp src 10.133.171.136 metric 100 
default via 192.168.0.1 dev enp6s0 proto dhcp src 192.168.0.38 metric 200 
[...other routes omitted]

the metric parameter is the link priority (lower is higher priority), so I thought “hey, let’s bump up the priority of the route I care about to see if the microk8s snap picks it up instead”:

ubuntu@still-horntail:~$ sudo ip route del default via 192.168.0.1
ubuntu@still-horntail:~$ sudo ip route add default via 192.168.0.1 dev enp6s0 proto dhcp src 192.168.0.38 metric 90

so now my ip route output looks like this:

ubuntu@still-horntail:~$ ip route
default via 192.168.0.1 dev enp6s0 proto dhcp src 192.168.0.38 metric 90 
default via 10.133.171.1 dev enp5s0 proto dhcp src 10.133.171.136 metric 100 
[...other routes omitted]

cool. Now, on this new node (once I’ve set 192.168.0.38 still-horntail in the master node’s /etc/hosts file, I can:

  • sudo snap install microk8s --classic
  • microk8s join ...

and it joins the node.

Still got problems though - calico-node pod on that node is really angry:

2023-04-23 11:13:17.776 [ERROR][114166] ipsets.go 561: Bad return code from 'ipset list'. error=exit status 1 family="inet" stderr="ipset v7.1: Kernel and userspace incompatible: settype hash:net with revision 7 not supported by userspace.\n"
2023-04-23 11:13:18.289 [ERROR][114166] ipsets.go 916: Failed to read IP sets error=exit status 1 family="inet"
2023-04-23 11:13:18.289 [PANIC][114166] ipsets.go 342: Failed to update IP sets after multiple retries. family="inet"
panic: (*logrus.Entry) (0x1d26240,0xc00058ee10)

goroutine 205 [running]:
github.com/sirupsen/logrus.Entry.log(0xc0000981e0, 0xc0004103f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/pkg/mod/github.com/projectcalico/logrus@v1.0.4-calico/entry.go:128 +0x697
github.com/sirupsen/logrus.(*Entry).Panic(0xc000235720, 0xc00087dbb8, 0x1, 0x1)
	/go/pkg/mod/github.com/projectcalico/logrus@v1.0.4-calico/entry.go:173 +0x102
github.com/projectcalico/felix/ipsets.(*IPSets).ApplyUpdates(0xc00062ac60)
	/go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20200311115901-aa0a22d97c2d/ipsets/ipsets.go:342 +0x34e
github.com/projectcalico/felix/dataplane/linux.(*InternalDataplane).apply.func1(0xc0005e6500, 0xc000844474, 0xc00062ac60)
	/go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20200311115901-aa0a22d97c2d/dataplane/linux/int_dataplane.go:1290 +0x2f
created by github.com/projectcalico/felix/dataplane/linux.(*InternalDataplane).apply
	/go/pkg/mod/github.com/projectcalico/felix@v0.0.0-20200311115901-aa0a22d97c2d/dataplane/linux/int_dataplane.go:1289 +0x3f8

and other weird things - there seems to be something wrong with DNS (haha isn’t it always DNS) as pods running on this new node can’t seem to resolve any domain names.