High Availability (HA)

:warning: HA for MicroK8s is currently only available as a tech preview for testing purposes.

A highly available Kubernetes cluster is a cluster that can withstand a failure on any one of its components and continue serving workloads without interruption. There are three components necessary for a highly available Kubernetes cluster:

  1. There must be more than one node available at any time.
  2. The control plane must be running on more than one node, so that losing a single node would not render the cluster in-operable.
  3. The cluster state must be in a datastore that is itself highly available.

This documentation describe the steps needed to form an HA cluster in MicroK8s and to check its state.

As HA is a tech preview, this documentation is also a work in progress and subject to change. Please add comments if any parts aren’t working for you!

Testing HA for MicroK8s

To test the HA implementation, you will need:

  1. To install the ‘ha-preview’ version of MicroK8s
  2. At least three nodes. For testing on a single machine, please see the documentation for installing on LXD

Install the first node

HA is currently offered as a tech preview from the latest/edge/ha-preview branch
On Linux, you can install this with:

sudo snap install microk8s --classic --channel=latest/edge/ha-preview

or update an existing installation with:

sudo snap refresh microk8s --classic --channel=latest/edge/ha-preview

For Windows and macOS, you can update your installation with:

multipass exec microk8s -- sudo snap refresh microk8s --classic --channel=latest/edge/ha-preview

(see the install docs for Windows and macOS if you need to install MicroK8s.)

Add at least two other nodes

As before, install the ha-preview version of MicroK8s on at least two additional machines (or LXD containers).
Follow the usual procedure for clustering (described in the clustering documentation):

On the inital node, run:

microk8s add-node

This will output a command with a generated token such as microk8s join 10.128.63.86:25000/567a21bdfc9a64738ef4b3286b2b8a69. Copy this command and run it from the next node. It may take a few minutes to successfully join.
Repeat this process (generate a token, run it from the joining node) for the third and any additional nodes.

Check the status

run the status command:

microk8s status

With HA enabled, this will now inform you of the HA status and the addresses and roles of additional nodes. For example:

microk8s is running
high-availability: yes
  datastore master nodes: 10.128.63.86:19001 10.128.63.166:19001 10.128.63.43:19001
  datastore standby nodes: none

Working with HA

All nodes of the HA cluster run the master control plane. A subset of the cluster nodes (at least three) maintain a copy of the Kubernetes dqlite database. Database maintenance involves a voting process through which a leader is elected. Apart from the voting nodes there are non-voting nodes silently keeping a copy of the database. These nodes are on standby to take over the position of a departing voter. Finally, there are nodes that neither vote nor replicate the database. These nodes are called spare. To sum up, the three node roles are:

voters: replicating the database, participating in leader election
standby: replicating the database, not participating in leader election
spare: not replicating the database, not participating in leader election

Cluster formation, database syncing, voter and leader elections are all transparent to the administrator.

The state of the current state of the HA cluster is shown with:

microk8s status

The output of the HA inspection reports:

  • If HA is achieved or not.
  • The voter and stand-by nodes.

Since all nodes of the HA cluster run the master control plane the microk8s * commands are now available everywhere. Should one of the nodes crash we can move to any other node and continue working without much disruption.

Almost all of the HA cluster management is transparent to the admin and requires minimal configuration. The administrator can only add or remove nodes. To ensure the health of the cluster the following timings should be taken into account:

  • If the leader node gets “removed” ungracefully, e.g. it crashes and never comes back, it will take up to 5 seconds for the cluster to elect a new leader.
  • Promoting a non-voter to a voter takes up to 30 seconds. This promotion takes place when a new node enters the cluster or when a voter crashes.

To remove a node gracefully, first run the leave command on the departing node:

microk8s leave

The node will be marked as ‘NotReady’ (unreachable) in Kubernetes. To complete the removal of the departing , issue the following on any of the remaining nodes:

microk8s remove-node <node>

In the case we are not able to call microk8s leave from the departing node, e.g. due to a node crash, we need to call microk8s remove-node with the --force flag:

microk8s remove-node <node> --force

Add-ons on an HA cluster

Certain add-ons download and “install” client binaries. These binaries will be available only on the node the add-on was enabled from. For example, the helm client that gets installed with microk8s enable helm will be available only on the node the user issued the microk8s enable command.

Upgrading an existing cluster

If you have an existing cluster, you can upgrade to the ha-preview channel

sudo snap refresh microk8s --channel=latest/edge/ha-preview

You then need to enable HA clustering:

microk8s enable ha-cluster

Any machines which are already nodes in a cluster will need to exit and rejoin
in order to establish HA.

To do so, cycle through the nodes to drain, remove, and rejoin them:

microk8s kubectl drain <node>

On the node machine, force it to leave the cluster with:

microk8s leave

Then enable HA with microk8s enable ha-cluster and re-join the node to the cluster with a microk8s add-node and microk8s join issued on the master and node respectively.

What about an etcd based HA?

MicroK8s ships the upstream Kubernetes so an etcd HA setup is also possible, see the upstream documentation on how this can be achieved: 1 2.
The etcd approach is more involved and outside the scope of this document. Overall you will need to maintain your own etcd HA cluster. You will then need to configure the API server and flannel to point to that etcd. Finally you will need to provide a load balancer in front of the nodes acting as masters and configure the workers to reach the masters through the load-balanced endpoint.

1 Like

The output of the add-node cmd can be improved to avoid people expecting both master and worker nodes. I suggest the following:

“This will output a command with a generated token such as ‘microk8s join ip-address:port/token’. Copy this command and run it on the second node you want to distribute the control plane to. It may […]”

there is already a PR for this: https://github.com/ubuntu/microk8s/pull/1389

Thanks. Should I add my comment there then?

For sure the PR is needed. But I think the ask from Alex is slightly different.

In our docs we say “… a generated token such as microk8s join master:25000/DDOkUupkmaBezNnMheTBqFYHLWINGDbf” but in an HA setup there is no master, all nodes are acting as masters. Furthermore, the fact that we say “master” implies that there is also a worker which is not right.

Yeah, I agree. I think the confusion about nodes should be addressed in the output from the command as the PR currently does. The master/worker issue should probably be resolved in the docs.
In future will ALL clusters be ha-enabled?
is it easy to change the output of add-node depending on whether HA is enabled?

Only pre-1.19 clusters will have a master node.

It is relatively easy to change the output of add-node depending on whether HA is enabled. I see we do not mention the work “master” in the add-node output. What do you have in mind?

From 1.19 what happens in the case of only two nodes then? are they master/worker or a not-ready HA cluster?

If HA becomes the only type of cluster then we merge the HA and cluster docs (possibly keep a legacy page for pre-1.19) and most of this can be settled there, then we only need to have one output command.

@Alex_Chalkias I updated the example command in this doc from the current output :+1:

We report high-availability: no. Both nodes can act as K8s masters but only one of them is the datastore master.

+1 makes sense.

1 Like

Wanted to check here, is there plan to have an HA control plane while maintaining worker nodes separately?
The advantage of having a “worker” only nodes is for constrained environments. There’s no need to run control plane components.

From 1.19+ MicroK8s version, there will no need to do microk8s enable ha-cluster?
Which means if i understand correctly, microk8s will form an HA cluster as soon as the user performs the following.

$ microk8s add-node

Followed by (on a different node)

$ microk8s join .....

Thanks!

Yes, that’s the plan. You will need 3 nodes for HA.

1 Like

Thanks. :+1: So HA is available as long as there are more than 2 nodes?

That is the intention, yes.