Upgrading a MicroK8s cluster

Description

This document outlines the process of upgrading a 3-node MicroK8s cluster, from version 1.22/stable to 1.23/candidate .

General notes

The following is generic advice to follow when upgrading your Kubernetes cluster:

  • For production clusters, always make sure that you have a working backup of the Kubernetes cluster database before starting.
  • To minimize the margin of error, and ensure that you can always rollback to a working state, only upgrade a single Kubernetes node at a time.
  • Start with the Kubernetes control plane nodes. After all control plane nodes are upgraded, proceed with the Kubernetes worker nodes, upgrading them one-by-one (or, for larger clusters, in small batches) as well.
  • Make sure that you update by one minor version at a time. Before proceeding with any upgrade, refer to the Kubernetes release notes to read about any breaking changes, removed and/or deprecated APIs, and make sure that they do not affect your cluster.
  • Cordon and drain any nodes prior to upgrading and restore them afterwards, to ensure that the application workloads hosted in your Kubernetes cluster are not affected.

NOTE: For finer-grained control over MicroK8s revision upgrades in clusters running in production, consider using a Snap Store Proxy.

Upgrade a 3-node cluster

For our example, we have the following cluster, running on k8s-1, k8s-2 and k8s-3. Two services are running in the workload (an nginx and a microbot deployment):

microk8s kubectl get node
microk8s kubectl get pod -o wide

The output for our deployments looks like this (nginx has 3 pods, microbot has 10):

NAME    STATUS   ROLES    AGE   VERSION
k8s-3   Ready    <none>   19d   v1.22.3-3+9ec7c40ec93c73
k8s-2   Ready    <none>   19d   v1.22.3-3+9ec7c40ec93c73
k8s-1   Ready    <none>   19d   v1.22.3-3+9ec7c40ec93c73

NAME                       READY   STATUS    RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
nginx-7848d4b86f-xwhcp     1/1     Running   0          5m41s   10.1.200.196   k8s-2   <none>           <none>
nginx-7848d4b86f-kxxjv     1/1     Running   0          4m51s   10.1.200.197   k8s-2   <none>           <none>
nginx-7848d4b86f-wsdws     1/1     Running   0          4m51s   10.1.13.71     k8s-3   <none>           <none>
microbot-fdcc4594f-mlqr7   1/1     Running   0          2m34s   10.1.13.73     k8s-3   <none>           <none>
microbot-fdcc4594f-kjcjq   1/1     Running   0          2m34s   10.1.200.199   k8s-2   <none>           <none>
microbot-fdcc4594f-4vsrd   1/1     Running   0          2m27s   10.1.231.202   k8s-1   <none>           <none>
microbot-fdcc4594f-hkqrw   1/1     Running   0          2m26s   10.1.231.203   k8s-1   <none>           <none>
microbot-fdcc4594f-qmjhq   1/1     Running   0          16s     10.1.200.200   k8s-2   <none>           <none>
microbot-fdcc4594f-nxx9j   1/1     Running   0          16s     10.1.13.74     k8s-3   <none>           <none>
microbot-fdcc4594f-pbndr   1/1     Running   0          8s      10.1.200.202   k8s-2   <none>           <none>
microbot-fdcc4594f-f2jmm   1/1     Running   0          16s     10.1.13.75     k8s-3   <none>           <none>
microbot-fdcc4594f-jtfdf   1/1     Running   0          8s      10.1.200.201   k8s-2   <none>           <none>
microbot-fdcc4594f-zl2sl   1/1     Running   0          8s      10.1.13.76     k8s-3   <none>           <none>

Upgrade first node

We will the start the cluster upgrade with k8s-1.

  1. Run kubectl drain k8s-1. This command will cordon the node (marking it with the NoSchedule taint, so that no new workloads are scheduled on it), as well as evicting all running pods to other nodes:

    microk8s kubectl drain k8s-1 --ignore-daemonsets
    

    The output should look like this:

    node/k8s-1 cordoned
    WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-mhbqw, ingress/nginx-ingress-microk8s-controller-gtb8p
    evicting pod default/microbot-fdcc4594f-hkqrw
    evicting pod kube-system/hostpath-provisioner-5c65fbdb4f-gfdpj
    evicting pod kube-system/coredns-7f9c69c78c-nfd4b
    evicting pod default/microbot-fdcc4594f-4vsrd
    pod/hostpath-provisioner-5c65fbdb4f-gfdpj evicted
    pod/coredns-7f9c69c78c-nfd4b evicted
    pod/microbot-fdcc4594f-hkqrw evicted
    pod/microbot-fdcc4594f-4vsrd evicted
    node/k8s-1 evicted
    
  2. Verify that all pods previously running on k8s-1 were removed, and new ones have been deployed on the other cluster nodes. Also, make sure that the node has been marked with SchedulingDisabled:

    microk8s kubectl get node
    microk8s kubectl get pod -o wide
    

    Note that no pods are seen running on k8s-1:

    NAME    STATUS                     ROLES    AGE   VERSION
    k8s-3   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-2   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-1   Ready,SchedulingDisabled   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    
    NAME                       READY   STATUS    RESTARTS   AGE     IP             NODE    NOMINATED NODE   READINESS GATES
    nginx-7848d4b86f-xwhcp     1/1     Running   0          14m     10.1.200.196   k8s-2   <none>           <none>
    nginx-7848d4b86f-kxxjv     1/1     Running   0          13m     10.1.200.197   k8s-2   <none>           <none>
    nginx-7848d4b86f-wsdws     1/1     Running   0          13m     10.1.13.71     k8s-3   <none>           <none>
    microbot-fdcc4594f-mlqr7   1/1     Running   0          11m     10.1.13.73     k8s-3   <none>           <none>
    microbot-fdcc4594f-kjcjq   1/1     Running   0          11m     10.1.200.199   k8s-2   <none>           <none>
    microbot-fdcc4594f-qmjhq   1/1     Running   0          9m      10.1.200.200   k8s-2   <none>           <none>
    microbot-fdcc4594f-nxx9j   1/1     Running   0          9m      10.1.13.74     k8s-3   <none>           <none>
    microbot-fdcc4594f-f2jmm   1/1     Running   0          9m      10.1.13.75     k8s-3   <none>           <none>
    microbot-fdcc4594f-jtfdf   1/1     Running   0          8m52s   10.1.200.201   k8s-2   <none>           <none>
    microbot-fdcc4594f-zl2sl   1/1     Running   0          8m52s   10.1.13.76     k8s-3   <none>           <none>
    microbot-fdcc4594f-pbndr   1/1     Running   0          8m52s   10.1.200.202   k8s-2   <none>           <none>
    microbot-fdcc4594f-nrqh9   1/1     Running   0          8m18s   10.1.200.204   k8s-2   <none>           <none>
    microbot-fdcc4594f-dx2pk   1/1     Running   0          8m17s   10.1.13.78     k8s-3   <none>           <none>
    
  3. Refresh the MicroK8s snap to track the 1.23/candidate channel (1.23/stable is not yet released as of this writing). This command needs to run on k8s-1.

    sudo snap refresh microk8s --channel 1.23/candidate
    

    The output should look like this:

    microk8s (1.23/candidate) v1.23.0-rc.0 from Canonical✓ refreshed
    
  4. Shortly afterwards, we can see that k8s-1 is now running version 1.23.0:

    microk8s kubectl get node
    
    NAME    STATUS                     ROLES    AGE   VERSION
    k8s-2   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-1   Ready,SchedulingDisabled   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    k8s-3   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    
  5. The final step is to uncordon the node, so that the cluster can start scheduling new workloads on it:

    microk8s kubectl uncordon k8s-1
    microk8s kubectl get node
    
    node/k8s-1 uncordoned
    
    NAME    STATUS  ROLES    AGE   VERSION
    k8s-1   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    k8s-3   Ready   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-2   Ready   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    

Rollback in case of failure

At this point, let’s assume that a hypothetical error has occured, and we observe that our Kubernetes cluster is not behaving as it should (e.g. connectivity issues, pods getting to error state, increase number of error logs on the upgraded node, etc). In that case, it may be required to rollback the node back to the previous version (for our example, 1.22).

With MicroK8s, this is as simple as running sudo snap revert microk8s:

  1. If the node has any new workloads, make sure to drain before any changes:

    microk8s kubectl drain k8s-1
    
  2. Revert to previous MicroK8s version. This will re-installed the previous snap revision, and restore all configuration files of the control plane services:

    sudo snap revert microk8s
    microk8s kubectl get node
    
    microk8s reverted to v1.22.3
    
    NAME    STATUS                     ROLES    AGE   VERSION
    k8s-3   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-2   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-1   Ready,SchedulingDisabled   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    

Upgrade second node

Follow the same steps as previously. Note that all kubectl commands can run from any node in the cluster.

  1. Drain and cordon the node

    microk8s kubectl drain k8s-2 --ignore-daemonsets
    
  2. Ensure that all workloads have been moved to other cluster nodes:

    microk8s kubectl get node
    microk8s kubectl get pod -o wide
    

    Notice the output, showing SchedulingDisabled for k8s-2, and no pods running on it.

    NAME    STATUS                     ROLES    AGE   VERSION
    k8s-3   Ready                      <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-2   Ready,SchedulingDisabled   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-1   Ready                      <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f0
    
    NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
    nginx-7848d4b86f-wsdws     1/1     Running   0          96m   10.1.13.71     k8s-3   <none>           <none>
    microbot-fdcc4594f-mlqr7   1/1     Running   0          94m   10.1.13.73     k8s-3   <none>           <none>
    microbot-fdcc4594f-nxx9j   1/1     Running   0          92m   10.1.13.74     k8s-3   <none>           <none>
    microbot-fdcc4594f-f2jmm   1/1     Running   0          92m   10.1.13.75     k8s-3   <none>           <none>
    microbot-fdcc4594f-zl2sl   1/1     Running   0          91m   10.1.13.76     k8s-3   <none>           <none>
    microbot-fdcc4594f-dx2pk   1/1     Running   0          91m   10.1.13.78     k8s-3   <none>           <none>
    nginx-7848d4b86f-lsjq6     1/1     Running   0          33m   10.1.231.204   k8s-1   <none>           <none>
    microbot-fdcc4594f-h9cjg   1/1     Running   0          33m   10.1.231.205   k8s-1   <none>           <none>
    microbot-fdcc4594f-98vnj   1/1     Running   0          33m   10.1.231.207   k8s-1   <none>           <none>
    microbot-fdcc4594f-glvcm   1/1     Running   0          33m   10.1.231.208   k8s-1   <none>           <none>
    microbot-fdcc4594f-m5wzj   1/1     Running   0          33m   10.1.231.209   k8s-1   <none>           <none>
    microbot-fdcc4594f-7n5k5   1/1     Running   0          33m   10.1.231.210   k8s-1   <none>           <none>
    nginx-7848d4b86f-skshj     1/1     Running   0          33m   10.1.231.211   k8s-1   <none>           <none>
    
  3. Upgrade. This command must run on k8s-2:

    sudo snap refresh microk8s --channel 1.23/candidate
    
  4. Verify that k8s-2 is also now running on 1.23.0:

    microk8s kubectl uncordon k8s-2
    microk8s kubectl get node
    
    node/k8s-2 uncordoned
    
    NAME    STATUS  ROLES    AGE   VERSION
    k8s-1   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    k8s-3   Ready   <none>   19d   v1.22.3-3+9ec7c40ec93c73
    k8s-2   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    

Upgrade third node

The process is exactly the same as with the previous two nodes:

  1. Drain and cordon the node

    microk8s kubectl drain k8s-3 --ignore-daemonsets
    
  2. Verify all workloads have been evicted. No pods should be shown running on k8s-3:

    microk8s kubectl get node
    microk8s kubectl get pod -o wide
    
  3. Upgrade. This command must run on k8s-3:

    sudo snap refresh microk8s --channel 1.23/candidate
    
  4. Verify that k8s-3 is also now running on 1.23.0:

    microk8s kubectl uncordon k8s-3
    microk8s kubectl get node
    
    node/k8s-3 uncordoned
    
    NAME    STATUS  ROLES    AGE   VERSION
    k8s-1   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    k8s-3   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    k8s-2   Ready   <none>   19d   v1.23.0-rc.0.2+f4d3c97c512f07
    

Verify by deploying new workloads

We will verify that our cluster is still working as expected by creating a new microbot-2 test deployment:

microk8s kubectl create deploy --image dontrebootme/microbot:v1 microbot-2
microk8s kubectl scale deploy microbot-2 --replicas 4
microk8s kubectl expose deploy microbot-2 --port 80 --type NodePort
deployment.apps/microbot-2 created
deployment.apps/microbot-2 scaled
service/microbot-2 exposed

After deployment is finished, our services should look like this:

microk8s kubectl get pod -l app=microbot-2 -o wide
microk8s kubectl get svc microbot-2
NAME                          READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
microbot-2-5484459568-g299z   1/1     Running   0          43m   10.1.13.83     k8s-3   <none>           <none>
microbot-2-5484459568-2dj7z   1/1     Running   0          43m   10.1.200.214   k8s-2   <none>           <none>
microbot-2-5484459568-52cgn   1/1     Running   0          43m   10.1.231.212   k8s-1   <none>           <none>
microbot-2-5484459568-nb52k   1/1     Running   0          43m   10.1.13.84     k8s-3   <none>           <none>

NAME         TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
microbot-2   NodePort   10.152.183.221   <none>        80:30032/TCP   42m

Finally, we use the NodePort service we just created from all 3 cluster nodes, and verify that load balancing to the 4 deployed pods is working:

curl --silent k8s-1:30032 | grep hostname
curl --silent k8s-1:30032 | grep hostname
curl --silent k8s-2:30032 | grep hostname
curl --silent k8s-2:30032 | grep hostname
curl --silent k8s-3:30032 | grep hostname

In the output, we can see that different pods are answering on each curl.

<p class="centered">Container hostname: microbot-2-5484459568-g299z</p>
<p class="centered">Container hostname: microbot-2-5484459568-nb52k</p>
<p class="centered">Container hostname: microbot-2-5484459568-52cgn</p>
<p class="centered">Container hostname: microbot-2-5484459568-2dj7z</p>
<p class="centered">Container hostname: microbot-2-5484459568-2dj7z</p>

When would be some upgrade support for addons started?