Description
This document outlines the process of upgrading a 3-node MicroK8s cluster, from version 1.22/stable
to 1.23/candidate
.
General notes
The following is generic advice to follow when upgrading your Kubernetes cluster:
- For production clusters, always make sure that you have a working backup of the Kubernetes cluster database before starting.
- To minimize the margin of error, and ensure that you can always rollback to a working state, only upgrade a single Kubernetes node at a time.
- Start with the Kubernetes control plane nodes. After all control plane nodes are upgraded, proceed with the Kubernetes worker nodes, upgrading them one-by-one (or, for larger clusters, in small batches) as well.
- Make sure that you update by one minor version at a time. Before proceeding with any upgrade, refer to the Kubernetes release notes to read about any breaking changes, removed and/or deprecated APIs, and make sure that they do not affect your cluster.
- Cordon and drain any nodes prior to upgrading and restore them afterwards, to ensure that the application workloads hosted in your Kubernetes cluster are not affected.
NOTE: For finer-grained control over MicroK8s revision upgrades in clusters running in production, consider using a Snap Store Proxy.
Upgrade a 3-node cluster
For our example, we have the following cluster, running on k8s-1
, k8s-2
and k8s-3
. Two services are running in the workload (an nginx
and a microbot
deployment):
microk8s kubectl get node
microk8s kubectl get pod -o wide
The output for our deployments looks like this (nginx has 3 pods, microbot has 10):
NAME STATUS ROLES AGE VERSION
k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73
k8s-2 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73
k8s-1 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-7848d4b86f-xwhcp 1/1 Running 0 5m41s 10.1.200.196 k8s-2 <none> <none>
nginx-7848d4b86f-kxxjv 1/1 Running 0 4m51s 10.1.200.197 k8s-2 <none> <none>
nginx-7848d4b86f-wsdws 1/1 Running 0 4m51s 10.1.13.71 k8s-3 <none> <none>
microbot-fdcc4594f-mlqr7 1/1 Running 0 2m34s 10.1.13.73 k8s-3 <none> <none>
microbot-fdcc4594f-kjcjq 1/1 Running 0 2m34s 10.1.200.199 k8s-2 <none> <none>
microbot-fdcc4594f-4vsrd 1/1 Running 0 2m27s 10.1.231.202 k8s-1 <none> <none>
microbot-fdcc4594f-hkqrw 1/1 Running 0 2m26s 10.1.231.203 k8s-1 <none> <none>
microbot-fdcc4594f-qmjhq 1/1 Running 0 16s 10.1.200.200 k8s-2 <none> <none>
microbot-fdcc4594f-nxx9j 1/1 Running 0 16s 10.1.13.74 k8s-3 <none> <none>
microbot-fdcc4594f-pbndr 1/1 Running 0 8s 10.1.200.202 k8s-2 <none> <none>
microbot-fdcc4594f-f2jmm 1/1 Running 0 16s 10.1.13.75 k8s-3 <none> <none>
microbot-fdcc4594f-jtfdf 1/1 Running 0 8s 10.1.200.201 k8s-2 <none> <none>
microbot-fdcc4594f-zl2sl 1/1 Running 0 8s 10.1.13.76 k8s-3 <none> <none>
Upgrade first node
We will the start the cluster upgrade with k8s-1
.
-
Run
kubectl drain k8s-1
. This command will cordon the node (marking it with theNoSchedule
taint, so that no new workloads are scheduled on it), as well as evicting all running pods to other nodes:microk8s kubectl drain k8s-1 --ignore-daemonsets
The output should look like this:
node/k8s-1 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-mhbqw, ingress/nginx-ingress-microk8s-controller-gtb8p evicting pod default/microbot-fdcc4594f-hkqrw evicting pod kube-system/hostpath-provisioner-5c65fbdb4f-gfdpj evicting pod kube-system/coredns-7f9c69c78c-nfd4b evicting pod default/microbot-fdcc4594f-4vsrd pod/hostpath-provisioner-5c65fbdb4f-gfdpj evicted pod/coredns-7f9c69c78c-nfd4b evicted pod/microbot-fdcc4594f-hkqrw evicted pod/microbot-fdcc4594f-4vsrd evicted node/k8s-1 evicted
-
Verify that all pods previously running on
k8s-1
were removed, and new ones have been deployed on the other cluster nodes. Also, make sure that the node has been marked withSchedulingDisabled
:microk8s kubectl get node microk8s kubectl get pod -o wide
Note that no pods are seen running on
k8s-1
:NAME STATUS ROLES AGE VERSION k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-2 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-1 Ready,SchedulingDisabled <none> 19d v1.22.3-3+9ec7c40ec93c73 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-7848d4b86f-xwhcp 1/1 Running 0 14m 10.1.200.196 k8s-2 <none> <none> nginx-7848d4b86f-kxxjv 1/1 Running 0 13m 10.1.200.197 k8s-2 <none> <none> nginx-7848d4b86f-wsdws 1/1 Running 0 13m 10.1.13.71 k8s-3 <none> <none> microbot-fdcc4594f-mlqr7 1/1 Running 0 11m 10.1.13.73 k8s-3 <none> <none> microbot-fdcc4594f-kjcjq 1/1 Running 0 11m 10.1.200.199 k8s-2 <none> <none> microbot-fdcc4594f-qmjhq 1/1 Running 0 9m 10.1.200.200 k8s-2 <none> <none> microbot-fdcc4594f-nxx9j 1/1 Running 0 9m 10.1.13.74 k8s-3 <none> <none> microbot-fdcc4594f-f2jmm 1/1 Running 0 9m 10.1.13.75 k8s-3 <none> <none> microbot-fdcc4594f-jtfdf 1/1 Running 0 8m52s 10.1.200.201 k8s-2 <none> <none> microbot-fdcc4594f-zl2sl 1/1 Running 0 8m52s 10.1.13.76 k8s-3 <none> <none> microbot-fdcc4594f-pbndr 1/1 Running 0 8m52s 10.1.200.202 k8s-2 <none> <none> microbot-fdcc4594f-nrqh9 1/1 Running 0 8m18s 10.1.200.204 k8s-2 <none> <none> microbot-fdcc4594f-dx2pk 1/1 Running 0 8m17s 10.1.13.78 k8s-3 <none> <none>
-
Refresh the MicroK8s snap to track the
1.23/candidate
channel (1.23/stable
is not yet released as of this writing). This command needs to run onk8s-1
.sudo snap refresh microk8s --channel 1.23/candidate
The output should look like this:
microk8s (1.23/candidate) v1.23.0-rc.0 from Canonical✓ refreshed
-
Shortly afterwards, we can see that
k8s-1
is now running version1.23.0
:microk8s kubectl get node
NAME STATUS ROLES AGE VERSION k8s-2 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-1 Ready,SchedulingDisabled <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07 k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73
-
The final step is to uncordon the node, so that the cluster can start scheduling new workloads on it:
microk8s kubectl uncordon k8s-1 microk8s kubectl get node
node/k8s-1 uncordoned NAME STATUS ROLES AGE VERSION k8s-1 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07 k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-2 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73
Rollback in case of failure
At this point, let’s assume that a hypothetical error has occured, and we observe that our Kubernetes cluster is not behaving as it should (e.g. connectivity issues, pods getting to error state, increase number of error logs on the upgraded node, etc). In that case, it may be required to rollback the node back to the previous version (for our example, 1.22
).
With MicroK8s, this is as simple as running sudo snap revert microk8s
:
-
If the node has any new workloads, make sure to drain before any changes:
microk8s kubectl drain k8s-1
-
Revert to previous MicroK8s version. This will re-installed the previous snap revision, and restore all configuration files of the control plane services:
sudo snap revert microk8s microk8s kubectl get node
microk8s reverted to v1.22.3 NAME STATUS ROLES AGE VERSION k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-2 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-1 Ready,SchedulingDisabled <none> 19d v1.22.3-3+9ec7c40ec93c73
Upgrade second node
Follow the same steps as previously. Note that all kubectl
commands can run from any node in the cluster.
-
Drain and cordon the node
microk8s kubectl drain k8s-2 --ignore-daemonsets
-
Ensure that all workloads have been moved to other cluster nodes:
microk8s kubectl get node microk8s kubectl get pod -o wide
Notice the output, showing
SchedulingDisabled
fork8s-2
, and no pods running on it.NAME STATUS ROLES AGE VERSION k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-2 Ready,SchedulingDisabled <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-1 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f0 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-7848d4b86f-wsdws 1/1 Running 0 96m 10.1.13.71 k8s-3 <none> <none> microbot-fdcc4594f-mlqr7 1/1 Running 0 94m 10.1.13.73 k8s-3 <none> <none> microbot-fdcc4594f-nxx9j 1/1 Running 0 92m 10.1.13.74 k8s-3 <none> <none> microbot-fdcc4594f-f2jmm 1/1 Running 0 92m 10.1.13.75 k8s-3 <none> <none> microbot-fdcc4594f-zl2sl 1/1 Running 0 91m 10.1.13.76 k8s-3 <none> <none> microbot-fdcc4594f-dx2pk 1/1 Running 0 91m 10.1.13.78 k8s-3 <none> <none> nginx-7848d4b86f-lsjq6 1/1 Running 0 33m 10.1.231.204 k8s-1 <none> <none> microbot-fdcc4594f-h9cjg 1/1 Running 0 33m 10.1.231.205 k8s-1 <none> <none> microbot-fdcc4594f-98vnj 1/1 Running 0 33m 10.1.231.207 k8s-1 <none> <none> microbot-fdcc4594f-glvcm 1/1 Running 0 33m 10.1.231.208 k8s-1 <none> <none> microbot-fdcc4594f-m5wzj 1/1 Running 0 33m 10.1.231.209 k8s-1 <none> <none> microbot-fdcc4594f-7n5k5 1/1 Running 0 33m 10.1.231.210 k8s-1 <none> <none> nginx-7848d4b86f-skshj 1/1 Running 0 33m 10.1.231.211 k8s-1 <none> <none>
-
Upgrade. This command must run on
k8s-2
:sudo snap refresh microk8s --channel 1.23/candidate
-
Verify that
k8s-2
is also now running on1.23.0
:microk8s kubectl uncordon k8s-2 microk8s kubectl get node
node/k8s-2 uncordoned NAME STATUS ROLES AGE VERSION k8s-1 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07 k8s-3 Ready <none> 19d v1.22.3-3+9ec7c40ec93c73 k8s-2 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07
Upgrade third node
The process is exactly the same as with the previous two nodes:
-
Drain and cordon the node
microk8s kubectl drain k8s-3 --ignore-daemonsets
-
Verify all workloads have been evicted. No pods should be shown running on
k8s-3
:microk8s kubectl get node microk8s kubectl get pod -o wide
-
Upgrade. This command must run on
k8s-3
:sudo snap refresh microk8s --channel 1.23/candidate
-
Verify that
k8s-3
is also now running on1.23.0
:microk8s kubectl uncordon k8s-3 microk8s kubectl get node
node/k8s-3 uncordoned NAME STATUS ROLES AGE VERSION k8s-1 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07 k8s-3 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07 k8s-2 Ready <none> 19d v1.23.0-rc.0.2+f4d3c97c512f07
Verify by deploying new workloads
We will verify that our cluster is still working as expected by creating a new microbot-2
test deployment:
microk8s kubectl create deploy --image dontrebootme/microbot:v1 microbot-2
microk8s kubectl scale deploy microbot-2 --replicas 4
microk8s kubectl expose deploy microbot-2 --port 80 --type NodePort
deployment.apps/microbot-2 created
deployment.apps/microbot-2 scaled
service/microbot-2 exposed
After deployment is finished, our services should look like this:
microk8s kubectl get pod -l app=microbot-2 -o wide
microk8s kubectl get svc microbot-2
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
microbot-2-5484459568-g299z 1/1 Running 0 43m 10.1.13.83 k8s-3 <none> <none>
microbot-2-5484459568-2dj7z 1/1 Running 0 43m 10.1.200.214 k8s-2 <none> <none>
microbot-2-5484459568-52cgn 1/1 Running 0 43m 10.1.231.212 k8s-1 <none> <none>
microbot-2-5484459568-nb52k 1/1 Running 0 43m 10.1.13.84 k8s-3 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
microbot-2 NodePort 10.152.183.221 <none> 80:30032/TCP 42m
Finally, we use the NodePort
service we just created from all 3 cluster nodes, and verify that load balancing to the 4 deployed pods is working:
curl --silent k8s-1:30032 | grep hostname
curl --silent k8s-1:30032 | grep hostname
curl --silent k8s-2:30032 | grep hostname
curl --silent k8s-2:30032 | grep hostname
curl --silent k8s-3:30032 | grep hostname
In the output, we can see that different pods are answering on each curl
.
<p class="centered">Container hostname: microbot-2-5484459568-g299z</p>
<p class="centered">Container hostname: microbot-2-5484459568-nb52k</p>
<p class="centered">Container hostname: microbot-2-5484459568-52cgn</p>
<p class="centered">Container hostname: microbot-2-5484459568-2dj7z</p>
<p class="centered">Container hostname: microbot-2-5484459568-2dj7z</p>