Sometimes we need to backup etcd. In addition to that, we need the certificates and optionally the kubeadm configuration file for easily restoring the master. If you set up your cluster using kubeadm (with no special configuration) you can do it similar to this:
# Backup certificates
sudo cp -r /etc/kubernetes/pki backup/
# Make etcd snapshot
sudo docker run --rm -v $(pwd)/backup:/backup \
--network host \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd-amd64:3.2.18 \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /backup/etcd-snapshot-latest.db
# Backup kubeadm-config
sudo cp /etc/kubeadm/kubeadm-config.yaml backup/
Note that the contents of the backup folder should then be stored somewhere safe, where it can survive if the master is completely destroyed. You perhaps want to use e.g. AWS S3 (or similar) for this.
So what is really going on here? There are three commands in the example and all of them should be run on the master node. The first one copies the folder containing all the certificates that kubeadm creates. These certificates are used for secure communications between the various components in a Kubernetes cluster. The final command is optional and only relevant if you use a configuration file for kubeadm. Storing this file makes it easy to initialize the master with the exact same configuration as before when restoring it.
- Use the host network in order to access 127.0.0.1:2379, where etcd is exposed (–network host)
- Mount the backup folder where we want to save the snapshot (-v $(pwd)/backup:/backup)
- Mount the folder containing the certificates needed to access etcd (-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd)
- Specify the correct etcd API version as environment variable (–env ETCDCTL_API=3)
- The actual command for creating a snapshot (etcdctl snapshot save /backup/etcd-snapshot-latest.db)
- Some flags for the etcdctl command
- Specify where to connect to (–endpoints=https://127.0.0.1:2379)
- Specify certificates to use (–cacert=…, --cert=…, --key=…)
Restore a single master
When the time has come to restore the master, just copy everything back from the backup and initiate the master again. If you want to simulate a master failing you can for example run “kubeadm reset” for a “soft” destruction. But if you really want to make sure you can set it up from zero, you should delete the VM or format the disk. In this case you must remember to do all the prerequisites before initializing it again (e.g. install kubeadm).
The restoration may look something like this:
# Restore certificates
sudo cp -r backup/pki /etc/kubernetes/
# Restore etcd backup
sudo mkdir -p /var/lib/etcd
sudo docker run --rm \
-v $(pwd)/backup:/backup \
-v /var/lib/etcd:/var/lib/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd-amd64:3.2.18 \
/bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; mv /default.etcd/member/ /var/lib/etcd/"
# Restore kubeadm-config
sudo mkdir /etc/kubeadm
sudo cp backup/kubeadm-config.yaml /etc/kubeadm/
# Initialize the master with backup
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd \
--config /etc/kubeadm/kubeadm-config.yaml
This is pretty much a reversal of the previous steps. Certificates and kubeadm configuration file are restored from the backup location simply by copying files and folders back to where they were. For etcd we restore the snapshot and then move the data to /var/lib/etcd, since that is where kubeadm will tell etcd to store its data.
Note that we have to add an extra flag to the kubeadm init command (–ignore-preflight-errors=DirAvailable–var-lib-etcd) to acknowledge that we want to use the pre-existing data.