ETCD - backup and restore management

i tried to backup etcd to local cluster, but it seems only endpoint didn’t work properly, only this command work for me

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot save 

are those certs mandatory to backup the etcd db?

while restoring, why these additional parameters need to be passed in, such as:

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --name=master \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     --data-dir /var/lib/etcd-from-backup \
     --initial-cluster=master=https://127.0.0.1:2380 \
     --initial-cluster-token etcd-cluster-1 \
     --initial-advertise-peer-urls=https://127.0.0.1:2380 \
     snapshot restore ...

when kubelet is not working properly, can use
journalctl -u kubelet to check the status, the config file normally reside on /var/lib/kubernetes, apiserver url port should be 6443 in general, then can use systemctl restart kubelet to reboot the service and check the status

etcd server sometimes has different ca from kube-apiserver, do check carefully for that

1 Like
set number
set tabstop=2
set expandtab
set shiftwidth=2

for vim easier editing

hi, can you provide the link for

ETCDCTL_API=3 etcdctl --endpoints=https://....

Here is the link Operating etcd clusters for Kubernetes | Kubernetes

This complete setup to back and validate the satus of back

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key member list

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot status

alias k=‘kubectl’
alias kgp=‘kubectl get pods’
alias kgs=‘kubectl get service’
alias kd=‘kubectl delete’
alias kcf=‘kubectl create -f’
alias kaf=‘kubectl apply -f’
alias kgpa=‘kubectl get pods --all-namespaces’

vi ~/.vimrc
set number
set tabstop=2
set expandtab
set shiftwidth=2
set cursorline

1 Like

Here’s what my lecturer told me on the steps
To make use of etcdctl for tasks such as back up and restore, make sure that you set the ETCDCTL_API to 3.

You can do this by exporting the variable ETCDCTL_API prior to using the etcdctl client. This can be done as follows:

Backup

master $ export ETCDCTL_API=3
master $ etcdctl -h | grep -A 1 API
    API VERSION:
            3.3
master $
master $ head -n 35 /etc/kubernetes/manifests/etcd.yaml  | grep -A 20 containers
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.17.0.12:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://172.17.0.12:2380
    - --initial-cluster=master=https://172.17.0.12:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.12:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.17.0.12:2380
    - --name=master
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.4.3-0

master $ etcdctl \
> --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/kubernetes/pki/etcd/ca.crt \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> snapshot save /tmp/snapshot-pre-boot.db
Snapshot saved at /tmp/snapshot-pre-boot.db
master $

Restore, while referencing the configuration from /etc/kubernetes/manifests/etcd.yaml and
adding in --initial-cluster-token=etcd-cluster-1
and
modifying --data-dir=/var/lib/etcd to point to a new location: --data-dir=/var/lib/etcd-from-backup

ETCDCTL_API=3 etcdctl snapshot restore /tmp/snapshot-pre-boot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \ 
--name=master \
--data-dir=/var/lib/etcd-from-backup \
--initial-cluster=master=https://127.0.0.1:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380

Next edit /etc/kubernetes/manifests/etcd.yaml and replace all data-dir entries that have /var/lib/etcd with /var/lib/etcd-from-backup
Next add this line --initial-cluster-token=etcd-cluster-1 to the container configuration section
image

Next validate that cluster is restore with kubectl get all --all-namespaces.

It may take a while for the restore to complete depending on how large it is

1 Like

I tried all ways above my etcd comes up in docker ps -a | grep etcd …
But i am etcd static pod does not comes up , it shows me in pending state. can you tell me why …

Kindly let me know

I had the exact same issue.
The way to resolve is:
update the volumeMounts to reflect the new data path. In this case the new data directory of “/var/lib/etcd-from-backup”
So, the new mountVolume section looks like this:

volumeMounts:
- mountPath: /var/lib/etcd-from-backup
  name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
  name: etcd-certs
volumes:
- hostPath:
  path: /etc/kubernetes/pki/etcd
  type: DirectoryOrCreate
name: etcd-certs
- hostPath:
  path: /var/lib/etcd-from-backup
  type: DirectoryOrCreate
name: etcd-data

Hope that helps!

1 Like

See the following link which has all the steps for backup/restore of etcd in case of a disaster occurs.

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot save /tmp/snapshot-pre-boot.db
ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --name=master \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     --data-dir /var/lib/etcd-from-backup \
     --initial-cluster=master=https://127.0.0.1:2380 \
     --initial-cluster-token=etcd-cluster-1 \
     --initial-advertise-peer-urls=https://127.0.0.1:2380 \
     snapshot restore /tmp/snapshot-pre-boot.db

What’s working for me were:
Step 1: save db file
ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/snapshot-pre-boot.db

Step2: restore db file
ETCDCTL_API=3 etcdctl snapshot restore /opt/snapshot-pre-boot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --name=master \ --data-dir=/var/lib/etcd-from-backup \ --initial-cluster=master=https://127.0.0.1:2380 \ --initial-cluster-token=etcd-cluster-1 \ --initial-advertise-peer-urls=https://127.0.0.1:2380

Step 3, edit the etcd yaml file. Starts with identifying the lines needs to be updated:
cat /etc/kubernetes/manifests/etcd.yaml | grep -i lib/etcd -n
replace all with lib/etcd-from-back
and then add the initial cluster token to the spec.containers[0].command list
- --initial-cluster-token=etcd-cluster-1

1 Like
ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot save /tmp/snapshot-pre-boot.db
ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --name=master \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     --data-dir /var/lib/etcd-from-backup \
     --initial-cluster=master=https://127.0.0.1:2380 \
     --initial-cluster-token etcd-cluster-1 \
     --initial-advertise-peer-urls=https://127.0.0.1:2380 \
     snapshot restore /tmp/snapshot-pre-boot.db

Backup

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     snapshot save /opt/snapshot-pre-boot.db

Restore ETCD Snapshot to a new folder

ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
     --data-dir /var/lib/etcd-from-backup \
     snapshot restore /opt/snapshot-pre-boot.db

Modify /etc/kubernetes/manifests/etcd.yaml

 volumes:
  - hostPath:
      path: /var/lib/etcd-from-backup
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
1 Like

ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.crt --cert=/etc/etcd/etcd.crt --key=/etc/etcd/etcd.key snapshot save /tmp/snapshot.db

etcdctl --write-out=table snapshot status /tmp/snapshot.db

cd /tmp
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.crt --cert=/etc/etcd/etcd.crt --key=/etc/etcd/etcd.key snapshot restore snapshot.db

Step 6. Remove the files from /var/lib/etcd directory

systemctl stop etcd

rm -rf /var/lib/etcd/*

Step 7. Copy the files from restored snapshot to /var/lib/etcd

mv /tmp/default.etcd/* /var/lib/etcd

8. Start ETCD Service

systemctl start etcd

9. Verify the restore

ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/ca.crt --cert=/etc/etcd/etcd.crt --key=/etc/etcd/etcd.key get course

I generally get Error: expected sha256 when trying to restore from backup. In that case, I used --skip-hash-check=true and it worked for me. Here are my steps

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt
–cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
snapshot save /tmp/etcd-backup.db

ETCDCTL_API=3 etcdctl --write-out=table snapshot status /tmp/etcd-backup.db

ETCDCTL_API=3 etcdctl --data-dir /var/lib/etcd-from-backup
snapshot restore /tmp/etcd-backup.db

Use “–skip-hash-check=true” in restore command if you get Error: expected sha256

Modify /etc/kubernetes/manifests/etcd.yaml
volumes:

  • hostPath:
    path: /var/lib/etcd-from-backup
    type: DirectoryOrCreate
    name: etcd-data
  • hostPath:
    path: /etc/kubernetes/pki/etcd
    type: DirectoryOrCreate
    name: etcd-certs

Simple solution:

after you restore the etcd backup to lets say : /var/lib/last-backup directory , after that you get into the static manifest and update the Hostpath and that will reflect to you etcd container.

why do we need to get initial-cluster-token ? is it something specific to cluster or can we give any name ?

I think you have to also get permission for the directory which you removed so:
sudo chown -R etcd:etcd /var/lib/etcd**

All steps to backup and restore etcd.

  1. verify cluster name
ETCDCTL_API=3 etcdctl get cluster.name \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/etcd/ca.pem \
--cert=/etc/kubernetes/etcd/server.crt \
--key=/etc/kubernetes/etcd/server.key
  1. backup etcd
ETCDCTL_API=3 etcdctl snapshot save etcd_backup.db \
--endpoints=https://10.0.1.101:2379 \
--cacert=/etc/kubernetes/etcd/ca.pem \
--cert=/etc/kubernetes/etcd/server.crt \
--key=/etc/kubernetes/etcd/server.key
  1. stop etcd and remove existing data

sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd

  1. restore
sudo ETCDCTL_API=3 etcdctl snapshot restore etcd_backup.db \
--initial-cluster etcd-restore=https://127.0.0.1:2380 \
--initial-advertise-peer-urls https://127.0.0.1:2380 \
--name etcd-restore \
--data-dir /var/lib/etcd
  1. set ownership to dircetory

sudo chown -R etcd:etcd /var/lib/etcd

  1. start etcd

sudo systemctl start etcd

Hi. while taking the backup and trying to save to a .db file… error was showing no access. Any idea of this error

backup:-

export ETCDCTL_API=3

etcdctl snapshot save --endpoints=127.0.0.1:2379
–cacert=/etc/kubernetes/pki/etcd/ca.crt
–cert=/etc/kubernetes/pki/etcd/server.crt
–key=/etc/kubernetes/pki/etcd/server.key
/opt/snapshot-pre-boot.db

restore:-

etcdctl snapshot restore --data-dir /var/lib/etcd-from-backup /opt/snapshot-pre-boot.db

update the directory path in etcd-data in etc/kubernetes/pki/manifest/etcd.yaml file.