Etcdserver: mvcc: database space exceeded

Asking for help? Comment out what you need so we can get more information to help you!

Cluster information:

Kubernetes version: 1.23.6
Cloud being used: (put bare-metal if not on a public cloud) : Bare-metal
Installation method: Using Kubeadm
Host OS: Ubuntu
CNI and version:
CRI and version:

Hi Team,
This is very URGENT!!
The whole cluster is down due to the error “etcdserver: mvcc: database space exceeded” .
Nothing is working.
Could you please help me in this.
Hoping for solutions.

Regards,
Ankit

@ankitkaushik94 have you able to fix this issue i am also getting same issue

Hello,
we had the same issue today and I was able to solve it as follows:

On my cluster manager node:

ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	endpoint status --write-out=table

Output looks like this:

+------------------------+------------------+---------+---------+-----------+-----------+------------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://127.0.0.1:2379 | 123456789ABCDEFG |   3.5.6 |  2.1 GB |      true |       985 |  269601165 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+

etcd database is really big, exceeds quota. Check the alarm message of etcd:

ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	alarm list

Output shows the alarm: NOSPACE:

memberID:123456789ABCDEFG alarm:NOSPACE

Get the current revision and compact it and defrag the database:

REVISION=$(ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	compact ${REVISION}
ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	defrag

Output should look like this:

compacted revision 111111111
Finished defragmenting etcd member[https://127.0.0.1:2379]

When you now repeat the status request from the beginning, it shoul look like this:

+------------------------+------------------+---------+---------+-----------+-----------+------------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://127.0.0.1:2379 | 123456789ABCDEFG |   3.5.6 |  228 MB |      true |       985 |  269601165 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+

Note that the DB SIZE is now a lot smaller again.
After that, my cluster worked again.
We will now activate auto compacting and to be sure, increase the quota. See details for etcd maintenance here: etcd maintenance docs

Hope this helps someone solve their problems, too.

Kind regards
Timo (b+m Informatik AG)

1 Like