Etcdserver: mvcc: database space exceeded

Hello,
we had the same issue today and I was able to solve it as follows:

On my cluster manager node:

ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	endpoint status --write-out=table

Output looks like this:

+------------------------+------------------+---------+---------+-----------+-----------+------------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://127.0.0.1:2379 | 123456789ABCDEFG |   3.5.6 |  2.1 GB |      true |       985 |  269601165 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+

etcd database is really big, exceeds quota. Check the alarm message of etcd:

ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	alarm list

Output shows the alarm: NOSPACE:

memberID:123456789ABCDEFG alarm:NOSPACE

Get the current revision and compact it and defrag the database:

REVISION=$(ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	compact ${REVISION}
ETCDCTL_API=3 etcdctl \
	--endpoints=https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/peer.crt \
	--key=/etc/kubernetes/pki/etcd/peer.key \
	defrag

Output should look like this:

compacted revision 111111111
Finished defragmenting etcd member[https://127.0.0.1:2379]

When you now repeat the status request from the beginning, it shoul look like this:

+------------------------+------------------+---------+---------+-----------+-----------+------------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://127.0.0.1:2379 | 123456789ABCDEFG |   3.5.6 |  228 MB |      true |       985 |  269601165 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+

Note that the DB SIZE is now a lot smaller again.
After that, my cluster worked again.
We will now activate auto compacting and to be sure, increase the quota. See details for etcd maintenance here: etcd maintenance docs

Hope this helps someone solve their problems, too.

Kind regards
Timo (b+m Informatik AG)

1 Like