Node Disk IO saturation

frit0-rb · September 30, 2024, 9:15am

Asking for help? Comment out what you need so we can get more information to help you!

Hi guys, I hace 3 cluster deployed in virtual machines behind Hyper-V hosts. This cluster runs with RKE2.

I can see during last weeks, every day at 00:00 and 12:00 I received alerts from Prometheus talking about Node disk IO saturation. Looks like this issue comes from etcd backups.

How this is possible? Are there any way to prevent this issues?

When this issue happened sometime one or two workers have problems to recreate some pods o unattached a volume from it. This is very frustrate.

I don’t know what to do

Thanks in adavance

Cluster information:

Kubernetes version: 1.27.15
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: RKE2
Host OS: Redhat 8.10
CNI and version:
CRI and version:

You can format your yaml by highlighting it and pressing Ctrl-Shift-C, it will make your output easier to read.

jayeshmahajan · September 30, 2024, 11:16pm

Try following to see it it helps

If your etcd instances are running on the same nodes as your worker nodes, consider isolating etcd onto its own set of dedicated control plane nodes. This will ensure that the etcd backup process doesn’t interfere with other Kubernetes operations like pod scheduling or volume detachments.
Try ionice if its supported on your system to give low priority for backup process (As long as you are ok for backup to take some time by giving priority to pods). configure I/O limits or use tools like ionice or cgroups to reduce the I/O priority of the backup process. This will allow critical workloads like pod scheduling or volume detachment to have priority access to disk I/O.

ionice -c3 /path/to/your-backup-script.sh # Example of setting low priority I/O class
3. If delta or incremental backup is an option, give it a try.

Topic		Replies	Views
Etcd massive disk io General Discussions	0	472	March 17, 2021
Out of the Clouds onto the Ground: How to Make Kubernetes Production Grade Anywhere General Discussions k8s-blog	2	2425	November 18, 2018
100,000+ K8s nodes General Discussions	9	1734	January 29, 2019
What is a good practice to separate different pods on different hard disk of the same node General Discussions	1	1087	May 29, 2021
Start regular pods after all daemonset pods are Running General Discussions	6	5570	November 16, 2018

Node Disk IO saturation

Cluster information:

Related topics