a solution to the severe throughput loss caused by I/O-bandwidth management

paolo · March 6, 2019, 12:22pm

Hi,
I’m a developer of the BFQ I/O scheduler [1, 2], and this topic is about
the techniques used to guarantee I/O bandwidth to clients, containers,
virtual machines and any other type of entities accessing shared
storage. I’ve been suggested to create this topic by a Google storage
manager, who thinks that my results might be interesting for some of
you guys, or maybe even useful for some company.

I found out that the above techniques entail dramatic throughput
losses: up to 80-90% of the throughput reachable by your storage.
Experts, including people working on these techniques, confirmed this
severe underutilization. So I decided to analyze the problem, and put
my results in a short article:
http://ow.ly/vsrW50mBAGl

On the bright side, I also tried to extend BFQ so as to become a
viable solution to this problem. BFQ now seems to reduce the loss to
just 10%. This result is reported in the article too.

If someone has any question, I’ll be happy to answer, if I can

Thanks,
Paolo

[1] https://www.kernel.org/doc/Documentation/block/bfq-iosched.txt
[2] https://algo.ing.unimo.it/people/paolo/disk_sched/results.php

rata · March 7, 2019, 2:33am

Very interesting, thanks a lot for sharing!

Just curious, do you know which managed service or distribution is using bfq?

Some flatcar Linux I have is using CFQ, probably container Linux does the same. Don’t have others handy

paolo · March 7, 2019, 3:12pm

Thank you for reading my post!

I guess you can now find bfq in every distribution. A different story is whether the distribution is configured to use it as the default I/O scheduler fr every device. More and more distributions are being configured this way. SUSE and Fedora are two examples of top distributions that are currently switching to, or considering to switch to bfq as default I/O scheduler. Chromium OS has already done it, but I guess this last example is little relevant in this discussion

As for flatcar and container Linux, I’m afraid they’ll have to give up CFQ soon, as CFQ belongs to the (now) old legacy Linux I/O stack, which is gone from Linux 5.0. This is actually one of the reasons why several distributions are considering BFQ, as it is the most natural replacement for CFQ.

Should I have news soon, I’ll try to remember to share them in this discussion.

rata · March 7, 2019, 4:19pm

Thank you! And thanks a lot for the examples

Yes, please share in the future too!

dlezcano · March 8, 2019, 8:21am

Hi Paolo,
thanks for the post, very interesting.

Topic		Replies	Views
How to manage disk throughput in Kubernetes General Discussions	1	424	April 25, 2022
I/O Fencing on k8s General Discussions	0	570	October 24, 2022
Not able to disable caching during fio I/O test (gVisor vs.runc) General Discussions	0	2502	May 4, 2022
Question: how to request a minimum IO (NW&disk) resources in container deployment General Discussions docs , development , network	0	632	December 8, 2021
Self congestion, fq_codel, bufferbloat, network namespaces and containers General Discussions	1	1061	May 29, 2019

a solution to the severe throughput loss caused by I/O-bandwidth management

Related topics