[LTP] [PATCH v2] io: fix really slow dio_sparse on certain systems

Andrea Cervesato andrea.cervesato@suse.com
Mon Jan 26 16:07:15 CET 2026


Hi!

On Mon Jan 26, 2026 at 12:42 PM CET, Cyril Hrubis wrote:
> Hi!
> > The reason why dio_sparse is happening to be slow on certain systems is
> > that, if data buffering is slow, we run more buffered read() for one
> > single dio write(). This slows down the whole test, because for each
> > read() we always need to move data from kernel space to user space.
>
> I guess it's not about slow buffering. What I suppose happens is that
> every time the writer thread writes with O_DIRECT it invalidates the
> page cache and we have to re-read everything from disk. Which measn that
> the data are often removed from the cache between the reads and the
> reader processes are often forced to re-read the data from the disk. If
> there was no O_DIRECT reader thread the first child that happens to read
> a file block would cause kernel to put it into the page cache and all
> other children would just copy that data without a need to reach the
> disk at all.

This is definetly a possible solution. I sent this patch by waiting for
some feebacks in order to have other opinions. What puzzles me is that
it's only happening in POWER10 on a random node during kernel tests.
Other architectures seem to work fine.

kernel 6.6+ seems to be the affected one.

>
> However the test should finish as fast as the writer finishes writing
> the file. So slow readers shouldn't matter unless there is some serious
> contention on the disk I/O. That's probably the reason you are aligning
> the writer as well.

Exactly, I would expect that.

>
> What is the difference in runtime between test before and after this
> patch on the slow hardware?

DS009 from 4 hours to 30 seconds. I also profiled the list of syscalls
with perf, obtaining a 63+ % of io_read() time consumption. Still, this
patch moves the execution from ~10 secs to ~3 secs on my laptop. There's
a big difference between 4h and 10 secs runtime, no matter the hard disk
which is running below.

>
> The only thing I wonder about is that if we aren't dropping some
> coverage along with speeding up the test. For the reading part I guess
> it doesn't matter that much how big the blocks are (if we speed up the
> test we finish faster and do less operations, but that is something we
> can live with). If we align the writer it may write directly whole
> blocks instead of reading a block, modifying it and writing it back.
> Looking at the runtest files, we do have dio_sparse there with a
> different write block sizes, so the default shouldn't matter that much,
> so why do we bother changing it?

My patch wants to be a default way to fix the problem for all the cases,
instead of adding parameters inside runtest file.


-- 
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com



More information about the ltp mailing list