[LTP] [PATCH] madvise06: Raise the bar for judging failure

Richard Palethorpe rpalethorpe@suse.de
Mon Feb 27 12:33:18 CET 2023


Hell Li,

Li Wang <liwang@redhat.com> writes:

> There is an intermittent failure which we have observed many times whether
> on rhel or mainline kernel. But we're unable to stable reproduce it:
>
>     43	madvise06.c:201: TFAIL: less than 102400 Kb were moved to the swap cache
>     ...
>
> However it does not look like a kernel issue, because SwapCached change is
> not strictly abiding by the principle of MADV_WILLNEED advice. That means it
> all depends on the kernel's specific circumstances. The value of the threshold
> is debatable at least from my point of view, its use 1/4 is not guaranteed
> 100% safe.
>
> As MADV_WILLNEED is just advice to the kernel, not a guarantee. The kernel may
> choose to ignore the advice, or may prioritize other memory management tasks
> over pre-loading the advised pages.
>
> So this patch is aimed at improving the accuracy and clarity of the test results.
> Specifically, the use of two separate variables to track the results of different
> comparisons will make it easier to understand what the test is doing.
>
> Additionally, the change to report a test result of "TINFO" instead of "TFAIL"
> when the swap cache size is less than expected would be intended to indicate
> that this is an acceptable outcome.
>
> Finally, the change to the second tst_res call is intended to make the test more
> lenient, as it now passes if either no page faults occur or the swap cache size
> is larger than expected.

Why not skip to making them all TINFO?

It's undefined what action will result from MADV_WILLNEED. If it were
better for performance *not* to read in pages, then it would be valid
for the kernel to ignore it.

Yang Xu added a tag for a perf regression that it could
reproduce. However looking at the kernel commit this was first found by
stress-ng.

commit 66383800df9cbdbf3b0c34d5a51bf35bcdb72fd2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Nov 21 22:17:22 2020 -0800

    mm: fix madvise WILLNEED performance problem

    The calculation of the end page index was incorrect, leading to a
    regression of 70% when running stress-ng.

    With this fix, we instead see a performance improvement of 3%

I found a bug with this test, but it was causing an Oops. It wouldn't
matter if the test printed pass or fail.

So I think we are wasting our time by constantly tweaking this test.

-- 
Thank you,
Richard.


More information about the ltp mailing list