[LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
Li Wang
liwang@redhat.com
Wed Feb 15 10:45:12 CET 2017
On Wed, Feb 15, 2017 at 5:38 PM, Li Wang <liwang@redhat.com> wrote:
> On Tue, Feb 14, 2017 at 10:06 PM, Jan Stancek <jstancek@redhat.com> wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Cyril Hrubis" <chrubis@suse.cz>
>>> To: "Li Wang" <liwang@redhat.com>
>>> Cc: richiejp@f-m.fm, ltp@lists.linux.it
>>> Sent: Monday, 13 February, 2017 10:08:37 AM
>>> Subject: Re: [LTP] madvise07.c:72: FAIL: Did not receive SIGBUS
>>>
>>> Hi!
>>> > I'm trying to run ltp on upstream kernel-4.10.0-rc7, and found that
>>> > madvise07 always failing with no SIGBUS received when mmap the PRIVATE
>>> > memory. I hope to know if there're some relevant stuff about this
>>> > issue.
>>> > Any discussion or document for that?
>>>
>>> Looks like a plain old kernel bug to me.
>>
>> Or maybe MADV_HWPOISON is supposed to work only for faulted-in pages?
>
> Looks like this thought is reasonable. Since the flag MAP_PRIVATE
> creates a private copy-on-write page mapping, it means the testcase
> will poison the read-only empty zero page many times if we reserve
> more than one page. I did a test and verify that imagination.
>
> e.g Only running madvise07 PRIVATE part with 4pages on rhel7.3
>
> # dmesg
> [ 62.322637] Injecting memory failure for page 1c9d at 7f0594254000
> [ 62.329660] MCE 0x1c9d: reserved kernel page still referenced by 1 users
> [ 62.337143] MCE 0x1c9d: reserved kernel page recovery: Failed
> [ 91.505460] Injecting memory failure for page 1c9d at 7f09ab16e000
> [ 91.512363] MCE 0x1c9d: already hardware poisoned
> [ 91.517620] Injecting memory failure for page 1c9d at 7f09ab16f000
> [ 91.524516] MCE 0x1c9d: already hardware poisoned
> [ 91.529763] Injecting memory failure for page 1c9d at 7f09ab170000
> [ 91.536659] MCE 0x1c9d: already hardware poisoned
>
>
>
> And a patch in upstream kernel to fix a similar problem like that, it
> make sense to fix our LTP case madvise07.c.
>
> commit 29b4eedee67b449534214058e1bcb36307a7f1dc
> Author: Wanpeng Li <liwanp@linux.vnet.ibm.com>
> Date: Wed Sep 11 14:22:59 2013 -0700
>
> mm/hwpoison.c: fix held reference count after unpoisoning empty zero page
>
>
>
>> It works fine for me with change below:
>>
>> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c b/testcases/kernel/syscalls/madvise/madvise07.c
>> index 2f8c42e..f5fd4b7 100644
>> --- a/testcases/kernel/syscalls/madvise/madvise07.c
>> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
>> @@ -44,13 +44,13 @@ static int maptypes[] = {
>>
>> static void run_child(int maptype)
>> {
>> - const size_t msize = 4096;
>> + const size_t msize = getpagesize();
>> void *mem = NULL;
>>
>> mem = SAFE_MMAP(NULL,
>> msize,
>> PROT_READ | PROT_WRITE,
>> - MAP_ANONYMOUS | maptype,
>> + MAP_ANONYMOUS | maptype | MAP_POPULATE,
>> -1,
>> 0);
>>
>
> An other way I propose to fix the problem is just to using the page
> before madvise():
>
> $ git diff
> diff --git a/testcases/kernel/syscalls/madvise/madvise07.c
> b/testcases/kernel/syscalls/madvise/madvise07.c
> index 2f8c42e..0ed5307 100644
> --- a/testcases/kernel/syscalls/madvise/madvise07.c
> +++ b/testcases/kernel/syscalls/madvise/madvise07.c
> @@ -54,6 +54,8 @@ static void run_child(int maptype)
> -1,
> 0);
>
> + *((char *)mem) = 'a';
> +
> tst_res(TINFO, "madvise(%p, %zu, MADV_HWPOISON)", mem, msize);
> if (madvise(mem, msize, MADV_HWPOISON) == -1) {
> if (errno == EINVAL)
>
Attach this patched madvise07 result below:
# ./madvise07
tst_test.c:792: INFO: Timeout per run is 0h 05m 00s
madvise07.c:54: INFO: madvise(0x7f864a116000, 4096, MADV_HWPOISON)
madvise07.c:88: PASS: madvise(..., MADV_HWPOISON) on MAP_PRIVATE memory
madvise07.c:54: INFO: madvise(0x7f864a116000, 4096, MADV_HWPOISON)
madvise07.c:88: PASS: madvise(..., MADV_HWPOISON) on MAP_SHARED memory
Summary:
passed 2
failed 0
skipped 0
warnings 0
# dmesg
[ 636.254254] Injecting memory failure for page 223cfd at 7f864a116000
[ 636.261400] MCE 0x223cfd: dirty LRU page recovery: Recovered
[ 636.267722] MCE: Killing madvise07:2498 due to hardware memory
corruption fault at 7f864a116000
[ 636.277674] Injecting memory failure for page 223d18 at 7f864a116000
[ 636.284811] MCE 0x223d18: dirty LRU page recovery: Recovered
[ 636.291133] MCE: Killing madvise07:2499 due to hardware memory
corruption fault at 7f864a116000
Regards,
Li Wang
More information about the ltp
mailing list