<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
<pre>on 2019/06/24 10:43, Li Wang wrote:</pre>
<blockquote
cite="mid:CAEemH2c+CWAOmAH=1WT+GR-iZ=5RoDcCmD=-zBpc63PHg6xXyQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_default" style="font-size: small;">Hi Xu
Yang,</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Jun 21, 2019 at 1:58
PM Yang Xu <<a moz-do-not-send="true"
href="mailto:xuyang2018.jy@cn.fujitsu.com">xuyang2018.jy@cn.fujitsu.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin: 0px 0px 0px
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;"><br>
> Hi Li Wang,<br>
><br>
> Thank you for maintaining the testcase.<br>
><br>
> Recently (since 4.19) we have a semantics change on the
return value of<br>
> madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when
hugepage migration<br>
> succeeded and error containment failed:<br>
><br>
> commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca<br>
> Author: Naoya Horiguchi <<a moz-do-not-send="true"
href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
> Date: Thu Aug 23 17:00:38 2018 -0700<br>
> <br>
> mm: fix race on soft-offlining free huge pages<br>
><br>
> , so we don't have to consider this EBUSY as error, but
a good report<br>
> for application. Your change meets the change.<br>
><br>
> Feel free to add my ack:<br>
><br>
> Acked-by: Naoya Horiguchi <<a moz-do-not-send="true"
href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
><br>
> Thanks,<br>
> - Naoya<br>
><br>
> On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang
wrote:<br>
>> The test#2 is going to simulate the race condition,
where move_pages()<br>
>> and soft offline are called on a single hugetlb
page concurrently. But,<br>
>> it return EBUSY and report FAIL in soft-offline a
moving hugepage as a<br>
>> result sometimes.<br>
>><br>
>> The root cause seems a call to page_huge_active
return false, then the<br>
>> soft offline action will failed to isolate hugepage
with EBUSY return as<br>
>> below call trace:<br>
>><br>
>> In Parent:<br>
>> madvise(..., MADV_SOFT_OFFLINE)<br>
>> ...<br>
>> soft_offline_page<br>
>> soft_offline_in_use_page<br>
>> soft_offline_huge_page<br>
>> isolate_huge_page<br>
>> page_huge_active --> return false
at here<br>
>><br>
>> In Child:<br>
>> move_pages()<br>
>> ...<br>
>> do_move_pages<br>
>> do_move_pages_to_node<br>
>> add_page_for_migration<br>
>> isolate_huge_page --> it has already
isolated the hugepage<br>
>><br>
>> In this patch, I simply regard the returned EBUSY
as a normal situation and<br>
>> mask it in error handler. Because move_pages is
calling add_page_for_migration<br>
>> to isolate hugepage before do migration, so that's
very possible to hit the<br>
>> collision and return EBUSY on the same page.<br>
>><br>
>> Error log:<br>
>> ----------<br>
>> move_pages12.c:235: INFO: Free RAM 8386256 kB<br>
>> move_pages12.c:253: INFO: Increasing 2048kB
hugepages pool on node 0 to 4<br>
>> move_pages12.c:263: INFO: Increasing 2048kB
hugepages pool on node 1 to 6<br>
>> move_pages12.c:179: INFO: Allocating and freeing 4
hugepages on node 0<br>
>> move_pages12.c:179: INFO: Allocating and freeing 4
hugepages on node 1<br>
>> move_pages12.c:169: PASS: Bug not reproduced<br>
>> move_pages12.c:81: FAIL: madvise failed: SUCCESS<br>
>> move_pages12.c:81: FAIL: madvise failed: SUCCESS<br>
>> move_pages12.c:143: BROK:
mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM<br>
>> move_pages12.c:114: FAIL: move_pages failed: EINVAL<br>
>><br>
>> Dmesg:<br>
>> ------<br>
>> [165435.492170] soft offline: 0x61c00 hugepage
failed to isolate<br>
>> [165435.590252] soft offline: 0x61c00 hugepage
failed to isolate<br>
>> [165435.725493] soft offline: 0x61400 hugepage
failed to isolate<br>
>><br>
>> Other two fixes in this patch:<br>
>> * use TERRNO(but not TTERRNO) to catch
madvise(..., MADV_SOFT_OFFLINE) errno<br>
>> * go out test when hugepage allocating failed with
ENOMEM<br>
Hi Li<br>
<br>
Your patch can handle EBUSY errno correctly for soft
offline. <br>
But move page may be killed by SIGBUS because of MCE when
we soft offline concurrently. <br>
That leads to move_page failed with ESRCH. Also, move page
may fails with ENOMEM .<br>
Do you notice it ?<br>
</blockquote>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-size: small;">I
didn't get this failure, it seems not related to this
patch. Two questions:</div>
<div class="gmail_default" style="font-size: small;"><br>
</div>
<div class="gmail_default" style="font-size: small;">1.
which kernel version do you test?</div>
<div class="gmail_default" style="font-size: small;">2. can
you reproduce this without my patch?</div>
</div>
</div>
</div>
</blockquote>
Hi Li<br>
<br>
I test it on 3.10.0-957.el7.x86_64 kvm(my machine was not support
numa and i enable it on kvm. as below: <br>
<cpu mode='custom' match='exact' check='full'><br>
<model fallback='forbid'>Penryn</model><br>
<feature policy='require' name='x2apic'/><br>
<feature policy='require' name='hypervisor'/><br>
<numa><br>
<cell id='0' cpus='0' memory='1048576' unit='KiB'/><br>
<cell id='1' cpus='1' memory='1048576' unit='KiB'/><br>
</numa><br>
</cpu><br>
<br>
Does it only exist on kvm and doesn't exist on physical machine? I
don't have physical machine that supports numa.<br>
<br>
And the fix patch has been merged since 3.10.0-957.el7.x86_64 .<br>
Yes, I can reproduce this without your patch because MCE kills
child process and move_page gets ESRCH error.<br>
<br>
<br>
<blockquote
cite="mid:CAEemH2c+CWAOmAH=1WT+GR-iZ=5RoDcCmD=-zBpc63PHg6xXyQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin: 0px 0px 0px
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<br>
I think ESRCH error can represent the soft offline bug not
reproduce because it don't trigger a crash.<br>
What do you think about it?<br>
</blockquote>
<div><br>
</div>
<div class="gmail_default" style="font-size: small;">Maybe,
but it needs to check details on your kernel.</div>
<blockquote class="gmail_quote" style="margin: 0px 0px 0px
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<br>
err_log:<br>
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s<br>
move_pages12.c:236: INFO: Free RAM 119568 kB<br>
move_pages12.c:254: INFO: Increasing 2048kB hugepages pool
on node 0 to 83<br>
move_pages12.c:264: INFO: Increasing 2048kB hugepages pool
on node 1 to 94<br>
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages
on node 0<br>
move_pages12.c:180: INFO: Allocating and freeing 4 hugepages
on node 1<br>
move_pages12.c:170: PASS: Bug not reproduced<br>
tst_test.c:1141: BROK: Test killed by SIGBUS!<br>
<br>
Summary:<br>
passed 1<br>
failed 0<br>
skipped 0<br>
warnings 0<br>
<br>
move_pages12.c:114: FAIL: move_pages failed: ESRCH<br>
<br>
dmesg<br>
[ 9868.180669] MCE: Killing move_pages12:29616 due to
hardware memory corruption fault at 2aaaaac00018<br>
[ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000<br>
[ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000<br>
[ 9990.060395] Soft offlining page 51000 at 2aaaaac00000<br>
<br>
Kind Regards,<br>
Yang Xu<br>
<br>
>> Signed-off-by: Li Wang <<a
moz-do-not-send="true" href="mailto:liwang@redhat.com"
target="_blank">liwang@redhat.com</a>><br>
>> Cc: Naoya Horiguchi <<a moz-do-not-send="true"
href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
>> Cc: Xiao Yang <<a moz-do-not-send="true"
href="mailto:yangx.jy@cn.fujitsu.com" target="_blank">yangx.jy@cn.fujitsu.com</a>><br>
>> Cc: Yang Xu <<a moz-do-not-send="true"
href="mailto:xuyang2018.jy@cn.fujitsu.com" target="_blank">xuyang2018.jy@cn.fujitsu.com</a>><br>
>> ---<br>
>> .../kernel/syscalls/move_pages/move_pages12.c | 33
++++++++++++++-----<br>
>> 1 file changed, 24 insertions(+), 9 deletions(-)<br>
>><br>
>> diff --git
a/testcases/kernel/syscalls/move_pages/move_pages12.c
b/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
>> index 964b712fb..c446396dc 100644<br>
>> ---
a/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
>> +++
b/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
>> @@ -77,8 +77,8 @@ static void *addr;<br>
>> static int do_soft_offline(int tpgs)<br>
>> {<br>
>> if (madvise(addr, tpgs * hpsz,
MADV_SOFT_OFFLINE) == -1) {<br>
>> - if (errno != EINVAL)<br>
>> - tst_res(TFAIL | TTERRNO,
"madvise failed");<br>
>> + if (errno != EINVAL && errno
!= EBUSY)<br>
>> + tst_res(TFAIL | TERRNO,
"madvise failed");<br>
>> return errno;<br>
>> }<br>
>> return 0;<br>
>> @@ -121,7 +121,8 @@ static void do_child(int tpgs)<br>
>> <br>
>> static void do_test(unsigned int n)<br>
>> {<br>
>> - int i;<br>
>> + int i, ret;<br>
>> + void *ptr;<br>
>> pid_t cpid = -1;<br>
>> int status;<br>
>> unsigned int twenty_percent =
(tst_timeout_remaining() / 5);<br>
>> @@ -136,24 +137,37 @@ static void do_test(unsigned
int n)<br>
>> do_child(tcases[n].tpages);<br>
>> <br>
>> for (i = 0; i < LOOPS; i++) {<br>
>> - void *ptr;<br>
>> + ptr = mmap(NULL, tcases[n].tpages *
hpsz,<br>
>> + PROT_READ |
PROT_WRITE,<br>
>> + MAP_PRIVATE |
MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);<br>
>> + if (ptr == MAP_FAILED) {<br>
>> + if (errno == ENOMEM) {<br>
>> + tst_res(TCONF,<br>
>> + "Cannot
allocate hugepage, memory too fragmented?");<br>
>> + goto out;<br>
>> + }<br>
>> +<br>
>> + tst_brk(TBROK | TERRNO,
"Cannot allocate hugepage");<br>
>> + }<br>
>> <br>
>> - ptr = SAFE_MMAP(NULL, tcases[n].tpages
* hpsz,<br>
>> - PROT_READ | PROT_WRITE,<br>
>> - MAP_PRIVATE | MAP_ANONYMOUS |
MAP_HUGETLB, -1, 0);<br>
>> if (ptr != addr)<br>
>> tst_brk(TBROK, "Failed to mmap
at desired addr");<br>
>> <br>
>> memset(addr, 0, tcases[n].tpages *
hpsz);<br>
>> <br>
>> if (tcases[n].offline) {<br>
>> - if
(do_soft_offline(tcases[n].tpages) == EINVAL) {<br>
>> + ret =
do_soft_offline(tcases[n].tpages);<br>
>> +<br>
>> + if (ret == EINVAL) {<br>
>> SAFE_KILL(cpid,
SIGKILL);<br>
>> SAFE_WAITPID(cpid,
&status, 0);<br>
>> SAFE_MUNMAP(addr,
tcases[n].tpages * hpsz);<br>
>> tst_res(TCONF,<br>
>> "madvise()
didn't support MADV_SOFT_OFFLINE");<br>
>> return;<br>
>> + } else if (ret == EBUSY) {<br>
>> + SAFE_MUNMAP(addr,
tcases[n].tpages * hpsz);<br>
>> + goto out;<br>
>> }<br>
>> }<br>
>> <br>
>> @@ -163,9 +177,10 @@ static void do_test(unsigned
int n)<br>
>> break;<br>
>> }<br>
>> <br>
>> +out:<br>
>> SAFE_KILL(cpid, SIGKILL);<br>
>> SAFE_WAITPID(cpid, &status, 0);<br>
>> - if (!WIFEXITED(status))<br>
>> + if (!WIFEXITED(status) && ptr !=
MAP_FAILED)<br>
>> tst_res(TPASS, "Bug not reproduced");<br>
>> }<br>
>> <br>
>> -- <br>
>> 2.20.1<br>
>><br>
>><br>
><br>
> .<br>
><br>
<br>
<br>
<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>Regards,<br>
</div>
<div>Li Wang<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>