[LTP] [PATCH RFC] move_pages12: handle errno EBUSY for madvise(..., MADV_SOFT_OFFLINE)

Thu Jun 27 04:50:30 CEST 2019

on 2019/06/24 10:43, Li Wang wrote:

> Hi Xu Yang,
>
> On Fri, Jun 21, 2019 at 1:58 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com 
> <mailto:xuyang2018.jy@cn.fujitsu.com>> wrote:
>
>
>     > Hi Li Wang,
>     >
>     > Thank you for maintaining the testcase.
>     >
>     > Recently (since 4.19) we have a semantics change on the return
>     value of
>     > madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when hugepage
>     migration
>     > succeeded and error containment failed:
>     >
>     >   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca
>     >   Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >   Date:   Thu Aug 23 17:00:38 2018 -0700
>     >
>     >       mm: fix race on soft-offlining free huge pages
>     >
>     > , so we don't have to consider this EBUSY as error, but a good
>     report
>     > for application. Your change meets the change.
>     >
>     > Feel free to add my ack:
>     >
>     > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >
>     > Thanks,
>     > - Naoya
>     >
>     > On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang wrote:
>     >> The test#2 is going to simulate the race condition, where
>     move_pages()
>     >> and soft offline are called on a single hugetlb page
>     concurrently. But,
>     >> it return EBUSY and report FAIL in soft-offline a moving
>     hugepage as a
>     >> result sometimes.
>     >>
>     >> The root cause seems a call to page_huge_active return false,
>     then the
>     >> soft offline action will failed to isolate hugepage with EBUSY
>     return as
>     >> below call trace:
>     >>
>     >> In Parent:
>     >>   madvise(..., MADV_SOFT_OFFLINE)
>     >>   ...
>     >>     soft_offline_page
>     >>       soft_offline_in_use_page
>     >>         soft_offline_huge_page
>     >>           isolate_huge_page
>     >>             page_huge_active  --> return false at here
>     >>
>     >> In Child:
>     >>   move_pages()
>     >>   ...
>     >>     do_move_pages
>     >>       do_move_pages_to_node
>     >>         add_page_for_migration
>     >>           isolate_huge_page   --> it has already isolated the
>     hugepage
>     >>
>     >> In this patch, I simply regard the returned EBUSY as a normal
>     situation and
>     >> mask it in error handler. Because move_pages is calling
>     add_page_for_migration
>     >> to isolate hugepage before do migration, so that's very
>     possible to hit the
>     >> collision and return EBUSY on the same page.
>     >>
>     >> Error log:
>     >> ----------
>     >> move_pages12.c:235: INFO: Free RAM 8386256 kB
>     >> move_pages12.c:253: INFO: Increasing 2048kB hugepages pool on
>     node 0 to 4
>     >> move_pages12.c:263: INFO: Increasing 2048kB hugepages pool on
>     node 1 to 6
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 0
>     >> move_pages12.c:179: INFO: Allocating and freeing 4 hugepages on
>     node 1
>     >> move_pages12.c:169: PASS: Bug not reproduced
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:81: FAIL: madvise failed: SUCCESS
>     >> move_pages12.c:143: BROK: mmap((nil),4194304,3,262178,-1,0)
>     failed: ENOMEM
>     >> move_pages12.c:114: FAIL: move_pages failed: EINVAL
>     >>
>     >> Dmesg:
>     >> ------
>     >> [165435.492170] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.590252] soft offline: 0x61c00 hugepage failed to isolate
>     >> [165435.725493] soft offline: 0x61400 hugepage failed to isolate
>     >>
>     >> Other two fixes in this patch:
>     >>  * use TERRNO(but not TTERRNO) to catch madvise(...,
>     MADV_SOFT_OFFLINE) errno
>     >>  * go out test when hugepage allocating failed with ENOMEM
>     Hi Li
>
>     Your patch can handle EBUSY errno correctly for soft offline.
>     But move page  may be killed by SIGBUS because of  MCE  when we
>     soft offline concurrently.
>     That leads to move_page failed with ESRCH.   Also, move page may
>     fails with ENOMEM .
>     Do you notice it ?
>
>
> I didn't get this failure, it seems not related to this patch. Two 
> questions:
>
> 1. which kernel version do you test?
> 2. can you reproduce this without my patch?
Hi Li

I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support numa 
and i enable it on kvm. as below:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Penryn</model>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
<numa>
<cell id='0' cpus='0' memory='1048576' unit='KiB'/>
<cell id='1' cpus='1' memory='1048576' unit='KiB'/>
</numa>
</cpu>

Does it only exist on kvm and doesn't  exist on physical machine?  I 
don't have physical machine that supports numa.

And the fix patch has been merged since  3.10.0-957.el7.x86_64 .
Yes,   I can reproduce this without your patch because MCE kills child 
process and move_page gets ESRCH error.

>
>
>     I think ESRCH error can represent the soft offline bug not
>     reproduce because it don't trigger a crash.
>     What do you think about it?
>
>
> Maybe, but it needs to check details on your kernel.
>
>
>     err_log:
>     tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
>     move_pages12.c:236: INFO: Free RAM 119568 kB
>     move_pages12.c:254: INFO: Increasing 2048kB hugepages pool on node
>     0 to 83
>     move_pages12.c:264: INFO: Increasing 2048kB hugepages pool on node
>     1 to 94
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 0
>     move_pages12.c:180: INFO: Allocating and freeing 4 hugepages on node 1
>     move_pages12.c:170: PASS: Bug not reproduced
>     tst_test.c:1141: BROK: Test killed by SIGBUS!
>
>     Summary:
>     passed   1
>     failed   0
>     skipped  0
>     warnings 0
>
>     move_pages12.c:114: FAIL: move_pages failed: ESRCH
>
>     dmesg
>     [ 9868.180669] MCE: Killing move_pages12:29616 due to hardware
>     memory corruption fault at 2aaaaac00018
>     [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000
>     [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000
>     [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000
>
>     Kind Regards,
>     Yang Xu
>
>     >> Signed-off-by: Li Wang <liwang@redhat.com
>     <mailto:liwang@redhat.com>>
>     >> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com
>     <mailto:n-horiguchi@ah.jp.nec.com>>
>     >> Cc: Xiao Yang <yangx.jy@cn.fujitsu.com
>     <mailto:yangx.jy@cn.fujitsu.com>>
>     >> Cc: Yang Xu <xuyang2018.jy@cn.fujitsu.com
>     <mailto:xuyang2018.jy@cn.fujitsu.com>>
>     >> ---
>     >>  .../kernel/syscalls/move_pages/move_pages12.c | 33
>     ++++++++++++++-----
>     >>  1 file changed, 24 insertions(+), 9 deletions(-)
>     >>
>     >> diff --git
>     a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> index 964b712fb..c446396dc 100644
>     >> --- a/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> +++ b/testcases/kernel/syscalls/move_pages/move_pages12.c
>     >> @@ -77,8 +77,8 @@ static void *addr;
>     >>  static int do_soft_offline(int tpgs)
>     >>  {
>     >>      if (madvise(addr, tpgs * hpsz, MADV_SOFT_OFFLINE) == -1) {
>     >> -            if (errno != EINVAL)
>     >> -                    tst_res(TFAIL | TTERRNO, "madvise failed");
>     >> +            if (errno != EINVAL && errno != EBUSY)
>     >> +                    tst_res(TFAIL | TERRNO, "madvise failed");
>     >>              return errno;
>     >>      }
>     >>      return 0;
>     >> @@ -121,7 +121,8 @@ static void do_child(int tpgs)
>     >>
>     >>  static void do_test(unsigned int n)
>     >>  {
>     >> -    int i;
>     >> +    int i, ret;
>     >> +    void *ptr;
>     >>      pid_t cpid = -1;
>     >>      int status;
>     >>      unsigned int twenty_percent = (tst_timeout_remaining() / 5);
>     >> @@ -136,24 +137,37 @@ static void do_test(unsigned int n)
>     >>              do_child(tcases[n].tpages);
>     >>
>     >>      for (i = 0; i < LOOPS; i++) {
>     >> -            void *ptr;
>     >> +            ptr = mmap(NULL, tcases[n].tpages * hpsz,
>     >> +                            PROT_READ | PROT_WRITE,
>     >> +                            MAP_PRIVATE | MAP_ANONYMOUS |
>     MAP_HUGETLB, -1, 0);
>     >> +            if (ptr == MAP_FAILED) {
>     >> +                    if (errno == ENOMEM) {
>     >> +                            tst_res(TCONF,
>     >> +                                    "Cannot allocate hugepage,
>     memory too fragmented?");
>     >> +                            goto out;
>     >> +                    }
>     >> +
>     >> +                    tst_brk(TBROK | TERRNO, "Cannot allocate
>     hugepage");
>     >> +            }
>     >>
>     >> -            ptr = SAFE_MMAP(NULL, tcases[n].tpages * hpsz,
>     >> -                    PROT_READ | PROT_WRITE,
>     >> -                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
>     -1, 0);
>     >>              if (ptr != addr)
>     >>                      tst_brk(TBROK, "Failed to mmap at desired
>     addr");
>     >>
>     >>              memset(addr, 0, tcases[n].tpages * hpsz);
>     >>
>     >>              if (tcases[n].offline) {
>     >> -                    if (do_soft_offline(tcases[n].tpages) ==
>     EINVAL) {
>     >> +                    ret = do_soft_offline(tcases[n].tpages);
>     >> +
>     >> +                    if (ret == EINVAL) {
>     >>                              SAFE_KILL(cpid, SIGKILL);
>     >>                              SAFE_WAITPID(cpid, &status, 0);
>     >>                              SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >>                              tst_res(TCONF,
>     >>                                      "madvise() didn't support
>     MADV_SOFT_OFFLINE");
>     >>                              return;
>     >> +                    } else if (ret == EBUSY) {
>     >> +                            SAFE_MUNMAP(addr, tcases[n].tpages
>     * hpsz);
>     >> +                            goto out;
>     >>                      }
>     >>              }
>     >>
>     >> @@ -163,9 +177,10 @@ static void do_test(unsigned int n)
>     >>                      break;
>     >>      }
>     >>
>     >> +out:
>     >>      SAFE_KILL(cpid, SIGKILL);
>     >>      SAFE_WAITPID(cpid, &status, 0);
>     >> -    if (!WIFEXITED(status))
>     >> +    if (!WIFEXITED(status) && ptr != MAP_FAILED)
>     >>              tst_res(TPASS, "Bug not reproduced");
>     >>  }
>     >>
>     >> --
>     >> 2.20.1
>     >>
>     >>
>     >
>     > .
>     >
>
>
>
>
>
> -- 
> Regards,
> Li Wang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20190627/b8863f73/attachment-0001.htm>