<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#ffffff">
    <pre>on 2019/06/24 10:43, Li Wang wrote:</pre>
    <blockquote
cite="mid:CAEemH2c+CWAOmAH=1WT+GR-iZ=5RoDcCmD=-zBpc63PHg6xXyQ@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div class="gmail_default" style="font-size: small;">Hi Xu
            Yang,</div>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Fri, Jun 21, 2019 at 1:58
            PM Yang Xu <<a moz-do-not-send="true"
              href="mailto:xuyang2018.jy@cn.fujitsu.com">xuyang2018.jy@cn.fujitsu.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin: 0px 0px 0px
            0.8ex; border-left: 1px solid rgb(204, 204, 204);
            padding-left: 1ex;"><br>
            > Hi Li Wang,<br>
            ><br>
            > Thank you for maintaining the testcase.<br>
            ><br>
            > Recently (since 4.19) we have a semantics change on the
            return value of<br>
            > madvise(MADV_SOFT_OFFLINE), and we see -EBUSY when
            hugepage migration<br>
            > succeeded and error containment failed:<br>
            ><br>
            >   commit 6bc9b56433b76e40d11099338d27fbc5cd2935ca<br>
            >   Author: Naoya Horiguchi <<a moz-do-not-send="true"
              href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
            >   Date:   Thu Aug 23 17:00:38 2018 -0700<br>
            >   <br>
            >       mm: fix race on soft-offlining free huge pages<br>
            ><br>
            > , so we don't have to consider this EBUSY as error, but
            a good report<br>
            > for application. Your change meets the change.<br>
            ><br>
            > Feel free to add my ack:<br>
            ><br>
            > Acked-by: Naoya Horiguchi <<a moz-do-not-send="true"
              href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
            ><br>
            > Thanks,<br>
            > - Naoya<br>
            ><br>
            > On Fri, Jun 07, 2019 at 05:52:13PM +0800, Li Wang
            wrote:<br>
            >> The test#2 is going to simulate the race condition,
            where move_pages()<br>
            >> and soft offline are called on a single hugetlb
            page concurrently. But,<br>
            >> it return EBUSY and report FAIL in soft-offline a
            moving hugepage as a<br>
            >> result sometimes.<br>
            >><br>
            >> The root cause seems a call to page_huge_active
            return false, then the<br>
            >> soft offline action will failed to isolate hugepage
            with EBUSY return as<br>
            >> below call trace:<br>
            >><br>
            >> In Parent:<br>
            >>   madvise(..., MADV_SOFT_OFFLINE)<br>
            >>   ...<br>
            >>     soft_offline_page<br>
            >>       soft_offline_in_use_page<br>
            >>         soft_offline_huge_page<br>
            >>           isolate_huge_page<br>
            >>             page_huge_active  --> return false
            at here<br>
            >><br>
            >> In Child:<br>
            >>   move_pages()<br>
            >>   ...<br>
            >>     do_move_pages<br>
            >>       do_move_pages_to_node<br>
            >>         add_page_for_migration<br>
            >>           isolate_huge_page   --> it has already
            isolated the hugepage<br>
            >><br>
            >> In this patch, I simply regard the returned EBUSY
            as a normal situation and<br>
            >> mask it in error handler. Because move_pages is
            calling add_page_for_migration<br>
            >> to isolate hugepage before do migration, so that's
            very possible to hit the<br>
            >> collision and return EBUSY on the same page.<br>
            >><br>
            >> Error log:<br>
            >> ----------<br>
            >> move_pages12.c:235: INFO: Free RAM 8386256 kB<br>
            >> move_pages12.c:253: INFO: Increasing 2048kB
            hugepages pool on node 0 to 4<br>
            >> move_pages12.c:263: INFO: Increasing 2048kB
            hugepages pool on node 1 to 6<br>
            >> move_pages12.c:179: INFO: Allocating and freeing 4
            hugepages on node 0<br>
            >> move_pages12.c:179: INFO: Allocating and freeing 4
            hugepages on node 1<br>
            >> move_pages12.c:169: PASS: Bug not reproduced<br>
            >> move_pages12.c:81: FAIL: madvise failed: SUCCESS<br>
            >> move_pages12.c:81: FAIL: madvise failed: SUCCESS<br>
            >> move_pages12.c:143: BROK:
            mmap((nil),4194304,3,262178,-1,0) failed: ENOMEM<br>
            >> move_pages12.c:114: FAIL: move_pages failed: EINVAL<br>
            >><br>
            >> Dmesg:<br>
            >> ------<br>
            >> [165435.492170] soft offline: 0x61c00 hugepage
            failed to isolate<br>
            >> [165435.590252] soft offline: 0x61c00 hugepage
            failed to isolate<br>
            >> [165435.725493] soft offline: 0x61400 hugepage
            failed to isolate<br>
            >><br>
            >> Other two fixes in this patch:<br>
            >>  * use TERRNO(but not TTERRNO) to catch
            madvise(..., MADV_SOFT_OFFLINE) errno<br>
            >>  * go out test when hugepage allocating failed with
            ENOMEM<br>
            Hi Li<br>
            <br>
            Your patch can handle EBUSY errno correctly for soft
            offline. <br>
            But move page  may be killed by SIGBUS because of  MCE  when
            we soft offline concurrently.  <br>
            That leads to move_page failed with ESRCH.   Also, move page
            may fails with ENOMEM .<br>
            Do you notice it ?<br>
          </blockquote>
          <div><br>
          </div>
          <div>
            <div class="gmail_default" style="font-size: small;">I
              didn't get this failure, it seems not related to this
              patch. Two questions:</div>
            <div class="gmail_default" style="font-size: small;"><br>
            </div>
            <div class="gmail_default" style="font-size: small;">1.
              which kernel version do you test?</div>
            <div class="gmail_default" style="font-size: small;">2. can
              you reproduce this without my patch?</div>
          </div>
        </div>
      </div>
    </blockquote>
    Hi Li<br>
    <br>
    I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support
    numa and i enable it on kvm. as below: <br>
     <cpu mode='custom' match='exact' check='full'><br>
        <model fallback='forbid'>Penryn</model><br>
        <feature policy='require' name='x2apic'/><br>
        <feature policy='require' name='hypervisor'/><br>
        <numa><br>
          <cell id='0' cpus='0' memory='1048576' unit='KiB'/><br>
          <cell id='1' cpus='1' memory='1048576' unit='KiB'/><br>
        </numa><br>
      </cpu><br>
    <br>
    Does it only exist on kvm and doesn't  exist on physical machine?  I
    don't have physical machine that supports numa.<br>
    <br>
    And the fix patch has been merged since  3.10.0-957.el7.x86_64 .<br>
    Yes,   I can reproduce this without your patch because MCE kills
    child process and move_page gets ESRCH error.<br>
    <br>
    <br>
    <blockquote
cite="mid:CAEemH2c+CWAOmAH=1WT+GR-iZ=5RoDcCmD=-zBpc63PHg6xXyQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin: 0px 0px 0px
            0.8ex; border-left: 1px solid rgb(204, 204, 204);
            padding-left: 1ex;">
            <br>
            I think ESRCH error can represent the soft offline bug not
            reproduce because it don't trigger a crash.<br>
            What do you think about it?<br>
          </blockquote>
          <div><br>
          </div>
          <div class="gmail_default" style="font-size: small;">Maybe,
            but it needs to check details on your kernel.</div>
          <blockquote class="gmail_quote" style="margin: 0px 0px 0px
            0.8ex; border-left: 1px solid rgb(204, 204, 204);
            padding-left: 1ex;">
            <br>
            err_log:<br>
            tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s<br>
            move_pages12.c:236: INFO: Free RAM 119568 kB<br>
            move_pages12.c:254: INFO: Increasing 2048kB hugepages pool
            on node 0 to 83<br>
            move_pages12.c:264: INFO: Increasing 2048kB hugepages pool
            on node 1 to 94<br>
            move_pages12.c:180: INFO: Allocating and freeing 4 hugepages
            on node 0<br>
            move_pages12.c:180: INFO: Allocating and freeing 4 hugepages
            on node 1<br>
            move_pages12.c:170: PASS: Bug not reproduced<br>
            tst_test.c:1141: BROK: Test killed by SIGBUS!<br>
            <br>
            Summary:<br>
            passed   1<br>
            failed   0<br>
            skipped  0<br>
            warnings 0<br>
            <br>
            move_pages12.c:114: FAIL: move_pages failed: ESRCH<br>
            <br>
            dmesg<br>
            [ 9868.180669] MCE: Killing move_pages12:29616 due to
            hardware memory corruption fault at 2aaaaac00018<br>
            [ 9990.049875] Soft offlining page 50e00 at 2aaaaac00000<br>
            [ 9990.052218] Soft offlining page 50c00 at 2aaaaae00000<br>
            [ 9990.060395] Soft offlining page 51000 at 2aaaaac00000<br>
            <br>
            Kind Regards,<br>
            Yang Xu<br>
            <br>
            >> Signed-off-by: Li Wang <<a
              moz-do-not-send="true" href="mailto:liwang@redhat.com"
              target="_blank">liwang@redhat.com</a>><br>
            >> Cc: Naoya Horiguchi <<a moz-do-not-send="true"
              href="mailto:n-horiguchi@ah.jp.nec.com" target="_blank">n-horiguchi@ah.jp.nec.com</a>><br>
            >> Cc: Xiao Yang <<a moz-do-not-send="true"
              href="mailto:yangx.jy@cn.fujitsu.com" target="_blank">yangx.jy@cn.fujitsu.com</a>><br>
            >> Cc: Yang Xu <<a moz-do-not-send="true"
              href="mailto:xuyang2018.jy@cn.fujitsu.com" target="_blank">xuyang2018.jy@cn.fujitsu.com</a>><br>
            >> ---<br>
            >>  .../kernel/syscalls/move_pages/move_pages12.c | 33
            ++++++++++++++-----<br>
            >>  1 file changed, 24 insertions(+), 9 deletions(-)<br>
            >><br>
            >> diff --git
            a/testcases/kernel/syscalls/move_pages/move_pages12.c
            b/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
            >> index 964b712fb..c446396dc 100644<br>
            >> ---
            a/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
            >> +++
            b/testcases/kernel/syscalls/move_pages/move_pages12.c<br>
            >> @@ -77,8 +77,8 @@ static void *addr;<br>
            >>  static int do_soft_offline(int tpgs)<br>
            >>  {<br>
            >>      if (madvise(addr, tpgs * hpsz,
            MADV_SOFT_OFFLINE) == -1) {<br>
            >> -            if (errno != EINVAL)<br>
            >> -                    tst_res(TFAIL | TTERRNO,
            "madvise failed");<br>
            >> +            if (errno != EINVAL && errno
            != EBUSY)<br>
            >> +                    tst_res(TFAIL | TERRNO,
            "madvise failed");<br>
            >>              return errno;<br>
            >>      }<br>
            >>      return 0;<br>
            >> @@ -121,7 +121,8 @@ static void do_child(int tpgs)<br>
            >>  <br>
            >>  static void do_test(unsigned int n)<br>
            >>  {<br>
            >> -    int i;<br>
            >> +    int i, ret;<br>
            >> +    void *ptr;<br>
            >>      pid_t cpid = -1;<br>
            >>      int status;<br>
            >>      unsigned int twenty_percent =
            (tst_timeout_remaining() / 5);<br>
            >> @@ -136,24 +137,37 @@ static void do_test(unsigned
            int n)<br>
            >>              do_child(tcases[n].tpages);<br>
            >>  <br>
            >>      for (i = 0; i < LOOPS; i++) {<br>
            >> -            void *ptr;<br>
            >> +            ptr = mmap(NULL, tcases[n].tpages *
            hpsz,<br>
            >> +                            PROT_READ |
            PROT_WRITE,<br>
            >> +                            MAP_PRIVATE |
            MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);<br>
            >> +            if (ptr == MAP_FAILED) {<br>
            >> +                    if (errno == ENOMEM) {<br>
            >> +                            tst_res(TCONF,<br>
            >> +                                    "Cannot
            allocate hugepage, memory too fragmented?");<br>
            >> +                            goto out;<br>
            >> +                    }<br>
            >> +<br>
            >> +                    tst_brk(TBROK | TERRNO,
            "Cannot allocate hugepage");<br>
            >> +            }<br>
            >>  <br>
            >> -            ptr = SAFE_MMAP(NULL, tcases[n].tpages
            * hpsz,<br>
            >> -                    PROT_READ | PROT_WRITE,<br>
            >> -                    MAP_PRIVATE | MAP_ANONYMOUS |
            MAP_HUGETLB, -1, 0);<br>
            >>              if (ptr != addr)<br>
            >>                      tst_brk(TBROK, "Failed to mmap
            at desired addr");<br>
            >>  <br>
            >>              memset(addr, 0, tcases[n].tpages *
            hpsz);<br>
            >>  <br>
            >>              if (tcases[n].offline) {<br>
            >> -                    if
            (do_soft_offline(tcases[n].tpages) == EINVAL) {<br>
            >> +                    ret =
            do_soft_offline(tcases[n].tpages);<br>
            >> +<br>
            >> +                    if (ret == EINVAL) {<br>
            >>                              SAFE_KILL(cpid,
            SIGKILL);<br>
            >>                              SAFE_WAITPID(cpid,
            &status, 0);<br>
            >>                              SAFE_MUNMAP(addr,
            tcases[n].tpages * hpsz);<br>
            >>                              tst_res(TCONF,<br>
            >>                                      "madvise()
            didn't support MADV_SOFT_OFFLINE");<br>
            >>                              return;<br>
            >> +                    } else if (ret == EBUSY) {<br>
            >> +                            SAFE_MUNMAP(addr,
            tcases[n].tpages * hpsz);<br>
            >> +                            goto out;<br>
            >>                      }<br>
            >>              }<br>
            >>  <br>
            >> @@ -163,9 +177,10 @@ static void do_test(unsigned
            int n)<br>
            >>                      break;<br>
            >>      }<br>
            >>  <br>
            >> +out:<br>
            >>      SAFE_KILL(cpid, SIGKILL);<br>
            >>      SAFE_WAITPID(cpid, &status, 0);<br>
            >> -    if (!WIFEXITED(status))<br>
            >> +    if (!WIFEXITED(status) && ptr !=
            MAP_FAILED)<br>
            >>              tst_res(TPASS, "Bug not reproduced");<br>
            >>  }<br>
            >>  <br>
            >> -- <br>
            >> 2.20.1<br>
            >><br>
            >><br>
            ><br>
            > .<br>
            ><br>
            <br>
            <br>
            <br>
          </blockquote>
        </div>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr" class="gmail_signature">
          <div dir="ltr">
            <div>Regards,<br>
            </div>
            <div>Li Wang<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>