<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small">Hi Xu,</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 27, 2019 at 10:50 AM Yang Xu <<a href="mailto:xuyang2018.jy@cn.fujitsu.com">xuyang2018.jy@cn.fujitsu.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  <div bgcolor="#ffffff">

    <blockquote type="cite"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail_default" style="font-size:small">...</span><br>

            Hi Li<br>

            <br>

            Your patch can handle EBUSY errno correctly for soft

            offline. <br>

            But move page  may be killed by SIGBUS because of  MCE  when

            we soft offline concurrently.  <br>

            That leads to move_page failed with ESRCH.   Also, move page

            may fails with ENOMEM .<br>

            Do you notice it ?<br>

          </blockquote>

          <div><br>

          </div>

          <div>

            <div style="font-size:small">I

              didn't get this failure, it seems not related to this

              patch. Two questions:</div>

            <div style="font-size:small"><br>

            </div>

            <div style="font-size:small">1.

              which kernel version do you test?</div>

            <div style="font-size:small">2. can

              you reproduce this without my patch?</div>

          </div>

        </div>

      </div>

    </blockquote>

    Hi Li<br>

    <br>

    I test it on 3.10.0-957.el7.x86_64  kvm(my machine was not support

    numa and i enable it on kvm. as below: <br>

     <cpu mode='custom' match='exact' check='full'><br>

        <model fallback='forbid'>Penryn</model><br>

        <feature policy='require' name='x2apic'/><br>

        <feature policy='require' name='hypervisor'/><br>

        <numa><br>

          <cell id='0' cpus='0' memory='1048576' unit='KiB'/><br>

          <cell id='1' cpus='1' memory='1048576' unit='KiB'/><br>

        </numa><br>

      </cpu><br>

    <br>

    Does it only exist on kvm and doesn't  exist on physical machine?  I

    don't have physical machine that supports numa.<br></div></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">I can reproduce your problem on  bare metal too, it seems like you hit the bug as the commit 6bc9b56433b (<span class="gmail_default"></span>mm: fix race on soft-offlining free huge pages) described, which Naoya pointed out before:</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">See:</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">+               /*</div>+                * We set PG_hwpoison only when the migration source hugepage<br>+                * was successfully dissolved, because otherwise hwpoisoned<br>+                * hugepage remains on free hugepage list, then userspace will<br>+                * find it as SIGBUS by allocation failure. That's not expected<br>+                * in soft-offlining.<br>+                */<br>+               ret = dissolve_free_huge_page(page);<br>+               if (!ret) {<br>+                       if (set_hwpoison_free_buddy_page(page))<br>+                               num_poisoned_pages_inc();<br>+               }<br><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">And, this bz still exists in the latest rhel7 kernel, I will open a bug to RHEL7 product.<br></div></div></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div>