<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small">Hi Xu,</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 27, 2019 at 10:50 AM Yang Xu <<a href="mailto:xuyang2018.jy@cn.fujitsu.com">xuyang2018.jy@cn.fujitsu.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div bgcolor="#ffffff">
<blockquote type="cite"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail_default" style="font-size:small">...</span><br>
Hi Li<br>
<br>
Your patch can handle EBUSY errno correctly for soft
offline. <br>
But move page may be killed by SIGBUS because of MCE when
we soft offline concurrently. <br>
That leads to move_page failed with ESRCH. Also, move page
may fails with ENOMEM .<br>
Do you notice it ?<br>
</blockquote>
<div><br>
</div>
<div>
<div style="font-size:small">I
didn't get this failure, it seems not related to this
patch. Two questions:</div>
<div style="font-size:small"><br>
</div>
<div style="font-size:small">1.
which kernel version do you test?</div>
<div style="font-size:small">2. can
you reproduce this without my patch?</div>
</div>
</div>
</div>
</blockquote>
Hi Li<br>
<br>
I test it on 3.10.0-957.el7.x86_64 kvm(my machine was not support
numa and i enable it on kvm. as below: <br>
<cpu mode='custom' match='exact' check='full'><br>
<model fallback='forbid'>Penryn</model><br>
<feature policy='require' name='x2apic'/><br>
<feature policy='require' name='hypervisor'/><br>
<numa><br>
<cell id='0' cpus='0' memory='1048576' unit='KiB'/><br>
<cell id='1' cpus='1' memory='1048576' unit='KiB'/><br>
</numa><br>
</cpu><br>
<br>
Does it only exist on kvm and doesn't exist on physical machine? I
don't have physical machine that supports numa.<br></div></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small">I can reproduce your problem on bare metal too, it seems like you hit the bug as the commit 6bc9b56433b (<span class="gmail_default"></span>mm: fix race on soft-offlining free huge pages) described, which Naoya pointed out before:</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">See:</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">+ /*</div>+ * We set PG_hwpoison only when the migration source hugepage<br>+ * was successfully dissolved, because otherwise hwpoisoned<br>+ * hugepage remains on free hugepage list, then userspace will<br>+ * find it as SIGBUS by allocation failure. That's not expected<br>+ * in soft-offlining.<br>+ */<br>+ ret = dissolve_free_huge_page(page);<br>+ if (!ret) {<br>+ if (set_hwpoison_free_buddy_page(page))<br>+ num_poisoned_pages_inc();<br>+ }<br><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">And, this bz still exists in the latest rhel7 kernel, I will open a bug to RHEL7 product.<br></div></div></div><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Regards,<br></div><div>Li Wang<br></div></div></div></div>