[LTP] [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background
Naoya Horiguchi
n-horiguchi@ah.jp.nec.com
Fri Aug 2 05:48:26 CEST 2019
On Mon, Jul 29, 2019 at 01:17:27PM +0800, Li Wang wrote:
> Hi Naoya and Linux-MMers,
>
> The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing.
> https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/
> move_pages/move_pages12.c
>
> It seems like the retry mmap() triggers SIGBUS while doing the numa_move_pages
> () in background. That is very similar to the kernel bug which was mentioned by
> commit 6bc9b56433b76e40d(mm: fix race on soft-offlining ): A race condition
> between soft offline and hugetlb_fault which causes unexpected process SIGBUS
> killing.
>
> I'm not sure if that below patch is making sene to memory-failures.c, but after
> building a new kernel-5.2.3 with this change, the problem can NOT be reproduced
> .
>
> Any comments?
>
> ----------------------------------
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1695,15 +1695,16 @@ static int soft_offline_huge_page(struct page *page,
> int flags)
> unlock_page(hpage);
>
> ret = isolate_huge_page(hpage, &pagelist);
> + if (!ret) {
> + pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> + return -EBUSY;
> + }
> +
> /*
> * get_any_page() and isolate_huge_page() takes a refcount each,
> * so need to drop one here.
> */
> put_hwpoison_page(hpage);
> - if (!ret) {
> - pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> - return -EBUSY;
> - }
Sorry for my late response.
This change skips put_hwpoison_page() in failure path, so soft_offline_page()
should return without releasing hpage's refcount taken by get_any_page(),
maybe which is not what we want.
- Naoya
More information about the ltp
mailing list