[LTP] [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background

Naoya Horiguchi n-horiguchi@ah.jp.nec.com
Fri Aug 2 05:48:26 CEST 2019


On Mon, Jul 29, 2019 at 01:17:27PM +0800, Li Wang wrote:
> Hi Naoya and Linux-MMers,
> 
> The LTP/move_page12 V2 triggers SIGBUS in the kernel-v5.2.3 testing.
> https://github.com/wangli5665/ltp/blob/master/testcases/kernel/syscalls/
> move_pages/move_pages12.c
> 
> It seems like the retry mmap() triggers SIGBUS while doing the numa_move_pages
> () in background. That is very similar to the kernel bug which was mentioned by
> commit 6bc9b56433b76e40d(mm: fix race on soft-offlining ): A race condition
> between soft offline and hugetlb_fault which causes unexpected process SIGBUS
> killing.
> 
> I'm not sure if that below patch is making sene to memory-failures.c, but after
> building a new kernel-5.2.3 with this change, the problem can NOT be reproduced
> . 
> 
> Any comments?
> 
> ----------------------------------
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1695,15 +1695,16 @@ static int soft_offline_huge_page(struct page *page,
> int flags)
>         unlock_page(hpage);
> 
>         ret = isolate_huge_page(hpage, &pagelist);
> +       if (!ret) {
> +               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> +               return -EBUSY;
> +       }
> +
>         /*
>          * get_any_page() and isolate_huge_page() takes a refcount each,
>          * so need to drop one here.
>          */
>         put_hwpoison_page(hpage);
> -       if (!ret) {
> -               pr_info("soft offline: %#lx hugepage failed to isolate\n",
> pfn);
> -               return -EBUSY;
> -       }

Sorry for my late response.

This change skips put_hwpoison_page() in failure path, so soft_offline_page()
should return without releasing hpage's refcount taken by get_any_page(),
maybe which is not what we want.

- Naoya


More information about the ltp mailing list