[LTP] [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background

Michal Hocko mhocko@suse.com
Wed Aug 7 09:39:09 CEST 2019


On Tue 06-08-19 17:07:25, Mike Kravetz wrote:
> On 8/5/19 10:36 AM, Mike Kravetz wrote:
> >>>>> Can you try this patch in your environment?  I am not sure if it will
> >>>>> be the final fix, but just wanted to see if it addresses issue for you.
> >>>>>
> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>>>> index ede7e7f5d1ab..f3156c5432e3 100644
> >>>>> --- a/mm/hugetlb.c
> >>>>> +++ b/mm/hugetlb.c
> >>>>> @@ -3856,6 +3856,20 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >>>>>  
> >>>>>  		page = alloc_huge_page(vma, haddr, 0);
> >>>>>  		if (IS_ERR(page)) {
> >>>>> +			/*
> >>>>> +			 * We could race with page migration (try_to_unmap_one)
> >>>>> +			 * which is modifying page table with lock.  However,
> >>>>> +			 * we are not holding lock here.  Before returning
> >>>>> +			 * error that will SIGBUS caller, get ptl and make
> >>>>> +			 * sure there really is no entry.
> >>>>> +			 */
> >>>>> +			ptl = huge_pte_lock(h, mm, ptep);
> >>>>> +			if (!huge_pte_none(huge_ptep_get(ptep))) {
> >>>>> +				ret = 0;
> >>>>> +				spin_unlock(ptl);
> >>>>> +				goto out;
> >>>>> +			}
> >>>>> +			spin_unlock(ptl);
> >>>>
> >>>> Thanks you for investigation, Mike.
> >>>> I tried this change and found no SIGBUS, so it works well.
> 
> Here is another way to address the issue.  Take the hugetlb fault mutex in
> the migration code when modifying the page tables.  IIUC, the fault mutex
> was introduced to prevent this same issue when there were two page faults
> on the same page (and we were unable to allocate an 'extra' page).  The
> downside to such an approach is that we add more hugetlbfs specific code
> to try_to_unmap_one.

I would rather go with the hugetlb_no_page which is better isolated.
-- 
Michal Hocko
SUSE Labs


More information about the ltp mailing list