[LTP] [linux-next:master] [mm/migrate] b28dd7507f: ltp.move_pages04.fail

David Hildenbrand david@redhat.com
Wed Aug 21 10:44:32 CEST 2024


On 21.08.24 06:44, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed "ltp.move_pages04.fail" on:
> 
> commit: b28dd7507f2dd7923325eab6ea1f291416dcc396 ("mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [test failed on linux-next/master bb1b0acdcd66e0d8eedee3570d249e076b89ab32]
> 
> in testcase: ltp
> version: ltp-x86_64-14c1f76-1_20240817
> with following parameters:
> 
> 	test: numa/move_pages04
> 
> 
> 
> compiler: gcc-12
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202408211026.636ade1a-oliver.sang@intel.com
> 
> 
> 
> Running tests.......
> <<<test_start>>>
> tag=move_pages04 stime=1724192393
> cmdline="move_pages04"
> contacts=""
> analysis=exit
> <<<test_output>>>
> move_pages04    1  TFAIL  :  move_pages04.c:142: status[1] is ENOENT, expected EFAULT

This change is to be expected, and I touched on that in the patch 
description. I am rather surprised that we have a test for that 
handling, especially because it changed already (see below).

The man page says:

	-EFAULT: This is a zero page or the memory area is not mapped
		 by the process.

"memory area not mapped" to me translates to "there is no mmap()", not 
"there is no page mapped". And it says:

	-ENOENT: The page is not present.


It's not really specifies what happens when "The memory area is mapped 
by the process, but no page is faulted in.".

And the old handling was even inconsistent: to achieve the old behavior, 
we abused FOLL_DUMP, which triggers in mm/gup.c:no_page_table():

(1) is_vm_hugetlb_page() *and* a hugetlb page is in the pagecache?
     Return -EFAULT. Otherwise return NULL -> -ENOENT.
(2) vma_is_anonymous() || !vma->vm_ops->fault ? Return -EFAULT.
     Otherwise return NULL -> -ENOENT.

So, if nothing is mapped, for things like shmem we would always return 
"-ENOENT", for anonymous memory always "-EFAULT", and for hugetlb 
"-ENOENT" or "-EFAULT", depending on the page cache state. Inconsistent, 
and that handling is only in place because we abused FOLL_DUMP.

(there are other issues in the old implementation: on PMD migration 
entries we would likely have returned -EFAULT in some cases where we 
should have returned -ENOENT ...)

While writing folio_walk, I temporarily had a version that would return 
error codes instead of NULL to indicate "there is something, but we 
cannot return  it" and "there is nothing", but it didn't feel right. And 
I'm not really interested in revisiting that :)

----

Staring at the test,  I realized the that behavior *changed* already, 
because we wanted to fix the "zero page" and started abusing FOLL_DUMP, 
but ended up changing the behavior for unpopulated (nothing mapped) 
memory as well:

  * NAME
  *	move_pages04.c
  *
  * DESCRIPTION
  *      Failure when page does not exit.
  *
  * ALGORITHM
  *
  *      1. Pass zero page (allocated, but not written to) as one of the
  *         page addresses to move_pages().
  *      2. Check if the corresponding status is set to:
  *         -ENOENT for kernels < 4.3
  *         -EFAULT for kernels >= 4.3 [1]
  *
  * [1]
  * d899844e9c98 "mm: fix status code which move_pages() returns for 
zero page"
  *

Likely test is *wrong*, because it claims to test the "zero page" but it 
just passes "unpopulated" memory.

Let me dig deeper into the test.

-- 
Cheers,

David / dhildenb



More information about the ltp mailing list