[LTP] ptrace07 regression w/ Intel Sapphire Rapids

Eric DeVolder eric.devolder@oracle.com
Tue Aug 23 21:23:08 CEST 2022


While working on an Intel Sapphire Rapids machine and a recent
upstream kernel, we have discovered that the LTP test ptrace07 fails.

With kernel v6.0.0-rc1, on a non-Sapphire Rapids machine, the test
passes:

  # PATH=$PATH:$PWD ./ptrace07
  tst_test.c:1528: TINFO: Timeout per run is 0h 00m 30s
  ptrace07.c:139: TINFO: PTRACE_SETREGSET with reserved bits failed with EINVAL
  ptrace07.c:162: TINFO: test child 7762 exited, retcode: 0
  ptrace07.c:175: TPASS: wasn't able to set invalid FPU state

  Summary:
  passed   1
  failed   0
  broken   0
  skipped  0
  warnings 0

With the same kernel on a Sapphire Rapids machine, the test fails:

  # PATH=$PATH:$PWD ./ptrace07
  tst_test.c:1528: TINFO: Timeout per run is 0h 00m 30s
  ptrace07.c:143: TBROK: PTRACE_SETREGSET failed with unexpected error: EFAULT (14)
  tst_test.c:1571: TINFO: Killed the leftover descendant processes

  Summary:
  passed   0
  failed   0
  broken   1
  skipped  0
  warnings 0

I have bisected and determined that the failure point occurs at the
following commit:

  Chang S. Bae <chang.seok.bae@intel.com> 2021-10-21
  2308ee57 x86/fpu/amx: Enable the AMX feature in 64-bit mode

This commits simply turns-on support for AMX; in reality however, the
issue lies elsewhere. In fact, instrumentation reveals that the test
fails in ptrace07.c:do_test() here:

         if (!cpu_feature_enabled(X86_FEATURE_XSAVE))
                 return -ENODEV;

         /*
          * A whole standard-format XSAVE buffer is needed:
          */
         if (pos != 0 || count != fpu_user_cfg.max_size)
                 return -EFAULT;

The test passes the X86_FEATURE_XSAVE check, but then fails at the
buffer check with EFAULT.

The value for pos is 0, the value for count is 4096, and the value for
fpu_user_cfg.max_size is 11008. This check fails due to count !=
max_size.

This appears to be an issue with the ptrace07 test; specifically the
userspace-supplied buffer needs to be, according to the code comment
above, the whole standard-format XSAVE buffer, which is 11008 bytes,
in this situation.

I've edited ltp/testcases/kernel/syscalls/ptrace/ptrace07.c,
do_test(), as such:

  --- a/testcases/kernel/syscalls/ptrace/ptrace07.c
  +++ b/testcases/kernel/syscalls/ptrace/ptrace07.c
  @@ -83,7 +83,7 @@ static void do_test(void)
          int i;
          int num_cpus = tst_ncpus();
          pid_t pid;
  -       uint64_t xstate[512];
  +       uint64_t xstate[1376];
          struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };
          int status;
          bool okay;

The value 512 yields the 4096 byte buffer observed originally. The
1376 yields the max_size of 11008. When I run the ptrace07 test this
way, the test now passes!

This problem has existed about a year but its existence only on
Sapphire Rapids machines likely explains why it has gone undetected.

Chang Bae has corroborated these findings. Furthermore he suggests the
test should reference the CPUID leaf EAX=0xd,ECX=0x0 as EBX will
indicate the XSAVE buffer size.

Regards,
eric


More information about the ltp mailing list