[LTP] mmstress[1309]: segfault at 7f3d71a36ee8 ip 00007f3d77132bdf sp 00007f3d71a36ee8 error 4 in libc-2.27.so[7f3d77058000+1aa000]

Sean Christopherson sean.j.christopherson@intel.com
Fri Oct 23 07:02:14 CEST 2020


On Thu, Oct 22, 2020 at 08:05:05PM -0700, Linus Torvalds wrote:
> On Thu, Oct 22, 2020 at 6:36 PM Daniel Díaz <daniel.diaz@linaro.org> wrote:
> >
> > The kernel Naresh originally referred to is here:
> >   https://builds.tuxbuild.com/SCI7Xyjb7V2NbfQ2lbKBZw/
> 
> Thanks.
> 
> And when I started looking at it, I realized that my original idea
> ("just look for __put_user_nocheck_X calls, there aren't so many of
> those") was garbage, and that I was just being stupid.
> 
> Yes, the commit that broke was about __put_user(), but in order to not
> duplicate all the code, it re-used the regular put_user()
> infrastructure, and so all the normal put_user() calls are potential
> problem spots too if this is about the compiler interaction with KASAN
> and the asm changes.
> 
> So it's not just a couple of special cases to look at, it's all the
> normal cases too.
> 
> Ok, back to the drawing board, but I think reverting it is probably
> the right thing to do if I can't think of something smart.
> 
> That said, since you see this on x86-64, where the whole ugly trick with that
> 
>    register asm("%"_ASM_AX)
> 
> is unnecessary (because the 8-byte case is still just a single
> register, no %eax:%edx games needed), it would be interesting to hear
> if the attached patch fixes it. That would confirm that the problem
> really is due to some register allocation issue interaction (or,
> alternatively, it would tell me that there's something else going on).

I haven't reproduced the crash, but I did find a smoking gun that confirms the
"register shenanigans are evil shenanigans" theory.  I ran into a similar thing
recently where a seemingly innocuous line of code after loading a value into a
register variable wreaked havoc because it clobbered the input register.

This put_user() in schedule_tail():

   if (current->set_child_tid)
           put_user(task_pid_vnr(current), current->set_child_tid);

generates the following assembly with KASAN out-of-line:

   0xffffffff810dccc9 <+73>: xor    %edx,%edx
   0xffffffff810dcccb <+75>: xor    %esi,%esi
   0xffffffff810dcccd <+77>: mov    %rbp,%rdi
   0xffffffff810dccd0 <+80>: callq  0xffffffff810bf5e0 <__task_pid_nr_ns>
   0xffffffff810dccd5 <+85>: mov    %r12,%rdi
   0xffffffff810dccd8 <+88>: callq  0xffffffff81388c60 <__asan_load8>
   0xffffffff810dccdd <+93>: mov    0x590(%rbp),%rcx
   0xffffffff810dcce4 <+100>: callq  0xffffffff817708a0 <__put_user_4>
   0xffffffff810dcce9 <+105>: pop    %rbx
   0xffffffff810dccea <+106>: pop    %rbp
   0xffffffff810dcceb <+107>: pop    %r12

__task_pid_nr_ns() returns the pid in %rax, which gets clobbered by
__asan_load8()'s check on current for the current->set_child_tid dereference.


More information about the ltp mailing list