[LTP] LTP in valgrind :)

Thu May 8 18:40:34 CEST 2025

Hi Petr, Hi ltp hackers,

On Tue, 2025-05-06 at 10:05 +0200, Petr Vorel wrote:
> > > > It also opens some interesting questions, i.e. how do we make comparing
> > > > results from two different tests easier. Currently they grep the test
> > > > results for a summary, but maybe we can do better.
> 
> > > One option is to extract all TPASS/TFAIL/TWARN/TBROK/TCONF messages, 
> > > discard any message contents past the file:line header and then compare 
> > > whether the sanitized output is identical. That'll take care of random 
> > > values in the output while ensuring that the test went through the same 
> > > code paths as before. We could provide a sanitizer script for that.
> 
> > Maybe we can even add an option to the test library to supress the
> > messages in output, that would be fairly simple.
> 
> @Martin @Mark: feel free to comment what we can do for you :).
> Whole thread:
> https://lore.kernel.org/ltp/20250505195003.GB137650@pevik/T/#t

Thanks. Something small, but if you could CC
valgrind-developers@lists.sourceforge.net on new ltp releases that
would be great.

I saw a question at the top of that thread about "They consider some
test long running". In general that is not about the actual ltp tests
(we currently only run the testcases/kernel/syscalls ones), but because
valgrind:

a) is basically a giant Just In Time compiler, which instruments all
code, and might slow down your code 20x (yeah, we know).

b) serializes all threads, so only one thread runs at a time, all
parallel code is executed as if it was serial code.

c) memcheck keeps track of all memory to see if values are "valid"
(have a known value written to them). This cost about 25% more memory
than the program itself uses (and might actually double the amount).

So for now we have the following exclude file:
https://sourceware.org/cgit/valgrind/tree/auxprogs/ltp-excludes.txt

bind06
epoll-ltp
fork14
futex_cmp_requeue01
futex_cmp_requeue02
inotify09
msgstress01
pidfd_send_signal01
pidfd_send_signal02
pidfd_send_signal03
sendmsg03
setsockopt06
setsockopt07
signal05
signal06
timerfd_settime02

We haven't yet analyzed them to see what exactly makes them so slow
(under valgrind). It was more important to get the ltpchecks run on all
our CI systems:
https://builder.sourceware.org/buildbot/#/builders?tags=valgrind
Some of those systems only have 8GB of memory or are fairly slow
(specifically the arm32 and riscv builders).

The pidfd_send_signal tests are excluded because they seemed to kill
the container they were running inside. Again, this could be a valgrind
bug. valgrind doesn't implement a proper syscall wrapper for
pidfd_send_signal so it might be possible we are just killing a random
process...

Cheers,

Mark