[LTP] [REGRESSION] lkft ltp for 6763a36

Tue Jun 21 13:38:40 CEST 2022

Hello Li,

Li Wang <liwang@redhat.com> writes:

> On Tue, Jun 21, 2022 at 4:56 PM Richard Palethorpe <rpalethorpe@suse.de> wrote:
>
>  Hello,
>
>  Joerg Vehlow <lkml@jv-coder.de> writes:
>
>  > Hi Jan,
>  >
>  > Am 6/21/2022 um 9:22 AM schrieb Jan Stancek:
>  >> On Tue, Jun 21, 2022 at 9:15 AM Joerg Vehlow <lkml@jv-coder.de> wrote:
>  >>>
>  >>> Hi,
>  >>>
>  >>> Am 6/17/2022 um 3:17 AM schrieb lkft@linaro.org:
>  >>>> * qemu_i386, ltp-fs-tests
>  >>>>   - read_all_proc
>  >>> I've seen this test fail a lot, has anyone ever tried to analyze it? I
>  >>> was unable to reproduce the problem when running the test in isolation.
>  >> 
>  >> I see it hit timeouts too (read_all_sys as well). I think it needs
>  >> runtime restored to 5minutes as well, atm. it has 30s.
>  > Didn't think about that, but at least for the failures I've seen, this
>  > is not the reason. The message printed by the test is "Test timeout 5
>  > minutes exceeded."
>  >
>  > Joerg
>
>  The main issue with read_all is that it also acts as a stress
>  test. Reading some files in proc and sys is very resource intensive
>  (e.g. due to lock contention) and varies depending on what state the
>  system is in. On some systems this test will take a long time. Also
>  there are some files which have to be filtered from the test. This
>  varies by system as well.
>
> Does it make sense to have a lite version of read_all_sys?
> which may only go through files sequentially or under slight stress.

IIRC the reason I started doing it in parallel is because sequential
opens and reads are even slower and unreliable. Some level of parallism
is required, but too much and it causes issues.

Thinking about it now, on a single or two core system only one worker
process will be spawned. Which could get blocked for a long time on some
reads because of the way some sys/proc files are implemented.

The worker count can be overridden with -w if someone wants to try
increasing it to see if that actually helps on systems with <3
cpus. Also the number of reads is set to 3 in the runtest file, that can
be reduced to 1 with -r.

>
> With regard to this stressful read_all, I guess we can put into a dedicated
> set and run separately in stress testing.

I don't think I'd want to run that. IMO just doing enough to test
parallel accesses is whats required. More than that we will run into
diminishing returns . However I'm not against creating another runtest
file/entry for that.

On bigger systems I think the test is already quite limited even though
it does 3 reads. It only spwans a max of 15 workers which should prevent
it from causing huge lock contention on machines with >16 CPUs. At least
I've not seen problems with that.

It looks like the log from lkft is for a smaller machine?

-- 
Thank you,
Richard.