[LTP] [REGRESSION] pidns05 timeout (was: [PATCH 1/2] lib: multiply the timeout if detect slow kconfigs)

Mon Jan 20 11:18:02 CET 2025

On Mon, Jan 20, 2025 at 5:12 PM Petr Vorel <pvorel@suse.cz> wrote:

> > > On Thu, Jan 16, 2025 at 6:42 AM Petr Vorel <pvorel@suse.cz> wrote:
>
> > > > Hi Li, Cyril, all,
>
> > > > ...
> > > > > +++ b/lib/tst_test.c
> > > > > @@ -555,9 +555,6 @@ static int multiply_runtime(int max_runtime)
>
> > > > >       parse_mul(&runtime_mul, "LTP_RUNTIME_MUL", 0.0099, 100);
>
> > > > > -     if (tst_has_slow_kconfig())
> > > > > -             max_runtime *= 4;
> > > > > -
> > > > >       return max_runtime * runtime_mul;
> > > > >  }
>
> > > > > @@ -1706,6 +1703,9 @@ unsigned int tst_multiply_timeout(unsigned
> int
> > > > timeout)
> > > > >       if (timeout < 1)
> > > > >               tst_brk(TBROK, "timeout must to be >= 1! (%d)",
> timeout);
>
> > > > > +     if (tst_has_slow_kconfig())
> > > > > +             timeout *= 4;
>
> > > > FYI this change, merged as 893ca0abe7 ("lib: multiply the timeout if
> > > > detect slow
> > > > kconfigs") caused a regression on *all* tests which use tst_net.sh.
> > ...
>
> > FYI also at least pidns05.c sometimes fails due timeout with this change.
> > On some of SLES product previously pidns05.c run for 3 sec. With this
> change it
> > runs 12s and therefore timeouts.
>
> I'm sorry for a wrong report. Looking about it twice there is "*** stack
> smashing detected ***: terminated" => some other problem, which causes slow
> down. IMHO it's not optimal to run the detection many times + basically now
> requiring kernel config for each LTP test, but performance impact is
> probably
> low.
>

Yes, I understand your concern. I agree that we can avoid the
kconfig detection for most faster test cases.

After reviewing my previous pre-release LTP test and compare to
the current main branch test, it doesn't show much more time-consuming,
maybe there is different HW/kernel factors, but the performance
impact can be ignored.

>
> Kind regards,
> Petr
>
> > In pidns05.c case child is run 5x. For each of this child we again
> detect if we
> > run on slow config. Maybe we should have used struct tst_test member to
> cache
> > the value.
>
> > What bothers me more that how much time we waste for whole LTP testing
> with
> > repeatedly detecting slow config for all tests (runtest/syscalls has
> 1457 items,
> > we run it more times for each product with different kernel cmdline
> parameters).
> > I don't know what was supposed to be fixed by this feature, is it really
> worth
> > of slowdown? Why not just set LTP_RUNTIME_MUL=2 on slow kernels? We
> could have
> > tool which would 'exit 1' on "slow" kernel and 'exit 0' on normal kernel
> to do
> > automatic detection, which could be run by frameworks just once.
>
> > Kind regards,
> > Petr
>
>

-- 
Regards,
Li Wang