[LTP] [PATCH v3] sched: starvation: Autocallibrate the timeout

Anders Roxell anders.roxell@linaro.org
Thu Aug 1 14:15:12 CEST 2024


On Mon, 29 Jul 2024 at 22:51, Petr Vorel <pvorel@suse.cz> wrote:
>
> Hi all,
>
> > Instead of hardcoding the values we attempt to measure the CPU speed and
> > set the runtime accordingly. Given that the difference in the duration
> > of the test when the kernel is buggy is about 30x we do not have to have
> > a precise callibration, just very rough estimate if we are running on a
> > server or small ARM board would suffice.
>
> > So we attempt to measure how long does a bussy loop take and base the
> > default timeout on that. On x86_64 CPUs the resulting runtime seems to
> > be between 2x and 10x of the actual runtime which seems to be in the
> > required range.
>
> > We also make sure to check the runtime at the end of the test because
> > the failures could have been masked by a timeout multiplier, i.e. if you
> > set LTP_TIMEOUT_MUL=10 the test would previously pass on a buggy kernel
> > as well. The side efect is that we now get better PASS/FAIL messages as
> > well.
>
> > Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
> > ---
>
> > Changes in v3:
>
> > - Increased the CALLIBRATE_LOOPS a bit, since some of the numbers
> >   reported by the linaro lab had the runtime very close to the
> >   calculated runtime.
>
> Anders, Joe, can you please recheck?

I've tested this patch on 1 arm HW and 2 arm64 HW's. see the results below.

arm HW:
===== [1;36mstarvation[0m =====
command: starvation
[ 42.782470] /usr/local/bin/kirk[369]: starting test starvation (starvation)
tst_test.c:1734: [1;34mTINFO: [0mLTP version: 20180118-5676-gc035435ba
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 05m 24s
starvation.c:84: [1;32mTPASS: [0msched_setaffinity(0, sizeof(mask),
&mask) returned 0
starvation.c:52: [1;34mTINFO: [0mCPU did 120000000 loops in 106903us
tst_test.c:1626: [1;34mTINFO: [0mUpdating max runtime to 0h 01m 46s
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 07m 10s
starvation.c:133: [1;32mTPASS: [0mHaven't reproduced scheduller starvation.
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 2s

arm64 HW1:
===== [1;36mstarvation[0m =====
command: starvation
[   52.326682] /usr/local/bin/kirk[371]: starting test starvation (starvation)
[   52.326682] /usr/local/bin/kirk[371]: starting test starvation (starvation)
tst_test.c:1734: [1;34mTINFO: [0mLTP version: 20180118-5676-gc035435ba
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 05m 24s
starvation.c:84: [1;32mTPASS: [0msched_setaffinity(0, sizeof(mask),
&mask) returned 0
starvation.c:52: [1;34mTINFO: [0mCPU did 120000000 loops in 170664us
tst_test.c:1626: [1;34mTINFO: [0mUpdating max runtime to 0h 02m 50s
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 08m 14s
starvation.c:133: [1;32mTPASS: [0mHaven't reproduced scheduller starvation.
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 31s

arm64 HW2:
===== [1;36mstarvation[0m =====
command: starvation
[   25.461437] /usr/local/bin/kirk[781]: starting test starvation (starvation)
tst_test.c:1734: [1;34mTINFO: [0mLTP version: 20180118-5676-gc035435ba
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 05m 24s
starvation.c:84: [1;32mTPASS: [0msched_setaffinity(0, sizeof(mask),
&mask) returned 0
starvation.c:52: [1;34mTINFO: [0mCPU did 120000000 loops in 136451us
tst_test.c:1626: [1;34mTINFO: [0mUpdating max runtime to 0h 02m 16s
tst_test.c:1618: [1;34mTINFO: [0mTimeout per run is 0h 07m 40s
[   29.165003] sd 0:0:0:0: [sda] Starting disk
[   35.815791] sd 0:0:0:0: [sda] Starting disk
[   39.911757] sd 0:0:0:0: [sda] Starting disk
[   47.079469] sd 0:0:0:0: [sda] Starting disk
[   52.716832] sd 0:0:0:0: [sda] Starting disk
[   57.831781] sd 0:0:0:0: [sda] Starting disk
[   63.975740] sd 0:0:0:0: [sda] Starting disk
[   69.095887] sd 0:0:0:0: [sda] Starting disk
[   74.983623] sd 0:0:0:0: [sda] Starting disk
[   77.799749] sd 0:0:0:0: [sda] Starting disk
[   80.871712] sd 0:0:0:0: [sda] Starting disk
[   87.015762] sd 0:0:0:0: [sda] Starting disk
[   92.903786] sd 0:0:0:0: [sda] Starting disk
starvation.c:133: [1;32mTPASS: [0mHaven't reproduced scheduller starvation.
Summary:
passed   2
failed   0
broken   0
skipped  0
warnings 0
Duration: 1m 7s

Tested-by: Anders Roxell <anders.roxell@linaro.org>

Cheers,
Anders

>
> > - Removed some curly braces, as suggested by pvorel
>
> > - Added runtime check at the end of test to avoid false positives with
> >   LTP_TIMEOUT_MUL.
>
> Great!
>
> LGTM.
>
> Reviewed-by: Petr Vorel <pvorel@suse.cz>
>
> Tested-by: Petr Vorel <pvorel@suse.cz>
>
> Tested on Tumbleweed (kernel 6.10.1):
>
> tst_tmpdir.c:316: TINFO: Using /tmp/LTP_starv8seE as tmpdir (tmpfs filesystem)
> tst_test.c:1806: TINFO: LTP version: 20240524
> tst_test.c:1650: TINFO: Timeout per run is 0h 00m 30s
> starvation.c:71: TINFO: Setting affinity to CPU 0
> tst_test.c:1658: TINFO: Updating max runtime to 0h 04m 00s
> tst_test.c:1650: TINFO: Timeout per run is 0h 04m 30s
> starvation.c:117: TPASS: wait_for_pid(child_pid) passed
>
> => test runs ~ 13s - 19s on aarch64, ppc64le and x86_64. Therefore not sure if
> 04m max runtime is good.
>
> I'll have tomorrow some tests on various SLES versions.
>
> Kind regards,
> Petr
>
> > .../kernel/sched/cfs-scheduler/starvation.c   | 41 +++++++++++++++++--
> >  1 file changed, 38 insertions(+), 3 deletions(-)
>
> > diff --git a/testcases/kernel/sched/cfs-scheduler/starvation.c b/testcases/kernel/sched/cfs-scheduler/starvation.c
> > index 9ac388fdc..e707e0865 100644
> > --- a/testcases/kernel/sched/cfs-scheduler/starvation.c
> > +++ b/testcases/kernel/sched/cfs-scheduler/starvation.c
> > @@ -21,11 +21,38 @@
> >  #include <sched.h>
>
> >  #include "tst_test.h"
> > +#include "tst_safe_clocks.h"
> > +#include "tst_timer.h"
>
> >  static char *str_loop;
> > -static long loop = 2000000;
> > +static long loop = 1000000;
> >  static char *str_timeout;
> > -static int timeout = 240;
> > +static int timeout;
> > +
> > +#define CALLIBRATE_LOOPS 120000000
> > +
> > +static int callibrate(void)
> > +{
> > +     int i;
> > +     struct timespec start, stop;
> > +     long long diff;
> > +
> > +     for (i = 0; i < CALLIBRATE_LOOPS; i++)
> > +             __asm__ __volatile__ ("" : "+g" (i) : :);
> > +
> > +     SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &start);
> > +
> > +     for (i = 0; i < CALLIBRATE_LOOPS; i++)
> > +             __asm__ __volatile__ ("" : "+g" (i) : :);
> > +
> > +     SAFE_CLOCK_GETTIME(CLOCK_MONOTONIC_RAW, &stop);
> > +
> > +     diff = tst_timespec_diff_us(stop, start);
> > +
> > +     tst_res(TINFO, "CPU did %i loops in %llius", CALLIBRATE_LOOPS, diff);
> > +
> > +     return diff;
> > +}
>
> >  static int wait_for_pid(pid_t pid)
> >  {
> > @@ -78,6 +105,8 @@ static void setup(void)
>
> >       if (tst_parse_int(str_timeout, &timeout, 1, INT_MAX))
> >               tst_brk(TBROK, "Invalid number of timeout '%s'", str_timeout);
> > +     else
> > +             timeout = callibrate() / 1000;
>
> >       tst_set_max_runtime(timeout);
> >  }
> > @@ -114,7 +143,13 @@ static void do_test(void)
> >               sleep(1);
>
> >       SAFE_KILL(child_pid, SIGTERM);
> > -     TST_EXP_PASS(wait_for_pid(child_pid));
> > +
> > +     if (!tst_remaining_runtime())
> > +             tst_res(TFAIL, "Scheduller starvation reproduced.");
> > +     else
> > +             tst_res(TPASS, "Haven't reproduced scheduller starvation.");
> > +
> > +     TST_EXP_PASS_SILENT(wait_for_pid(child_pid));
> >  }
>
> >  static struct tst_test test = {


More information about the ltp mailing list