[LTP] [PATCH] sched_football: fix false failures on many-CPU systems

Jan Polensky japo@linux.ibm.com
Tue Apr 14 17:54:49 CEST 2026


On Sun, Apr 12, 2026 at 06:35:42PM +0530, Soma Das wrote:
> Test logs attached — 40+ consecutive runs passing after fix on 80-CPU
> ppc64le LPAR. ---
> From 3fc17dd06907785f5b1b65aebfe150c8ac73d54a Mon Sep 17 00:00:00 2001
> From: Soma Das <somadas1@linux.ibm.com>
> Date: Sun, 12 Apr 2026 13:13:08 +0000
> Subject: [PATCH] sched_football: fix false failures on many-CPU systems
> On large SMP systems with CONFIG_RT_GROUP_SCHED=n, four independent
> issues cause false failures.
> 1. RT throttling freezes all SCHED_FIFO threads simultaneously. On
> release, the kernel does not always reschedule the highest-priority
> thread first on every CPU, so offense briefly runs and increments
> the_ball before defense is rescheduled. Fix by saving and disabling
> sched_rt_runtime_us in setup and restoring it in a new cleanup
> callback.
> 2. Offense and defense threads were unpinned, allowing the scheduler
> to migrate them freely. An offense thread could land on a CPU with
> no defense thread present and run unchecked. Fix by passing a CPU
> index as the thread arg and calling sched_setaffinity() at thread
> start. Pairs are distributed round-robin (i % ncpus) so each
> offense thread shares its CPU with a defense thread.
> 3. game_over was never reset between iterations, causing all threads
> to exit immediately on reruns (-i N), making the test a no-op.
> Fix by resetting both kickoff_flag and game_over at the top of
> do_test().
> 4. sched_setscheduler() failure for the referee was silently ignored.
> If the call fails the test produces meaningless results. Fix by
> checking the return value and calling tst_brk(TBROK) on failure.
> Signed-off-by: Soma Das <somadas1@linux.ibm.com>
> ---
[snip]
Hi Soma,

this patch makes the test much more deterministic by changing the
scheduling scenario: offense/defense were previously unpinned (so they
could migrate freely), but are now pinned via sched_setaffinity() with
explicit CPU pairing (round‑robin across CPUs).

That likely fixes the false failures you describe (offense briefly
running “unchecked” when it lands on a CPU without a local higher‑prio
defense), but it also means we’re no longer exercising
migration/load‑balancing behavior here.

We’re effectively testing local SCHED_FIFO priority ordering under
controlled placement. It would be great to call this out explicitly in
the test description / commit message so readers don’t assume the test
still covers migration fairness.

Tested-by: Jan Polensky <japo@linux.ibm.com>

Tested on z16:

    [************ ~]# lscpu
    Architecture:                s390x
      CPU op-mode(s):            32-bit, 64-bit
      Byte Order:                Big Endian
    CPU(s):                      256
      On-line CPU(s) list:       0-255
    Vendor ID:                   IBM/S390
      Model name:                -
        Machine type:            9175
        Thread(s) per core:      2
[snip]

Best regards
Jan



More information about the ltp mailing list