[LTP] [PATCH] sched_football: fix false failures on many-CPU systems
Li Wang
wangli.ahau@gmail.com
Wed Apr 15 11:52:11 CEST 2026
Hi Soma and Jan,
> > 1. RT throttling freezes all SCHED_FIFO threads simultaneously. On
> > release, the kernel does not always reschedule the highest-priority
> > thread first on every CPU, so offense briefly runs and increments
> > the_ball before defense is rescheduled. Fix by saving and disabling
> > sched_rt_runtime_us in setup and restoring it in a new cleanup
> > callback.
Make sense, and like the AI-reviewer points out LTP provides an option
to save_restore it automatically.
> > 2. Offense and defense threads were unpinned, allowing the scheduler
> > to migrate them freely. An offense thread could land on a CPU with
> > no defense thread present and run unchecked. Fix by passing a CPU
> > index as the thread arg and calling sched_setaffinity() at thread
> > start. Pairs are distributed round-robin (i % ncpus) so each
> > offense thread shares its CPU with a defense thread.
This is a good thought, as for SCHED_FIFO it manages the corresponding
runqueue for each CPU and simply picks the higher priority task to run.
So pinning the threads to each CPU makes sense, but maybe we could
only pin the defense because:
With N defense threads pinned one per CPU, every CPU has a defense
thread at priority 30 permanently runnable. The offense threads at priority
15, regardless of which CPU the scheduler places them on, will always find
a higher-priority defense thread on the same CPU's runqueue. Since
SCHED_FIFO strictly favors the higher-priority runnable task, offense can
never be picked.
Pinning offense as well would be redundant, it doesn't matter where offense
lands, because defense already covers every CPU. This also has the advantage
of letting the scheduler freely migrate offense threads without
affecting the test
outcome, which avoids interfering with the kernel's load balancing logic during
the test.
And, I'd suggest using tst_ncpus_available() instead of get_numcpus()
when distributing defense threads across CPUs, in case some CPUs are
offline. Pinning a defense thread to an offline CPU would leave that
CPU uncovered and allow offense to run unchecked. See:
Also, the short usleep() after setting kickoff_flag, which was originally
there to give the RT load balancer time to distribute threads across CPUs,
can now be removed, since defense is already pinned to every CPU and
coverage is guaranteed without any extra settling time.
- if (tst_check_preempt_rt())
- usleep(20000);
- else
- usleep(2000000);
> > 3. game_over was never reset between iterations, causing all threads
> > to exit immediately on reruns (-i N), making the test a no-op.
> > Fix by resetting both kickoff_flag and game_over at the top of
> > do_test().
Good catch.
> > 4. sched_setscheduler() failure for the referee was silently ignored.
> > If the call fails the test produces meaningless results. Fix by
> > checking the return value and calling tst_brk(TBROK) on failure.
+1
> Tested-by: Jan Polensky <japo@linux.ibm.com>
Thanks, can you re-test it based on what my above revision suggests?
--
Regards,
Li Wang
More information about the ltp
mailing list