[LTP] [PATCH] sched_football: fix false failures on many-CPU systems

Thu Apr 16 05:23:35 CEST 2026

Hi John,

John Stultz <jstultz@google.com> wrote:

> > > > > 1. RT throttling freezes all SCHED_FIFO threads simultaneously. On
> > > > > release, the kernel does not always reschedule the highest-priority
> > > > > thread first on every CPU, so offense briefly runs and increments
> > > > > the_ball before defense is rescheduled. Fix by saving and disabling
> > > > > sched_rt_runtime_us in setup and restoring it in a new cleanup
> > > > > callback.
> > >
> > > Make sense, and like the AI-reviewer points out LTP provides an option
> > > to save_restore it automatically.
>
> Throttling shouldn't break the test. The fact that SCHED_NORMAL tasks
> ran shouldn't change the ordering when we go back to running RT tasks.
> This is likely a kernel bug.

Theoretically speaking, your point is correct. While instant global
priority ordering upon unthrottling is ideal, it ignores the physical realities
of SMP architecture.

If we look at do_sched_rt_period_timer(), CPUs are unthrottled sequentially
via a for_each_cpu loop. When an early CPU in the loop is unlocked, it
immediately schedules its local RT task 'offense'.
See:
  https://elixir.bootlin.com/linux/v7.0/source/kernel/sched/rt.c#L797

Meanwhile, subsequent CPUs in the loop (which may hold the higher-priority
'defense' task) are literally still throttled.

Expecting atomic, zero-latency global unthrottling is physically unrealistic for
multi-core systems.

That's why I tend to believe disabling throttling for this specific test is the
wise and practical approach.

> > > > > 2. Offense and defense threads were unpinned, allowing the scheduler
> > > > > to migrate them freely. An offense thread could land on a CPU with
> > > > > no defense thread present and run unchecked. Fix by passing a CPU
> > > > > index as the thread arg and calling sched_setaffinity() at thread
> > > > > start. Pairs are distributed round-robin (i % ncpus) so each
> > > > > offense thread shares its CPU with a defense thread.
> > >
> > > This is a good thought, as for SCHED_FIFO it manages the corresponding
> > > runqueue for each CPU and simply picks the higher priority task to run.
> > > So pinning the threads to each CPU makes sense, but maybe we could
> > > only pin the defense because:
> > >
> > > With N defense threads pinned one per CPU, every CPU has a defense
> > > thread at priority 30 permanently runnable. The offense threads at priority
> > > 15, regardless of which CPU the scheduler places them on, will always find
> > > a higher-priority defense thread on the same CPU's runqueue. Since
> > > SCHED_FIFO strictly favors the higher-priority runnable task, offense can
> > > never be picked.
> > >
> > > Pinning offense as well would be redundant, it doesn't matter where offense
> > > lands, because defense already covers every CPU. This also has the advantage
> > > of letting the scheduler freely migrate offense threads without
> > > affecting the test
> > > outcome, which avoids interfering with the kernel's load balancing logic during
> > > the test.
> > >
> > > And, I'd suggest using tst_ncpus_available() instead of get_numcpus()
> > > when distributing defense threads across CPUs, in case some CPUs are
> > > offline. Pinning a defense thread to an offline CPU would leave that
> > > CPU uncovered and allow offense to run unchecked. See:
>
> I didn't see the orignal patch here, but the whole point of
> sched_football is to ensure the top <num cpu> (unaffined) priority
> tasks are always run and no lower priority rt tasks are run instead.
>
> So none of the tasks should be pinned to any cpus. The scheduler is
> supposed to ensure the RT invariant holds.
> There are some known bugs at the moment that will cause sched_football
> to fail (the RT_PUSH_IPI feature, for instance). That's a problem with
> the kernel, not the test.

Apart from the known bug of RT_PUSH_IPI feature, it still does not
guarantee 100% success in real scenarios.

After a deep look into the rt scheduler principals, I found that, because
the RT_PUSH_IPI mechanism is designed as a "best-effort"
optimization rather than a guaranteed operation. As the kernel's
scheduling state is highly dynamic and asynchronous, a push attempt
will deliberately abort if the environment changes between the time the
IPI is sent and when it is actually processed.

It fails by design to prevent instability, primarily due to state expiration,
CPU affinity restrictions, sudden priority inversions, or the lack of an
eligible target CPU.

See push_rt_task() in kernel/sched/rt.c:
  https://elixir.bootlin.com/linux/v7.0/source/kernel/sched/rt.c#L1939

Hence, if we explicitly pin the defense thread to each CPU, it will join in
the corresponding runqueues, which completely match the reasonable
situation: the kernel's RT scheduler guarantees per-CPU priority ordering,
not global placement. The RT load balancer is asynchronous and doesn't
guarantee that all high-priority threads are placed before any low-priority
thread runs.

On the other side, if the test expects a global guarantee, it's testing
something the kernel doesn't claim to provide.

To the rest changes, the current test has several non-kernel issues
(RT throttling interference, uninitialized game_over on reruns, silent
sched_setscheduler failures) that produce false failures. These mask
the real kernel bugs you want to detect.

--
Regards,
Li Wang