[LTP] [PATCH] sched_football: fix false failures on many-CPU systems

Jan Polensky japo@linux.ibm.com
Wed Apr 15 17:20:08 CEST 2026


On Wed, Apr 15, 2026 at 05:52:11PM +0800, Li Wang wrote:
> Hi Soma and Jan,
>
> > > 1. RT throttling freezes all SCHED_FIFO threads simultaneously. On
> > > release, the kernel does not always reschedule the highest-priority
> > > thread first on every CPU, so offense briefly runs and increments
> > > the_ball before defense is rescheduled. Fix by saving and disabling
> > > sched_rt_runtime_us in setup and restoring it in a new cleanup
> > > callback.
>
> Make sense, and like the AI-reviewer points out LTP provides an option
> to save_restore it automatically.
>
> > > 2. Offense and defense threads were unpinned, allowing the scheduler
> > > to migrate them freely. An offense thread could land on a CPU with
> > > no defense thread present and run unchecked. Fix by passing a CPU
> > > index as the thread arg and calling sched_setaffinity() at thread
> > > start. Pairs are distributed round-robin (i % ncpus) so each
> > > offense thread shares its CPU with a defense thread.
>
> This is a good thought, as for SCHED_FIFO it manages the corresponding
> runqueue for each CPU and simply picks the higher priority task to run.
> So pinning the threads to each CPU makes sense, but maybe we could
> only pin the defense because:
>
> With N defense threads pinned one per CPU, every CPU has a defense
> thread at priority 30 permanently runnable. The offense threads at priority
> 15, regardless of which CPU the scheduler places them on, will always find
> a higher-priority defense thread on the same CPU's runqueue. Since
> SCHED_FIFO strictly favors the higher-priority runnable task, offense can
> never be picked.
>
> Pinning offense as well would be redundant, it doesn't matter where offense
> lands, because defense already covers every CPU. This also has the advantage
> of letting the scheduler freely migrate offense threads without
> affecting the test
> outcome, which avoids interfering with the kernel's load balancing logic during
> the test.
>
> And, I'd suggest using tst_ncpus_available() instead of get_numcpus()
> when distributing defense threads across CPUs, in case some CPUs are
> offline. Pinning a defense thread to an offline CPU would leave that
> CPU uncovered and allow offense to run unchecked. See:
>
> Also, the short usleep() after setting kickoff_flag, which was originally
> there to give the RT load balancer time to distribute threads across CPUs,
> can now be removed, since defense is already pinned to every CPU and
> coverage is guaranteed without any extra settling time.
>
> -       if (tst_check_preempt_rt())
> -               usleep(20000);
> -       else
> -               usleep(2000000);
>
>
> > > 3. game_over was never reset between iterations, causing all threads
> > > to exit immediately on reruns (-i N), making the test a no-op.
> > > Fix by resetting both kickoff_flag and game_over at the top of
> > > do_test().
>
> Good catch.
>
> > > 4. sched_setscheduler() failure for the referee was silently ignored.
> > > If the call fails the test produces meaningless results. Fix by
> > > checking the return value and calling tst_brk(TBROK) on failure.
>
> +1
>
> > Tested-by: Jan Polensky <japo@linux.ibm.com>
>
> Thanks, can you re-test it based on what my above revision suggests?
>
> --
> Regards,
> Li Wang

Hi Li, hi Soma,

@Li: I incorporated your suggested changes and re-tested. Works fine for
me on the same machine as before.

	./sched_football -i 10
    [snip]
    Summary:
    passed   10
    failed   0
    broken   0
    skipped  0
    warnings 0

Tested-by: Jan Polensky <japo@linux.ibm.com>

@Soma: Feel free to pick it up, the diff/patch is attached below.
Please note the additional join_threads() call, it is needed if you run the test like:

	./sched_football -i 10

It seems the threads do not terminate correctly without it.

diff --git a/testcases/realtime/func/sched_football/sched_football.c b/testcases/realtime/func/sched_football/sched_football.c
index 43bac9468693..be0334da812e 100644
--- a/testcases/realtime/func/sched_football/sched_football.c
+++ b/testcases/realtime/func/sched_football/sched_football.c
@@ -118,23 +118,17 @@ void *thread_defense(void *arg)
 }

 /* This is the offensive team. They're trying to move the ball */
-void *thread_offense(void *arg)
+void *thread_offense(void *arg LTP_ATTRIBUTE_UNUSED)
 {
-	struct thread *t = (struct thread *)arg;
-	int cpu = (int)(intptr_t)t->arg;
-	cpu_set_t cpuset;
-
 	prctl(PR_SET_NAME, "offense", 0, 0, 0);

 	/*
-	 * Pin to the same CPU as the paired defense thread so there is
-	 * always a higher-priority defense thread locally available to
-	 * preempt this one without requiring cross-CPU migration.
+	 * Offense threads are not pinned. With defense threads pinned one per
+	 * CPU at priority 30, every CPU has a higher-priority defense thread
+	 * runnable. Offense at priority 15 can never be picked by SCHED_FIFO
+	 * regardless of which CPU it lands on. Leaving offense unpinned avoids
+	 * interfering with the kernel's load balancing logic.
 	 */
-	CPU_ZERO(&cpuset);
-	CPU_SET(cpu, &cpuset);
-	if (sched_setaffinity(0, sizeof(cpuset), &cpuset) != 0)
-		tst_brk(TBROK | TERRNO, "sched_setaffinity failed for offense on CPU %d", cpu);

 	pthread_barrier_wait(&start_barrier);
 	while (!tst_atomic_load(&kickoff_flag))
@@ -169,10 +163,6 @@ void referee(int game_length)

 	tst_atomic_store(0, &the_ball);
 	tst_atomic_store(1, &kickoff_flag);
-	if (tst_check_preempt_rt())
-		usleep(20000);
-	else
-		usleep(2000000);

 	/* Watch the game */
 	while ((now.tv_sec - start.tv_sec) < game_length) {
@@ -199,10 +189,10 @@ static void do_test(void)
 	int i;

 	if (players_per_team == 0)
-		players_per_team = get_numcpus();
+		players_per_team = tst_ncpus_available();

 	/* actual CPU count used for affinity assignment, independent of players_per_team */
-	ncpus = get_numcpus();
+	ncpus = tst_ncpus_available();

 	tst_res(TINFO, "players_per_team: %d game_length: %d",
 	       players_per_team, game_length);
@@ -227,10 +217,8 @@ static void do_test(void)
 	priority = 15;
 	tst_res(TINFO, "Starting %d offense threads at priority %d",
 	       players_per_team, priority);
-	/* i % ncpus distributes threads round-robin across CPUs; the CPU index is
-	 * passed as the thread arg so offense and defense pairs share the same CPU */
 	for (i = 0; i < players_per_team; i++)
-		create_fifo_thread(thread_offense, (void *)(intptr_t)(i % ncpus), priority);
+		create_fifo_thread(thread_offense, NULL, priority);

 	/* Start the defense */
 	priority = 30;
@@ -248,6 +236,7 @@ static void do_test(void)

 	referee(game_length);

+	join_threads();
 	pthread_barrier_destroy(&start_barrier);
 }




More information about the ltp mailing list