[LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems

Jan Polensky japo@linux.ibm.com
Mon Mar 2 09:56:31 CET 2026


On Wed, Feb 25, 2026 at 11:11:53AM -0800, John Stultz wrote:
> On Wed, Feb 25, 2026 at 1:23 AM Andrea Cervesato
> <andrea.cervesato@suse.com> wrote:
> > On Tue Feb 24, 2026 at 10:03 PM CET, John Stultz via ltp wrote:
> > > On Tue, Feb 24, 2026 at 2:45 AM Jan Polensky <japo@linux.ibm.com> wrote:
> > > >
> > > > The sched_football test has been intermittently failing, most noticeably
> > > > on systems with many CPUs and/or under load, due to a startup ordering
> > > > hole around kickoff.
> > > >
> > >
> > > I've not had time to closely review your suggestion here, but it
> > > sounds reasonable.
> > >
> > > That said, I want to warn you and ensure you are aware: the
> > > RT_PUSH_IPI feature in the scheduler breaks the RT invariant
> > > sched_football is testing.
> > >
> > > I see this as a bug with that feature, but the scalability RT_PUSH_IPI
> > > allows for seems more important to folks who are doing RT work then
> > > the mis-behavior.  Steven and I talked awhile back about some ideas on
> > > how we might be able to do the pull in a more efficient way while
> > > still holding the invariant true, and I have a bug to track it, but
> > > its not been high enough priority to get bandwidth yet.
> > >
> > > So you might want to make sure you disable that feature before testing via:
> > > # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features
> > >
> > > thanks
> > > -john
> >
> > Thanks for your deep analysis on the possible issue. I'm not an RT expert,
> > but I trust your expertise in here :-) Will leave this patch review to
> > someone who's more skilled than me in this topic.
> >
> > I have a suggestion tho.
> >
> > About the NO_RT_PUSH_IPI, we are lucky: LTP provides a safe mechanism to
> > set the sys configurations and to restore it to default value after
> > test. You can find this in the `struct tst_test` and it's called
> > `.save_restore` [1]
> >
> > I think it's worth to force this option according to the underlying
> > variant (and properly document this in the code with a comment).
> >
> > WDYT?
>
> That seems reasonable, as long as it's clearly labeled as a workaround
> and hopefully can be dropped (or kernel version limited) when the
> issue is finally addressed.
>
> thanks
> -john
Hi Andrea, hi John,

thank you for the thorough review and the helpful remarks.
After going through the feedback, I think it makes sense to step back and
rework the patch. The main objective is to drive the failure rate down as
much as possible, and the current version still shows weaknesses,
especially with respect to steal time. On heavily loaded systems I also
still observe frequent TBROK results, so the timing clearly needs further
tuning.
I will take some time to revisit the design and incorporate these aspects
before posting a revised version.
Thanks again for your comments and suggestions.

Thanks & Greetings
Jan


More information about the ltp mailing list