[LTP] [PATCH v2] sched_football: synchronize with kickoff flag to reduce skew
Li Wang
liwang@redhat.com
Fri Sep 5 06:03:59 CEST 2025
On Fri, Sep 5, 2025 at 8:54 AM Li Wang <liwang@redhat.com> wrote:
>
>
> On Thu, Sep 4, 2025 at 11:28 PM Cyril Hrubis <chrubis@suse.cz> wrote:
>
>> Hi!
>> > > > > static void do_setup(void)
>> > > > > {
>> > > > > + if (!tst_check_preempt_rt())
>> > > > > + tst_brk(TCONF, "Test requires real-time kernel");
>> > > >
>> > > > I understood Cyril is really suggesting to keep it [1]. I would also
>> > > vote to
>> > > > keep it (we still have some time to see if it got fixed before
>> release).
>> > > >
>> > > > I know we had this discussion in the past (some of your colleague
>> > > suggesting it
>> > > > should not be run on non-RT kernel), so I'm not pushing for it.
>> > >
>> > > I stil do not understand reasons for disabling the test. The POSIX
>> > > realtime schedulling classes have to work properly regardless of the
>> > > kernel flavor. Why should we turn the test off on non-rt kernel then?
>> > >
>> >
>> > No special reasons. I still can sporadically catch the failure on non-RT
>> > kernel with even sleep 2 seconds.
>>
>> That is very strange. The SCHED_FIFO threads should preempty any lower
>> prio thread as long as they become runable and should stay running until
>> they finish or yield. Two seconds should be more than enough for that to
>> happen.
>>
>> > Thus, I took this very extreme approach, because on non-RT kernel, sleep
>> > may not have a particularly perfect effect, I guess the stock kernel
>> with
>> > sched_setscheduler(, SCHED_FIFO, ) is still has scheduling skew with
>> > workload.
>>
>> Does this happen on vanilla Linux as well or only on RedHat kernels?
>>
>
> Yes, both vanilla Linux and CentOS kernels.
>
> more CI test history for sched_football:
>
> Without barrier patch:
> Fails on both RT and non-RT CentOS Stream 9/10 kernels
> Fails on non-RT mainline v6.17-rc4 kernels. (not build v6.17
> RT kernel)
>
> (^ that's why we started to look into the failure and submit the commit
> e523ba88dd9b)
>
> With barrier patch:
> Fails on both RT and non-RT CentOS kernels, but the final ball
> position is noticeably lower.
> Fails on non-RT mainline v6.17-rc4 kernels. (not build v6.17
> RT kernel)
>
> With barrier patch + kickoff flag enhancement:
> Fails on the non-RT CentOS Stream 10 kernel
> Fails on non-RT mainline v6.17-rc4 kernel. (not build v6.17
> RT-kernel)
> Passed on RT CentOS stream kernel
>
> (^ here I started to suspect the SCHED_FIFO threads can not perform well
> like RT-kernel)
>
>
> [root@dell-per7625-01 sched_football]# uname -r
> 6.17.0-rc4.liwang
>
> [root@dell-per7625-01 sched_football]# ./sched_football
> tst_test.c:2004: TINFO: LTP version: 20250530
> tst_test.c:2007: TINFO: Tested kernel: 6.17.0-rc4.liwang #1 SMP
> PREEMPT_DYNAMIC Thu Sep 4 20:07:20 EDT 2025 x86_64
> tst_kconfig.c:88: TINFO: Parsing kernel config
> '/lib/modules/6.17.0-rc4.liwang/build/.config'
> tst_kconfig.c:676: TINFO: CONFIG_FAULT_INJECTION kernel option detected
> which might slow the execution
> tst_test.c:1825: TINFO: Overall timeout per run is 0h 02m 00s
> sched_football.c:162: TINFO: players_per_team: 32 game_length: 5
> sched_football.c:178: TINFO: Starting 32 offense threads at priority 15
> sched_football.c:185: TINFO: Starting 32 defense threads at priority 30
> sched_football.c:192: TINFO: Starting 64 crazy-fan threads at priority 50
> sched_football.c:118: TINFO: Starting referee thread
> sched_football.c:121: TINFO: Starting the game (5 sec)
> sched_football.c:144: TINFO: Final ball position: 20205
> sched_football.c:150: TFAIL: Expect: final_ball == 0
>
Checking the configurations of the stock kernel and the real-time
kernel, the stock kernel uses "CONFIG_PREEMPT_VOLUNTARY=y,"
which only provides voluntary preemption.
This preemption model is designed to strike a balance between throughput
and latency. It only allows the kernel to be preempted at specific, well
defined
"safe points," potentially resulting in long, unbounded latencies.
However, the sched_football test was most likely designed to measure or
stress-test the deterministic, low-latency scheduling behavior that is
characteristic of real-time (RT) kernel.
So, I tend to believe the test's failure on the stock kernel is acceptable.
And, by the way, what does the SUSE kernel configuration look like?
# grep CONFIG_PREEMPT /boot/config-6.12.0-55.29.1.el10_0.x86_64
CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
(^ I built my v6.17-rc4 with this config too)
# grep CONFIG_PREEMPT /boot/config-6.12.0-55.31.1.el10_0.x86_64+rt
CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
# CONFIG_PREEMPT_DYNAMIC is not set
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
--
Regards,
Li Wang
More information about the ltp
mailing list