[LTP] [sched/fair] 1c35b07e6d: RIP:native_queued_spin_lock_slowpath
kernel test robot
rong.a.chen@intel.com
Wed Jul 7 11:07:22 CEST 2021
On Tue, Jul 06, 2021 at 11:08:54AM +0200, Vincent Guittot wrote:
> Hi Rong
>
> On Tue, 6 Jul 2021 at 10:56, kernel test robot <rong.a.chen@intel.com> wrote:
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 1c35b07e6d3986474e5635be566e7bc79d97c64d ("sched/fair: Ensure _sum and _avg values stay consistent")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> I don't think this commit is the real culprit as it mainly replaces a
> sub by a mul whereas the dmesg mentioned spinlock deadlock . Have you
> bisect the problem down to this commit or you faced the problem while
> testing latest master branch ?
Hi Vincent,
It's bisected by 0day-CI, I tried to run more times and found the issue is not
first introduced by this commit, but boot always failed with this commit.
adf3c31e18b765ea 1c35b07e6d3986474e5635be566
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
7:15 -47% :12 last_state.booting
2:15 67% 12:12 dmesg.Kernel_panic-not_syncing:Hard_LOCKUP
:15 7% 1:12 dmesg.RIP:cpuidle_enter_state
2:15 67% 12:12 dmesg.RIP:native_queued_spin_lock_slowpath
:15 7% 1:12 dmesg.RIP:sd_init_command[sd_mod]
1:15 -7% :12 dmesg.RIP:update_blocked_averages
1:15 -7% :12 dmesg.WARNING:at_kernel/sched/fair.c:#update_blocked_averages
3:15 60% 12:12 dmesg.boot_failures
Best Regards,
Rong Chen
>
> >
> >
> > in testcase: ltp
> > version: ltp-x86_64-14c1f76-1_20210703
> > with following parameters:
> >
> > disk: 1HDD
> > fs: ext4
> > test: dio-01
> > ucode: 0xe2
> >
> > test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> > test-url: http://linux-test-project.github.io/
> >
> >
> > on test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with 28G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> >
> >
> > [ 160.446205]
> > [ 160.451594] <<<test_output>>>
> > [ 160.451595]
> > [ 178.116525] ------------[ cut here ]------------
> > [ 203.592757] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
> > [ 203.592758] Modules linked in: dm_mod btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c ipmi_devintf ipmi_msghandler sd_mod t10_pi sg intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mei_wdt intel_gtt drm_kms_helper ahci rapl syscopyarea libahci sysfillrect intel_cstate sysimgblt mei_me fb_sys_fops wmi_bmof drm intel_uncore libata mei joydev intel_pch_thermal wmi video intel_pmc_core acpi_pad ip_tables
> > [ 203.592770] CPU: 3 PID: 3103 Comm: diotest6 Tainted: G I 5.13.0-rc6-00076-g1c35b07e6d39 #1
> > [ 203.592770] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016
> > [ 203.592771] RIP: 0010:native_queued_spin_lock_slowpath (kbuild/src/consumer/kernel/locking/qspinlock.c:382 kbuild/src/consumer/kernel/locking/qspinlock.c:315)
> > [ 203.592771] Code: 6c f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 46 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 b8 00 02 00 00
> > All code
> > ========
> > 0: 6c insb (%dx),%es:(%rdi)
> > 1: f0 0f ba 2f 08 lock btsl $0x8,(%rdi)
> > 6: 0f 92 c0 setb %al
> > 9: 0f b6 c0 movzbl %al,%eax
> > c: c1 e0 08 shl $0x8,%eax
> > f: 89 c2 mov %eax,%edx
> > 11: 8b 07 mov (%rdi),%eax
> > 13: 30 e4 xor %ah,%ah
> > 15: 09 d0 or %edx,%eax
> > 17: a9 00 01 ff ff test $0xffff0100,%eax
> > 1c: 75 46 jne 0x64
> > 1e: 85 c0 test %eax,%eax
> > 20: 74 0e je 0x30
> > 22: 8b 07 mov (%rdi),%eax
> > 24: 84 c0 test %al,%al
> > 26: 74 08 je 0x30
> > 28: f3 90 pause
> > 2a:* 8b 07 mov (%rdi),%eax <-- trapping instruction
> > 2c: 84 c0 test %al,%al
> > 2e: 75 f8 jne 0x28
> > 30: b8 01 00 00 00 mov $0x1,%eax
> > 35: 66 89 07 mov %ax,(%rdi)
> > 38: c3 retq
> > 39: 8b 37 mov (%rdi),%esi
> > 3b: b8 00 02 00 00 mov $0x200,%eax
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 8b 07 mov (%rdi),%eax
> > 2: 84 c0 test %al,%al
> > 4: 75 f8 jne 0xfffffffffffffffe
> > 6: b8 01 00 00 00 mov $0x1,%eax
> > b: 66 89 07 mov %ax,(%rdi)
> > e: c3 retq
> > f: 8b 37 mov (%rdi),%esi
> > 11: b8 00 02 00 00 mov $0x200,%eax
> > [ 203.592772] RSP: 0018:ffffc90001f032d8 EFLAGS: 00000002
> > [ 203.592773] RAX: 0000000000000101 RBX: ffff88810d4a0000 RCX: ffff888759cc0000
> > [ 203.592773] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888759ceba80
> > [ 203.592774] RBP: ffffc90001f032e8 R08: ffff888759ceb420 R09: ffff888759ceb420
> > [ 203.592774] R10: ffff88810cc01500 R11: 0000000000000000 R12: ffff888759ceba80
> > [ 203.592774] R13: 0000000000000000 R14: 0000000000000087 R15: ffff88810d4a0c8c
> > [ 203.592775] FS: 00007fc252ae2740(0000) GS:ffff888759cc0000(0000) knlGS:0000000000000000
> > [ 203.592775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 203.592776] CR2: 00007fa0a4d577f8 CR3: 000000074d22a005 CR4: 00000000003706e0
> > [ 203.592776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 203.592776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 203.592777] Call Trace:
> > [ 203.592777] _raw_spin_lock (kbuild/src/consumer/arch/x86/include/asm/paravirt.h:585 kbuild/src/consumer/arch/x86/include/asm/qspinlock.h:51 kbuild/src/consumer/include/asm-generic/qspinlock.h:85 kbuild/src/consumer/include/linux/spinlock.h:183 kbuild/src/consumer/include/linux/spinlock_api_smp.h:143 kbuild/src/consumer/kernel/locking/spinlock.c:151)
> > [ 203.592777] raw_spin_rq_lock_nested (kbuild/src/consumer/arch/x86/include/asm/preempt.h:85 kbuild/src/consumer/kernel/sched/core.c:462)
> > [ 203.592778] try_to_wake_up (kbuild/src/consumer/kernel/sched/sched.h:1536 kbuild/src/consumer/kernel/sched/sched.h:1611 kbuild/src/consumer/kernel/sched/core.c:3555 kbuild/src/consumer/kernel/sched/core.c:3835)
> > [ 203.592778] __queue_work (kbuild/src/consumer/arch/x86/include/asm/paravirt.h:590 kbuild/src/consumer/arch/x86/include/asm/qspinlock.h:56 kbuild/src/consumer/include/linux/spinlock.h:212 kbuild/src/consumer/include/linux/spinlock_api_smp.h:151 kbuild/src/consumer/kernel/workqueue.c:1501)
> > [ 203.592778] queue_work_on (kbuild/src/consumer/kernel/workqueue.c:1526)
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > bin/lkp run generated-yaml-file
> >
> >
> >
> > ---
> > 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> > https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
> >
> > Thanks,
> > Rong Chen
> >
More information about the ltp
mailing list