[LTP] LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference

Yong Wang yongw.pur@gmail.com
Thu Sep 14 17:18:39 CEST 2023


Hello!
>Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
>running LTP sched tests cases.
>
>This is not always reproducible.
I also encountered this problem on linux 5.10 on arm64 environment.
The prompt information is as follows:
[ 2893.003795] ================================================================== 
[ 2893.003822] BUG: KASAN: null-ptr-deref in pick_next_task_fair+0x130/0x4e0 
[ 2893.003880] Read of size 8 at addr 0000000000000080 by task ksoftirqd/0/12 
[ 2893.003901]  
[ 2893.003914] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: P           O      5.10.59-rt52#1 
[ 2893.003959] Call trace: 
[ 2893.003968]  dump_backtrace+0x0/0x2e8 
[ 2893.004009]  show_stack+0x18/0x28 
[ 2893.004032]  dump_stack+0x104/0x174 
[ 2893.004067]  kasan_report+0x1d0/0x258 
[ 2893.004098]  __asan_load8+0x94/0xd0 
[ 2893.004126]  pick_next_task_fair+0x130/0x4e0 
[ 2893.004164]  __schedule+0x220/0xbd0 
[ 2893.004192]  schedule+0xec/0x1a0 
[ 2893.004216]  smpboot_thread_fn+0x124/0x548 
[ 2893.004246]  kthread+0x24c/0x278 
[ 2893.004277]  ret_from_fork+0x10/0x34 
[ 2893.004306] ================================================================== 
[ 2893.004325] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 
[ 2893.152267] Mem abort info: 
[ 2893.152639]   ESR = 0x96000004 
[ 2893.153045]   EC = 0x25: DABT (current EL), IL = 32 bits 
[ 2893.153739]   SET = 0, FnV = 0 
[ 2893.154143]   EA = 0, S1PTW = 0 
[ 2893.154560] Data abort info: 
[ 2893.154940]   ISV = 0, ISS = 0x00000004 
[ 2893.155443]   CM = 0, WnR = 0 
[ 2893.155838] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000188edb000 

The source code where the problem occurs corresponds to:
  se = pick_next_entity(cfs_rq, curr);		
  cfs_rq = group_cfs_rq(se); //se is NULL!

It is found that pick_next_entity returns null, so null-ptr-dere appears when accessing the members of se later.
But it is not clear under what circumstances pick_next_entity returns null.

In addition, in my environment, the following operations often recur:
  stress-ng -c 8 --cpu-load 100 --sched fifo --sched-prio 1 --cpu-method pi -t 900 &
  runltp -s cfs_bandwidth01

Hope it helps to solve the problem.
Thanks.


More information about the ltp mailing list