[LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
Tetsuo Handa
penguin-kernel@I-love.SAKURA.ne.jp
Tue Jan 26 15:46:23 CET 2016
Jan Stancek wrote:
> On 01/22/2016 04:14 PM, Jan Stancek wrote:
> > On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> >> although I
> >> couldn't find evidence that mlock() and madvice() are related with this hangup,
> >
> > I simplified reproducer by having only single thread allocating
> > memory when OOM triggers:
> > http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
> >
> > In this instance it was mmap + mlock, as you can see from oom call trace.
> > It made it to do_exit(), but couldn't complete it:
>
> I have extracted test from LTP into standalone reproducer (attached),
> if you want to give a try. It usually hangs my system within ~30
> minutes. If it takes too long, you can try disabling swap. From my past
> experience this usually helped to reproduce it faster on small KVM guests.
>
> # gcc oom_mlock.c -pthread -O2
> # echo 1 > /proc/sys/vm/overcommit_memory
> (optionally) # swapoff -a
> # ./a.out
>
> Also, it's interesting to note, that when I disabled mlock() calls
> test ran fine over night. I'll look into confirming this observation
> on more systems.
>
Thank you for a reproducer. I tried it with
----------
--- oom_mlock.c
+++ oom_mlock.c
@@ -33,7 +33,7 @@
if (s == MAP_FAILED)
return errno;
- if (do_mlock) {
+ if (0 && do_mlock) {
while (mlock(s, length) == -1 && loop > 0) {
if (EAGAIN != errno)
return errno;
----------
applied (i.e. disabled mlock() calls) on a VM with 4CPUs / 5120MB RAM, and
successfully reproduced a livelock. Therefore, I think mlock() is irrelevant.
What I observed is that while disk_events_workfn workqueue item was looping,
"Node 0 Normal free:" remained smaller than "min:" but "Node 0 Normal:" was
larger than "Node 0 Normal free:".
Is this difference caused by pending vmstat_update, vmstat_shepherd, vmpressure_work_fn ?
Can we somehow check how long these workqueue items remained pending?
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160126.txt.xz .
----------
[ 312.018243] Out of memory: Kill process 9785 (a.out) score 937 or sacrifice child
[ 312.020469] Killed process 9785 (a.out) total-vm:6314312kB, anon-rss:4637568kB, file-rss:0kB
[ 323.867935] MemAlloc-Info: 4 stalling task, 0 dying task, 0 victim task.
[ 323.870075] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=9828
[ 323.872259] kworker/1:8 R running task 0 9698 2 0x00000080
[ 323.874452] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[ 324.194104] Mem-Info:
[ 324.195095] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[ 324.195095] active_file:1 inactive_file:0 isolated_file:0
[ 324.195095] unevictable:0 dirty:0 writeback:0 unstable:0
[ 324.195095] slab_reclaimable:1906 slab_unreclaimable:5202
[ 324.195095] mapped:1175 shmem:4204 pagetables:2555 bounce:0
[ 324.195095] free:8087 free_pcp:0 free_cma:0
[ 324.204574] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 324.213795] lowmem_reserve[]: 0 2708 4673 4673
[ 324.215372] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 324.226159] lowmem_reserve[]: 0 0 1965 1965
[ 324.227795] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
[ 324.238987] lowmem_reserve[]: 0 0 0 0
[ 324.240614] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[ 324.244417] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[ 324.249520] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
[ 324.253481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 324.255910] 4219 total pagecache pages
[ 324.257497] 0 pages in swap cache
[ 324.258996] Swap cache stats: add 0, delete 0, find 0/0
[ 324.260835] Free swap = 0kB
[ 324.262243] Total swap = 0kB
[ 324.263648] 1310589 pages RAM
[ 324.265051] 0 pages HighMem/MovableOnly
[ 324.266607] 109257 pages reserved
[ 324.268079] 0 pages hwpoisoned
[ 324.269503] Showing busy workqueues and worker pools:
[ 324.271285] workqueue events: flags=0x0
[ 324.272865] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[ 324.274933] pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[ 324.277426] workqueue events_freezable_power_: flags=0x84
[ 324.279303] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 324.281398] in-flight: 9698:disk_events_workfn
[ 324.283242] workqueue vmstat: flags=0xc
[ 324.284849] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 324.286947] pending: vmstat_update
[ 324.288647] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
(...snipped...)
[ 334.771342] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[ 334.792901] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 345.351647] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[ 345.373938] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 355.983345] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:184 all_unreclaimable? yes
(...snipped...)
[ 356.008948] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 366.569312] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 366.591131] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 377.255040] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 377.276782] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 387.948890] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 387.970629] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 398.582824] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 398.604575] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 409.307406] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 409.329012] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 419.866932] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 419.888642] Node 0 Normal: 24*4kB (UM) 0*8kB 2*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3744kB
(...snipped...)
[ 430.444391] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 430.469660] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[ 441.055258] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 441.078312] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[ 451.624394] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
(...snipped...)
[ 451.646614] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
(...snipped...)
[ 461.701021] MemAlloc-Info: 6 stalling task, 0 dying task, 0 victim task.
(...snipped...)
[ 461.798182] MemAlloc: kworker/1:8(9698) seq=10 gfp=0x2400000 order=0 delay=147661
[ 461.800145] kworker/1:8 R running task 0 9698 2 0x00000080
[ 461.802061] Workqueue: events_freezable_power_ disk_events_workfn
(...snipped...)
[ 462.141865] Mem-Info:
[ 462.142877] active_anon:1165831 inactive_anon:3486 isolated_anon:0
[ 462.142877] active_file:1 inactive_file:0 isolated_file:0
[ 462.142877] unevictable:0 dirty:0 writeback:0 unstable:0
[ 462.142877] slab_reclaimable:1906 slab_unreclaimable:5202
[ 462.142877] mapped:1175 shmem:4204 pagetables:2555 bounce:0
[ 462.142877] free:8087 free_pcp:0 free_cma:0
[ 462.152641] Node 0 DMA free:15888kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 462.161919] lowmem_reserve[]: 0 2708 4673 4673
[ 462.163515] Node 0 DMA32 free:12836kB min:5008kB low:6260kB high:7512kB active_anon:2712464kB inactive_anon:7984kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2776588kB mlocked:0kB dirty:0kB writeback:0kB mapped:2880kB shmem:9704kB slab_reclaimable:4556kB slab_unreclaimable:9116kB kernel_stack:1184kB pagetables:4644kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 462.174286] lowmem_reserve[]: 0 0 1965 1965
[ 462.175919] Node 0 Normal free:3624kB min:3632kB low:4540kB high:5448kB active_anon:1950860kB inactive_anon:5960kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2012836kB mlocked:0kB dirty:0kB writeback:0kB mapped:1820kB shmem:7112kB slab_reclaimable:3068kB slab_unreclaimable:11676kB kernel_stack:1952kB pagetables:5576kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 462.187058] lowmem_reserve[]: 0 0 0 0
[ 462.188704] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 2*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[ 462.192473] Node 0 DMA32: 38*4kB (UM) 24*8kB (UME) 146*16kB (UME) 139*32kB (UME) 52*64kB (UME) 11*128kB (UME) 2*256kB (UM) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 12888kB
[ 462.198054] Node 0 Normal: 24*4kB (UM) 0*8kB 3*16kB (U) 29*32kB (U) 16*64kB (UM) 1*128kB (U) 0*256kB 1*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 3760kB
[ 462.202046] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 462.204481] 4219 total pagecache pages
[ 462.206068] 0 pages in swap cache
[ 462.207569] Swap cache stats: add 0, delete 0, find 0/0
[ 462.209416] Free swap = 0kB
[ 462.210833] Total swap = 0kB
[ 462.212240] 1310589 pages RAM
[ 462.213656] 0 pages HighMem/MovableOnly
[ 462.215229] 109257 pages reserved
[ 462.216696] 0 pages hwpoisoned
[ 462.218116] Showing busy workqueues and worker pools:
[ 462.219928] workqueue events: flags=0x0
[ 462.221501] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=3/256
[ 462.223582] pending: vmstat_shepherd, vmpressure_work_fn, vmw_fb_dirty_flush [vmwgfx]
[ 462.226085] workqueue events_power_efficient: flags=0x80
[ 462.227943] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 462.230056] pending: check_lifetime
[ 462.231713] workqueue events_freezable_power_: flags=0x84
[ 462.233577] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 462.235693] in-flight: 9698:disk_events_workfn
[ 462.237540] workqueue vmstat: flags=0xc
[ 462.239172] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[ 462.241287] pending: vmstat_update
[ 462.242970] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=10 idle: 9696 14 9665 9701 9675 9664 407 46 9691
----------
Above result was obtained using below patch on Linux 4.4.
----------
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fa39434..daf2a1a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1375,6 +1375,28 @@ struct tlbflush_unmap_batch {
bool writable;
};
+struct memalloc_info {
+ /* For locking and progress monitoring. */
+ unsigned int sequence;
+ /*
+ * 0: not doing __GFP_RECLAIM allocation.
+ * 1: doing non-recursive __GFP_RECLAIM allocation.
+ * 2: doing recursive __GFP_RECLAIM allocation.
+ */
+ u8 valid;
+ /*
+ * bit 0: Will be reported as OOM victim.
+ * bit 1: Will be reported as dying task.
+ * bit 2: Will be reported as stalling task.
+ */
+ u8 type;
+ /* Started time in jiffies as of valid == 1. */
+ unsigned long start;
+ /* Requested order and gfp flags as of valid == 1. */
+ unsigned int order;
+ gfp_t gfp;
+};
+
struct task_struct {
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
void *stack;
@@ -1813,6 +1835,9 @@ struct task_struct {
unsigned long task_state_change;
#endif
int pagefault_disabled;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+ struct memalloc_info memalloc;
+#endif
/* CPU-specific state of this task */
struct thread_struct thread;
/*
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index c9e4731..fb3004a 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -9,6 +9,9 @@ extern int sysctl_hung_task_warnings;
extern int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+extern unsigned long sysctl_memalloc_task_timeout_secs;
+#endif
#else
/* Avoid need for ifdefs elsewhere in the code */
enum { sysctl_hung_task_timeout_secs = 0 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 1155eac..894221df 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1416,6 +1416,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->sequential_io_avg = 0;
#endif
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+ p->memalloc.sequence = 0;
+#endif
+
/* Perform scheduler related setup. Assign this task to a CPU. */
retval = sched_fork(clone_flags, p);
if (retval)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index e0f90c2..2abebb9 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/sysctl.h>
#include <linux/utsname.h>
+#include <linux/console.h>
#include <trace/events/sched.h>
/*
@@ -72,6 +73,207 @@ static struct notifier_block panic_block = {
.notifier_call = hung_task_panic,
};
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+/*
+ * Zero means infinite timeout - no checking done:
+ */
+unsigned long __read_mostly sysctl_memalloc_task_timeout_secs =
+ CONFIG_DEFAULT_MEMALLOC_TASK_TIMEOUT;
+static struct memalloc_info memalloc; /* Filled by is_stalling_task(). */
+
+static long memalloc_timeout_jiffies(unsigned long last_checked, long timeout)
+{
+ struct task_struct *g, *p;
+ long t;
+ unsigned long delta;
+
+ /* timeout of 0 will disable the watchdog */
+ if (!timeout)
+ return MAX_SCHEDULE_TIMEOUT;
+ /* At least wait for timeout duration. */
+ t = last_checked - jiffies + timeout * HZ;
+ if (t > 0)
+ return t;
+ /* Calculate how long to wait more. */
+ t = timeout * HZ;
+ delta = t - jiffies;
+
+ /*
+ * We might see outdated values in "struct memalloc_info" here.
+ * We will recheck later using is_stalling_task().
+ */
+ preempt_disable();
+ rcu_read_lock();
+ for_each_process_thread(g, p) {
+ if (likely(!p->memalloc.valid))
+ continue;
+ t = min_t(long, t, p->memalloc.start + delta);
+ if (unlikely(t <= 0))
+ goto stalling;
+ }
+ stalling:
+ rcu_read_unlock();
+ preempt_enable();
+ return t;
+}
+
+/**
+ * is_stalling_task - Check and copy a task's memalloc variable.
+ *
+ * @task: A task to check.
+ * @expire: Timeout in jiffies.
+ *
+ * Returns true if a task is stalling, false otherwise.
+ */
+static bool is_stalling_task(const struct task_struct *task,
+ const unsigned long expire)
+{
+ const struct memalloc_info *m = &task->memalloc;
+
+ /*
+ * If start_memalloc_timer() is updating "struct memalloc_info" now,
+ * we can ignore it because timeout jiffies cannot be expired as soon
+ * as updating it completes.
+ */
+ if (!m->valid || (m->sequence & 1))
+ return false;
+ smp_rmb(); /* Block start_memalloc_timer(). */
+ memalloc.start = m->start;
+ memalloc.order = m->order;
+ memalloc.gfp = m->gfp;
+ smp_rmb(); /* Unblock start_memalloc_timer(). */
+ memalloc.sequence = m->sequence;
+ /*
+ * If start_memalloc_timer() started updating it while we read it,
+ * we can ignore it for the same reason.
+ */
+ if (!m->valid || (memalloc.sequence & 1))
+ return false;
+ /* This is a valid "struct memalloc_info". Check for timeout. */
+ return time_after_eq(expire, memalloc.start);
+}
+
+/* Check for memory allocation stalls. */
+static void check_memalloc_stalling_tasks(unsigned long timeout)
+{
+ char buf[128];
+ struct task_struct *g, *p;
+ unsigned long now;
+ unsigned long expire;
+ unsigned int sigkill_pending;
+ unsigned int memdie_pending;
+ unsigned int stalling_tasks;
+
+ cond_resched();
+ now = jiffies;
+ /*
+ * Report tasks that stalled for more than half of timeout duration
+ * because such tasks might be correlated with tasks that already
+ * stalled for full timeout duration.
+ */
+ expire = now - timeout * (HZ / 2);
+ /* Count stalling tasks, dying and victim tasks. */
+ sigkill_pending = 0;
+ memdie_pending = 0;
+ stalling_tasks = 0;
+ preempt_disable();
+ rcu_read_lock();
+ for_each_process_thread(g, p) {
+ u8 type = 0;
+
+ if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+ type |= 1;
+ memdie_pending++;
+ }
+ if (fatal_signal_pending(p)) {
+ type |= 2;
+ sigkill_pending++;
+ }
+ if (is_stalling_task(p, expire)) {
+ type |= 4;
+ stalling_tasks++;
+ }
+ p->memalloc.type = type;
+ }
+ rcu_read_unlock();
+ preempt_enable();
+ if (!stalling_tasks)
+ return;
+ /* Report stalling tasks, dying and victim tasks. */
+ pr_warn("MemAlloc-Info: %u stalling task, %u dying task, %u victim task.\n",
+ stalling_tasks, sigkill_pending, memdie_pending);
+ cond_resched();
+ preempt_disable();
+ rcu_read_lock();
+ restart_report:
+ for_each_process_thread(g, p) {
+ bool can_cont;
+ u8 type;
+
+ if (likely(!p->memalloc.type))
+ continue;
+ p->memalloc.type = 0;
+ /* Recheck in case state changed meanwhile. */
+ type = 0;
+ if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ type |= 1;
+ if (fatal_signal_pending(p))
+ type |= 2;
+ if (is_stalling_task(p, expire)) {
+ type |= 4;
+ snprintf(buf, sizeof(buf),
+ " seq=%u gfp=0x%x order=%u delay=%lu",
+ memalloc.sequence >> 1, memalloc.gfp,
+ memalloc.order, now - memalloc.start);
+ } else {
+ buf[0] = '\0';
+ }
+ if (unlikely(!type))
+ continue;
+ /*
+ * Victim tasks get pending SIGKILL removed before arriving at
+ * do_exit(). Therefore, print " exiting" instead for " dying".
+ */
+ pr_warn("MemAlloc: %s(%u)%s%s%s%s%s\n", p->comm, p->pid, buf,
+ (p->state & TASK_UNINTERRUPTIBLE) ?
+ " uninterruptible" : "",
+ (p->flags & PF_EXITING) ? " exiting" : "",
+ (type & 2) ? " dying" : "",
+ (type & 1) ? " victim" : "");
+ sched_show_task(p);
+ debug_show_held_locks(p);
+ /*
+ * Since there could be thousands of tasks to report, we always
+ * sleep and try to flush printk() buffer after each report, in
+ * order to avoid RCU stalls and reduce possibility of messages
+ * being dropped by continuous printk() flood.
+ *
+ * Since not yet reported tasks have p->memalloc.type > 0, we
+ * can simply restart this loop in case "g" or "p" went away.
+ */
+ get_task_struct(g);
+ get_task_struct(p);
+ rcu_read_unlock();
+ preempt_enable();
+ schedule_timeout_interruptible(1);
+ preempt_disable();
+ rcu_read_lock();
+ can_cont = pid_alive(g) && pid_alive(p);
+ put_task_struct(p);
+ put_task_struct(g);
+ if (!can_cont)
+ goto restart_report;
+ }
+ rcu_read_unlock();
+ preempt_enable();
+ cond_resched();
+ /* Show memory information. (SysRq-m) */
+ show_mem(0);
+ /* Show workqueue state. */
+ show_workqueue_state();
+}
+#endif /* CONFIG_DETECT_MEMALLOC_STALL_TASK */
+
static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
unsigned long switch_count = t->nvcsw + t->nivcsw;
@@ -185,10 +387,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
rcu_read_unlock();
}
-static unsigned long timeout_jiffies(unsigned long timeout)
+static unsigned long hung_timeout_jiffies(long last_checked, long timeout)
{
/* timeout of 0 will disable the watchdog */
- return timeout ? timeout * HZ : MAX_SCHEDULE_TIMEOUT;
+ if (!timeout)
+ return MAX_SCHEDULE_TIMEOUT;
+ return last_checked - jiffies + timeout * HZ;
}
/*
@@ -224,18 +428,36 @@ EXPORT_SYMBOL_GPL(reset_hung_task_detector);
*/
static int watchdog(void *dummy)
{
+ unsigned long hung_last_checked = jiffies;
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+ unsigned long stall_last_checked = hung_last_checked;
+#endif
+
set_user_nice(current, 0);
for ( ; ; ) {
unsigned long timeout = sysctl_hung_task_timeout_secs;
-
- while (schedule_timeout_interruptible(timeout_jiffies(timeout)))
- timeout = sysctl_hung_task_timeout_secs;
-
- if (atomic_xchg(&reset_hung_task, 0))
+ long t = hung_timeout_jiffies(hung_last_checked, timeout);
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+ unsigned long timeout2 = sysctl_memalloc_task_timeout_secs;
+ long t2 = memalloc_timeout_jiffies(stall_last_checked,
+ timeout2);
+
+ if (t2 <= 0) {
+ check_memalloc_stalling_tasks(timeout2);
+ stall_last_checked = jiffies;
continue;
-
- check_hung_uninterruptible_tasks(timeout);
+ }
+#else
+ long t2 = t;
+#endif
+ if (t <= 0) {
+ if (!atomic_xchg(&reset_hung_task, 0))
+ check_hung_uninterruptible_tasks(timeout);
+ hung_last_checked = jiffies;
+ continue;
+ }
+ schedule_timeout_interruptible(min(t, t2));
}
return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index dc6858d..96460aa 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1061,6 +1061,16 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &neg_one,
},
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+ {
+ .procname = "memalloc_task_timeout_secs",
+ .data = &sysctl_memalloc_task_timeout_secs,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_dohung_task_timeout_secs,
+ .extra2 = &hung_task_timeout_max,
+ },
+#endif
#endif
#ifdef CONFIG_COMPAT
{
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8c15b29..26d2c91 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -812,6 +812,30 @@ config BOOTPARAM_HUNG_TASK_PANIC_VALUE
default 0 if !BOOTPARAM_HUNG_TASK_PANIC
default 1 if BOOTPARAM_HUNG_TASK_PANIC
+config DETECT_MEMALLOC_STALL_TASK
+ bool "Detect tasks stalling inside memory allocator"
+ default n
+ depends on DETECT_HUNG_TASK
+ help
+ This option emits warning messages and traces when memory
+ allocation requests are stalling, in order to catch unexplained
+ hangups/reboots caused by memory allocation stalls.
+
+config DEFAULT_MEMALLOC_TASK_TIMEOUT
+ int "Default timeout for stalling task detection (in seconds)"
+ depends on DETECT_MEMALLOC_STALL_TASK
+ default 10
+ help
+ This option controls the default timeout (in seconds) used
+ to determine when a task has become non-responsive and should
+ be considered stalling inside memory allocator.
+
+ It can be adjusted at runtime via the kernel.memalloc_task_timeout_secs
+ sysctl or by writing a value to
+ /proc/sys/kernel/memalloc_task_timeout_secs.
+
+ A timeout of 0 disables the check. The default is 10 seconds.
+
endmenu # "Debug lockups and hangs"
config PANIC_ON_OOPS
diff --git a/mm/mlock.c b/mm/mlock.c
index 339d9e0..d6006b1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -172,7 +172,7 @@ static void __munlock_isolation_failed(struct page *page)
*/
unsigned int munlock_vma_page(struct page *page)
{
- unsigned int nr_pages;
+ int nr_pages;
struct zone *zone = page_zone(page);
/* For try_to_munlock() and to serialize with page migration */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d666df..4e4e4b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3165,6 +3165,37 @@ got_pg:
return page;
}
+#ifdef CONFIG_DETECT_MEMALLOC_STALL_TASK
+static void start_memalloc_timer(const gfp_t gfp_mask, const int order)
+{
+ struct memalloc_info *m = ¤t->memalloc;
+
+ /* We don't check for stalls for !__GFP_RECLAIM allocations. */
+ if (!(gfp_mask & __GFP_RECLAIM))
+ return;
+ /* We don't check for stalls for nested __GFP_RECLAIM allocations */
+ if (!m->valid) {
+ m->sequence++;
+ smp_wmb(); /* Block is_stalling_task(). */
+ m->start = jiffies;
+ m->order = order;
+ m->gfp = gfp_mask;
+ smp_wmb(); /* Unblock is_stalling_task(). */
+ m->sequence++;
+ }
+ m->valid++;
+}
+
+static void stop_memalloc_timer(const gfp_t gfp_mask)
+{
+ if (gfp_mask & __GFP_RECLAIM)
+ current->memalloc.valid--;
+}
+#else
+#define start_memalloc_timer(gfp_mask, order) do { } while (0)
+#define stop_memalloc_timer(gfp_mask) do { } while (0)
+#endif
+
/*
* This is the 'heart' of the zoned buddy allocator.
*/
@@ -3232,7 +3263,9 @@ retry_cpuset:
alloc_mask = memalloc_noio_flags(gfp_mask);
ac.spread_dirty_pages = false;
+ start_memalloc_timer(alloc_mask, order);
page = __alloc_pages_slowpath(alloc_mask, order, &ac);
+ stop_memalloc_timer(alloc_mask);
}
if (kmemcheck_enabled && page)
----------
More information about the Ltp
mailing list