[LTP] [BUG] oom hangs the system, NMI backtrace shows most CPUs in shrink_slab
Tetsuo Handa
penguin-kernel@I-love.SAKURA.ne.jp
Sat Jan 23 07:30:31 CET 2016
Jan Stancek wrote:
> On 01/19/2016 11:29 AM, Tetsuo Handa wrote:
> > although I
> > couldn't find evidence that mlock() and madvice() are related with this hangup,
>
> I simplified reproducer by having only single thread allocating
> memory when OOM triggers:
> http://jan.stancek.eu/tmp/oom_hangs/console.log.3-v4.4-8606-with-memalloc.txt
>
> In this instance it was mmap + mlock, as you can see from oom call trace.
> It made it to do_exit(), but couldn't complete it:
Thank you for retaking.
Comparing console.log.2-v4.4-8606-with-memalloc_wc.txt.bz2 and
console.log.3-v4.4-8606-with-memalloc.txt :
different things
Free swap = 0kB for the former
Free swap = 7556632kB for the latter
common things
All stalling allocations are order 0.
Swap cache stats: stopped increasing
Node 0 Normal free: remained below min:
A kworker got stuck inside 0x2400000 (GFP_NOIO) allocation within 1 second
after other allocations (0x24280ca (GFP_HIGHUSER_MOVABLE) or 0x24201ca
(GFP_HIGHUSER_MOVABLE | __GFP_COLD)) got stuck.
----------
[ 6904.555880] MemAlloc-Info: 2 stalling task, 0 dying task, 0 victim task.
[ 6904.563387] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=10001
[ 6904.571353] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=10001
[ 6915.195869] MemAlloc-Info: 16 stalling task, 0 dying task, 0 victim task.
[ 6915.203458] MemAlloc: systemd-journal(592) seq=33409 gfp=0x24201ca order=0 delay=20495
[ 6915.212300] MemAlloc: NetworkManager(807) seq=42042 gfp=0x24200ca order=0 delay=12030
[ 6915.221042] MemAlloc: gssproxy(815) seq=1551 gfp=0x24201ca order=0 delay=19414
[ 6915.229104] MemAlloc: irqbalance(825) seq=6763 gfp=0x24201ca order=0 delay=11234
[ 6915.237363] MemAlloc: tuned(1339) seq=74664 gfp=0x24201ca order=0 delay=20354
[ 6915.245329] MemAlloc: top(10485) seq=486624 gfp=0x24201ca order=0 delay=20124
[ 6915.253288] MemAlloc: kworker/1:1(20708) seq=48 gfp=0x2400000 order=0 delay=20248
[ 6915.261640] MemAlloc: sendmail(21855) seq=207 gfp=0x24201ca order=0 delay=19977
[ 6915.269800] MemAlloc: oom01(22007) seq=2 gfp=0x24201ca order=0 delay=20269
[ 6915.277466] MemAlloc: oom01(22008) seq=5659 gfp=0x24280ca order=0 delay=20502
[ 6915.285432] MemAlloc: oom01(22009) seq=5189 gfp=0x24280ca order=0 delay=20502
[ 6915.293389] MemAlloc: oom01(22010) seq=4795 gfp=0x24280ca order=0 delay=20502
[ 6915.301353] MemAlloc: oom01(22011) seq=5135 gfp=0x24280ca order=0 delay=20641
[ 6915.309316] MemAlloc: oom01(22012) seq=3828 gfp=0x24280ca order=0 delay=20502
[ 6915.317280] MemAlloc: oom01(22013) seq=5101 gfp=0x24280ca order=0 delay=20641
[ 6915.325244] MemAlloc: oom01(22014) seq=3633 gfp=0x24280ca order=0 delay=20502
----------
[19394.048063] MemAlloc-Info: 1 stalling task, 0 dying task, 0 victim task.
[19394.055562] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=10001
[19404.625516] MemAlloc-Info: 10 stalling task, 0 dying task, 0 victim task.
[19404.633107] MemAlloc: auditd(783) seq=615 gfp=0x24201ca order=0 delay=15101
[19404.640877] MemAlloc: irqbalance(806) seq=8107 gfp=0x24201ca order=0 delay=18440
[19404.649135] MemAlloc: NetworkManager(820) seq=10854 gfp=0x24200ca order=0 delay=19527
[19404.657874] MemAlloc: gssproxy(826) seq=586 gfp=0x24201ca order=0 delay=18487
[19404.665841] MemAlloc: tuned(1337) seq=40098 gfp=0x24201ca order=0 delay=19900
[19404.673805] MemAlloc: crond(2242) seq=5612 gfp=0x24201ca order=0 delay=15329
[19404.681674] MemAlloc: systemd-journal(22961) seq=151917 gfp=0x24201ca order=0 delay=20579
[19404.690796] MemAlloc: sendmail(31908) seq=7256 gfp=0x24200ca order=0 delay=17633
[19404.699051] MemAlloc: kworker/2:2(32161) seq=9 gfp=0x2400000 order=0 delay=19889
[19404.707306] MemAlloc: oom01(32704) seq=6391 gfp=0x24200ca order=0 delay=19164 exiting
----------
Does somebody know whether GFP_HIGHUSER_MOVABLE depend on workqueue status?
* GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not
* need direct access to but can use kmap() when access is required. They
* are expected to be movable via page reclaim or page migration. Typically,
* pages on the LRU would also be allocated with GFP_HIGHUSER_MOVABLE.
I don't have reproducer environment. But if this problem involves workqueue,
running kernel module below which requests GFP_NOIO allocation more frequently
than disk_check_events() does might help reproducing this problem.
---------- test/wq_test.c ----------
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/kthread.h>
#include <linux/delay.h>
static void wq_test_fn(struct work_struct *work);
static struct task_struct *task;
static bool pending;
static DECLARE_WORK(wq_test_work, wq_test_fn);
static void wq_test_fn(struct work_struct *unused)
{
kfree(kmalloc(PAGE_SIZE, GFP_NOIO));
pending = false;
}
static int wq_test_thread(void *unused)
{
while (!kthread_should_stop()) {
msleep(HZ / 10);
pending = true;
queue_work(system_freezable_power_efficient_wq, &wq_test_work);
while (pending)
msleep(1);
}
return 0;
}
static int __init wq_test_init(void)
{
task = kthread_run(wq_test_thread, NULL, "wq_test");
return IS_ERR(task) ? -ENOMEM : 0;
}
static void __exit wq_test_exit(void)
{
kthread_stop(task);
ssleep(1);
}
module_init(wq_test_init);
module_exit(wq_test_exit);
MODULE_LICENSE("GPL");
---------- test/wq_test.c ----------
---------- test/Makefile ----------
obj-m += wq_test.o
---------- test/Makefile ----------
$ make SUBDIRS=$PWD/test
# insmod test/wq_test.ko
More information about the Ltp
mailing list