[LTP] Question about oom02 testcase

Li Wang liwang@redhat.com
Thu Jun 1 10:18:51 CEST 2023


Hi Hao,

Thanks for reporting this, comments see below.

On Tue, May 30, 2023 at 9:26 AM Gou Hao <gouhao@uniontech.com> wrote:

> hello everyone,
>
> Recently, kernel restarted while I was running oom02.
> log:
> ```
> [480156.950100] Tasks state (memory values in pages):
> [480156.950101] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes
> swapents oom_score_adj name
> [480156.950302] [   2578]    81  2578      523        0 393216
> 6          -900 dbus-daemon
> [480156.950309] [   2648]   172  2596     2435        0 393216
> 5             0 rtkit-daemon
> [480156.950322] [   5256]     0  2826    25411        0 589824
> 0             0 DetectThread
> [480156.950328] [   5404]     0  5404      412        2 393216
> 64         -1000 sshd
> [480156.950357] [  10518]     0 10518     2586        0 393216
> 10             0 at-spi2-registr
> [480156.950361] [  10553]     0 10551    10543        0 458752
> 9             0 QXcbEventQueue
> [480156.950365] [  10867]     0 10567    17579        0 589824
> 16             0 QXcbEventQueue
> [480156.950370] [  10928]     0 10921     6999        0 458752
> 17             0 QXcbEventQueue
> [480156.950390] [  11882]     0 11811     7377        0 458752
> 10             0 QXcbEventQueue
> [480156.950394] [  12052]     0 12052     5823        0 458752
> 21             0 fcitx
> [480156.950404] [  12115]     0 12114    11678        0 524288
> 21             0 QXcbEventQueue
> [480156.950408] [ 101558]     0 101558     3549        0 393216
> 0             0 runltp
> [480156.950486] [1068864]     0 1068864      771        6 327680
> 85         -1000 systemd-udevd
> [480156.950552] [1035639]     0 1035639       52        0 393216
> 14         -1000 oom02
> [480156.950556] [1035640]     0 1035640       52        0 393216
> 23         -1000 oom02
> [480156.950561] [1036065]     0 1036065      493       60 393216
> 0          -250 systemd-journal
> [480156.950565] [1036087]     0 1036073  6258739  3543942
> 37814272        0             0 oom02
> [480156.950572] Out of memory and no killable processes...
> [480156.950575] Kernel panic - not syncing: System is deadlocked on memory
> ```
>
> oom02-1036073 has been killed before crash.
> log:
> ```
> [480152.242506] [1035177]     0 1035177     4773       20 393216
> 115             0 sssd_nss
> [480152.242510] [1035376]     0 1035376    25500      391 589824
> 602             0 tuned
> [480152.242514] [1035639]     0 1035639       52        0 393216
> 14         -1000 oom02
> [480152.242517] [1035640]     0 1035640       52        0 393216
> 19         -1000 oom02
> [480152.242522] [1036065]     0 1036065      493      114 393216
> 62          -250 systemd-journal
> [480152.242525] [1036073]     0 1036073  6258739  3540314 37814272
> 104             0 oom02
> [480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or
> sacrifice child
> [480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
> anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
> [480152.365804] oom_reaper: reaped process 1036073 (oom02), now
> anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
> ```
> but its memory can not be reclaimed.I add trace-log to oom_reaper code
> in kernel,
> I found that there is a large range vma in the memory that cannot be
> reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot be reclaimed
> immediately.
> ```log
>        oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=65536
>        oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=196608
>        oom_reaper-57    [007] ....   126.063582: __oom_reap_task_mm: gh:
> vma continue: 1056883, range:3221225472
>        oom_reaper-57    [007] ....   126.063583: __oom_reap_task_mm: gh:
> vma is anon:112, range=65536
>        oom_reaper-57    [007] ....   126.063584: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=8388608
> ```
> `vma continue: 1056883, range:3221225472` is the memory that can not
> reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag
>
> oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
> will merge continuous vma into one,
> so as long as one thread is still running, the entire vma will not be
> released.
>
> In extreme cases, crashes may occur due to the lack of memory reclamation.
>
> My question is that the crash in this case is a normal situation or a
> bug (kernel or ltp bug) ?
>


The  ltp-oom test is originally designed to verify OOM mechanism
works for memory allocating in three types (normal, mlock, ksm)
all as expected.

Yes, your analysis is reasonable to some degree, oom_reaper
might not reclaim the VMA with locked pages  even after the
process termination.

But the exact behavior of the oom_reaper and the conditions under
which it can or cannot reclaim VMAs may vary depending on the
specific kernel version and configuration. So we shouldn't simply
regard the system panic as a Kernel or LTP defect.
And BTW, what is your tested kernel version?


-- 
Regards,
Li Wang


More information about the ltp mailing list