[LTP] Question about oom02 testcase

Gou Hao gouhao@uniontech.com
Thu Jun 1 11:45:50 CEST 2023


On 6/1/23 16:18, Li Wang wrote:

> Hi Hao,
>
> Thanks for reporting this, comments see below.
>
> On Tue, May 30, 2023 at 9:26 AM Gou Hao <gouhao@uniontech.com> wrote:
>
>     hello everyone,
>
>     Recently, kernel restarted while I was running oom02.
>     log:
>     ```
>     [480156.950100] Tasks state (memory values in pages):
>     [480156.950101] [  pid  ]   uid  tgid total_vm      rss
>     pgtables_bytes
>     swapents oom_score_adj name
>     [480156.950302] [   2578]    81  2578      523        0 393216
>     6          -900 dbus-daemon
>     [480156.950309] [   2648]   172  2596     2435        0 393216
>     5             0 rtkit-daemon
>     [480156.950322] [   5256]     0  2826    25411        0 589824
>     0             0 DetectThread
>     [480156.950328] [   5404]     0  5404      412        2 393216
>     64         -1000 sshd
>     [480156.950357] [  10518]     0 10518     2586        0 393216
>     10             0 at-spi2-registr
>     [480156.950361] [  10553]     0 10551    10543        0 458752
>     9             0 QXcbEventQueue
>     [480156.950365] [  10867]     0 10567    17579        0 589824
>     16             0 QXcbEventQueue
>     [480156.950370] [  10928]     0 10921     6999        0 458752
>     17             0 QXcbEventQueue
>     [480156.950390] [  11882]     0 11811     7377        0 458752
>     10             0 QXcbEventQueue
>     [480156.950394] [  12052]     0 12052     5823        0 458752
>     21             0 fcitx
>     [480156.950404] [  12115]     0 12114    11678        0 524288
>     21             0 QXcbEventQueue
>     [480156.950408] [ 101558]     0 101558     3549        0 393216
>     0             0 runltp
>     [480156.950486] [1068864]     0 1068864      771        6 327680
>     85         -1000 systemd-udevd
>     [480156.950552] [1035639]     0 1035639       52        0 393216
>     14         -1000 oom02
>     [480156.950556] [1035640]     0 1035640       52        0 393216
>     23         -1000 oom02
>     [480156.950561] [1036065]     0 1036065      493       60 393216
>     0          -250 systemd-journal
>     [480156.950565] [1036087]     0 1036073  6258739  3543942
>     37814272        0             0 oom02
>     [480156.950572] Out of memory and no killable processes...
>     [480156.950575] Kernel panic - not syncing: System is deadlocked
>     on memory
>     ```
>
>     oom02-1036073 has been killed before crash.
>     log:
>     ```
>     [480152.242506] [1035177]     0 1035177     4773       20 393216
>     115             0 sssd_nss
>     [480152.242510] [1035376]     0 1035376    25500      391 589824
>     602             0 tuned
>     [480152.242514] [1035639]     0 1035639       52        0 393216
>     14         -1000 oom02
>     [480152.242517] [1035640]     0 1035640       52        0 393216
>     19         -1000 oom02
>     [480152.242522] [1036065]     0 1036065      493      114 393216
>     62          -250 systemd-journal
>     [480152.242525] [1036073]     0 1036073  6258739  3540314 37814272
>     104             0 oom02
>     [480152.242529] Out of memory: Kill process 1036073 (oom02) score
>     755 or
>     sacrifice child
>     [480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
>     anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
>     [480152.365804] oom_reaper: reaped process 1036073 (oom02), now
>     anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
>     ```
>     but its memory can not be reclaimed.I add trace-log to oom_reaper
>     code
>     in kernel,
>     I found that there is a large range vma in the memory that cannot be
>     reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot be
>     reclaimed
>     immediately.
>     ```log
>            oom_reaper-57    [007] ....   126.063581:
>     __oom_reap_task_mm: gh:
>     vma is anon:1048691, range=65536
>            oom_reaper-57    [007] ....   126.063581:
>     __oom_reap_task_mm: gh:
>     vma is anon:1048691, range=196608
>            oom_reaper-57    [007] ....   126.063582:
>     __oom_reap_task_mm: gh:
>     vma continue: 1056883, range:3221225472
>            oom_reaper-57    [007] ....   126.063583:
>     __oom_reap_task_mm: gh:
>     vma is anon:112, range=65536
>            oom_reaper-57    [007] ....   126.063584:
>     __oom_reap_task_mm: gh:
>     vma is anon:1048691, range=8388608
>     ```
>     `vma continue: 1056883, range:3221225472` is the memory that can not
>     reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag
>
>     oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
>     will merge continuous vma into one,
>     so as long as one thread is still running, the entire vma will not be
>     released.
>
>     In extreme cases, crashes may occur due to the lack of memory
>     reclamation.
>
>     My question is that the crash in this case is a normal situation or a
>     bug (kernel or ltp bug) ?
>
>
>
> The ltp-oom test is originally designed to verify OOM mechanism
> works for memory allocating in three types (normal, mlock, ksm)
> all as expected.
>
> Yes, your analysis is reasonable to some degree, oom_reaper
> might not reclaim the VMA with locked pages  even after the
> process termination.
>
> But the exact behavior of the oom_reaper and the conditions under
> which it can or cannot reclaim VMAs may vary depending on the
> specific kernel version and configuration. So we shouldn't simply
> regard the system panic as a Kernel or LTP defect.
> And BTW, what is your tested kernel version?
>
hi Li Wang,

Thank you for your reply.

My kernel version is 4.19, but it's not a community version.

I have only encountered the crash once, and most of the time oom_reaper 
can handle it well.

I tried to find a method or flag to prevent vma merging during mmap, but 
couldn't find it.


>
> -- 
> Regards,
> Li Wang

-- 
thanks,
Gou Hao<gouhao@uniontech.com>


More information about the ltp mailing list