[LTP] Question about oom02 testcase
Li Wang
liwang@redhat.com
Thu Jun 1 10:18:51 CEST 2023
Hi Hao,
Thanks for reporting this, comments see below.
On Tue, May 30, 2023 at 9:26 AM Gou Hao <gouhao@uniontech.com> wrote:
> hello everyone,
>
> Recently, kernel restarted while I was running oom02.
> log:
> ```
> [480156.950100] Tasks state (memory values in pages):
> [480156.950101] [ pid ] uid tgid total_vm rss pgtables_bytes
> swapents oom_score_adj name
> [480156.950302] [ 2578] 81 2578 523 0 393216
> 6 -900 dbus-daemon
> [480156.950309] [ 2648] 172 2596 2435 0 393216
> 5 0 rtkit-daemon
> [480156.950322] [ 5256] 0 2826 25411 0 589824
> 0 0 DetectThread
> [480156.950328] [ 5404] 0 5404 412 2 393216
> 64 -1000 sshd
> [480156.950357] [ 10518] 0 10518 2586 0 393216
> 10 0 at-spi2-registr
> [480156.950361] [ 10553] 0 10551 10543 0 458752
> 9 0 QXcbEventQueue
> [480156.950365] [ 10867] 0 10567 17579 0 589824
> 16 0 QXcbEventQueue
> [480156.950370] [ 10928] 0 10921 6999 0 458752
> 17 0 QXcbEventQueue
> [480156.950390] [ 11882] 0 11811 7377 0 458752
> 10 0 QXcbEventQueue
> [480156.950394] [ 12052] 0 12052 5823 0 458752
> 21 0 fcitx
> [480156.950404] [ 12115] 0 12114 11678 0 524288
> 21 0 QXcbEventQueue
> [480156.950408] [ 101558] 0 101558 3549 0 393216
> 0 0 runltp
> [480156.950486] [1068864] 0 1068864 771 6 327680
> 85 -1000 systemd-udevd
> [480156.950552] [1035639] 0 1035639 52 0 393216
> 14 -1000 oom02
> [480156.950556] [1035640] 0 1035640 52 0 393216
> 23 -1000 oom02
> [480156.950561] [1036065] 0 1036065 493 60 393216
> 0 -250 systemd-journal
> [480156.950565] [1036087] 0 1036073 6258739 3543942
> 37814272 0 0 oom02
> [480156.950572] Out of memory and no killable processes...
> [480156.950575] Kernel panic - not syncing: System is deadlocked on memory
> ```
>
> oom02-1036073 has been killed before crash.
> log:
> ```
> [480152.242506] [1035177] 0 1035177 4773 20 393216
> 115 0 sssd_nss
> [480152.242510] [1035376] 0 1035376 25500 391 589824
> 602 0 tuned
> [480152.242514] [1035639] 0 1035639 52 0 393216
> 14 -1000 oom02
> [480152.242517] [1035640] 0 1035640 52 0 393216
> 19 -1000 oom02
> [480152.242522] [1036065] 0 1036065 493 114 393216
> 62 -250 systemd-journal
> [480152.242525] [1036073] 0 1036073 6258739 3540314 37814272
> 104 0 oom02
> [480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or
> sacrifice child
> [480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
> anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
> [480152.365804] oom_reaper: reaped process 1036073 (oom02), now
> anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
> ```
> but its memory can not be reclaimed.I add trace-log to oom_reaper code
> in kernel,
> I found that there is a large range vma in the memory that cannot be
> reclaimed, and the vma has the `VM_LOCKED` flag, so cannot be reclaimed
> immediately.
> ```log
> oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=65536
> oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=196608
> oom_reaper-57 [007] .... 126.063582: __oom_reap_task_mm: gh:
> vma continue: 1056883, range:3221225472
> oom_reaper-57 [007] .... 126.063583: __oom_reap_task_mm: gh:
> vma is anon:112, range=65536
> oom_reaper-57 [007] .... 126.063584: __oom_reap_task_mm: gh:
> vma is anon:1048691, range=8388608
> ```
> `vma continue: 1056883, range:3221225472` is the memory that can not
> reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag
>
> oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
> will merge continuous vma into one,
> so as long as one thread is still running, the entire vma will not be
> released.
>
> In extreme cases, crashes may occur due to the lack of memory reclamation.
>
> My question is that the crash in this case is a normal situation or a
> bug (kernel or ltp bug) ?
>
The ltp-oom test is originally designed to verify OOM mechanism
works for memory allocating in three types (normal, mlock, ksm)
all as expected.
Yes, your analysis is reasonable to some degree, oom_reaper
might not reclaim the VMA with locked pages even after the
process termination.
But the exact behavior of the oom_reaper and the conditions under
which it can or cannot reclaim VMAs may vary depending on the
specific kernel version and configuration. So we shouldn't simply
regard the system panic as a Kernel or LTP defect.
And BTW, what is your tested kernel version?
--
Regards,
Li Wang
More information about the ltp
mailing list