[LTP] Question about oom02 testcase
Gou Hao
gouhao@uniontech.com
Fri Jun 2 04:47:15 CEST 2023
On 6/1/23 18:50, Li Wang wrote:
> On Thu, Jun 1, 2023 at 5:46 PM Gou Hao <gouhao@uniontech.com> wrote:
>
>> On 6/1/23 16:18, Li Wang wrote:
>>
>> Hi Hao,
>>
>> Thanks for reporting this, comments see below.
>>
>> On Tue, May 30, 2023 at 9:26 AM Gou Hao <gouhao@uniontech.com> wrote:
>>
>>> hello everyone,
>>>
>>> Recently, kernel restarted while I was running oom02.
>>> log:
>>> ```
>>> [480156.950100] Tasks state (memory values in pages):
>>> [480156.950101] [ pid ] uid tgid total_vm rss pgtables_bytes
>>> swapents oom_score_adj name
>>> [480156.950302] [ 2578] 81 2578 523 0 393216
>>> 6 -900 dbus-daemon
>>> [480156.950309] [ 2648] 172 2596 2435 0 393216
>>> 5 0 rtkit-daemon
>>> [480156.950322] [ 5256] 0 2826 25411 0 589824
>>> 0 0 DetectThread
>>> [480156.950328] [ 5404] 0 5404 412 2 393216
>>> 64 -1000 sshd
>>> [480156.950357] [ 10518] 0 10518 2586 0 393216
>>> 10 0 at-spi2-registr
>>> [480156.950361] [ 10553] 0 10551 10543 0 458752
>>> 9 0 QXcbEventQueue
>>> [480156.950365] [ 10867] 0 10567 17579 0 589824
>>> 16 0 QXcbEventQueue
>>> [480156.950370] [ 10928] 0 10921 6999 0 458752
>>> 17 0 QXcbEventQueue
>>> [480156.950390] [ 11882] 0 11811 7377 0 458752
>>> 10 0 QXcbEventQueue
>>> [480156.950394] [ 12052] 0 12052 5823 0 458752
>>> 21 0 fcitx
>>> [480156.950404] [ 12115] 0 12114 11678 0 524288
>>> 21 0 QXcbEventQueue
>>> [480156.950408] [ 101558] 0 101558 3549 0 393216
>>> 0 0 runltp
>>> [480156.950486] [1068864] 0 1068864 771 6 327680
>>> 85 -1000 systemd-udevd
>>> [480156.950552] [1035639] 0 1035639 52 0 393216
>>> 14 -1000 oom02
>>> [480156.950556] [1035640] 0 1035640 52 0 393216
>>> 23 -1000 oom02
>>> [480156.950561] [1036065] 0 1036065 493 60 393216
>>> 0 -250 systemd-journal
>>> [480156.950565] [1036087] 0 1036073 6258739 3543942
>>> 37814272 0 0 oom02
>>> [480156.950572] Out of memory and no killable processes...
>>> [480156.950575] Kernel panic - not syncing: System is deadlocked on memory
>>> ```
>>>
>>> oom02-1036073 has been killed before crash.
>>> log:
>>> ```
>>> [480152.242506] [1035177] 0 1035177 4773 20 393216
>>> 115 0 sssd_nss
>>> [480152.242510] [1035376] 0 1035376 25500 391 589824
>>> 602 0 tuned
>>> [480152.242514] [1035639] 0 1035639 52 0 393216
>>> 14 -1000 oom02
>>> [480152.242517] [1035640] 0 1035640 52 0 393216
>>> 19 -1000 oom02
>>> [480152.242522] [1036065] 0 1036065 493 114 393216
>>> 62 -250 systemd-journal
>>> [480152.242525] [1036073] 0 1036073 6258739 3540314 37814272
>>> 104 0 oom02
>>> [480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or
>>> sacrifice child
>>> [480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
>>> anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
>>> [480152.365804] oom_reaper: reaped process 1036073 (oom02), now
>>> anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
>>> ```
>>> but its memory can not be reclaimed.I add trace-log to oom_reaper code
>>> in kernel,
>>> I found that there is a large range vma in the memory that cannot be
>>> reclaimed, and the vma has the `VM_LOCKED` flag, so cannot be reclaimed
>>> immediately.
>>> ```log
>>> oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
>>> vma is anon:1048691, range=65536
>>> oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
>>> vma is anon:1048691, range=196608
>>> oom_reaper-57 [007] .... 126.063582: __oom_reap_task_mm: gh:
>>> vma continue: 1056883, range:3221225472
>>> oom_reaper-57 [007] .... 126.063583: __oom_reap_task_mm: gh:
>>> vma is anon:112, range=65536
>>> oom_reaper-57 [007] .... 126.063584: __oom_reap_task_mm: gh:
>>> vma is anon:1048691, range=8388608
>>> ```
>>> `vma continue: 1056883, range:3221225472` is the memory that can not
>>> reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag
>>>
>>> oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
>>> will merge continuous vma into one,
>>> so as long as one thread is still running, the entire vma will not be
>>> released.
>>>
>>> In extreme cases, crashes may occur due to the lack of memory reclamation.
>>>
>>> My question is that the crash in this case is a normal situation or a
>>> bug (kernel or ltp bug) ?
>>>
>>
>> The ltp-oom test is originally designed to verify OOM mechanism
>> works for memory allocating in three types (normal, mlock, ksm)
>> all as expected.
>>
>> Yes, your analysis is reasonable to some degree, oom_reaper
>> might not reclaim the VMA with locked pages even after the
>> process termination.
>>
>> But the exact behavior of the oom_reaper and the conditions under
>> which it can or cannot reclaim VMAs may vary depending on the
>> specific kernel version and configuration. So we shouldn't simply
>> regard the system panic as a Kernel or LTP defect.
>> And BTW, what is your tested kernel version?
>>
>> hi Li Wang,
>>
>> Thank you for your reply.
>>
>> My kernel version is 4.19, but it's not a community version.
>>
>> I have only encountered the crash once, and most of the time oom_reaper
>> can handle it well.
>>
>> I tried to find a method or flag to prevent vma merging during mmap, but
>> couldn't find it.
>>
> That also might be related to the value of overcommit_memory,
> if we set 2 (strict mode) to it, the oom_reaper can reclaim VM_LOCKED
> memory more aggressively.
>
> But in oom02 as you can see, it is set to 1 (always mode) for the
> whole test, that might be the reason your system can't recover from
> overcommit and finally crashed.
I do a oom02-test according to your suggestion: set overcommit_memory to 2,
most of the time, returning ENOMEM from the mmap() directly, the oom-kill is
only triggered approximately once, and the memory cannot reclaimed
quickly by
oom-reaper.
```
Jun 2 10:24:51 ltptest kernel: [ 71588] 0 71588 792 244
393216 0 0 sshd
Jun 2 10:24:51 ltptest kernel: [ 71590] 0 71590 792 150
393216 0 0 sshd
Jun 2 10:24:51 ltptest kernel: [ 71591] 0 71591 3565 109
393216 0 0 bash
Jun 2 10:24:51 ltptest kernel: [ 72118] 0 72118 3364 17
458752 0 0 sleep
Jun 2 10:24:51 ltptest kernel: [ 72134] 0 72134 3364 17
393216 0 0 tail
Jun 2 10:24:51 ltptest kernel: [ 72157] 0 72157 52 25
393216 0 -1000 oom02
Jun 2 10:24:51 ltptest kernel: [ 72158] 0 72158 52 14
393216 0 -1000 oom02
Jun 2 10:24:51 ltptest kernel: [ 72203] 0 72203 295609 244870
2359296 0 0 oom02
Jun 2 10:24:51 ltptest kernel: Out of memory: Kill process 72203
(oom02) score 373 or sacrifice child
Jun 2 10:24:51 ltptest kernel: Killed process 72203 (oom02)
total-vm:18918976kB, anon-rss:15671680kB, file-rss:0kB, shmem-rss:0kB
Jun 2 10:24:51 ltptest kernel: oom_reaper: reaped process 72203
(oom02), now anon-rss:15681280kB, file-rss:0kB, shmem-rss:0kB
```
--
thanks,
Gou Hao <gouhao@uniontech.com>
More information about the ltp
mailing list