[LTP] [RFC] enable OOM protection for the library and test process?
Jan Stancek
jstancek@redhat.com
Mon Dec 13 10:32:23 CET 2021
On Mon, Dec 13, 2021 at 9:04 AM Li Wang <liwang@redhat.com> wrote:
>
> Hi All,
>
> As we observed that oom tests occasionally ended with TBROK (Test killed) on small
> RAM system, the reason seems test process(test_pid) get killed early than the expected
> victim process so that can't report the status correctly.
>
> I'm thinking maybe we can purposely make the OOM ignore test process(test_pid)
> and the main process? (achieve this only in mem library for OOM test)
There are likely more processes that could become unintended targets
(e.g. harness process)
(if we haven't tried already) Could we make expected victim process
more appealing target by tweaking its oom_score/oom_score_adj ?
>
> e.g.
>
> set oom_score_adj to -1000 for pid-305071 and main-process
>
> oom03:
> main ---> tst_run_tcases --> ... --> fork_testrun
> (pid 305071) testrun --> run_tests --> ... --> testoom --> oom()
> (pid 305072) child_alloc --> child_alloc_thread --> alloc_mem
>
>
> =============
>
> 3 cmdline="oom03"
> ...
> 10 mem.c:218: TINFO: start normal OOM testing.
> 11 mem.c:140: TINFO: expected victim is 305072.
>
> 12 mem.c:39: TINFO: thread (7fe173d1a700), allocating 3221225472 bytes.
> 13 mem.c:39: TINFO: thread (7fe173d1a700), allocating 3221225472 bytes.
>
> 14 tst_test.c:1410: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> 15 tst_test.c:1411: TBROK: Test killed! (timeout?)
>
> ==========
>
> [ 1117.558867] Tasks state (memory values in pages):
> [ 1117.559373] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
> [ 1117.560167] [ 305071] 0 305071 2215 31 61440 4 0 oom03
> [ 1117.560889] [ 305072] 0 305072 1577128 259389 10326016 1019452 0 oom03
> ...
>
> [ 1117.596510] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp/test-305071,task_memcg=/ltp/test-305071,task=oom03,pid=305071,uid=0
> [ 1117.597963] Memory cgroup out of memory: Killed process 305071 (oom03) total-vm:8860kB, anon-rss:124kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
>
> =============
>
> # free -h
> total used free shared buff/cache available
> Mem: 3.6Gi 270Mi 2.3Gi 18Mi 1.1Gi 3.3Gi
> Swap: 4.0Gi 0B 4.0Gi
>
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 2
> On-line CPU(s) list: 0,1
> Thread(s) per core: 1
> Core(s) per socket: 1
> Socket(s): 2
> NUMA node(s): 1
>
>
> --
> Regards,
> Li Wang
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
More information about the ltp
mailing list