[LTP] [RFC] enable OOM protection for the library and test process?

Jan Stancek jstancek@redhat.com
Mon Dec 13 10:32:23 CET 2021


On Mon, Dec 13, 2021 at 9:04 AM Li Wang <liwang@redhat.com> wrote:
>
> Hi All,
>
> As we observed that oom tests occasionally ended with TBROK (Test killed) on small
> RAM system, the reason seems test process(test_pid) get killed early than the expected
> victim process so that can't report the status correctly.
>
> I'm thinking maybe we can purposely make the OOM ignore test process(test_pid)
> and the main process? (achieve this only in mem library for OOM test)

There are likely more processes that could become unintended targets
(e.g. harness process)
(if we haven't tried already) Could we make expected victim process
more appealing target by tweaking its oom_score/oom_score_adj ?

>
> e.g.
>
> set oom_score_adj to -1000 for pid-305071 and main-process
>
> oom03:
> main ---> tst_run_tcases --> ... --> fork_testrun
>    (pid 305071)    testrun  --> run_tests --> ... --> testoom --> oom()
>             (pid 305072)    child_alloc --> child_alloc_thread --> alloc_mem
>
>
> =============
>
> 3 cmdline="oom03"
> ...
> 10 mem.c:218: TINFO: start normal OOM testing.
> 11 mem.c:140: TINFO: expected victim is 305072.
>
> 12 mem.c:39: TINFO: thread (7fe173d1a700), allocating 3221225472 bytes.
> 13 mem.c:39: TINFO: thread (7fe173d1a700), allocating 3221225472 bytes.
>
> 14 tst_test.c:1410: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> 15 tst_test.c:1411: TBROK: Test killed! (timeout?)
>
> ==========
>
> [ 1117.558867] Tasks state (memory values in pages):
> [ 1117.559373] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> [ 1117.560167] [ 305071]     0 305071     2215       31    61440        4             0 oom03
> [ 1117.560889] [ 305072]     0 305072 1577128 259389 10326016 1019452 0 oom03
> ...
>
> [ 1117.596510] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ltp/test-305071,task_memcg=/ltp/test-305071,task=oom03,pid=305071,uid=0
> [ 1117.597963] Memory cgroup out of memory: Killed process 305071 (oom03) total-vm:8860kB, anon-rss:124kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:60kB oom_score_adj:0
>
> =============
>
> # free -h
>               total        used        free      shared  buff/cache   available
> Mem:          3.6Gi       270Mi       2.3Gi        18Mi       1.1Gi       3.3Gi
> Swap:         4.0Gi          0B       4.0Gi
>
> # lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              2
> On-line CPU(s) list: 0,1
> Thread(s) per core:  1
> Core(s) per socket:  1
> Socket(s):           2
> NUMA node(s):        1
>
>
> --
> Regards,
> Li Wang
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp



More information about the ltp mailing list