[LTP] [RFC] enable OOM protection for the library and test process?

Mon Dec 13 17:06:47 CET 2021

On 13. 12. 21 10:32, Jan Stancek wrote:
> On Mon, Dec 13, 2021 at 9:04 AM Li Wang <liwang@redhat.com> wrote:
>>
>> Hi All,
>>
>> As we observed that oom tests occasionally ended with TBROK (Test killed) on small
>> RAM system, the reason seems test process(test_pid) get killed early than the expected
>> victim process so that can't report the status correctly.
>>
>> I'm thinking maybe we can purposely make the OOM ignore test process(test_pid)
>> and the main process? (achieve this only in mem library for OOM test)
> 
> There are likely more processes that could become unintended targets
> (e.g. harness process)
> (if we haven't tried already) Could we make expected victim process
> more appealing target by tweaking its oom_score/oom_score_adj ?

I'm afraid it won't be that easy. The main cause of OOM killer going
postal and killing processes with tiny memory footprint is that a
process executing the mlock() syscall cannot be targeted by OOM killer
at all. That's a known issue in the kernel with no easy fix.

You can protect the main test process using oom_score_adj but chances
are that OOM killer will just kill PID 1 (kernel panic), or find no
killable process left (also kernel panic).

Protecting the test harness is a bad idea because oom_score_adj is
inherited by child processes and it'll affect other tests as well. Given
the nature of OOM tests, I'd rather not assume that the protection will
be properly removed at the end.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic