[LTP] [PATCH] lib: memutils: don't pollute entire system memory to avoid OoM
Krzysztof Kozlowski
krzysztof.kozlowski@canonical.com
Thu Jun 24 17:46:04 CEST 2021
On 24/06/2021 17:34, Krzysztof Kozlowski wrote:
> On 24/06/2021 17:07, Krzysztof Kozlowski wrote:
>>
>> On 24/06/2021 15:33, Martin Doucha wrote:
>>> On 24. 06. 21 15:22, Krzysztof Kozlowski wrote:
>>>> On big memory systems, e.g. 196 GB RAM machine, the ioctl_sg01 test was
>>>> failing because of OoM killer during memory pollution:
>>>>
>>>> ...
>>>>
>>>> It seems leaving hard-coded 128 MB free memory works for small or medium
>>>> systems, but for such bigger machine it creates significant memory
>>>> pressure triggering the out of memory reaper.
>>>>
>>>> The memory pressure usually is defined by ratio between free and total
>>>> memory, so adjust the safety/spare memory similarly to keep always 0.5%
>>>> of memory free.
>>>
>>> Hi,
>>> I've sent a similar patch for the same issue a while ago. It covers a
>>> few more edge cases. See [1] for the discussion about it.
>>>
>>
>> Thanks for the pointer. I see partially we used similar solution -
>> always leave some percentage of free memory.
>>
>> Different kernels might have different limits here, for example v5.11
>> where this happened has two additional restrictions:
>>
>> 1. vm.min_free_kbytes = 90112
>> The min_free_kbytes will grow non-linearly up to 256 MB (still for v5.11).
>>
>> 2. vm.lowmem_reserve_ratio = 256 256 32 0 0
>> Which is a ratio 1/X for specific zones and since it was highmem
>> allocation, it does not matter here (machine has plenty of normal zone
>> memory).
>>
>> Therefore it OoM seems to be caused by min_free_kbytes. The machine has
>> two nodes and the limit looks like to be spread between them:
>>
>> [76578.062366] Node 0 Normal free:44536kB min:44600kB ...
>> [76578.062373] Node 1 Normal free:44824kB min:45060kB ...
>>
>> The rest of free memory is in other zones (11 MB DMA and 380 MB in
>> DMA32), which were not used for this allocation. Therefore to be
>> accurate, the safety limit should process /proc/zoneinfo and count
>> amount of free memory in Normal zone. This 128 MB safety limit should
>> not be counted from total memory, but from Normal zone.
>>
>> But this is much more complex task and simple limit of 0.5% usually does
>> the trick.
>>
>> P.S. For 32-bit systems the Highmem zone should also be included in Normal.
>
> Just to backup this with some numbers:
> MemTotal: 198067420 kB
> MemFree: 109125196 kB => 27 281 299 pages
> MemAvailable: 108425900 kB
>
> Node 1 free pages: 2732177
> Node 0 free pages: 24305662
> 2732177+24305662 = 27037839
> DMA32 free pages: 240511
> DMA free pages: 2949
>
> You can see that MemFree, which is returned by sysinfo, includes DMA32
> and DMA zones which is not valid. Under low memory pressure user-space
> (allocating highmem page) cannot allocate memory from DMA zones and
> normal zones counters are in reality lower and hitting minimal level.
Which brings to the topic that using sysinfo is not reliable in the
first place. It returns free memory, not available memory, even though
man page says otherwise.
It would be better to read /proc/meminfo and use MemAvailable and
subtract swap from it (as MemAvailable takes into account watermarks
/low limit/).
Best regards,
Krzysztof
More information about the ltp
mailing list