[LTP] [PATCH 1/2] syscalls/ioctl: ioctl_sg01.c: ioctl_sg01 invoked oom-killer
Li Wang
liwang@redhat.com
Thu Mar 4 08:52:27 CET 2021
Hi Xinpeng,
[root@test-env-nm05-compute-14e5e72e38 ~]# cat /proc/meminfo
> MemTotal: 526997420 kB
> MemFree: 520224908 kB
> MemAvailable: 519936744 kB
> Buffers: 0 kB
> Cached: 2509036 kB
> SwapCached: 0 kB
> ...
> SwapTotal: 0 kB
> SwapFree: 0 kB
> ...
> CommitLimit: 263498708 kB
> Committed_AS: 10263760 kB
>
> [root@test-env-nm05-compute-14e5e72e38 ~]# cat
> /proc/sys/vm/min_free_kbytes
> 90112
>
After looking back on this problem, I prefer to think the reasons were
caused by lower CommitLimit.
CommitLimit: 263498708 kB < MemAvailable: 519936744 kB
If you try to enable all swap-disk or reset to a high ratio in
overcommit_ratio
to make it larger than MemAvailable, probably no OOM occurs anymore.
Btw, I also observed that ioctl_sg01 almost being killed by OOM
every time on an aarch64 with little swap space, but if I add more
swap or set a high value of overcommit_ratio, the problem is gone.
(I manually tried with another x86_64 to confirm this too)
total used free shared buff/cache available
Mem: 259828 5365 247383 68 7079 231296
Swap: 4095 55 4040
---
MemTotal: 266063872 kB
MemFree: 253320768 kB
MemAvailable: 236848064 kB
Buffers: 1472 kB
Cached: 6755456 kB
SwapCached: 12160 kB
...
CommitLimit: 137226176 kB
Committed_AS: 1206912 kB
---
The previous method in the patch[1] seems not good enough, but that can
help to verify if OOM disappears when resetting the overcommit_ratio.
[1] http://lists.linux.it/pipermail/ltp/2021-February/020907.html
Hence, another improvement way based on the above is to allocate proper
memory-size according to CommitLimit value when detecting the value of
CommitLimit is less than MemAvailable. That will make the test happy with
a little swap-space size system.
Any thoughts, or comments?
--- a/lib/tst_memutils.c
+++ b/lib/tst_memutils.c
@@ -36,6 +36,13 @@ void tst_pollute_memory(size_t maxsize, int fillchar)
if (info.freeram - safety < maxsize / info.mem_unit)
maxsize = (info.freeram - safety) * info.mem_unit;
+ /*
+ * To respect CommitLimit to prevent test invoking OOM killer,
+ * this may appear on system with a smaller swap-disk (or disabled).
+ */
+ if (SAFE_READ_MEMINFO("CommitLimit:") <
SAFE_READ_MEMINFO("MemAvailable:"))
+ maxsize = SAFE_READ_MEMINFO("CommitLimit:") * 1024 -
(safety * info.mem_unit);
+
blocksize = MIN(maxsize, blocksize);
map_count = maxsize / blocksize;
map_blocks = SAFE_MALLOC(map_count * sizeof(void *));
========================
About the MemAvailable < MemFree, I think that is correct behavior on
your system and not the OOM root-cause.
Generally, we assumed the MemAvailable higher than MemFree,
but we sometimes also allow situations to break that. We'd better
count all of the different free watermarks from /proc/zoneinfo, then
add the sum of the low watermarks to MemAvailable, if get a value
larger than MemFree, that should be OK from my perspective.
-----
# echo 675840 > /proc/sys/vm/min_free_kbytes
# cat /proc/meminfo |grep -i mem
MemTotal: 5888584 kB
MemFree: 4518064 kB
MemAvailable: 3692008 kB
Shmem: 21128 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
# cat /proc/zoneinfo |grep low -B 3
...
pages free 3840
min 440
low 550
--
Node 0, zone DMA32
pages free 355602
min 79706
low 99632
--
Node 0, zone Normal
pages free 0
min 0
low 0
--
Node 0, zone Movable
pages free 0
min 0
low 0
--
Node 0, zone Device
pages free 0
min 0
low 0
--
Node 1, zone DMA
pages free 0
min 0
low 0
--
Node 1, zone DMA32
pages free 0
min 0
low 0
--
nr_kernel_misc_reclaimable 0
pages free 769192
min 88812
low 111015
(111015+99632+550)*4 + 3692008(MemAvailable) > 5888584(MemFree)
Btw the formula to count MemAvailable is:
available = MemFree - totalreserve_pages + pages[LRU_ACTIVE_FILE] +
pages[LRU_INACTIVE_FILE] - min(pagecache / 2, wmark_low)
--
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20210304/07089c65/attachment-0001.htm>
More information about the ltp
mailing list