[LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8
Petr Vorel
pvorel@suse.cz
Wed Jan 15 13:52:41 CET 2025
Hi Harshvardhan,
We run mainline stable kernel Tumbleweed on v2, thus we have
TCONF: memory controller mounted on cgroup v2 hierarchy, skipping test.
I try to have look today or tomorrow on it. I wonder if I even find some system
with cgroup v1.
@Li: maybe you had an idea what's wrong.
Kind regards,
Petr
> Hi there,
> I saw your name appear the most in the commit log of memcg_stat_rss.sh so I was wondering if you had any information as to why this is happening. I feel that we have enough reason to believe that this is due to outdated testcases. It’ll be highly appreciated if you could verify this fact.
> Thanks & Regards,
> Harshvardhan
> From: ltp <ltp-bounces+harshvardhan.j.jha=oracle.com@lists.linux.it> on behalf of Harshvardhan Jha via ltp <ltp@lists.linux.it>
> Date: Thursday, 28 November 2024 at 3:20 PM
> To: ltp@lists.linux.it <ltp@lists.linux.it>
> Subject: [LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8
> Hi there,
> I've been getting test failures on the memcg_stat_rss testcase for
> mainline 6.12 kernels with 3 tests failing and one being broken.
> Running tests.......
> <<<test_start>>>
> tag=memcg_stat_rss stime=1732003500
> cmdline="memcg_stat_rss.sh"
> contacts=""
> analysis=exit
> <<<test_output>>>
> incrementing stop
> memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> 6.12.0-master.20241021.el9.v1.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Oct 21
> 06:24:22 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> memcg_stat_rss 1 TINFO: Using
> /tempdir/ltp-Y4AEUmKVIE/LTP_memcg_stat_rss.kEhD0QvvMw as tmpdir (xfs
> filesystem)
> memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> to 0 failed
> memcg_stat_rss 1 TINFO: Setting shmmax
> memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> memcg_stat_rss 1 TINFO: Warming up pid: 9367
> memcg_stat_rss 1 TINFO: Process is still here after warm up: 9367
> memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> memcg_stat_rss 2 TINFO: Running memcg_process --mmap-file -s 4096
> memcg_stat_rss 2 TINFO: Warming up pid: 9383
> memcg_stat_rss 2 TINFO: Process is still here after warm up: 9383
> memcg_stat_rss 2 TPASS: rss is 0 as expected
> memcg_stat_rss 3 TINFO: Running memcg_process --shm -k 3 -s 4096
> memcg_stat_rss 3 TINFO: Warming up pid: 9446
> memcg_stat_rss 3 TINFO: Process is still here after warm up: 9446
> memcg_stat_rss 3 TPASS: rss is 0 as expected
> memcg_stat_rss 4 TINFO: Running memcg_process --mmap-anon --mmap-file
> --shm -s 266240
> memcg_stat_rss 4 TINFO: Warming up pid: 9462
> memcg_stat_rss 4 TINFO: Process is still here after warm up: 9462
> memcg_stat_rss 4 TPASS: rss is 266240 as expected
> memcg_stat_rss 5 TINFO: Running memcg_process --mmap-lock1 -s 266240
> memcg_stat_rss 5 TINFO: Warming up pid: 9479
> memcg_stat_rss 5 TINFO: Process is still here after warm up: 9479
> memcg_stat_rss 5 TFAIL: rss is 0, 266240 expected
> memcg_stat_rss 6 TINFO: Running memcg_process --mmap-anon -s 266240
> memcg_stat_rss 6 TINFO: Warming up pid: 9495
> memcg_stat_rss 6 TINFO: Process is still here after warm up: 9495
> memcg_stat_rss 6 TFAIL: rss is 0, 266240 expected
> memcg_stat_rss 6 TBROK: timed out on memory.usage_in_bytes 4096 266240
> 266240
> /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 9495
> Killed memcg_process "$@" (wd:
> /sys/fs/cgroup/memory/ltp/test-9308/ltp_9308)
> Summary:
> passed 3
> failed 3
> broken 1
> skipped 0
> warnings 0
> <<<execution_status>>>
> initiation_status="ok"
> duration=17 termination_type=exited termination_id=3 corefile=no
> cutime=13 cstime=58
> <<<test_end>>>
> INFO: ltp-pan reported some tests FAIL
> LTP Version: 20240930
> I'm not sure whether this error is due to the kernel or the testcase
> being outdated. I know that since cgroup v2 is the default upstream and
> cgroup v1 is now a legacy option, this specific testcase is not
> particularly higher in the priority list, but just to be sure, I wanted
> to verify this from your side. Please let me know whether this error is
> coming due to the testcase being outdated or this in fact is a valid
> kernel error.
> I ran a bisect on memcg_stat_rss test upon mainline kernels and saw the
> bisect range narrow down between 6.7 and 6.8 which further isolated to:
> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$<https://urldefense.com/v3/__https:/git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$>
> This commit was part of a 5 patch series and I wasn't able to revert it
> on 6.12 without getting a series of conflicts.
> So, what I did was checkout the SHA before this patch series
> 4a3bfbd1699e2306731809d50d480634012ed4de and after the patch series
> 7d7ef0a4686abe43cd76a141b340a348f45ecdf2 and ran this test.
> The machine had 32GB Ram and 4CPUs.
> The steps to reproduce this are:
> #!/bin/bash
> # After setting default kernel to the desired one
> if ! grep -q "unified_cgroup_hierarchy=0" /proc/cmdline; then
> sudo grubby --update-kernel DEFAULT
> --args="systemd.unified_cgroup_hierarchy=0"
> sudo grubby --update-kernel DEFAULT
> --args="systemd.legacy_systemd_cgroup_controller"
> sudo grubby --update-kernel DEFAULT --args selinux=0
> sudo sed -i "/^SELINUX=/s/=.*/=disabled/" /etc/selinux/config
> sudo reboot
> fi
> cd /opt/ltp
> rm -rf /tmpdir
> mkdir /tempdir
> ./runltp -d /tempdir -s memcg_stat_rss
> The results obtained were:
> Pre bisect culprit (4a3bfbd1699e2306731809d50d480634012ed4de):
> <<<test_start>>>
> tag=memcg_stat_rss stime=1731754078
> cmdline="memcg_stat_rss.sh"
> contacts=""
> analysis=exit
> <<<test_output>>>
> incrementing stop
> memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> 6.7.0-masterpre.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15
> 11:56:10 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> memcg_stat_rss 1 TINFO: Using
> /tempdir/ltp-SzE9ADK6MM/LTP_memcg_stat_rss.6op28sMXO2 as tmpdir (xfs
> filesystem)
> memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> to 0 failed
> memcg_stat_rss 1 TINFO: Setting shmmax
> memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> memcg_stat_rss 1 TINFO: Warming up pid: 34237
> memcg_stat_rss 1 TINFO: Process is still here after warm up: 34237
> memcg_stat_rss 1 TPASS: rss is 266240 as expected
> memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> 266240
> /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 34237
> Killed memcg_process "$@" (wd:
> /sys/fs/cgroup/memory/ltp/test-34180/ltp_34180)
> Summary:
> passed 1
> failed 0
> broken 1
> skipped 0
> warnings 0
> <<<execution_status>>>
> Post bisect culprit(7d7ef0a4686abe43cd76a141b340a348f45ecdf2):
> <<<test_start>>>
> tag=memcg_stat_rss stime=1731755339
> cmdline="memcg_stat_rss.sh"
> contacts=""
> analysis=exit
> <<<test_output>>>
> incrementing stop
> memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> 6.7.0-masterpost.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov
> 15 11:55:41 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> memcg_stat_rss 1 TINFO: Using
> /tempdir/ltp-G6cge4CkrR/LTP_memcg_stat_rss.1zrm6X02CO as tmpdir (xfs
> filesystem)
> memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> to 0 failed
> memcg_stat_rss 1 TINFO: Setting shmmax
> memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> memcg_stat_rss 1 TINFO: Warming up pid: 9083
> memcg_stat_rss 1 TINFO: Process is still here after warm up: 9083
> memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> 266240
> /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 9083
> Killed memcg_process "$@" (wd:
> /sys/fs/cgroup/memory/ltp/test-9024/ltp_9024)
> Summary:
> passed 0
> failed 1
> broken 1
> skipped 0
> warnings 0
> <<<execution_status>>>
> Thanks & Regards,
> Harshvardhan
More information about the ltp
mailing list