[LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8
Petr Vorel
pvorel@suse.cz
Wed Jan 15 23:59:20 CET 2025
Hi Harshvardhan,
[ Cc cgroups@vger.kernel.org: FYI problem in recent kernel using cgroup v1 ]
> Kind regards,
> Petr
> > Hi there,
> > I saw your name appear the most in the commit log of memcg_stat_rss.sh so I was wondering if you had any information as to why this is happening. I feel that we have enough reason to believe that this is due to outdated testcases. It’ll be highly appreciated if you could verify this fact.
> > Thanks & Regards,
> > Harshvardhan
> > From: ltp <ltp-bounces+harshvardhan.j.jha=oracle.com@lists.linux.it> on behalf of Harshvardhan Jha via ltp <ltp@lists.linux.it>
> > Date: Thursday, 28 November 2024 at 3:20 PM
> > To: ltp@lists.linux.it <ltp@lists.linux.it>
> > Subject: [LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8
> > Hi there,
> > I've been getting test failures on the memcg_stat_rss testcase for
> > mainline 6.12 kernels with 3 tests failing and one being broken.
> > Running tests.......
> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1732003500
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.12.0-master.20241021.el9.v1.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Oct 21
> > 06:24:22 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-Y4AEUmKVIE/LTP_memcg_stat_rss.kEhD0QvvMw as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 9367
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 9367
> > memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 2 TINFO: Running memcg_process --mmap-file -s 4096
> > memcg_stat_rss 2 TINFO: Warming up pid: 9383
> > memcg_stat_rss 2 TINFO: Process is still here after warm up: 9383
> > memcg_stat_rss 2 TPASS: rss is 0 as expected
> > memcg_stat_rss 3 TINFO: Running memcg_process --shm -k 3 -s 4096
> > memcg_stat_rss 3 TINFO: Warming up pid: 9446
> > memcg_stat_rss 3 TINFO: Process is still here after warm up: 9446
> > memcg_stat_rss 3 TPASS: rss is 0 as expected
> > memcg_stat_rss 4 TINFO: Running memcg_process --mmap-anon --mmap-file
> > --shm -s 266240
> > memcg_stat_rss 4 TINFO: Warming up pid: 9462
> > memcg_stat_rss 4 TINFO: Process is still here after warm up: 9462
> > memcg_stat_rss 4 TPASS: rss is 266240 as expected
> > memcg_stat_rss 5 TINFO: Running memcg_process --mmap-lock1 -s 266240
> > memcg_stat_rss 5 TINFO: Warming up pid: 9479
> > memcg_stat_rss 5 TINFO: Process is still here after warm up: 9479
> > memcg_stat_rss 5 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 6 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 6 TINFO: Warming up pid: 9495
> > memcg_stat_rss 6 TINFO: Process is still here after warm up: 9495
> > memcg_stat_rss 6 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 6 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 9495
> > Killed memcg_process "$@" (wd:
> > /sys/fs/cgroup/memory/ltp/test-9308/ltp_9308)
> > Summary:
> > passed 3
> > failed 3
> > broken 1
> > skipped 0
> > warnings 0
> > <<<execution_status>>>
> > initiation_status="ok"
> > duration=17 termination_type=exited termination_id=3 corefile=no
> > cutime=13 cstime=58
> > <<<test_end>>>
> > INFO: ltp-pan reported some tests FAIL
> > LTP Version: 20240930
> > I'm not sure whether this error is due to the kernel or the testcase
> > being outdated. I know that since cgroup v2 is the default upstream and
> > cgroup v1 is now a legacy option, this specific testcase is not
Yes, exactly. I have system with cgroup v1, but it's based on 4.12.14.
Even old Debian VM with old 5.10 uses cgroup v2. Therefore I have no change to
debug the problem.
> > particularly higher in the priority list, but just to be sure, I wanted
> > to verify this from your side. Please let me know whether this error is
> > coming due to the testcase being outdated or this in fact is a valid
> > kernel error.
> > I ran a bisect on memcg_stat_rss test upon mainline kernels and saw the
> > bisect range narrow down between 6.7 and 6.8 which further isolated to:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$<https://urldefense.com/v3/__https:/git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$>
This was a reason to Cc cgroups@vger.kernel.org.
> > This commit was part of a 5 patch series and I wasn't able to revert it
> > on 6.12 without getting a series of conflicts.
> > So, what I did was checkout the SHA before this patch series
> > 4a3bfbd1699e2306731809d50d480634012ed4de and after the patch series
> > 7d7ef0a4686abe43cd76a141b340a348f45ecdf2 and ran this test.
> > The machine had 32GB Ram and 4CPUs.
> > The steps to reproduce this are:
> > #!/bin/bash
> > # After setting default kernel to the desired one
> > if ! grep -q "unified_cgroup_hierarchy=0" /proc/cmdline; then
> > sudo grubby --update-kernel DEFAULT
> > --args="systemd.unified_cgroup_hierarchy=0"
> > sudo grubby --update-kernel DEFAULT
> > --args="systemd.legacy_systemd_cgroup_controller"
> > sudo grubby --update-kernel DEFAULT --args selinux=0
> > sudo sed -i "/^SELINUX=/s/=.*/=disabled/" /etc/selinux/config
> > sudo reboot
> > fi
> > cd /opt/ltp
> > rm -rf /tmpdir
> > mkdir /tempdir
> > ./runltp -d /tempdir -s memcg_stat_rss
Or just:
# PATH="/opt/ltp/testcases/bin:$PATH" memcg_stat_rss.sh
Kind regards,
Petr
> > The results obtained were:
> > Pre bisect culprit (4a3bfbd1699e2306731809d50d480634012ed4de):
> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1731754078
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.7.0-masterpre.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15
> > 11:56:10 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-SzE9ADK6MM/LTP_memcg_stat_rss.6op28sMXO2 as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 34237
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 34237
> > memcg_stat_rss 1 TPASS: rss is 266240 as expected
> > memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 34237
> > Killed memcg_process "$@" (wd:
> > /sys/fs/cgroup/memory/ltp/test-34180/ltp_34180)
> > Summary:
> > passed 1
> > failed 0
> > broken 1
> > skipped 0
> > warnings 0
> > <<<execution_status>>>
> > Post bisect culprit(7d7ef0a4686abe43cd76a141b340a348f45ecdf2):
> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1731755339
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.7.0-masterpost.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov
> > 15 11:55:41 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-G6cge4CkrR/LTP_memcg_stat_rss.1zrm6X02CO as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 9083
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 9083
> > memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 9083
> > Killed memcg_process "$@" (wd:
> > /sys/fs/cgroup/memory/ltp/test-9024/ltp_9024)
> > Summary:
> > passed 0
> > failed 1
> > broken 1
> > skipped 0
> > warnings 0
> > <<<execution_status>>>
> > Thanks & Regards,
> > Harshvardhan
More information about the ltp
mailing list