[LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8

Petr Vorel pvorel@suse.cz
Wed Jan 15 23:59:20 CET 2025


Hi Harshvardhan,

[ Cc cgroups@vger.kernel.org: FYI problem in recent kernel using cgroup v1 ]

> Kind regards,
> Petr

> > Hi there,
> > I saw your name appear the most in the commit log of memcg_stat_rss.sh so I was wondering if you had any information as to why this is happening. I feel that we have enough reason to believe that this is due to outdated testcases. It’ll be highly appreciated if you could verify this fact.

> > Thanks & Regards,
> > Harshvardhan

> > From: ltp <ltp-bounces+harshvardhan.j.jha=oracle.com@lists.linux.it> on behalf of Harshvardhan Jha via ltp <ltp@lists.linux.it>
> > Date: Thursday, 28 November 2024 at 3:20 PM
> > To: ltp@lists.linux.it <ltp@lists.linux.it>
> > Subject: [LTP] Issue faced in memcg_stat_rss while running mainline kernels between 6.7 and 6.8
> > Hi there,

> > I've been getting test failures on the memcg_stat_rss testcase for
> > mainline 6.12 kernels with 3 tests failing and one being broken.

> > Running tests.......
> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1732003500
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.12.0-master.20241021.el9.v1.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Oct 21
> > 06:24:22 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-Y4AEUmKVIE/LTP_memcg_stat_rss.kEhD0QvvMw as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 9367
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 9367
> > memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 2 TINFO: Running memcg_process --mmap-file -s 4096
> > memcg_stat_rss 2 TINFO: Warming up pid: 9383
> > memcg_stat_rss 2 TINFO: Process is still here after warm up: 9383
> > memcg_stat_rss 2 TPASS: rss is 0 as expected
> > memcg_stat_rss 3 TINFO: Running memcg_process --shm -k 3 -s 4096
> > memcg_stat_rss 3 TINFO: Warming up pid: 9446
> > memcg_stat_rss 3 TINFO: Process is still here after warm up: 9446
> > memcg_stat_rss 3 TPASS: rss is 0 as expected
> > memcg_stat_rss 4 TINFO: Running memcg_process --mmap-anon --mmap-file
> > --shm -s 266240
> > memcg_stat_rss 4 TINFO: Warming up pid: 9462
> > memcg_stat_rss 4 TINFO: Process is still here after warm up: 9462
> > memcg_stat_rss 4 TPASS: rss is 266240 as expected
> > memcg_stat_rss 5 TINFO: Running memcg_process --mmap-lock1 -s 266240
> > memcg_stat_rss 5 TINFO: Warming up pid: 9479
> > memcg_stat_rss 5 TINFO: Process is still here after warm up: 9479
> > memcg_stat_rss 5 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 6 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 6 TINFO: Warming up pid: 9495
> > memcg_stat_rss 6 TINFO: Process is still here after warm up: 9495
> > memcg_stat_rss 6 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 6 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158:  9495
> > Killed                  memcg_process "$@"  (wd:
> > /sys/fs/cgroup/memory/ltp/test-9308/ltp_9308)

> > Summary:
> > passed   3
> > failed   3
> > broken   1
> > skipped  0
> > warnings 0
> > <<<execution_status>>>
> > initiation_status="ok"
> > duration=17 termination_type=exited termination_id=3 corefile=no
> > cutime=13 cstime=58
> > <<<test_end>>>
> > INFO: ltp-pan reported some tests FAIL
> > LTP Version: 20240930

> > I'm not sure whether this error is due to the kernel or the testcase
> > being outdated. I know that since cgroup v2 is the default upstream and
> > cgroup v1 is now a legacy option, this specific testcase is not

Yes, exactly. I have system with cgroup v1, but it's based on 4.12.14.
Even old Debian VM with old 5.10 uses cgroup v2. Therefore I have no change to
debug the problem.

> > particularly higher in the priority list, but just to be sure, I wanted
> > to verify this from your side. Please let me know whether this error is
> > coming due to the testcase being outdated or this in fact is a valid
> > kernel error.

> > I ran a bisect on memcg_stat_rss test upon mainline kernels and saw the
> > bisect range narrow down between 6.7 and 6.8 which further isolated to:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$<https://urldefense.com/v3/__https:/git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7d7ef0a4686abe43cd76a141b340a348f45ecdf2__;!!ACWV5N9M2RV99hQ!Ky0mM2XEGFSiCbcBvjP5FV5IV3kGpDuDEhuFVAGVdD1mXLQPidRcZLqH8k0AFxScjZgYnjCgaCISEgDVlcn4BSoj$>

This was a reason to Cc cgroups@vger.kernel.org.

> > This commit was part of a 5 patch series and I wasn't able to revert it
> > on 6.12 without getting a series of conflicts.
> > So, what I did was checkout the SHA before this patch series
> > 4a3bfbd1699e2306731809d50d480634012ed4de and after the patch series
> > 7d7ef0a4686abe43cd76a141b340a348f45ecdf2 and ran this test.

> > The machine had 32GB Ram and 4CPUs.

> > The steps to reproduce this are:

> > #!/bin/bash

> > # After setting default kernel to the desired one
> > if ! grep -q "unified_cgroup_hierarchy=0" /proc/cmdline; then
> >         sudo grubby --update-kernel DEFAULT
> > --args="systemd.unified_cgroup_hierarchy=0"
> >         sudo grubby --update-kernel DEFAULT
> > --args="systemd.legacy_systemd_cgroup_controller"
> >         sudo grubby --update-kernel DEFAULT --args selinux=0
> >         sudo sed -i "/^SELINUX=/s/=.*/=disabled/" /etc/selinux/config
> >         sudo reboot
> > fi

> > cd /opt/ltp
> > rm -rf /tmpdir
> > mkdir /tempdir
> > ./runltp -d /tempdir  -s memcg_stat_rss

Or just:

# PATH="/opt/ltp/testcases/bin:$PATH" memcg_stat_rss.sh

Kind regards,
Petr

> > The results obtained were:

> > Pre bisect culprit (4a3bfbd1699e2306731809d50d480634012ed4de):

> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1731754078
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.7.0-masterpre.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15
> > 11:56:10 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-SzE9ADK6MM/LTP_memcg_stat_rss.6op28sMXO2 as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 34237
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 34237
> > memcg_stat_rss 1 TPASS: rss is 266240 as expected
> > memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158: 34237
> > Killed                  memcg_process "$@"  (wd:
> > /sys/fs/cgroup/memory/ltp/test-34180/ltp_34180)

> > Summary:
> > passed   1
> > failed   0
> > broken   1
> > skipped  0
> > warnings 0
> > <<<execution_status>>>


> > Post bisect culprit(7d7ef0a4686abe43cd76a141b340a348f45ecdf2):

> > <<<test_start>>>
> > tag=memcg_stat_rss stime=1731755339
> > cmdline="memcg_stat_rss.sh"
> > contacts=""
> > analysis=exit
> > <<<test_output>>>
> > incrementing stop
> > memcg_stat_rss 1 TINFO: Running: memcg_stat_rss.sh
> > memcg_stat_rss 1 TINFO: Tested kernel: Linux harjha-ol9kdevltp
> > 6.7.0-masterpost.2024111.el9.rc1.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov
> > 15 11:55:41 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
> > memcg_stat_rss 1 TINFO: Using
> > /tempdir/ltp-G6cge4CkrR/LTP_memcg_stat_rss.1zrm6X02CO as tmpdir (xfs
> > filesystem)
> > memcg_stat_rss 1 TINFO: timeout per run is 0h 5m 0s
> > memcg_stat_rss 1 TINFO: set /sys/fs/cgroup/memory/memory.use_hierarchy
> > to 0 failed
> > memcg_stat_rss 1 TINFO: Setting shmmax
> > memcg_stat_rss 1 TINFO: Running memcg_process --mmap-anon -s 266240
> > memcg_stat_rss 1 TINFO: Warming up pid: 9083
> > memcg_stat_rss 1 TINFO: Process is still here after warm up: 9083
> > memcg_stat_rss 1 TFAIL: rss is 0, 266240 expected
> > memcg_stat_rss 1 TBROK: timed out on memory.usage_in_bytes 4096 266240
> > 266240
> > /opt/ltp-20240930/testcases/bin/tst_test.sh: line 158:  9083
> > Killed                  memcg_process "$@"  (wd:
> > /sys/fs/cgroup/memory/ltp/test-9024/ltp_9024)

> > Summary:
> > passed   0
> > failed   1
> > broken   1
> > skipped  0
> > warnings 0
> > <<<execution_status>>>

> > Thanks & Regards,
> > Harshvardhan


More information about the ltp mailing list