[LTP] msgstress03: "Fork failed (may be OK if under stress)" problem observed on qemu.

Kautuk Consul kautuk.consul.80@gmail.com
Fri Jan 21 09:23:50 CET 2022


Hi All,

I am running RISCV kernel on qemu and on executing the msgstress03
testcase I observe that it fails with the following failure
log:
msgstress03    0  TINFO  :  Cannot read session user limits from
'/sys/fs/cgroup/user.slice/user-0.slice/pids.max'
msgstress03    0  TINFO  :  Found limit of processes 10178 (from
/sys/fs/cgroup/pids/user.slice/user-0.slice/pids.max)
msgstress03    0  TINFO  :  Requested number of processes higher than
limit (10000 > 9991), setting to 9991
msgstress03    1  TFAIL  :  msgstress03.c:163:  Fork failed (may be OK
if under stress)

The kernel dmesg log shows the following log:
[ 3731.980951] cgroup: fork rejected by pids controller in
/user.slice/user-0.slice/session-c1.scope

I put some logs into the kernel and confirmed that the cgroup limit of
forks, i.e. 10178 is being exceeded by this msgstress03 testcase due
to which it fails to fork() in a legitimate manner.
On analyzing the msgstress03 testcase code I see that the test-case
tends to assume that the "nprocs" number of forks are done
and it is correctly restricted to the limit which is 9991. However,
the total number of forks is much larger (i.e. 2*nprocs) as the nproc
children do an additional fork within do_test().

Due to this on slower machines (where the children do not execute fast
enough and the parent doesn't do a wait syscall fast
enough) this testcase can/will fail. The initial children may even
reach exit(), but they will remain as defunct as the parent process
wil not necessarily be able to execute the wait() syscall on all them
fast enough to ensure that the pids become free for use.

I made the following changes and the test-case passed:
diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
index 3cb70ab18..75cfc109d 100644
--- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
+++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
@@ -131,7 +131,7 @@ int main(int argc, char **argv)
        /* Set up array of unique keys for use in allocating message
         * queues
         */
-       for (i = 0; i < nprocs; i++) {
+       for (i = 0; i < nprocs/2; i++) {
                ok = 1;
                do {
                        /* Get random key */
@@ -157,7 +157,7 @@ int main(int argc, char **argv)
         * of random length messages with specific values.
         */

-       for (i = 0; i < nprocs; i++) {
+       for (i = 0; i < nprocs/2; i++) {
                fflush(stdout);
                if ((pid = FORK_OR_VFORK()) < 0) {
                        tst_brkm(TFAIL,
@@ -191,11 +191,11 @@ int main(int argc, char **argv)
                }
        }
        /* Make sure proper number of children exited */
-       if (count != nprocs) {
+       if (count != nprocs/2) {
                tst_brkm(TFAIL,
                         NULL,
                         "Wrong number of children exited, Saw %d, Expected %d",
-                        count, nprocs);
+                        count, nprocs/2);
        }

The reason why other test-cases like msgstress04 dont fail is because
the nprocs value is set with a different calculation.
Specifically, I observe that the msgstress04 testcase uses only
free_pids / 2 pids instead of the full free_pids number of processes.

Can someone confirm my findings ? If needed I can also send out a
patch with my above nprocs/2 changes if required.
Or, if there is any better fix or opinion kindly reply back to us.

Thanks and Regards.


More information about the ltp mailing list