[LTP] [PATCH] Terminate leftover subprocesses when main test process crashes
Li Wang
liwang@redhat.com
Fri Feb 11 12:01:39 CET 2022
On Fri, Feb 11, 2022 at 6:34 PM Li Wang <liwang@redhat.com> wrote:
>
>
> On Fri, Feb 11, 2022 at 5:17 PM Martin Doucha <mdoucha@suse.cz> wrote:
>
>> On 11. 02. 22 7:47, Li Wang wrote:
>> > On Fri, Feb 11, 2022 at 12:18 AM Martin Doucha <mdoucha@suse.cz
>> > <mailto:mdoucha@suse.cz>> wrote:
>> > @@ -1560,6 +1568,7 @@ void tst_run_tcases(int argc, char *argv[],
>> > struct tst_test *self)
>> >
>> > SAFE_SIGNAL(SIGALRM, alarm_handler);
>> > SAFE_SIGNAL(SIGUSR1, heartbeat_handler);
>> > + SAFE_SIGNAL(SIGCHLD, sigchild_handler);
>> >
>> >
>> > Do we really need setup this signal handler for SIGCHILD?
>> >
>> > Since we have already called 'SAFE_WAITPID(test_pid, &status, 0)'
>> > in the library process (lib_pid) which rely on SIGCHILD as well.
>> > And even this handler will be called everytime when test exit normally.
>> >
>> > Maybe better just add a kill function to cleanup the remain
>> > descendants if main test process exit with abonormal status.
>> >
>> > e.g.
>> >
>> > --- a/lib/tst_test.c
>> > +++ b/lib/tst_test.c
>> > @@ -1503,6 +1503,8 @@ static int fork_testrun(void)
>> > if (WIFEXITED(status) && WEXITSTATUS(status))
>> > return WEXITSTATUS(status);
>> >
>> > + kill(-test_pid, SIGKILL);
>>
>> This will not work because at this point, the child process was already
>> destroyed by waitpid() and all its remaining children were moved under
>
> PID 1 (init). The only place where the grandchildren are still reachable
>> this way is in SIGCHLD handler while the dead child process still exists
>> in zombie state.
>
>
> Signal communicatoin is asynchronous processing, setup SIGCHILD
> handler can not 100% garantee the libarary process response
> in time as well.
>
> Though the test_pid being moved under PID 1(init), kill(-test_pid, SIGKILL)
> still works well for killing them. That beacuse the dead child process
> still
> exists until kernel recliam its all parent.
>
I give 5 seconds sleep before sending SIGKILL in lib-process
and modified the test_children_cleanup.c to print ppid each 1sec
to verify this:
# ./test_children_cleanup
tst_test.c:1452: TINFO: Timeout per run is 0h 00m 10s
test_children_cleanup.c:20: TINFO: Main process 173236 starting
test_children_cleanup.c:39: TINFO: Forked child 173238
test_children_cleanup.c:33: TINFO: ppid is 173236
test_children_cleanup.c:33: TINFO: ppid is 1
test_children_cleanup.c:33: TINFO: ppid is 1
test_children_cleanup.c:33: TINFO: ppid is 1
test_children_cleanup.c:33: TINFO: ppid is 1
tst_test.c:1502: TINFO: If you are running on slow machine, try exporting
LTP_TIMEOUT_MUL > 1
tst_test.c:1504: TBROK: Test killed! (timeout?)
Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0
=======
--- a/lib/newlib_tests/test_children_cleanup.c
+++ b/lib/newlib_tests/test_children_cleanup.c
@@ -28,7 +28,11 @@ static void run(void)
/* Start child that will outlive the main test process */
if (!child_pid) {
- sleep(30);
+ int i;
+ for (i = 0; i < 30; i++) {
+ tst_res(TINFO, "ppid is %d", getppid());
+ sleep(1);
+ }
return;
}
diff --git a/lib/tst_test.c b/lib/tst_test.c
index 84ce0a5d3..6f2d93611 100644
--- a/lib/tst_test.c
+++ b/lib/tst_test.c
@@ -1503,6 +1503,9 @@ static int fork_testrun(void)
if (WIFEXITED(status) && WEXITSTATUS(status))
return WEXITSTATUS(status);
+ sleep(5);
+ kill(-test_pid, SIGKILL);
+
if (WIFSIGNALED(status) && WTERMSIG(status) == SIGKILL) {
tst_res(TINFO, "If you are running on slow machine, "
"try exporting LTP_TIMEOUT_MUL > 1");
--
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20220211/498062cd/attachment.htm>
More information about the ltp
mailing list