[LTP] [PATCH] [RFC] lib: shell: Fix timeout process races
Petr Vorel
pvorel@suse.cz
Mon Sep 20 09:52:57 CEST 2021
Hi Cyril,
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
> There were actually several races in the shell library timeout handling.
> This commit fixes hopefully all of them by:
> * Reimplementing the backgroud timer in C
+1
> * Making sure that the timer has started before we leave test setup
+1
> The rewrite of the backround timer to C allows us to run all the timeout
> logic in a single process, which simplifies the whole problem greatly
> since previously we had chain of processes that estabilished signal
> handlers to kill it's descendants, which in the end had a few races in
> it.
> The race that caused the problems is, as far as I can tell, in the way
> how shell spawns it's children. I haven't checked the shell code, but I
> guess that when shell runs a process in bacground it does vfork() +
> exec() and because signals are ignored during the operation. If the
> SIGTERM arrives at that point it gets lost.
> That means that we created a race window in the shell library each time
> we started a new process. The rewrite to C simplifies the code but we
> still end up with a single place where this can happen and that is when
> we execute the tst_timeout_kill binary. This is now fixed in the shell
> library by waiting until the background process gets to a sleep state,
> which means that the proces has been executed and waiting for the
> timeout.
> After these fixes I haven't been able to reproduce the hang with:
> cat > debug.sh <<EOF
> #!/bin/sh
> TST_SETUP="setup"
> TST_TESTFUNC="do_test"
> . tst_test.sh
> setup()
> {
> tst_brk TCONF "quit now!"
> }
> do_test()
> {
> tst_res TPASS "pass :)"
> }
> tst_run
> EOF
> # while true; do ./debug.sh; done
I verified it's ok on both VM which were previously affected.
After release I might write a test for tst_timeout_kill.c.
Thanks for fixing it!
Kind regards,
Petr
More information about the ltp
mailing list