[LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
Petr Vorel
pvorel@suse.cz
Thu Sep 16 01:01:28 CEST 2021
Hi Li, all,
> Hi Li, all,
[ Cc Cyril and Alexey ]
> > We'd better avoid using SIGINT for process terminating becuasue,
> > it has different behavoir on kind of shell.
> > From Joerg Vehlow's test:
> > - bash does not seem to care about SIGINT delivery to background
> > processes, but can be blocked using trap
> > - zsh ignores SIGINT for background processes by default, but can be
> > allowed using trap
> > - dash and busybox sh ignore the signal to background processes, and
> > this cannot be changed with trap
> > This patch cover the below situations:
> > 1. SIGINT (Ctrl^C) for terminating the main process and do cleanup
> > correctly before a timeout
> > 2. Test finish normally and retrieves the _tst_timeout_process in the
> > background via SIGTERM(sending by _tst_cleanup_timer)
> > 3. Test timed out occurs and _tst_kill_test sending SIGTERM to
> > terminating all process, and the main process do cleanup work
> > 4. Test timed out occurs but still have process alive after _tst_kill_test
> > sending SIGTERM, then sending SIGKILL to the whole group
> > 5. Test terminated by SIGTERM unexpectly (e.g. system shutdown or process
> > manager) and do cleanup work as well
> > Co-authored-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> > Signed-off-by: Li Wang <liwang@redhat.com>
> > Reviewed-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> ...
> > +++ b/testcases/lib/tst_test.sh
> > @@ -21,7 +21,8 @@ export TST_LIB_LOADED=1
> > . tst_security.sh
> > # default trap function
> > -trap "tst_brk TBROK 'test interrupted or timed out'" INT
> > +trap "tst_brk TBROK 'test interrupted'" INT
> > +trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM
> FYI this commit (merged as 4a6b8a697 ("tst_test: using SIGTERM to terminate process"))
> broke net_stress_interface tests, particularly tst_require_cmds() call (which
> calls tst_brk TCONF:
> # ./if-addr-adddel.sh -c ifconfig
> if-addr-adddel 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
> if-addr-adddel 1 TINFO: add local addr 10.0.0.2/24
> if-addr-adddel 1 TINFO: add local addr fd00:1:1:1::2/64
> if-addr-adddel 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
> if-addr-adddel 1 TINFO: add remote addr 10.0.0.1/24
> if-addr-adddel 1 TINFO: add remote addr fd00:1:1:1::1/64
> if-addr-adddel 1 TINFO: Network config (local -- remote):
> if-addr-adddel 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
> if-addr-adddel 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
> if-addr-adddel 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
> if-addr-adddel 1 TINFO: timeout per run is 0h 5m 0s
> if-addr-adddel 1 TCONF: 'ifconfig' not found
> => waits till timeout
> if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
> if-addr-adddel 1 TWARN: test terminated
> Debugging it hangs in wait in _tst_cleanup_timer():
> kill -TERM $_tst_setup_timer_pid 2>/dev/null
> wait $_tst_setup_timer_pid 2>/dev/null
> because kill does not kill the test.
> The problem looks to be that unset actually does not work.
> trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM
> It looks to be something setup specific, because I discovered this on SLES on
> both bash and dash. Running it on current Debian testing it works on both bash
> and dash. I checked shopt output on both, but don't see anything obvious. It
> must be something else.
OK, repeatedly running on Debian with dash I managed to get hang as well:
Here it does not even quit the test:
if-addr-adddel 1 TCONF: 'ifconfig' not found
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TWARN: test terminated
Maybe not only SIGINT, but even SIGTERM is not reliable to background process?
Minimal reproducible example, on Dash needs few runs to hang:
cat > debug.sh <<EOF
#!/bin/sh
TST_SETUP="setup"
TST_TESTFUNC="do_test"
. tst_test.sh
setup()
{
tst_brk TCONF "quit now!"
}
do_test()
{
tst_res TPASS "pass :)"
}
tst_run
EOF
# while true; do ./debug.sh; done
Kind regards,
Petr
> Kind regards,
> Petr
> > _tst_do_exit()
> > {
> > @@ -439,9 +440,9 @@ _tst_kill_test()
> > {
> > local i=10
> > - trap '' INT
> > - tst_res TBROK "Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> > - kill -INT -$pid
> > + trap '' TERM
> > + tst_res TBROK "Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> > + kill -TERM -$pid
> > tst_sleep 100ms
> > while kill -0 $pid >/dev/null 2>&1 && [ $i -gt 0 ]; do
More information about the ltp
mailing list