[LTP] [PATCH v2] tst_test: using SIGTERM to terminate process

Thu Sep 16 00:40:28 CEST 2021

Hi Li, all,

> We'd better avoid using SIGINT for process terminating becuasue,
> it has different behavoir on kind of shell.

> From Joerg Vehlow's test:

>  - bash does not seem to care about SIGINT delivery to background
>    processes, but can be blocked using trap

>  - zsh ignores SIGINT for background processes by default, but can be
>    allowed using trap

>  - dash and busybox sh ignore the signal to background processes, and
>    this cannot be changed with trap

> This patch cover the below situations:

>  1. SIGINT (Ctrl^C) for terminating the main process and do cleanup
>     correctly before a timeout

>  2. Test finish normally and retrieves the _tst_timeout_process in the
>     background via SIGTERM(sending by _tst_cleanup_timer)

>  3. Test timed out occurs and _tst_kill_test sending SIGTERM to
>     terminating all process, and the main process do cleanup work

>  4. Test timed out occurs but still have process alive after _tst_kill_test
>     sending SIGTERM, then sending SIGKILL to the whole group

>  5. Test terminated by SIGTERM unexpectly (e.g. system shutdown or process
>     manager) and do cleanup work as well

> Co-authored-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
> Signed-off-by: Li Wang <liwang@redhat.com>
> Reviewed-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
...

> +++ b/testcases/lib/tst_test.sh
> @@ -21,7 +21,8 @@ export TST_LIB_LOADED=1
>  . tst_security.sh

>  # default trap function
> -trap "tst_brk TBROK 'test interrupted or timed out'" INT
> +trap "tst_brk TBROK 'test interrupted'" INT
> +trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

FYI this commit (merged as 4a6b8a697 ("tst_test: using SIGTERM to terminate process"))
broke net_stress_interface tests, particularly tst_require_cmds() call (which
calls tst_brk TCONF:

# ./if-addr-adddel.sh -c ifconfig
if-addr-adddel 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
if-addr-adddel 1 TINFO: add local addr 10.0.0.2/24
if-addr-adddel 1 TINFO: add local addr fd00:1:1:1::2/64
if-addr-adddel 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
if-addr-adddel 1 TINFO: add remote addr 10.0.0.1/24
if-addr-adddel 1 TINFO: add remote addr fd00:1:1:1::1/64
if-addr-adddel 1 TINFO: Network config (local -- remote):
if-addr-adddel 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
if-addr-adddel 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
if-addr-adddel 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
if-addr-adddel 1 TINFO: timeout per run is 0h 5m 0s
if-addr-adddel 1 TCONF: 'ifconfig' not found
=> waits till timeout
if-addr-adddel 1 TBROK: Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
if-addr-adddel 1 TWARN: test terminated

Debugging it hangs in wait in _tst_cleanup_timer():

kill -TERM $_tst_setup_timer_pid 2>/dev/null
wait $_tst_setup_timer_pid 2>/dev/null

because kill does not kill the test.

The problem looks to be that unset actually does not work.
trap "unset _tst_setup_timer_pid; tst_brk TBROK 'test terminated'" TERM

It looks to be something setup specific, because I discovered this on SLES on
both bash and dash. Running it on current Debian testing it works on both bash
and dash. I checked shopt output on both, but don't see anything obvious. It
must be something else.

Kind regards,
Petr

>  _tst_do_exit()
>  {
> @@ -439,9 +440,9 @@ _tst_kill_test()
>  {
>  	local i=10

> -	trap '' INT
> -	tst_res TBROK "Test timeouted, sending SIGINT! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> -	kill -INT -$pid
> +	trap '' TERM
> +	tst_res TBROK "Test timed out, sending SIGTERM! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1"
> +	kill -TERM -$pid
>  	tst_sleep 100ms

>  	while kill -0 $pid >/dev/null 2>&1 && [ $i -gt 0 ]; do