[LTP] 72b1728674 causing regressions [ [PATCH v2] Terminate leftover subprocesses when main test process crashes]
Petr Vorel
pvorel@suse.cz
Fri Feb 18 13:30:21 CET 2022
Hi all,
> On Fri, Feb 11, 2022 at 9:30 PM Martin Doucha <mdoucha@suse.cz> wrote:
> > On 11. 02. 22 13:55, Cyril Hrubis wrote:
> > > Hi!
> > >> --- a/lib/tst_test.c
> > >> +++ b/lib/tst_test.c
> > >> @@ -1495,6 +1495,9 @@ static int fork_testrun(void)
> > >> return TFAIL;
> > >> }
> > >> + if (tst_test->forks_child)
> > >> + kill(-test_pid, SIGKILL);
FYI This broke all LTP network tests which use netstress.c binary,
they now randomly fails after "tst_test.c:1499: TINFO: Killed the leftover descendant processes"
I was thinking whether it's not actually kernel bug which is now visible,
but the behavior is the same on various kernels: SLES 5.14, openSUSE 5.16.8,
older Debian 5.3. and different VM setup (but disabled firewall, also randomly
failing means it's not a firewall issue).
Not sure now whether netstress.c should be altered or we should add flag to the
API to not run this cleanup.
DEBUGGING:
The reason is hidden, because netstress.c output is redirected and printed only
on error.
Sometimes it's just a warning:
# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TWARN: netstress failed, ret: 2
tcp_ipsec 1 TPASS: netstress passed, median time 4 ms, data: 4 5 4 4
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 6 6 4 5 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 9 ms, data: 11 10 9 9 9
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.Qn3NINBzja'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 8 ms, data: 8 8 8 9 7
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none
# ./tcp_ipsec.sh -s 100:1000:65535:R65535
...
tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 1 TPASS: netstress passed, median time 6 ms, data: 5 5 6 6 6
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TWARN: netstress failed, ret: 2
tcp_ipsec 2 TPASS: netstress passed, median time 5 ms, data: 4 6 5 5
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TPASS: netstress passed, median time 10 ms, data: 10 10 8 9 10
tcp_ipsec 4 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.4I7mEMaCeK'
tcp_ipsec 4 TINFO: run client 'netstress -l -H 10.0.0.1 -A 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 4 TPASS: netstress passed, median time 11 ms, data: 12 11 11 11 11
tcp_ipsec 5 TINFO: AppArmor enabled, this may affect test results
tcp_ipsec 5 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
tcp_ipsec 5 TINFO: loaded AppArmor profiles: none
Sometimes it's a hard failure, where we at least see the log:
tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 4 7 4 8 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TPASS: netstress passed, median time 6 ms, data: 4 6 6 4 6
tcp_ipsec 3 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.rEORDqdaS6'
tcp_ipsec 3 TINFO: run client 'netstress -l -H 10.0.0.1 -n 65535 -N 65535 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 3 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '33985'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:902: TINFO: client msg size: 65535
netstress.c:903: TINFO: server msg size: 65535
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:33985, 16) failed: ECONNREFUSED (111)
But with patch below it shows that server process is killed:
tcp_ipsec 1 TPASS: netstress passed, median time 5 ms, data: 6 5 5 4 5
tcp_ipsec 2 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.DId6DBCQ2W'
tcp_ipsec 2 TINFO: run client 'netstress -l -H 10.0.0.1 -n 1000 -N 1000 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times
tcp_ipsec 2 TINFO: ===== 1: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36103
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes
=> HERE netstress server process is killed after TPASS
Summary:
passed 0
failed 0
broken 0
skipped 0
warnings 0
---
tcp_ipsec 2 TWARN: netstress failed, ret: 2
=> causing TWARN for client.
And hard failure:
tcp_ipsec 4 TINFO: ===== 5: remote netstress, ret: 0, cat tst_netload.log =====
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:923: TINFO: max requests '10'
netstress.c:947: TINFO: TCP server is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:678: TINFO: assigning a name to the server socket...
netstress.c:685: TINFO: bind to port 36709
netstress.c:706: TINFO: Listen on the socket '5'
tst_test.c:1499: TINFO: Killed the leftover descendant processes
Summary:
passed 0
failed 0
broken 0
skipped 0
warnings 0
---
tcp_ipsec 4 TWARN: netstress failed, ret: 2
netstress.c:642: TBROK: Server closed
tst_test.c:1457: TINFO: Timeout per run is 0h 05m 00s
netstress.c:874: TINFO: rand start seed 0xff9e
netstress.c:895: TINFO: connection: addr '10.0.0.1', port '36709'
netstress.c:896: TINFO: client max req: 100
netstress.c:897: TINFO: clients num: 2
netstress.c:900: TINFO: random msg size [5 65530]
netstress.c:817: TINFO: tcp_tw_reuse is already set
netstress.c:947: TINFO: TCP client is using old TCP API.
netstress.c:789: TINFO: '/proc/sys/net/ipv4/tcp_fastopen' is 1
netstress.c:476: TINFO: Running the test over IPv4
netstress.c:344: TBROK: connect(4, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)
netstress.c:344: TBROK: connect(3, 10.0.0.1:36709, 16) failed: ECONNREFUSED (111)
Summary:
passed 0
failed 0
broken 2
skipped 0
warnings 0
tcp_ipsec 4 TFAIL: expected 'pass' but ret: '2'
Kind regards,
Petr
+++ testcases/lib/tst_net.sh
@@ -728,6 +728,10 @@ tst_netload()
for i in $(seq 1 $run_cnt); do
tst_rhost_run -c "netstress $s_opts" > tst_netload.log 2>&1
+ tst_res_ TINFO "===== $i: remote netstress, ret: $ret, cat tst_netload.log ====="
+ cat tst_netload.log
+ printf -- "---\n\n"
+
if [ $? -ne 0 ]; then
cat tst_netload.log
local ttype="TFAIL"
More information about the ltp
mailing list