[LTP] [PATCH v2] tst_test: using SIGTERM to terminate process

Cyril Hrubis chrubis@suse.cz
Fri Sep 17 12:59:55 CEST 2021


Hi!
> > > I managed to reproduce this in dash. I bet that this is a bug where
> > > signal handler inside dash is temporarily disabled when we install the
> > > trap and if we manage to hit that window the signal is discarded. At
> > > least that is my working theory. After I've installed debug prints, in
> > > the cases where it hangs the signal was sent just before have installed
> > > the trap. And in some cases when the signal arrives the timer process is
> > > killed but the trap is not invoked. So it really looks like signal
> > > handling in dash is simply broken. Not sure what we can do about bugs
> > > like this apart from switching to a real programming language.
> Which version of bash and dash are you testing on?
> 
> > Thanks for the debugging. *bash* is also affected, at least some releases.
> > I reproduced it also on some older SLES, with bash 4.4.
> dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.
> 
> I tested it on various my VM:
> 
> dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)
> 
> bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> (Debian), 4.2.46-34 (CentOS)
> 
> I have no idea what it causes, whether really some bash and dash versions are
> buggy or it's reproducible only on certain environment.

bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> Any tip what to search for?

Not really, apart from reading source code and figuring out exactly what
happens here.

I guess that we can make things more predictable and easier to read by
shifting parts of the shell library to a C process.

For instance if we wrote an utility that would implement all the
tst_kill_test and timeout process in C we would simplify things greatly.

Something as:

tst_timeout_kill

that would be used as:


tst_timeout_kill 300 12342
                  ^    ^
		  |    process group leader pid
		  timeout in seconds


That would implement both the loop for killing tests and timeout
processing as well. That way we would get rid of the trap in the
subshell and we would end up with a single pid for the whole timeout
process and avoid the recursive sigkill to begin with.

-- 
Cyril Hrubis
chrubis@suse.cz


More information about the ltp mailing list