[LTP] [PATCH v2] tst_test: using SIGTERM to terminate process

Petr Vorel pvorel@suse.cz
Fri Sep 17 13:03:00 CEST 2021


> Hi!
> > > > I managed to reproduce this in dash. I bet that this is a bug where
> > > > signal handler inside dash is temporarily disabled when we install the
> > > > trap and if we manage to hit that window the signal is discarded. At
> > > > least that is my working theory. After I've installed debug prints, in
> > > > the cases where it hangs the signal was sent just before have installed
> > > > the trap. And in some cases when the signal arrives the timer process is
> > > > killed but the trap is not invoked. So it really looks like signal
> > > > handling in dash is simply broken. Not sure what we can do about bugs
> > > > like this apart from switching to a real programming language.
> > Which version of bash and dash are you testing on?

> > > Thanks for the debugging. *bash* is also affected, at least some releases.
> > > I reproduced it also on some older SLES, with bash 4.4.
> > dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.

> > I tested it on various my VM:

> > dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> > dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)

> > bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> > bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> > (Debian), 4.2.46-34 (CentOS)

> > I have no idea what it causes, whether really some bash and dash versions are
> > buggy or it's reproducible only on certain environment.

> bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> > Any tip what to search for?

> Not really, apart from reading source code and figuring out exactly what
> happens here.

> I guess that we can make things more predictable and easier to read by
> shifting parts of the shell library to a C process.

> For instance if we wrote an utility that would implement all the
> tst_kill_test and timeout process in C we would simplify things greatly.

> Something as:

> tst_timeout_kill

> that would be used as:


> tst_timeout_kill 300 12342
>                   ^    ^
> 		  |    process group leader pid
> 		  timeout in seconds


> That would implement both the loop for killing tests and timeout
> processing as well. That way we would get rid of the trap in the
> subshell and we would end up with a single pid for the whole timeout
> process and avoid the recursive sigkill to begin with.

+1 for this idea instead of never ending story to fix various shells.
Please if you have time, wrote that.

Kind regards,
Petr


More information about the ltp mailing list