[LTP] [PATCH 1/2] read_all: Add worker timeout

Richard Palethorpe rpalethorpe@suse.de
Mon Jul 18 12:37:59 CEST 2022


Hello,

Petr Vorel <pvorel@suse.cz> writes:

> Hi all,
>
> Reviewed-by: Petr Vorel <pvorel@suse.cz>
>
>> > +static void restart_worker(struct worker *const worker)
>> > +{
>> > +       int wstatus, ret, i, q_len;
>> > +       struct timespec now;
>> > +
>> > +       kill(worker->pid, SIGKILL);
>> > +       ret = waitpid(worker->pid, &wstatus, 0);
>
>> Is there a chance we could get stuck in uninterruptible read? I think I saw some
>> in past, but those may be blacklisted already, so this may only be something
>> to watch for if we still get test timeouts in future.
>
> +1
>
> ...
>> > +       if (ret != worker->pid) {
>> > +               tst_brk(TBROK | TERRNO, "waitpid(%d, ...) = %d",
>> > +                       worker->pid, ret);
>> > +       }
>> > +
>> > +       /* Make sure the queue length and semaphore match. Threre is a
>> > +        * race in qeuue_pop where the semaphore can be decremented
>> ^^ typo in queue_pop above
>
> ...
>> > +               tst_clock_gettime(CLOCK_MONOTONIC_RAW, &now);
>> > +               elapsed =
>> > +                       tst_timespec_to_ms(now) - tst_atomic_load(&w->last_seen);
>> > +
>> > +               if (elapsed > worker_timeout) {
>> > +                       if (!quiet) {
>> > +                               tst_res(TINFO,
>> > +                                       "Worker %d (%d) stuck for %dms, restarting it",
>> > +                                       i, w->pid, elapsed);
>
>> Can we also print file worker is stuck on?
>> And as Li pointed out, I'd also extend max_runtime to 60 seconds.
>
> +1, all these additional changes make sense to me.

OK I'll make these changes, thanks!

>
> Kind regards,
> Petr


-- 
Thank you,
Richard.


More information about the ltp mailing list