[LTP] [PATCH 1/2] read_all: Add worker timeout
Jan Stancek
jstancek@redhat.com
Thu Jul 14 08:58:30 CEST 2022
On Tue, Jul 12, 2022 at 2:46 PM Richard Palethorpe via ltp
<ltp@lists.linux.it> wrote:
>
> Kill and restart workers that take too long to read a file. The
> default being one second. A custom time can be set with the new -t
> option.
>
> This is to prevent a worker from blocking forever in a read. Currently
> when this happens the whole test times out and any remaining files in
> the worker's queue are not tested.
>
> As a side effect we can now also set the timeout very low to cause
> partial reads.
>
> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
> Cc: Joerg Vehlow <lkml@jv-coder.de>
> Cc: Li Wang <liwang@redhat.com>
> ---
> testcases/kernel/fs/read_all/read_all.c | 83 ++++++++++++++++++++++++-
> 1 file changed, 82 insertions(+), 1 deletion(-)
>
> +static void restart_worker(struct worker *const worker)
> +{
> + int wstatus, ret, i, q_len;
> + struct timespec now;
> +
> + kill(worker->pid, SIGKILL);
> + ret = waitpid(worker->pid, &wstatus, 0);
Is there a chance we could get stuck in uninterruptible read? I think I saw some
in past, but those may be blacklisted already, so this may only be something
to watch for if we still get test timeouts in future.
> +
> + if (ret != worker->pid) {
> + tst_brk(TBROK | TERRNO, "waitpid(%d, ...) = %d",
> + worker->pid, ret);
> + }
> +
> + /* Make sure the queue length and semaphore match. Threre is a
> + * race in qeuue_pop where the semaphore can be decremented
^^ typo in queue_pop above
> + * then the worker killed before updating q->front
> + */
> + q_len = 0;
> + i = worker->q->front;
> + while (i != worker->q->back) {
> + if (!worker->q->data[i])
> + q_len++;
> +
> + i = (i + 1) % QUEUE_SIZE;
> + }
> +
> + ret = sem_destroy(&worker->q->sem);
> + if (ret == -1)
> + tst_brk(TBROK | TERRNO, "sem_destroy");
> + ret = sem_init(&worker->q->sem, 1, q_len);
> + if (ret == -1)
> + tst_brk(TBROK | TERRNO, "sem_init");
> +
> + tst_clock_gettime(CLOCK_MONOTONIC_RAW, &now);
> + worker->last_seen = tst_timespec_to_ms(now);
> + worker->pid = SAFE_FORK();
> +
> + if (!worker->pid)
> + exit(worker_run(worker));
> +}
> +
> static void work_push_retry(int worker, const char *buf)
> {
> int i, ret, worker_min, worker_max, usleep_time = 100;
> + struct timespec now;
> + int elapsed;
>
> if (worker < 0) {
> /* pick any, try -worker first */
> @@ -299,10 +357,25 @@ static void work_push_retry(int worker, const char *buf)
> i = worker_min;
>
> for (;;) {
> - ret = queue_push(workers[i].q, buf);
> + struct worker *const w = workers + i;
> +
> + ret = queue_push(w->q, buf);
> if (ret == 1)
> break;
>
> + tst_clock_gettime(CLOCK_MONOTONIC_RAW, &now);
> + elapsed =
> + tst_timespec_to_ms(now) - tst_atomic_load(&w->last_seen);
> +
> + if (elapsed > worker_timeout) {
> + if (!quiet) {
> + tst_res(TINFO,
> + "Worker %d (%d) stuck for %dms, restarting it",
> + i, w->pid, elapsed);
Can we also print file worker is stuck on?
And as Li pointed out, I'd also extend max_runtime to 60 seconds.
Acked-by: Jan Stancek <jstancek@redhat.com>
More information about the ltp
mailing list