[LTP] [PATCH v2 1/4] read_all: Add worker timeout and rewrite scheduling

Wed Aug 3 09:49:47 CEST 2022

Hi Richard, All,

On Mon, Jul 25, 2022 at 6:06 PM Richard Palethorpe <rpalethorpe@suse.com>
wrote:

> Kill and restart workers that take too long to read a file. The
> default being 10% of max_runtime. A custom time can be set with the
> new -t option.
>
> This is to prevent a worker from blocking forever in a read. Currently
> when this happens the whole test times out and any remaining files in
> the worker's queue are not tested.
>
> As a side effect we can now also set the timeout very low to cause
> partial reads. However setting it to less than the time it takes to
> start (fork) a new worker is treated as an error. Forking takes much
> longer than most reads.
>
> A number of other possible issues were fixed as well as a side effect
> of changing the scheduling:
>
> + The worker queues are now filled in a
>   "round robin" way. Before this only happened when -r was large
>   enough.
> + When testing is finished the main process waits on the child procs before
>   destroying the semaphores and worker queues.
> + max_runtime is set to 100 secs. This is so that the worker timeout
>   is a round number.
> + Files which don't implement O_NONBLOCK and may block, no longer need
>   to be avoided. Even if they refuse to die immediately;
>   although this may result in TBROK due to zombie processes.
>
> Note that with a worker timeout of 1s, 2 to 3 files will usually timeout on
> my workstation. With 2s, none will. In any case testing completes in
> under 3s for proc, sys or dev.
>
> This is much faster than many other machines. It's quite likely the
> worker timeout and max_runtime need to be increased on small and very
> large machines. This can be done manually by setting LTP_RUNTIME_MUL.
>

Yes, apart from a bit of difficulty (at least for me) to comprehend the
detailed
behavior of this scheduler :).

Thanks for your improvements! Just one tiny query below.

Reviewed-by: Li Wang <liwang@redhat.com>

> --- a/testcases/kernel/fs/read_all/read_all.c
> +++ b/testcases/kernel/fs/read_all/read_all.c
> ...

> +#include <signal.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/wait.h>
>  #include <lapi/fnmatch.h>
>  #include <stdlib.h>
>  #include <stdio.h>
> @@ -43,7 +45,10 @@
>  #include <pwd.h>
>  #include <grp.h>
>
> +#include "tst_atomic.h"
> +#include "tst_safe_clocks.h"
>  #include "tst_test.h"
> +#include "tst_timer.h"
>
>  #define QUEUE_SIZE 16384
>  #define BUFFER_SIZE 1024
> @@ -55,11 +60,15 @@ struct queue {
>         int front;
>         int back;
>         char data[QUEUE_SIZE];
>

I doubt whether we need to maintain a queue to store the paths.
During the test it seems only one path is being pushed in the q->data[],
so the rest of the space is wasted, right?

By shrinking the QUEUE_SIZE to equal BUFFER_SIZE, the test
still works normally.

> +       char popped[BUFFER_SIZE];
>  };
>
>  struct worker {
> +       int i;
>         pid_t pid;
>         struct queue *q;
> +       int last_seen;
> +       unsigned int kill_sent:1;
>  };
>
>

-- 
Regards,
Li Wang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20220803/afecb075/attachment.htm>