[LTP] [PATCH v2 1/4] read_all: Add worker timeout and rewrite scheduling

Thu Aug 4 11:47:43 CEST 2022

Hello Li,

Li Wang <liwang@redhat.com> writes:

> Hi Richard, All,
>
> On Mon, Jul 25, 2022 at 6:06 PM Richard Palethorpe <rpalethorpe@suse.com> wrote:
>
>  Kill and restart workers that take too long to read a file. The
>  default being 10% of max_runtime. A custom time can be set with the
>  new -t option.
>
>  This is to prevent a worker from blocking forever in a read. Currently
>  when this happens the whole test times out and any remaining files in
>  the worker's queue are not tested.
>
>  As a side effect we can now also set the timeout very low to cause
>  partial reads. However setting it to less than the time it takes to
>  start (fork) a new worker is treated as an error. Forking takes much
>  longer than most reads.
>
>  A number of other possible issues were fixed as well as a side effect
>  of changing the scheduling:
>
>  + The worker queues are now filled in a
>    "round robin" way. Before this only happened when -r was large
>    enough.
>  + When testing is finished the main process waits on the child procs before
>    destroying the semaphores and worker queues.
>  + max_runtime is set to 100 secs. This is so that the worker timeout
>    is a round number.
>  + Files which don't implement O_NONBLOCK and may block, no longer need
>    to be avoided. Even if they refuse to die immediately;
>    although this may result in TBROK due to zombie processes.
>
>  Note that with a worker timeout of 1s, 2 to 3 files will usually timeout on
>  my workstation. With 2s, none will. In any case testing completes in
>  under 3s for proc, sys or dev.
>
>  This is much faster than many other machines. It's quite likely the
>  worker timeout and max_runtime need to be increased on small and very
>  large machines. This can be done manually by setting LTP_RUNTIME_MUL.
>
> Yes, apart from a bit of difficulty (at least for me) to comprehend the detailed
> behavior of this scheduler :). 
>
> Thanks for your improvements! Just one tiny query below.
>
> Reviewed-by: Li Wang <liwang@redhat.com>

Thanks!
>
>   
>  --- a/testcases/kernel/fs/read_all/read_all.c
>  +++ b/testcases/kernel/fs/read_all/read_all.c
>  ...
>
>  +#include <signal.h>
>   #include <sys/types.h>
>   #include <sys/stat.h>
>  +#include <sys/wait.h>
>   #include <lapi/fnmatch.h>
>   #include <stdlib.h>
>   #include <stdio.h>
>  @@ -43,7 +45,10 @@
>   #include <pwd.h>
>   #include <grp.h>
>
>  +#include "tst_atomic.h"
>  +#include "tst_safe_clocks.h"
>   #include "tst_test.h"
>  +#include "tst_timer.h"
>
>   #define QUEUE_SIZE 16384
>   #define BUFFER_SIZE 1024
>  @@ -55,11 +60,15 @@ struct queue {
>          int front;
>          int back;
>          char data[QUEUE_SIZE];
>
> I doubt whether we need to maintain a queue to store the paths.
> During the test it seems only one path is being pushed in the q->data[],
> so the rest of the space is wasted, right?
>
> By shrinking the QUEUE_SIZE to equal BUFFER_SIZE, the test
> still works normally.

The rest of the space should be being used unless it is buggy. It's a
circle buffer with multiple (variable length) items delimited by \0.

However it's a very good point you make. It's true that having a
multi-item queue on each worker is not necessary.

At most we need two buffers (or a flip buffer) on each worker, one
containing the current path being worked on and one with the next path
to be worked on.

This would be much simpler than the circle buffer queue I originally
implemented. It may mean the workers are more likely to starve while
waiting for the main process. However the main process usually adds work
far faster than the workers can consume it.

So we should probably swap the queue for a flip buffer. I'll look into
that and merge this patchset as it is for now.

-- 
Thank you,
Richard.