[LTP] [PATCH v2 1/4] read_all: Add worker timeout and rewrite scheduling

Jan Stancek jstancek@redhat.com
Fri Aug 5 08:26:17 CEST 2022


On Thu, Aug 4, 2022 at 12:14 PM Richard Palethorpe <rpalethorpe@suse.de> wrote:
>
> Hello Li,
>
> Li Wang <liwang@redhat.com> writes:
>
> > Hi Richard, All,
> >
> > On Mon, Jul 25, 2022 at 6:06 PM Richard Palethorpe <rpalethorpe@suse.com> wrote:
> >
> >  Kill and restart workers that take too long to read a file. The
> >  default being 10% of max_runtime. A custom time can be set with the
> >  new -t option.
> >
> >  This is to prevent a worker from blocking forever in a read. Currently
> >  when this happens the whole test times out and any remaining files in
> >  the worker's queue are not tested.
> >
> >  As a side effect we can now also set the timeout very low to cause
> >  partial reads. However setting it to less than the time it takes to
> >  start (fork) a new worker is treated as an error. Forking takes much
> >  longer than most reads.
> >
> >  A number of other possible issues were fixed as well as a side effect
> >  of changing the scheduling:
> >
> >  + The worker queues are now filled in a
> >    "round robin" way. Before this only happened when -r was large
> >    enough.
> >  + When testing is finished the main process waits on the child procs before
> >    destroying the semaphores and worker queues.
> >  + max_runtime is set to 100 secs. This is so that the worker timeout
> >    is a round number.
> >  + Files which don't implement O_NONBLOCK and may block, no longer need
> >    to be avoided. Even if they refuse to die immediately;
> >    although this may result in TBROK due to zombie processes.
> >
> >  Note that with a worker timeout of 1s, 2 to 3 files will usually timeout on
> >  my workstation. With 2s, none will. In any case testing completes in
> >  under 3s for proc, sys or dev.
> >
> >  This is much faster than many other machines. It's quite likely the
> >  worker timeout and max_runtime need to be increased on small and very
> >  large machines. This can be done manually by setting LTP_RUNTIME_MUL.
> >
> > Yes, apart from a bit of difficulty (at least for me) to comprehend the detailed
> > behavior of this scheduler :).
> >
> > Thanks for your improvements! Just one tiny query below.
> >
> > Reviewed-by: Li Wang <liwang@redhat.com>
>
> Thanks!
> >
> >
> >  --- a/testcases/kernel/fs/read_all/read_all.c
> >  +++ b/testcases/kernel/fs/read_all/read_all.c
> >  ...
> >
> >  +#include <signal.h>
> >   #include <sys/types.h>
> >   #include <sys/stat.h>
> >  +#include <sys/wait.h>
> >   #include <lapi/fnmatch.h>
> >   #include <stdlib.h>
> >   #include <stdio.h>
> >  @@ -43,7 +45,10 @@
> >   #include <pwd.h>
> >   #include <grp.h>
> >
> >  +#include "tst_atomic.h"
> >  +#include "tst_safe_clocks.h"
> >   #include "tst_test.h"
> >  +#include "tst_timer.h"
> >
> >   #define QUEUE_SIZE 16384
> >   #define BUFFER_SIZE 1024
> >  @@ -55,11 +60,15 @@ struct queue {
> >          int front;
> >          int back;
> >          char data[QUEUE_SIZE];
> >
> > I doubt whether we need to maintain a queue to store the paths.
> > During the test it seems only one path is being pushed in the q->data[],
> > so the rest of the space is wasted, right?
> >
> > By shrinking the QUEUE_SIZE to equal BUFFER_SIZE, the test
> > still works normally.
>
> The rest of the space should be being used unless it is buggy. It's a
> circle buffer with multiple (variable length) items delimited by \0.
>
> However it's a very good point you make. It's true that having a
> multi-item queue on each worker is not necessary.
>
> At most we need two buffers (or a flip buffer) on each worker, one
> containing the current path being worked on and one with the next path
> to be worked on.
>
> This would be much simpler than the circle buffer queue I originally
> implemented. It may mean the workers are more likely to starve while
> waiting for the main process. However the main process usually adds work
> far faster than the workers can consume it.
>
> So we should probably swap the queue for a flip buffer. I'll look into
> that and merge this patchset as it is for now.

Thanks for the improvements.



More information about the ltp mailing list