[LTP] [RFC PATCH] read_all: give more time to wait children finish read action
Li Wang
liwang@redhat.com
Mon Apr 9 11:22:06 CEST 2018
Hi Richard,
Richard Palethorpe <rpalethorpe@suse.de> wrote:
Hello,
>
> Li Wang writes:
>
> > 1. Some children are still working on the read I/O but parent trys to
> > stopping them after visit_dir() immediately. Although the stop_attemps
> > is 65535, it still sometimes fails, so we get the following worker
> > stalled messges in test.
> >
> > # uname -rm
> > 4.16.0-rc7 ppc64
> > # ./read_all -d /sys -q -r 10
> > tst_test.c:987: INFO: Timeout per run is 0h 05m 00s
> > read_all.c:280: BROK: Worker 26075 is stalled
> > read_all.c:280: WARN: Worker 26075 is stalled
> > read_all.c:280: WARN: Worker 26079 is stalled
> > read_all.c:280: WARN: Worker 26087 is stalled
>
> wow, three workers have there queues perfectly filled... I guess I
> accidentally created a brute force box packing algorithm.
>
> >
> > 2. The sched_work() push action in a infinite loop, here I propose to let
> > it in limited times.
>
> I think this is moving the problem instead of solving it. Increasing the
> number of stop_attempts should have the same effect unless the workers
> are permanently blocked on I/O. However this might be better because it
> removes the sleep.
>
Hmm, not sure if you're fully get my point, maybe I(apologize!) shouldn't
fix two
problems in one patch.
For the block I/O issue, I just adding 'usleep(100)' in stop_workers()
function.
You suggest increasing stop_attempts is also accessible, but without sleep
it still very fast to finish the loop and probably we need a very large
number
for stop_attempt to waste time.
For the second change in sched_work() is just to guarantee we can exist the
infinite loop if something wrong with queue_push action.
>
> Possibly we should actually try to determine if a worker is blocked
> reading a file and print the file name.
>
You are right, I'm now still looking for a better way to avoid this block
I/O issue.
>
> >
> > Signed-off-by: Li Wang <liwang@redhat.com>
> > ---
> > testcases/kernel/fs/read_all/read_all.c | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/testcases/kernel/fs/read_all/read_all.c
> b/testcases/kernel/fs/read_all/read_all.c
> > index b7ed540..ab206e7 100644
> > --- a/testcases/kernel/fs/read_all/read_all.c
> > +++ b/testcases/kernel/fs/read_all/read_all.c
> > @@ -280,6 +280,7 @@ static void stop_workers(void)
> > workers[i].pid);
> > break;
> > }
> > + usleep(100);
>
>
> > }
> > }
> > }
> > @@ -306,9 +307,12 @@ static void sched_work(const char *path)
> > if (pushed)
> > break;
> >
> > - if (++push_attempts > worker_count) {
> > - usleep(100);
> > - push_attempts = 0;
> > + usleep(100);
> > + if (++push_attempts > 0xffff) {
>
> Maybe add another f to this.
>
No need too much attempt, my test says this push action can get pass less
than try 20 times.
>
> > + tst_brk(TBROK,
> > + "Attempts %d times but still failed to
> push %s",
> > + push_attempts, path);
> > + break;
> > }
> > }
> > }
>
>
> --
> Thank you,
> Richard.
>
--
Li Wang
liwang@redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linux.it/pipermail/ltp/attachments/20180409/8321b02c/attachment-0001.html>
More information about the ltp
mailing list