[LTP] [PATCH v2] read_all: give more time to wait children finish read action
Richard Palethorpe
rpalethorpe@suse.de
Wed Apr 11 13:19:42 CEST 2018
Hello Li,
Li Wang writes:
> 1. We get the following worker stalled messges in test:
> # ./read_all -d /sys -q -r 10
> tst_test.c:987: INFO: Timeout per run is 0h 05m 00s
> read_all.c:280: BROK: Worker 26075 is stalled
> read_all.c:280: WARN: Worker 26075 is stalled
> read_all.c:280: WARN: Worker 26079 is stalled
> read_all.c:280: WARN: Worker 26087 is stalled
>
> The reason is that some children are still working on the read I/O but
> parent trys to stopping them after visit_dir() immediately. Although
> the stop_attemps is 65535, it still sometimes fails.
>
> Instead, we use an exponential backoff way to loop the stop operation
> in limited seconds.
>
> 2. The sched_work() push action in a infinite loop, here also let it
> trys in limited time.
>
> Signed-off-by: Li Wang <liwang@redhat.com>
> ---
> testcases/kernel/fs/read_all/read_all.c | 24 ++++++++++++++++--------
> 1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/testcases/kernel/fs/read_all/read_all.c b/testcases/kernel/fs/read_all/read_all.c
> index b7ed540..a9f9707 100644
> --- a/testcases/kernel/fs/read_all/read_all.c
> +++ b/testcases/kernel/fs/read_all/read_all.c
> @@ -57,6 +57,8 @@
> #define BUFFER_SIZE 1024
> #define MAX_PATH 4096
> #define MAX_DISPLAY 40
> +#define MICROSECOND 1
Not necessary.
> +#define SECOND MICROSECOND * 1000000
>
> struct queue {
> sem_t sem;
> @@ -265,20 +267,21 @@ static void spawn_workers(void)
> static void stop_workers(void)
> {
> const char stop_code[1] = { '\0' };
> - int i, stop_attempts;
> + int i, delay = 1;
>
> if (!workers)
> return;
>
> for (i = 0; i < worker_count; i++) {
> - stop_attempts = 0xffff;
> if (workers[i].q) {
Maybe change this to:
if (!workers[i].q)
continue;
To avoid a level of indentation.
> while (!queue_push(workers[i].q, stop_code)) {
> - if (--stop_attempts < 0) {
> + if (delay < SECOND) {
> + usleep(delay);
> + delay *= 2;
> + } else {
> tst_brk(TBROK,
> "Worker %d is stalled",
> workers[i].pid);
> - break;
> }
> }
> }
> @@ -295,7 +298,7 @@ static void stop_workers(void)
> static void sched_work(const char *path)
> {
> static int cur;
> - int push_attempts = 0, pushed;
> + int push_attempts = 0, pushed, delay = 1;
>
> while (1) {
> pushed = queue_push(workers[cur].q, path);
> @@ -306,9 +309,14 @@ static void sched_work(const char *path)
> if (pushed)
> break;
>
> - if (++push_attempts > worker_count) {
> - usleep(100);
> - push_attempts = 0;
> + if (delay < SECOND) {
> + push_attempts++;
> + usleep(delay);
> + delay *= 2;
> + } else {
> + tst_brk(TBROK,
> + "Attempts %d times but still failed to push %s",
^ Attempted
> + push_attempts, path);
> }
> }
> }
Maybe you could put the "if (delaly < SECOND) ..." into a function?
Otherwise this looks good to me. There are some other things I want to
change on this test, but we can leave those for another patch.
--
Thank you,
Richard.
More information about the ltp
mailing list