[LTP] [PATCH v1] fzsync: break inf loop with flag vs pthread_cancel

Mon Apr 11 09:51:43 CEST 2022

Hello Li and Edward,

Sorry for slow response I was AFK.

Li Wang <liwang@redhat.com> writes:

> Hi Edward,
>
> On Wed, Apr 6, 2022 at 1:06 AM Edward Liaw via ltp <ltp@lists.linux.it> wrote:
>
>  Hi, I'm working to get fzsync working with the Android kernel, which
>  does not have pthread_cancel available.
>
>  In the absence of pthread_cancel, when thread A exits due to a break,
>  thread B will get stuck in an infinite loop while waiting for thread A
>  to progress.
>
>  Instead of cancelling thread B, we can use the exit flag to break out of
>  thread B's loop.  This should also remove the need for the wrapper
>  around the thread.
>
> This method is more graceful, but involves a new potential issue.
>
> Looking at tst_fzsync_run_a, if anything goes wrong in other places
> (thread_b) and break with setting 'pair->exit' to 1 to end the looping. 
> It doesn't work for thread_a because tst_atomic_store(exit, &pair->exit)
> will reset it back to 0 (int exit = 0).

I don't think we have ever handled thread B breaking gracefully?

If B breaks and it calls tst_fzsync_pair_cleanup then it will try to
join to itself and we will probably get EDEADLK.

In most cases though thread B is very simple and doesn't even use SAFE_
functions.

>
> Another suggestion is to distinguish the abnormal invoke for
> tst_fzsync_pair_cleanup, because that is rarely a situation we
> encounter, no need to reset pair->exit over again.
>
> So better to have this improvement:
>
> --- a/include/tst_fuzzy_sync.h
> +++ b/include/tst_fuzzy_sync.h
> @@ -232,7 +232,11 @@ static inline void tst_fzsync_pair_init(struct tst_fzsync_pair *pair)
>  static inline void tst_fzsync_pair_cleanup(struct tst_fzsync_pair *pair)
>  {
>         if (pair->thread_b) {
> -               tst_atomic_store(1, &pair->exit);
> +               /* Terminate thread B if parent hits accidental break */
> +               if (!pair->exit) {
> +                       tst_atomic_store(1, &pair->exit);
> +                       usleep(100000);

Why do we need usleep?

> +               }
>                 SAFE_PTHREAD_JOIN(pair->thread_b, NULL);
>                 pair->thread_b = 0;
>         }
> @@ -642,7 +646,6 @@ static inline void tst_fzsync_wait_b(struct tst_fzsync_pair *pair)
>   */
>  static inline int tst_fzsync_run_a(struct tst_fzsync_pair *pair)
>  {
> -       int exit = 0;
>         float rem_p = 1 - tst_timeout_remaining() / pair->exec_time_start;
>  
>         if ((pair->exec_time_p * SAMPLING_SLICE < rem_p)
> @@ -657,19 +660,18 @@ static inline int tst_fzsync_run_a(struct tst_fzsync_pair *pair)
>         if (pair->exec_time_p < rem_p) {
>                 tst_res(TINFO,
>                         "Exceeded execution time, requesting exit");
> -               exit = 1;
> +               tst_atomic_store(1, &pair->exit);
>         }
>  
>         if (++pair->exec_loop > pair->exec_loops) {
>                 tst_res(TINFO,
>                         "Exceeded execution loops, requesting exit");
> -               exit = 1;
> +               tst_atomic_store(1, &pair->exit);
>         }
>  
> -       tst_atomic_store(exit, &pair->exit);
>         tst_fzsync_wait_a(pair);
>  
> -       if (exit) {
> +       if (pair->exit) {
>                 tst_fzsync_pair_cleanup(pair);
>                 return 0;
>         }

I think this part is reasonable however. Even if we don't handle B
breaking fully gracefully it is better to avoid setting exit back to
zero unecessarily and risking deadlock.

>
> -- 
> Regards,
> Li Wang

-- 
Thank you,
Richard.