[LTP] [PATCH 1/2] read_all: Add worker timeout

Mon Jul 18 12:57:39 CEST 2022

Hello,

Jan Stancek <jstancek@redhat.com> writes:

> On Tue, Jul 12, 2022 at 2:46 PM Richard Palethorpe via ltp
> <ltp@lists.linux.it> wrote:
>>
>> Kill and restart workers that take too long to read a file. The
>> default being one second. A custom time can be set with the new -t
>> option.
>>
>> This is to prevent a worker from blocking forever in a read. Currently
>> when this happens the whole test times out and any remaining files in
>> the worker's queue are not tested.
>>
>> As a side effect we can now also set the timeout very low to cause
>> partial reads.
>>
>> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
>> Cc: Joerg Vehlow <lkml@jv-coder.de>
>> Cc: Li Wang <liwang@redhat.com>
>> ---
>>  testcases/kernel/fs/read_all/read_all.c | 83 ++++++++++++++++++++++++-
>>  1 file changed, 82 insertions(+), 1 deletion(-)
>
>>
>> +static void restart_worker(struct worker *const worker)
>> +{
>> +       int wstatus, ret, i, q_len;
>> +       struct timespec now;
>> +
>> +       kill(worker->pid, SIGKILL);
>> +       ret = waitpid(worker->pid, &wstatus, 0);
>
> Is there a chance we could get stuck in uninterruptible read? I think I saw some
> in past, but those may be blacklisted already, so this may only be something
> to watch for if we still get test timeouts in future.
>

I was hoping that kill is special somehow, but I suppose that I should
check exactly what happens. If the process is stuck inside the kernel
then we don't want to wait too long for it. We just need to know that
the kill signal was delivered and that the process will not return to
userland. If we have a large number of zombies then it could exhaust the
PIDs or some other resource, but most reads are done very quickly and
don't need interrupting.

-- 
Thank you,
Richard.