[LTP] [PATCH v3 2/3] lib/tst_test.c: Update result counters when calling tst_brk()

Mon Jan 7 16:06:19 CET 2019

Hi!
> 1) Catch and report the TFAIL exit status of child process.

Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...")
to exit the child process, which sort of works but it's incorrect. The
tst_brk() always meant "unrecoverable failure have happened, exit the
current process as fast as possible". Looking over our codebase most of
the tst_brk(TFAIL, "...") should not actually cause the main test
process to exit, these were only meant to exit the child and report the
result in one call. It will for instance break the test with -i option
on the first failure, which is incorrect.

So if we ever want to have a function to exit child process with a result we
should implement tst_ret() that would be equivalent to tst_res() followed by
exit(0).

It could be even implemented as:

#define tst_ret(ttype, fmt, ...) \
	do { \
		tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \
		exit(0); \
	} while (0)

This function has one big advantage, it increments the results counters
before the child process exits.

Actually one of the big points of the new test library was that the
results counters are atomically increased, because passing the results
in exit values is nightmare that cannot be done correclty.

> 2) Only update result counters in library process and main test
>    process because the exit status of child can be reported by
>    main test process.

Actually after I spend some time on it I think that the best solution is
to update the results in the piece of shared memory as fast as possible,
anything else is prone to various races and corner cases.

> 3) Print TCONF message and increase skipped when calling tst_brk(TCONF).
>    Print TBROK message and increase broken when calling tst_brk(TBROK).
>    Print TFAIL message and increase failed when calling tst_brk(TFAIL).
> 4) Remove duplicate update_results() in run_tcases_per_fs().

I've been thinking about this and the problem is more complex, and I'm
even not sure that it's possible to write the library so that the
counters are consistent at the time we exit the test if something
unexpected happened and we called tst_brk().

Consider for instance this example:

#include "tst_test.h"

static void do_test(void)
{
        if (!SAFE_FORK())
                tst_brk(TBROK, "child");
        tst_brk(TBROK, "parent");
}

static struct tst_test test = {
        .test_all = do_test,
        .forks_child = 1,
};

When tst_brk() is called both in parent and child the counter would be
incremented only once because the child is not waited for by the main
test.

We can close this special case by changing the main test pid to wait for the
children before it calls exit() in the tst_brk() but that may cause the
main process to get stuck undefinitely if the child processes get stuck,
so we would have to be careful.

Also from the very definition of the TBROK return status the test
results would be incomplete at best, since TBROK really means
"unrecoverable error happened during the test" which would mostly means
that something as low level as filesystem got corrupted and there is no
point in presenting the results in that case, so I guess that the best
we could do in the case of TBROK is to print big message that says
"things went horribly wrong!" or something similar.

All in all I would like to avoid applying patches to the test library
before we finalize the release, since there is not much time for
testing now.

-- 
Cyril Hrubis
chrubis@suse.cz