[LTP] [PATCH 1/2] scripts: Add simple script for calculating timeouts

Tue Jan 21 14:12:23 CET 2025

Hi!

I like the general idea, a couple of comments on the python code.

On 1/21/25 13:34, Cyril Hrubis wrote:
> This script parses JSON results from kirk and LTP metadata in order
> calculate timeouts for tests based on the result file and can even patch
> tests automatically.
>
> The script does:
>
> - Take the results and pick all tests that run for longer than 0.5s.
>    Multiplies the time with a constant (currently 1.2) to get a suggested
>    timeout.
>
> - Exclude tests that have runtime defined since these are controller
>    by the runtime (that filters out all fuzzy sync tests).
>
>    There is a special case for timer tests that define runtime only
>    dynamically in the timer library code. This should be possibly fixed
>    with special value for the .runtime in tst_test. E.g.
>    TST_RUNTIME_DYNAMIC for tests that only set runtime in the setup.
>
> - Normalize the timeout per a single filesystem run if test is running for
>    more than one filesystem.
>
> - Tests that do not have a metadata record are old library tests which
>    which cannot be patched but are printed in a separate table if we
>    request a table to be printed.
>
> - If patching option is selected tests are update with newly calculated
>    timeout. By default we only increase timeouts but that can be
>    overrided with the -o option.
>
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
> ---
>   scripts/calc_timeouts.py | 133 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 133 insertions(+)
>   create mode 100755 scripts/calc_timeouts.py
>
> diff --git a/scripts/calc_timeouts.py b/scripts/calc_timeouts.py
> new file mode 100755
> index 000000000..c69ab8f57
> --- /dev/null
> +++ b/scripts/calc_timeouts.py
> @@ -0,0 +1,133 @@
> +#!/usr/bin/python
> +import re
> +import json
> +import getopt
> +import sys
> +from os import system
> +
> +# Top level directory path
> +top_dir = '../'
Better to pass this folder to the script using parameters, also because 
metadata can be found in the install folder as well.
> +
> +# The test runtime is multiplied by this to get a timeout
> +timeout_mul = 1.2
Global variables in python use TIMEOUT_MUL format.
> +
> +def patch(fname, new_timeout, patch_override):
> +
> +    orig_timeout = None
> +    file_path = top_dir + fname
os.path.join()
> +
> +    with open(file_path, 'r') as c_source:
> +        for line in c_source:
> +            timeout = re.search(r'\s*.timeout\s*=\s*(\d+).', line)
This regex should be compiled before loop using re.compile(), otherwise 
re.search() will compile it for each line.
> +            if timeout:
> +                orig_timeout = int(timeout.group(1))
> +
> +    if orig_timeout:
> +        if orig_timeout < new_timeout or patch_override:
> +            print("CHANGE %s timeout %i -> %i" % (fname, orig_timeout, new_timeout))
> +            system("sed -i 's/\\.timeout = [0-9]*/\\.timeout = " + str(new_timeout) + "/' " + file_path)

This can be substituted with a python version of sed, since system() is 
never a good idea, unless we really need to use it for external tools 
which cannot be translated into python. The solution would be:

content = []
matcher = re.match(r'\s*.timeout\s*=\s*(\d+).')
with open(file_path, 'r') as data:
     for line in data:
         # use regex here to find the matching string
         if matcher.search(line):
             content.append(f'timeout = {new_timeout}')
         else:
             content.append(line)

with open(file_path, 'w') as data:
         data.writelines(content)

> +        else:
> +            print("KEEP   %s timeout %i (new %i)" % (fname, orig_timeout, new_timeout))
> +    else:
> +        print("ADD    %s timeout %i" % (fname, new_timeout))
> +        system("sed -i '/static struct tst_test test = {/a\\\\t.timeout = " + str(new_timeout) + ",' " + file_path)
Same here. Maybe we can create a generic sed() function for both cases.
> +
> +def patch_all(timeouts, patch_override):
> +    for timeout in timeouts:
> +        if timeout[3]:
> +            patch(timeout[3], timeout[1], patch_override)
> +
> +def print_table(timeouts):
> +    timeouts.sort(key=lambda x: x[1], reverse=True)
> +
> +    total = 0;
> +
> +    print("Old library tests\n-----------------\n");
> +    for timeout in timeouts:
> +        if not timeout[2]:
> +            print("%-30s %i" % (timeout[0], timeout[1]))
> +            total+=1
> +
> +    print("\n\t%i tests in total" % total)
> +
> +    total = 0;
> +
> +    print("\nNew library tests\n-----------------\n");
> +    for timeout in timeouts:
> +        if timeout[2]:
> +            print("%-30s %i" % (timeout[0], timeout[1]))
> +            total+=1
> +
> +    print("\n\t%i tests in total" % total)
> +
> +def parse_data(results_path):
> +    timeouts = []
> +
> +    with open(results_path, 'r') as file:
> +        results = json.load(file)
> +
> +    with open(top_dir + 'metadata/ltp.json', 'r') as file:
> +        metadata = json.load(file)
> +
> +    for test in results['results']:
> +        name = test['test_fqn']
> +        duration = test['test']['duration']
> +        # If test runs for all_filesystems normalize the runtime per a single filesystem
> +        filesystems = max(1, test['test']['log'].count('TINFO: Formatting /'))
> +        # Check if test is new library test
> +        test_is_newlib = name in metadata['tests']
> +        # Store test file path
> +        path = None if not test_is_newlib else metadata['tests'][name]['fname']
> +        # Filter out tests with runtime
> +        test_has_runtime = False if not test_is_newlib else 'runtime' in metadata['tests'][name]
> +        # Timer tests define runtime dynamically in timer library
> +        if test_is_newlib and 'sample' in metadata['tests'][name]:
> +            test_has_runtime = True
> +        # Select tests that does not have runtime and are executed for longer time
> +        if not test_has_runtime and duration >= 0.5:
> +            timeouts.append((name, int(timeout_mul * duration/filesystems + 0.5), test_is_newlib, path))
> +
> +    return timeouts

In this case it's better to avoid tuples because they make code more 
difficult to read. Hash is a better approach in general:

data["name"] = name
data["timeout"] = int(timeout_mul * duration/filesystems + 0.5)
data["new_lib"] = test_is_newlib
data["path"] = path

return data

> +
> +def print_help():
> +    print('calc_timeouts.py [OPTION] [RESULT].json')
> +    print('\t-h prints this help')
> +    print('\t-o override test timeouts, by default timeouts are only increased')
> +    print('\t-p patch testcases with updated timeouts')
> +    print('\t-t prints table of tests with suggested timeouts')
> +
> +try:
> +    opts, args = getopt.getopt(sys.argv[1:], "hopt")
> +except:
> +    print_help()
> +    sys.exit(1)
> +
> +opt_print_table = False
> +opt_patch_tests = False
> +opt_patch_override = False
> +
> +for opt,arg in opts:
> +    if opt == '-h':
> +        print_help()
> +        sys.exit(0)
> +    if opt == '-o':
> +        opt_patch_override = True
> +    if opt == '-p':
> +        opt_patch_tests = True
> +    if opt == '-t':
> +        opt_print_table = True
> +
> +if not opt_print_table and not opt_patch_tests:
> +    print("No action selected!\n")
> +    print_help()
> +    sys.exit(1)
> +
> +results = args[0] if args else 'results.json'
> +
> +timeouts = parse_data(results)
> +
> +if opt_print_table:
> +    print_table(timeouts)
> +
> +if opt_patch_tests:
> +    patch_all(timeouts, opt_patch_override)
Better to wrap everything into a main() function and to use:

if __name__ == "__main__":

     run()

Andrea