[LTP] [External] : [PATCH] [RFC] doc: Add testers guide

Sun Jun 8 18:21:24 CEST 2025

There are some minor typos in addition to Ricardo's comments.

On 06-06-2025 18:51, Cyril Hrubis wrote:
> While we have added a documentation on supported kernels, compilers, how
> to compile and run LTP etc. Some kind of a comprehensive guide for
> people new to LTP and kernel testing was missing. This patch adds a
> short guide that tries to explain some of the pitfalls of kernel
> testing. Feel free to point out what is missing and suggest additional
> chapters.
> 
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
> ---
>   doc/index.rst               |   1 +
>   doc/users/testers_guide.rst | 129 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 130 insertions(+)
>   create mode 100644 doc/users/testers_guide.rst
[snip]
> +
> +Multi dimensionality
> +--------------------
> +
> +First of all kernel testing is a multi dimensional problem, just compiling and
> +running LTP will give you some coverage but very likely not enough. There are
> +several big gaps that may be easily missed.
> +
> +For example 64bit Linux kernel provides compatibility layer for 32bit
> +applications which code quality is usually a bit worse than the 64bit ABI.
> +Hence recompiling LTP with -m32 in compiler flags and runnig both 64bit and

typo runnig -> running

> +32bit test binaries is a good start. If you try to make an argument that your
> +application does not need 32bit support it's better to disable the compat layer
> +completely since it's possible source of security bugs.
> +
> +Another dimension is the number of architectures you need to test, for a
> +general distribution testing you may end up with a couple of them. Different
> +architectures have different platform code as well as differencies in memory

differencies -> differences

> +orderings, etc. that all means that running tests on one architecture out of
> +several will give you incomplete coverage.
> +
> +You also have to decide if you are going to run tests in virtual machine e.g.
> +qemu-kvm, on bare metal or both. Testing in virtual machine will give you about
> +90% of the coverage for bare metal and vice versa.
> +
> +There are other options worth of consideration too, Linux kernel has many
> +debugging options that are usually disabled on runtime since they incur
> +singificant performance penalty. Having a few more LTP testruns with different
> +debug options enabled e.g. KASAN may help catch bugs before they materialize in
> +production.
> +
> +In practice your test matrix may easily explode and you may end up with dozens
> +of differently configured testruns based on different considerations. The hard
> +taks at hand is not to have too many since computing power is not an infinite
> +resource and does not scale that easily. If you managed to read up to this
> +point "Don't Panic" things are almost never as bad as they may seem at first
> +glance.
> +
> +It's a good idea to start small with an evironment that models your production.
> +Once that works well you can try different configurations. Select a few
> +interesting ones and run them for some time in order to get an idea of their
> +usefulness. If you are feeling adventurous you may try to measure and compare
> +actual test coverage with one of the tools such as lcov. If you do so do not
> +fall into a trap of attempting to have 100% line coverage. Having 100% of lines
> +executed during the test does not mean that your test coverage is 100%. Good
> +tests validate much more than just how much code from the tested binary was
> +executed.
> +
> +You may need to sacrifice some coverage in order to match the tests runtime to
> +the available computing power. When doing so Pareto principle is your friend.
> +
> +
> +Test scope
> +----------
> +
> +So far we were talking about a code coverage from a point of maximizing test
> +coverage while keeping our test matrix as small as poissible. While that is a
> +noble goal it's not the universal holy grail of testing. Different use cases
> +have different considerations and scope. For a testing before the final release

testing before a final release

> +such testing is very desirable, however for a continuous integration or smoke
> +testing the main requirement is that feedback loops are as short as possible.
> +
> +When a developer changes the kernel and submits the changes to be merged it's
> +desirable to run some tests. Again the hard question is which tests. If we run
> +all possible tests in all possible combinations it may take a day or two and
> +the developer will move to a diffrent tasks before the tests have a chance to

typo diffrent -> different

> +finish. If you multiply that by a number of developers in the team you may end
> +up in a situation where a developer will retire before tests for his patch may
> +have chance to finish.

have had a chance to finish

> +
> +In this case careful selection of tests is even more important. Having less is
> +more in this context. One of the first ideas for CI is to skip tests that run
> +for more than a second or so, happily this can be easily done with kirk. In the
> +future we may want to explore some heuristics that would map the code changes
> +in kernel into a subset of tests, which would allow for a very quick feedback.
> +
> +
> +Debugging test failures
> +-----------------------
> +
> +You may think that you will enjoy some rest once you have your test matrix
> +ready and your tests are running. Unfortunatelly that's where the actual work
> +starts. Debugging test failures is probably the hardest part of the testing
> +process. In some cases failures are easily reproducible and it's not that hard
> +to locate the bug, either in the test or in the kernel itself. There are
> +however, quite common, cases where the test failure reproduces only in 10% or
> +even 1% of the test runs. That does not mean that there is no bug, that usually
> +means that the bug depends on more prerequisities that have to manifest at the

prerequisities -> prerequisites

> +right time in order to trigger the failure. Sadly for modern systems that are
> +asynchronous in nature such bugs are more and more common.
> +
> +The debugging process itself is not complicated at its nature. You have to

not complicated at its nature -> not complicated by nature

> +attempt to undestand the failure by checking the logs, reading the source code,

typo undestand -> understand

> +debugging with strace, gdb, etc. Then form a hypotesis and either prove or

hypotesis -> hypothesis

> +disprove it. Rinse and repeat until you end up with a clear description of what
> +went wrong. Hopefully you will manage to find the root cause but you should not
> +be discouraged if you do not. Debugging kernel bugs takes a lot of experience
> +and skill one cay say as much as is needed write the kernel code.
> +
> +
> +Happy testing!

Thanks,
Alok