[LTP] [PATCH] fzsync: skip test when avaliable CPUs less than 2

Mon Nov 30 15:16:22 CET 2020

On 30. 11. 20 10:01, Richard Palethorpe wrote:
> Hello,
> 
> Li Wang <liwang@redhat.com> writes:
> 
>> Hi Joerg,
>>
>> On Mon, Nov 30, 2020 at 3:53 PM Joerg Vehlow <lkml@jv-coder.de> wrote:
>>
>>> Hi,
>>>>> No, af_alg07 requires 2 CPUs, otherwise it'll report false positives.
>>>>> The test will pass only if fchownat() hits a half-closed socket and
>>>>> returns error. But IIRC the half-closed socket will be destroyed during
>>>>> reschedule which means there's no race window to hit anymore. But it
>>>>> would be better to put the TCONF condition into the test itself.
>>>> Interesting, I wonder if this is also true for the real-time kernel with
>>>> the threads set to RT priority?
>>> It looks like the test can fail even with more than one cpu. I've seen
>>> this sporadic failure on different hardware with more than two cores, at
>>> least on intel denverton (x86_64) and renesas r-car (aarch64) systems.
>>> Both with kernel 4.19 with the fix included, on the denverton system the
>>> rt parches were included and on the r-car not. The test passes most of
>>> the time, but sometimes fails with the message Li posted.
>>>
>>> It also seems to fail sporadically on other systems as well:
>>> https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1892860
>>>
>>> Additionally I tested on qemu-x86 with 4.19 with and without rt patches.
>>> The test succeeds even with only one virtualized cpu. So either Martin's
>>> assumption is wrong or it holds only for newer kernel versions?
>>>
>>
>> No, Mertin is not wrong, and you are also right.
>>
>> They are totally two different issues of af_alg07, the test on 1CPU
>> should be fixed with TCONF. But the fail with aarch64 is more like a
>> hardware issue, Chunyu has a drafted patch to add init delay value for
>> such a system.
>>
>> Can you try this on your aarm64 platform?
>> -----------------------------
>> fzsync can't get a random delay range on hpe-moonshot systems, so run with
>> delay=0 during all the tests. This is probably the hardware issue such as
>> cache line design so can't get a stable state during the execution of the
>> critical
>> section. Provide an experience delay value on hpe-moonshot to make it hit
>> the race window immediately without exceeding samples.
>>
>> ---
>>  testcases/kernel/crypto/af_alg07.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/testcases/kernel/crypto/af_alg07.c
>> b/testcases/kernel/crypto/af_alg07.c
>> index 6ad86f4f3..24f5b8088 100644
>> --- a/testcases/kernel/crypto/af_alg07.c
>> +++ b/testcases/kernel/crypto/af_alg07.c
>> @@ -47,6 +47,7 @@ static void setup(void)
>>   fd = SAFE_OPEN("tmpfile", O_RDWR | O_CREAT, 0644);
>>
>>   tst_fzsync_pair_init(&fzsync_pair);
>> + fzsync_pair.delay_bias = 700;
> 
> I hope there is some way to set this dynamically. Similar to
> CVE-2016-7117.
> 
> If we know that we should get some particular error we could modify the
> bias until the error happens.

There are three possible outcomes of the race:
1) fchownat() returns 0 => fchownat() was called too early or the kernel
is vulnerable, you can adjust bias here
2) fchwonat() fails with ENOENT => kernel is fixed, print TPASS and exit
3) fchownat() fails with EBADF => fchownat() was called too late, you
can adjust bias here

IIRC I didn't play with bias in this test because on x86_64 it passes
almost instantly on a fixed kernel. Feel free to add dynamic bias
adjustment for ARM.

-- 
Martin Doucha   mdoucha@suse.cz
QA Engineer for Software Maintenance
SUSE LINUX, s.r.o.
CORSO IIa
Krizikova 148/34
186 00 Prague 8
Czech Republic