[LTP] LTP test df01.sh detected different size of loop device in v5.19
Eric Sandeen
sandeen@sandeen.net
Mon Aug 15 22:09:41 CEST 2022
On 8/15/22 4:31 AM, Petr Vorel wrote:
> Hi Dave,
>
>> On Fri, Aug 12, 2022 at 03:20:37PM +0200, Petr Vorel wrote:
>>> Hi all,
>
>>> LTP test df01.sh found different size of loop device in v5.19.
>>> Test uses loop device formatted on various file systems, only XFS fails.
>>> It randomly fails during verifying that loop size usage changes:
>
>>> grep ${TST_DEVICE} output | grep -q "${total}.*${used}" [1]
>
>>> How to reproduce:
>>> # PATH="/opt/ltp/testcases/bin:$PATH" df01.sh -f xfs # it needs several tries to hit
>
>>> df saved output:
>>> Filesystem 1024-blocks Used Available Capacity Mounted on
>>> ...
>>> /dev/loop0 256672 16208 240464 7% /tmp/LTP_df01.1kRwoUCCR7/mntpoint
>>> df output:
>>> Filesystem 1024-blocks Used Available Capacity Mounted on
>>> ...
>>> tmpfs 201780 0 201780 0% /run/user/0
>>> /dev/loop0 256672 15160 241512 6% /tmp/LTP_df01.1kRwoUCCR7/mntpoint
>>> => different size
>>> df01 4 TFAIL: 'df -k -P' failed, not expected.
>
>> Yup, most likely because we changed something in XFS related to
>> internal block reservation spaces. That is, the test is making
>> fundamentally flawed assumptions about filesystem used space
>> accounting.
>
>> It is wrong to assuming that the available capacity of a given empty
>> filesystem will never change. Assumptions like this have been
>> invalid for decades because the available space can change based on
>> the underlying configuration or the filesystem. e.g. different
>> versions of mkfs.xfs set different default parameters and so simply
>> changing the version of xfsprogs you use between the two comparision
>> tests will make it fail....
>
>> And, well, XFS also has XFS_IOC_{GS}ET_RESBLKS ioctls that allow
>> userspace to change the amount of reserved blocks. They were
>> introduced in 1997, and since then we've changed the default
>> reservation the filesystem takes at least a dozen times.
>
> Thanks a lot for valuable info.
>
>>>> It might be a false positive / bug in the test, but it's at least a changed behavior.
>
>> Yup, any test that assumes "available space" does not change from
>> kernel version to kernel version is flawed. There is no guarantee
>> that this ever stays the same, nor that it needs to stay the same.
> I'm sorry I was not clear. Test [1] does not measure "available space" between
> kernel releases. It just run df command with parameters, saves it's output
> and compares "1024-blocks" and "Used" columns of df output with stat output:
Annotating what these do...
> local total=$(stat -f mntpoint --printf=%b) # number of blocks allocated
> local free=$(stat -f mntpoint --printf=%f) # free blocks in filesystem
> local used=$((total-free)) # (number of blocks free)
>
> local bsize=$(stat -f mntpoint --printf=%s) # block size ("for faster transfers")
> total=$((($total * $bsize + 512)/ 1024)) # number of 1k blocks allocated?
> used=$((($used * $bsize + 512) / 1024)) # number of 1k blocks used?
>
> And comparison with "$used" is what sometimes fails.
>
> BTW this happens on both distros when loop device is on tmpfs. I'm trying to
> trigger it on ext4 and btrfs, not successful so far. Looks like it's tmpfs
> related.
>
> If that's really expected, we might remove this check for used for XFS
> (not sure if check only for total makes sense).
It's kind of hard to follow this test, but it seems to be trying to
ensure that df output is consistent with du (statfs) numbers, before
and after creating and removing a 1MB file. I guess it's literally
testing df itself, i.e. that it actually presents the numbers it obtained
from statfs.
AFAICT the difference in the failure is 1024 1K blocks, which is the size
the file that's been created and removed during the test.
My best guess is that this is xfs inode deferred inode inactivation hanging
onto the space a little longer than the test expects.
This is probably because the new-ish xfs inode inactivation no longer blocks
on inode garbage collection during statfs().
IOWS, I think the test expects that free space is reflected in statfs numbers
immediately after a file is removed, and that's no longer the case here. They
change in between the df check and the statfs check.
(The test isn't just checking that the values are correct, it is checking that
the values are /immediately/ correct.)
Putting a "sleep 1" after the "rm -f" in the test seems to fix it; IIRC
the max time to wait for inodegc is 1s. This does slow the test down a bit.
-Eric
More information about the ltp
mailing list