[LTP] [PATCH] crypto: api - Fix generic algorithm self-test races

Eric Biggers ebiggers@kernel.org
Sun Oct 6 00:24:48 CEST 2024


On Tue, Sep 03, 2024 at 07:07:38AM +0800, Herbert Xu wrote:
> On Mon, Sep 02, 2024 at 10:05:54AM -0700, Eric Biggers wrote:
> >
> > With both this patch "crypto: api - Fix generic algorithm self-test races" and
> > your other patch "crypto: algboss - Pass instance creation error up" applied,
> > I'm still getting errors occasionally, e.g.:
> > 
> >     [    5.155587] alg: skcipher: failed to allocate transform for cbc(sm4-generic): -2
> >     [    5.155954] alg: self-tests for cbc(sm4) using cbc(sm4-generic) failed (rc=-2)
> >     [    5.372511] alg: aead: failed to allocate transform for gcm_base(ctr(aes-generic),ghash-generic): -2
> >     [    5.372861] alg: self-tests for gcm(aes) using gcm_base(ctr(aes-generic),ghash-generic) failed (rc=-2)
> > 
> > I can't follow your explanation of what is going on here and what the fix is.
> > Would it make any sense to just revert the commits that introduced this problem?
> 
> As I said earlier, these errors are expected.  What's happening
> is this:
> 
> __ecb-sm4-aesni-avx gets registered (but not tested)
> 
> cbc(sm4-generic) gets registered (but not tested)
> 
> __ecb-sm4-aesni-avx finishes testing
> 	with lskcipher this is equivalent to crypto_cipher sm4
> 	so it triggers the destruction of all instances of sm4
> 
> cbc(sm4-generic) gets marked as dead
> 
> cbc(sm4-generic) fails self-test because it's already dead (ENOENT)
> 
> It's harmless because whatever that is asking for cbc(sm4-generic)
> (in this case it's the extra-test mechanism) will simply retry the
> allocation which will then succeed.
> 
> I will send a patch to disable the warning when allocating X returns
> ENOENT while we're testing X itself.  This can always happen if X
> gets killed for the reason mentioned above and it's perfectly harmless.
> 
> It's just that the race window was tiny previously because testing
> occurred immediately after registration.  But now we've magnified
> that window many times with asynchronous testing.
> 

The tests are still failing on upstream:

[    0.343845] alg: self-tests for rfc4106(gcm(aes)) using rfc4106(gcm_base(ctr(aes-generic),ghash-generic)) failed (rc=-2)

To me it still seems like commit 37da5d0ffa7b ("crypto: api - Do not wait for
tests during registration") is just broken and should be reverted.

Besides the test failures, it looks like there's no longer any guarantee that
algorithms are actually available now that their module is loaded.

E.g. consider if someone does 'modprobe aesni-intel' and then immediately
creates a dm-crypt device.  Now it sounds like the AES-NI algorithms might not
have finished being tested yet and the generic algorithms can be used instead,
resulting in a performance regression.

I understand that you want to try to fix the edge cases in "fallback" ciphers.
But "fallback" ciphers have always seemed like a bad design due to how they use
the crypto API recursively.  I think the algorithms that use them should
generally be migrated off of them, e.g. as I did in commit f235bc11cc95
("crypto: arm/aes-neonbs - go back to using aes-arm directly").  That fixed the
problem in aes-neonbs that seems to have triggered this work in the first place.

- Eric


More information about the ltp mailing list