mbox series

[v2,00/11] Multibuffer hashing take two

Message ID cover.1739674648.git.herbert@gondor.apana.org.au
Headers show
Series Multibuffer hashing take two | expand

Message

Herbert Xu Feb. 16, 2025, 3:07 a.m. UTC
This patch-set introduces two additions to the ahash interface.
First of all request chaining is added so that an arbitrary number
of requests can be submitted in one go.  Incidentally this also
reduces the cost of indirect calls by amortisation.

It then adds virtual address support to ahash.  This allows the
user to supply a virtual address as the input instead of an SG
list.

This is assumed to be not DMA-capable so it is always copied
before it's passed to an existing ahash driver.  New drivers can
elect to take virtual addresses directly.  Of course existing shash
algorithms are able to take virtual addresses without any copying.

The next patch resurrects the old SHA2 AVX2 muiltibuffer code as
a proof of concept that this API works.  The result shows that with
a full complement of 8 requests, this API is able to achieve parity
with the more modern but single-threaded SHA-NI code.  This passes
the multibuffer fuzz tests.

Finally introduce a sync hash interface that is similar to the sync
skcipher interface.  This will replace the shash interface for users.
Use it in fsverity and enable multibuffer hashing.

Eric Biggers (1):
  fsverity: improve performance by using multibuffer hashing

Herbert Xu (10):
  crypto: ahash - Only save callback and data in ahash_save_req
  crypto: x86/ghash - Use proper helpers to clone request
  crypto: hash - Add request chaining API
  crypto: tcrypt - Restore multibuffer ahash tests
  crypto: ahash - Add virtual address support
  crypto: ahash - Set default reqsize from ahash_alg
  crypto: testmgr - Add multibuffer hash testing
  crypto: x86/sha2 - Restore multibuffer AVX2 support
  crypto: hash - Add sync hash interface
  fsverity: Use sync hash instead of shash

 arch/x86/crypto/Makefile                   |   2 +-
 arch/x86/crypto/ghash-clmulni-intel_glue.c |  23 +-
 arch/x86/crypto/sha256_mb_mgr_datastruct.S | 304 +++++++++++
 arch/x86/crypto/sha256_ssse3_glue.c        | 540 ++++++++++++++++--
 arch/x86/crypto/sha256_x8_avx2.S           | 598 ++++++++++++++++++++
 crypto/ahash.c                             | 605 ++++++++++++++++++---
 crypto/algapi.c                            |   2 +-
 crypto/tcrypt.c                            | 231 ++++++++
 crypto/testmgr.c                           | 132 ++++-
 fs/verity/fsverity_private.h               |   4 +-
 fs/verity/hash_algs.c                      |  41 +-
 fs/verity/verify.c                         | 179 +++++-
 include/crypto/algapi.h                    |  11 +
 include/crypto/hash.h                      | 172 +++++-
 include/crypto/internal/hash.h             |  17 +-
 include/linux/crypto.h                     |  24 +
 16 files changed, 2659 insertions(+), 226 deletions(-)
 create mode 100644 arch/x86/crypto/sha256_mb_mgr_datastruct.S
 create mode 100644 arch/x86/crypto/sha256_x8_avx2.S

Comments

Eric Biggers Feb. 18, 2025, 5:48 p.m. UTC | #1
On Tue, Feb 18, 2025 at 06:10:36PM +0800, Herbert Xu wrote:
> On Sun, Feb 16, 2025 at 11:51:29AM -0800, Eric Biggers wrote:
> >
> > But of course, there is no need to go there in the first place.  Cryptographic
> > APIs should be simple and not include unnecessary edge cases.  It seems you
> > still have a misconception that your more complex API would make my work useful
> > for IPsec, but again that is still incorrect, as I've explained many times.  The
> > latest bogus claims that you've been making, like that GHASH is not
> > parallelizable, don't exactly inspire confidence either.
> 
> Sure, everyone hates complexity.  But you're not removing it.

I'm avoiding adding it in the first place.

> You're simply pushing the complexity into the algorithm implementation
> and more importantly, the user.  With your interface the user has to
> jump through unnecessary hoops to get multiple requests going, which
> is probably why you limited it to just 2.
> 
> If anything we should be pushing the complexity into the API code
> itself and away from the algorithm implementation.  Why? Because
> it's shared and therefore the test coverage works much better.
> 
> Look over the years at how many buggy edge cases such as block
> left-overs we have had in arch crypto code.  Now if those edge
> cases were moved into shared API code it would be much better.
> Sure it could still be buggy, but it would affect everyone
> equally and that means it's much easier to catch.

You cannot ignore complexity in the API, as that is the worst kind.

In addition, your (slower) solution has a large amount of complexity in the
per-algorithm glue code, making it still more lines of code *per algorithm* than
my (faster) solution, which you're ignoring.

Also, users still have to queue up multiple requests anyway.  There are no
"unnecessary hoops" with my patches -- just a faster, simpler, easier to use and
less error-prone API.

>     Memory allocations can always fail, but they *rarely* do.  Resolve
>     the OOM case by using a stack request as a fallback.

Rarely executed fallbacks that are only executed in extremely rare OOM
situations that won't be covered by xfstests?  No thank you.  Why would you even
think that would be reasonable?

Anyway, I am getting tired of responding to all your weird arguments that don't
bring anything new to the table.  Please continue to treat your patches as
nacked and don't treat silence as agreement.  I am just tired of this.

- Eric