Message ID | 0171515-7267-624-5a22-238af829698f@redhat.com |
---|---|
State | New |
Headers | show |
Series | [RFC] crypto: de-prioritize QAT | expand |
On Mon, Oct 16, 2023 at 01:26:47PM +0200, Mikulas Patocka wrote: > Hi > > I created this kernel module that stress-tests the crypto API: > https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c > > It shows that QAT underperforms significantly compared to AES-NI (for > large requests it is 10 times slower; for small requests it is even worse) > - see the second table in this document: > https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt > > QAT has higher priority than AES-NI, so the kernel prefers it (it is not > used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", > but it is preferred over AES-NI in other cases). Probably you can get better performance by modifying your configuration and test.
On Fri, 20 Oct 2023, Giovanni Cabiddu wrote: > On Mon, Oct 16, 2023 at 01:26:47PM +0200, Mikulas Patocka wrote: > > Hi > > > > I created this kernel module that stress-tests the crypto API: > > https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c > > > > It shows that QAT underperforms significantly compared to AES-NI (for > > large requests it is 10 times slower; for small requests it is even worse) > > - see the second table in this document: > > https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt > > > > QAT has higher priority than AES-NI, so the kernel prefers it (it is not > > used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", > > but it is preferred over AES-NI in other cases). > Probably you can get better performance by modifying your configuration > and test. > >From your test application I can infer that you are using a single QAT > device. The driver allocates a ring pair per TFM and it loads balances > allocations between devices. > In addition, jobs are submitted synchronously. This way the cost of > offload is not amortised between requests. > > Regards, > > -- > Giovanni Yes, I thought about using more TFMs, but I don't have access to the dual Xeon machine anymore (I would have to request access and wait until people who are using it release it). We can run more tests, but it would be best to batch all the tests in a small timeframe, to minimize blocking the machine for other people. Regarding synchronous submission - the current test submits 112 requests concurrently. Do you think that it is too small and we should submit more? What would be the appropriate number of requests to submit concurrently? I did synchronous submission because it was easier to write it than asynchronous submission, but if you think that 112 requests is too small and we need to submit more, I can try to do it. Mikulas
Index: linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs.c =================================================================== --- linux-2.6.orig/drivers/crypto/intel/qat/qat_common/qat_algs.c +++ linux-2.6/drivers/crypto/intel/qat/qat_common/qat_algs.c @@ -1277,7 +1277,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha1),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha1", - .cra_priority = 4001, + .cra_priority = 201, .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1294,7 +1294,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha256),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha256", - .cra_priority = 4001, + .cra_priority = 201, .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1311,7 +1311,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha512),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha512", - .cra_priority = 4001, + .cra_priority = 201, .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1329,7 +1329,7 @@ static struct aead_alg qat_aeads[] = { { static struct skcipher_alg qat_skciphers[] = { { .base.cra_name = "cbc(aes)", .base.cra_driver_name = "qat_aes_cbc", - .base.cra_priority = 4001, + .base.cra_priority = 201, .base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY, .base.cra_blocksize = AES_BLOCK_SIZE, .base.cra_ctxsize = sizeof(struct qat_alg_skcipher_ctx), @@ -1347,7 +1347,7 @@ static struct skcipher_alg qat_skciphers }, { .base.cra_name = "ctr(aes)", .base.cra_driver_name = "qat_aes_ctr", - .base.cra_priority = 4001, + .base.cra_priority = 201, .base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY, .base.cra_blocksize = 1, .base.cra_ctxsize = sizeof(struct qat_alg_skcipher_ctx), @@ -1365,7 +1365,7 @@ static struct skcipher_alg qat_skciphers }, { .base.cra_name = "xts(aes)", .base.cra_driver_name = "qat_aes_xts", - .base.cra_priority = 4001, + .base.cra_priority = 201, .base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK | CRYPTO_ALG_ALLOCATES_MEMORY, .base.cra_blocksize = AES_BLOCK_SIZE,
Hi I created this kernel module that stress-tests the crypto API: https://people.redhat.com/~mpatocka/benchmarks/qat/tools/module-multithreaded.c It shows that QAT underperforms significantly compared to AES-NI (for large requests it is 10 times slower; for small requests it is even worse) - see the second table in this document: https://people.redhat.com/~mpatocka/benchmarks/qat/kernel-module.txt QAT has higher priority than AES-NI, so the kernel prefers it (it is not used for dm-crypt because it has the flag "CRYPTO_ALG_ALLOCATES_MEMORY", but it is preferred over AES-NI in other cases). AES-NI has priority 300 or 400, unaccelerated AES has 100, so I suggest to lower the priority of QAT from 4001 to 201. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/crypto/intel/qat/qat_common/qat_algs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)