Message ID | 20220930024320.29922-1-liulongfang@huawei.com |
---|---|
State | New |
Headers | show |
Series | crypto/hisilicon: Add null judgment to the callback interface | expand |
On Fri, Sep 30, 2022 at 10:43:20AM +0800, Longfang Liu wrote: > The algorithm acceleration function interface provided by the > current crypto subsystem is in asynchronous mode, but some users > will call it in synchronous mode, thus not providing a callback > interface for the "base.complete" of the request. Please give more details. Who is calling the callback functions in synchronous mode? Thanks,
On 2022/9/30 10:49, Herbert Xu wrote: > On Fri, Sep 30, 2022 at 10:43:20AM +0800, Longfang Liu wrote: >> The algorithm acceleration function interface provided by the >> current crypto subsystem is in asynchronous mode, but some users >> will call it in synchronous mode, thus not providing a callback >> interface for the "base.complete" of the request. > > Please give more details. Who is calling the callback functions > in synchronous mode? > > Thanks, > Hi, Herbert. Do you consider merging this patch? Thansk, Longfang.
On Fri, Sep 30, 2022 at 11:48:02AM +0800, liulongfang wrote: > > Even if the task is sent in synchronous mode, when using the hardware > driver, the hardware still informs the driver software through an > interrupt after completing the task, and the workqueue in the driver > software will call this callback function. > > And I found that the device drivers of other manufacturers under the > crypto subsystem are also in this asynchronous mode, and this problem > is also encountered when using the synchronous mode. This still makes no sense to me. Who is making an async request with no callback? Cheers,
On 2022/10/28 11:57, Herbert Xu wrote: > On Fri, Sep 30, 2022 at 11:48:02AM +0800, liulongfang wrote: >> >> Even if the task is sent in synchronous mode, when using the hardware >> driver, the hardware still informs the driver software through an >> interrupt after completing the task, and the workqueue in the driver >> software will call this callback function. >> >> And I found that the device drivers of other manufacturers under the >> crypto subsystem are also in this asynchronous mode, and this problem >> is also encountered when using the synchronous mode. > > This still makes no sense to me. > > Who is making an async request with no callback? > The context of the problem may not have been clearly stated in the previous description. This is a problem found in one of our real user scenarios: In this scenario of using an accelerator to perform encryption services, it was originally intended to use the CPU to perform encryption services in synchronous mode (without loading the hardware device driver, and without registering the asynchronous callback function), but due to a deployment error, the Hisi hardware device driver was loaded into the system, this wrong operation causes the encryption service to call the device driver of the hisi hardware, but the hardware device driver executes the asynchronous mode, so the callback interface is called after the service is completed. This leads to this system calltrace problem. The purpose of this patch is to ensure that the device does not appear calltrace in this abnormal situation. > Cheers, > Thanks, Longfang.
On Sat, Oct 29, 2022 at 09:25:18AM +0800, liulongfang wrote: > > The context of the problem may not have been clearly stated in the previous > description. > > This is a problem found in one of our real user scenarios: > In this scenario of using an accelerator to perform encryption services, > it was originally intended to use the CPU to perform encryption services > in synchronous mode (without loading the hardware device driver, and without > registering the asynchronous callback function), but due to a deployment > error, the Hisi hardware device driver was loaded into the system, > this wrong operation causes the encryption service to call the device > driver of the hisi hardware, but the hardware device driver executes the > asynchronous mode, so the callback interface is called after the service > is completed. > This leads to this system calltrace problem. > > The purpose of this patch is to ensure that the device does not appear > calltrace in this abnormal situation. I'm still having trouble understanding this. Please give an exact call-trace that triggers the callback with a NULL callback pointer. Thanks,
On 2022/11/4 17:08, Herbert Xu wrote: > On Sat, Oct 29, 2022 at 09:25:18AM +0800, liulongfang wrote: >> >> The context of the problem may not have been clearly stated in the previous >> description. >> >> This is a problem found in one of our real user scenarios: >> In this scenario of using an accelerator to perform encryption services, >> it was originally intended to use the CPU to perform encryption services >> in synchronous mode (without loading the hardware device driver, and without >> registering the asynchronous callback function), but due to a deployment >> error, the Hisi hardware device driver was loaded into the system, >> this wrong operation causes the encryption service to call the device >> driver of the hisi hardware, but the hardware device driver executes the >> asynchronous mode, so the callback interface is called after the service >> is completed. >> This leads to this system calltrace problem. >> >> The purpose of this patch is to ensure that the device does not appear >> calltrace in this abnormal situation. > > I'm still having trouble understanding this. Please give an > exact call-trace that triggers the callback with a NULL callback > pointer. > What do you need is the log of this call trace? Kernel log: Workqueue: 0000:76:00.0 qm_work_process [hisi_qm] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : sec_skcipher_callback+0x108/0x140 [hisi_sec2] sp : ffff80002cda3cb0 x29: ffff80002cda3cb0 x28: ffff0020a9a56080 x27: ffff2040076a4800 x26: 0000000000000000 x25: dead000000000100 x24: dead000000000122 x23: ffff0020881eb4d0 x22: ffff0020960c6f00 x21: ffff0020881eb4b0 x20: ffff0020881eb478 x19: ffff2040076a4880 x18: 000000000000001c x17: 00000000a3f246e1 x16: ffffb38bc5e73d40 x15: 000000006c63877d x14: 0000000000000000 x13: 0000000000000000 x12: ffff2040063b1dd0 x11: ffff2040063b1da8 x10: ffff2040063b1da8 x9 : ffffb38b8da71f78 x8 : ffff2040063b1dd0 x7 : 0000000000000000 x6 : ffff2040063b1fd0 x5 : 0000000000000228 x4 : 0000000000000000 x3 : ffff00209ba33000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff2040076a4820 Call trace: 0x0 sec_req_cb+0xc8/0x254 [hisi_sec2] qm_work_process+0x1d8/0x330 [hisi_qm] process_one_work+0x1e0/0x450 worker_thread+0x158/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20 Code: bad PC value ---[ end trace 0000000000000000 ]--- > Thanks, > Thanks, Longfang.
On Mon, Nov 07, 2022 at 09:22:22PM +0800, liulongfang wrote:
.
> What do you need is the log of this call trace?
I mean the functions in the call trace starting from the one that
sets the callback to NULL.
Cheers
On 2022/11/8 17:59, Herbert Xu wrote: > On Mon, Nov 07, 2022 at 09:22:22PM +0800, liulongfang wrote: > . >> What do you need is the log of this call trace? > > I mean the functions in the call trace starting from the one that > sets the callback to NULL. > The trigger method is to not call the function skcipher_request_set_callback() when using the skcipher interface for encryption and decryption services. > Cheers > Thanks, Longfang.
On Wed, Nov 09, 2022 at 02:21:11PM +0800, liulongfang wrote: > > The trigger method is to not call the function skcipher_request_set_callback() > when using the skcipher interface for encryption and decryption services. Yes but which function exactly? Please give the exact call path leading to this crash. Cheers,
On 2022/11/9 17:18, Herbert Xu wrote: > On Wed, Nov 09, 2022 at 02:21:11PM +0800, liulongfang wrote: >> >> The trigger method is to not call the function skcipher_request_set_callback() >> when using the skcipher interface for encryption and decryption services. > > Yes but which function exactly? Please give the exact call path > leading to this crash. > This problem occurs in the application code of the encryption usage scenario (unfortunately, these codes are not open to the public and cannot be given to you), but its code logic is similar to test_skcipher_vec_cfg() in kernel\crypto\testmgr.c, and it is also in accordance with this call logic execution: crypto_alloc_skcipher()--->skcipher_request_alloc()--->crypto_skcipher_setkey()---> sg_init_table()--->skcipher_request_set_crypt()---> crypto_skcipher_encrypt()/crypto_skcipher_decrypt()---> skcipher_request_free()--->crypto_free_skcipher() Just don't add "wait" and skcipher_request_set_callback(), use it as a synchronous mode. Thanks Longfang. > Cheers, >
On Thu, Nov 10, 2022 at 10:03:53AM +0800, liulongfang wrote: . > This problem occurs in the application code of the encryption usage scenario > (unfortunately, these codes are not open to the public and cannot be given to you), Are you saying this requires out-of-tree kernel code to trigger? Then you should fix that out-of-tree code. Thanks,
On 2022/11/10 11:20, Herbert Xu wrote: > On Thu, Nov 10, 2022 at 10:03:53AM +0800, liulongfang wrote: > . >> This problem occurs in the application code of the encryption usage scenario >> (unfortunately, these codes are not open to the public and cannot be given to you), > > Are you saying this requires out-of-tree kernel code to trigger? > Yes, this problem is triggered by application layer code, but it happens on kernel driver code. > Then you should fix that out-of-tree code. > When using crypto's skcipher series interfaces for encryption and decryption services, User can use synchronous mode(by adjusting some skcipher interfaces, here is to remove skcipher_request_set_callback()) or asynchronous mode, but when using synchronous mode and the current asynchronous mode is loaded it will cause a calltrace. The current problem is that the interface of skcipher does not restrict users to call functions in this way for encryption services. If the current driver doesn't handle this, there is a possibility that some users deliberately create this kind of problem to cause the kernel to crash. > Thanks, >
On Thu, Nov 10, 2022 at 12:11:15PM +0800, liulongfang wrote: > > When using crypto's skcipher series interfaces for encryption and decryption > services, User can use synchronous mode(by adjusting some skcipher interfaces, > here is to remove skcipher_request_set_callback()) or asynchronous mode, > but when using synchronous mode and the current asynchronous mode is loaded > it will cause a calltrace. > > The current problem is that the interface of skcipher does not restrict users > to call functions in this way for encryption services. > > If the current driver doesn't handle this, there is a possibility that some users > deliberately create this kind of problem to cause the kernel to crash. It sounds like your code is misusing the skcipher API. By default skcipher is always async. You must always set a callback. The only way to legally use skcipher without setting a callback is by allocating it with crypto_alloc_sync_skcipher. In which case unless your driver incorrectly declares itself as sync instead of async, then it will never be used by such a user. Cheers,
On 2022/11/10 16:53, Herbert Xu wrote: > On Thu, Nov 10, 2022 at 12:11:15PM +0800, liulongfang wrote: >> >> When using crypto's skcipher series interfaces for encryption and decryption >> services, User can use synchronous mode(by adjusting some skcipher interfaces, >> here is to remove skcipher_request_set_callback()) or asynchronous mode, >> but when using synchronous mode and the current asynchronous mode is loaded >> it will cause a calltrace. >> >> The current problem is that the interface of skcipher does not restrict users >> to call functions in this way for encryption services. >> >> If the current driver doesn't handle this, there is a possibility that some users >> deliberately create this kind of problem to cause the kernel to crash. > > It sounds like your code is misusing the skcipher API. By default > skcipher is always async. You must always set a callback. > > The only way to legally use skcipher without setting a callback > is by allocating it with crypto_alloc_sync_skcipher. In which case OK! I found in Documentation/crypto/architecture.rst the description that async mode must provide a callback function. However, what is confusing is that this document does not describe the synchronization mode so clearly. > unless your driver incorrectly declares itself as sync instead of > async, then it will never be used by such a user. > > Cheers, > Thanks, Longfang.
diff --git a/drivers/crypto/hisilicon/hpre/hpre_crypto.c b/drivers/crypto/hisilicon/hpre/hpre_crypto.c index 3ba6f15deafc..5b09f4a72020 100644 --- a/drivers/crypto/hisilicon/hpre/hpre_crypto.c +++ b/drivers/crypto/hisilicon/hpre/hpre_crypto.c @@ -430,7 +430,8 @@ static void hpre_dh_cb(struct hpre_ctx *ctx, void *resp) atomic64_inc(&dfx[HPRE_OVER_THRHLD_CNT].value); hpre_hw_data_clr_all(ctx, req, areq->dst, areq->src); - kpp_request_complete(areq, ret); + if (areq->base.complete) + kpp_request_complete(areq, ret); atomic64_inc(&dfx[HPRE_RECV_CNT].value); } @@ -451,7 +452,8 @@ static void hpre_rsa_cb(struct hpre_ctx *ctx, void *resp) areq = req->areq.rsa; areq->dst_len = ctx->key_sz; hpre_hw_data_clr_all(ctx, req, areq->dst, areq->src); - akcipher_request_complete(areq, ret); + if (areq->base.complete) + akcipher_request_complete(areq, ret); atomic64_inc(&dfx[HPRE_RECV_CNT].value); } @@ -1460,7 +1462,8 @@ static void hpre_ecdh_cb(struct hpre_ctx *ctx, void *resp) memmove(p + curve_sz, p + areq->dst_len - curve_sz, curve_sz); hpre_ecdh_hw_data_clr_all(ctx, req, areq->dst, areq->src); - kpp_request_complete(areq, ret); + if (areq->base.complete) + kpp_request_complete(areq, ret); atomic64_inc(&dfx[HPRE_RECV_CNT].value); } @@ -1766,7 +1769,8 @@ static void hpre_curve25519_cb(struct hpre_ctx *ctx, void *resp) hpre_key_to_big_end(sg_virt(areq->dst), CURVE25519_KEY_SIZE); hpre_curve25519_hw_data_clr_all(ctx, req, areq->dst, areq->src); - kpp_request_complete(areq, ret); + if (areq->base.complete) + kpp_request_complete(areq, ret); atomic64_inc(&dfx[HPRE_RECV_CNT].value); } diff --git a/drivers/crypto/hisilicon/sec2/sec_crypto.c b/drivers/crypto/hisilicon/sec2/sec_crypto.c index 77c9f13cf69a..b9d74d3ac12c 100644 --- a/drivers/crypto/hisilicon/sec2/sec_crypto.c +++ b/drivers/crypto/hisilicon/sec2/sec_crypto.c @@ -1416,12 +1416,14 @@ static void sec_skcipher_callback(struct sec_ctx *ctx, struct sec_req *req, break; backlog_sk_req = backlog_req->c_req.sk_req; - backlog_sk_req->base.complete(&backlog_sk_req->base, + if (backlog_sk_req->base.complete) + backlog_sk_req->base.complete(&backlog_sk_req->base, -EINPROGRESS); atomic64_inc(&ctx->sec->debug.dfx.recv_busy_cnt); } - sk_req->base.complete(&sk_req->base, err); + if (sk_req->base.complete) + sk_req->base.complete(&sk_req->base, err); } static void set_aead_auth_iv(struct sec_ctx *ctx, struct sec_req *req) @@ -1694,12 +1696,14 @@ static void sec_aead_callback(struct sec_ctx *c, struct sec_req *req, int err) break; backlog_aead_req = backlog_req->aead_req.aead_req; - backlog_aead_req->base.complete(&backlog_aead_req->base, + if (backlog_aead_req->base.complete) + backlog_aead_req->base.complete(&backlog_aead_req->base, -EINPROGRESS); atomic64_inc(&c->sec->debug.dfx.recv_busy_cnt); } - a_req->base.complete(&a_req->base, err); + if (a_req->base.complete) + a_req->base.complete(&a_req->base, err); } static void sec_request_uninit(struct sec_ctx *ctx, struct sec_req *req)
The algorithm acceleration function interface provided by the current crypto subsystem is in asynchronous mode, but some users will call it in synchronous mode, thus not providing a callback interface for the "base.complete" of the request. In this usage scenario, there is no problem in using software to complete encryption and decryption operations. But if the hardware driver is loaded, a kernel calltrace error will be generated because the "base.complete" callback interface is empty. Kernel log: pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : sec_skcipher_callback+0x108/0x140 [hisi_sec2] sp : ffff80002cda3cb0 x29: ffff80002cda3cb0 x28: ffff0020a9a56080 x27: ffff2040076a4800 x26: 0000000000000000 x25: dead000000000100 x24: dead000000000122 x23: ffff0020881eb4d0 x22: ffff0020960c6f00 x21: ffff0020881eb4b0 x20: ffff0020881eb478 x19: ffff2040076a4880 x18: 000000000000001c x17: 00000000a3f246e1 x16: ffffb38bc5e73d40 x15: 000000006c63877d x14: 0000000000000000 x13: 0000000000000000 x12: ffff2040063b1dd0 x11: ffff2040063b1da8 x10: ffff2040063b1da8 x9 : ffffb38b8da71f78 x8 : ffff2040063b1dd0 x7 : 0000000000000000 x6 : ffff2040063b1fd0 x5 : 0000000000000228 x4 : 0000000000000000 x3 : ffff00209ba33000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff2040076a4820 Call trace: 0x0 sec_req_cb+0xc8/0x254 [hisi_sec2] qm_work_process+0x1d8/0x330 [hisi_qm] process_one_work+0x1e0/0x450 worker_thread+0x158/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20 Code: bad PC value ---[ end trace 0000000000000000 ]--- In order to prevent the occurrence of kernel errors in this scenario, it is necessary to add a judgment on whether the "base.complete" interface of the hardware device driver is empty. Signed-off-by: Longfang Liu <liulongfang@huawei.com> --- drivers/crypto/hisilicon/hpre/hpre_crypto.c | 12 ++++++++---- drivers/crypto/hisilicon/sec2/sec_crypto.c | 12 ++++++++---- 2 files changed, 16 insertions(+), 8 deletions(-)