[v2,13/20] scsi: ufs: Fix a deadlock in the error handler

Message ID	20211119195743.2817-14-bvanassche@acm.org
State	New
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Bart Van Assche <bvanassche@acm.org> To: "Martin K . Petersen" <martin.petersen@oracle.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org>, Adrian Hunter <adrian.hunter@intel.com>, linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>, "James E.J. Bottomley" <jejb@linux.ibm.com>, Bean Huo <beanhuo@micron.com>, Can Guo <cang@codeaurora.org>, Avri Altman <avri.altman@wdc.com>, Stanley Chu <stanley.chu@mediatek.com>, Asutosh Das <asutoshd@codeaurora.org> Subject: [PATCH v2 13/20] scsi: ufs: Fix a deadlock in the error handler Date: Fri, 19 Nov 2021 11:57:36 -0800 Message-Id: <20211119195743.2817-14-bvanassche@acm.org> In-Reply-To: <20211119195743.2817-1-bvanassche@acm.org> References: <20211119195743.2817-1-bvanassche@acm.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	UFS patches for kernel v5.17 \| expand [v2,00/20] UFS patches for kernel v5.17 [v2,02/20] scsi: core: Unexport scsi_track_queue_full() [v2,05/20] scsi: core: Add support for internal commands [v2,06/20] scsi: core: Add support for reserved tags [v2,08/20] scsi: ufs: Remove is_rpmb_wlun() [v2,10/20] scsi: ufs: Remove dead code [v2,12/20] scsi: ufs: Rework ufshcd_change_queue_depth() [v2,13/20] scsi: ufs: Fix a deadlock in the error handler [v2,17/20] scsi: ufs: Stop using the clock scaling lock in the error handler [v2,18/20] scsi: ufs: Optimize the command queueing code [v2,20/20] scsi: ufs: Fix race conditions related to driver data

Message ID

20211119195743.2817-14-bvanassche@acm.org

State

New

Headers

From: Bart Van Assche <bvanassche@acm.org>
To: "Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
        Adrian Hunter <adrian.hunter@intel.com>,
        linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>,
        "James E.J. Bottomley" <jejb@linux.ibm.com>,
        Bean Huo <beanhuo@micron.com>, Can Guo <cang@codeaurora.org>,
        Avri Altman <avri.altman@wdc.com>,
        Stanley Chu <stanley.chu@mediatek.com>,
        Asutosh Das <asutoshd@codeaurora.org>
Subject: [PATCH v2 13/20] scsi: ufs: Fix a deadlock in the error handler
Date: Fri, 19 Nov 2021 11:57:36 -0800
Message-Id: <20211119195743.2817-14-bvanassche@acm.org>
In-Reply-To: <20211119195743.2817-1-bvanassche@acm.org>
References: <20211119195743.2817-1-bvanassche@acm.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

UFS patches for kernel v5.17 | expand

Commit Message

Bart Van Assche Nov. 19, 2021, 7:57 p.m. UTC

The following deadlock has been observed on a test setup:
* All tags allocated.
* The SCSI error handler calls ufshcd_eh_host_reset_handler()
* ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler()
* ufshcd_err_handler() locks up as follows:

Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
Call trace:
 __switch_to+0x298/0x5d8
 __schedule+0x6cc/0xa94
 schedule+0x12c/0x298
 blk_mq_get_tag+0x210/0x480
 __blk_mq_alloc_request+0x1c8/0x284
 blk_get_request+0x74/0x134
 ufshcd_exec_dev_cmd+0x68/0x640
 ufshcd_verify_dev_init+0x68/0x35c
 ufshcd_probe_hba+0x12c/0x1cb8
 ufshcd_host_reset_and_restore+0x88/0x254
 ufshcd_reset_and_restore+0xd0/0x354
 ufshcd_err_handler+0x408/0xc58
 process_one_work+0x24c/0x66c
 worker_thread+0x3e8/0xa4c
 kthread+0x150/0x1b4
 ret_from_fork+0x10/0x30

Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
request.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comments

Bart Van Assche Nov. 30, 2021, 5:52 p.m. UTC | #1

On 11/30/21 12:54 AM, Bean Huo wrote:
> The concern of this patch is that it reduces the UFS data transmission
> queue depth. The cost is a bit high. We are looking for alternative
> methods: for example, to fix this problem from the SCSI layer;
> Add a new dedicated hardware device management queue on the UFS device
> side.

Are any performance numbers available? My performance measurements did not
report a significant difference between the 31 and 32 queue depths.

Thanks,

Bart.

Bean Huo Dec. 1, 2021, 1:44 p.m. UTC | #2

On Tue, 2021-11-30 at 11:32 -0800, Bart Van Assche wrote:
> On 11/30/21 12:54 AM, Bean Huo wrote:
> > We are looking for alternative
> > methods: for example, to fix this problem from the SCSI layer;
> > Add a new dedicated hardware device management queue on the UFS
> > device
> > side.
> 
> Hi Bean,
> 
> I don't think that there are any alternatives. If all host controller
> tags
> are in use, it is not possible to implement a mechanism in software
> to
> create an additional tag. Additionally, stealing a tag is not
> possible since
> no new request can be submitted to a UFS controller if all tags are
> in use.
> 
> Thanks,
> 
> Bart.

Bart, 

How about adding a hardware-specific UFS device management queue to the
UFS device? Compared with NVMe and eMMC CMDQ, they both have dedicated
tags for device management commands. NVMe is queue 0, eMMC CMDQ is CMDQ
slot 31.

I'm thinking about the benefits of using this device manage queue. If
the benefits are significant, we can submit the Jedec proposal for the
UFS equipment management queue.

Kind regards,
Bean

Bart Van Assche Dec. 1, 2021, 6:31 p.m. UTC | #3

On 12/1/21 5:44 AM, Bean Huo wrote:
> How about adding a hardware-specific UFS device management queue to the
> UFS device? Compared with NVMe and eMMC CMDQ, they both have dedicated
> tags for device management commands. NVMe is queue 0, eMMC CMDQ is CMDQ
> slot 31.
> 
> I'm thinking about the benefits of using this device manage queue. If
> the benefits are significant, we can submit the Jedec proposal for the
> UFS equipment management queue.

Hi Bean,

Do you want to modify UFSHCI 4.0 or UFSHCI 3.0? As you may know UFSHCI 4.0
will support multiple queues. The proposal is called multi-circular queue
(MCQ). Since UFSHCI 4.0 has not yet been ratified the draft is only available
to JEDEC members.

For UFSHCI 3.0, how about increasing the number of tags instead of adding an
additional queue? Increasing the number of tags probably will be easier to
implement in existing UFS host controllers than adding a new queue.

Thanks,

Bart.

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a241ef6bbc6f..03f4772fc2e2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -128,8 +128,9 @@  EXPORT_SYMBOL_GPL(ufshcd_dump_regs);
 enum {
 	UFSHCD_MAX_CHANNEL	= 0,
 	UFSHCD_MAX_ID		= 1,
-	UFSHCD_CMD_PER_LUN	= 32,
-	UFSHCD_CAN_QUEUE	= 32,
+	UFSHCD_NUM_RESERVED	= 1,
+	UFSHCD_CMD_PER_LUN	= 32 - UFSHCD_NUM_RESERVED,
+	UFSHCD_CAN_QUEUE	= 32 - UFSHCD_NUM_RESERVED,
 };
 
 static const char *const ufshcd_state_name[] = {
@@ -2941,12 +2942,7 @@  static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 
 	down_read(&hba->clk_scaling_lock);
 
-	/*
-	 * Get free slot, sleep if slots are unavailable.
-	 * Even though we use wait_event() which sleeps indefinitely,
-	 * the maximum wait time is bounded by SCSI request timeout.
-	 */
-	scmd = scsi_get_internal_cmd(q, DMA_TO_DEVICE, 0);
+	scmd = scsi_get_internal_cmd(q, DMA_TO_DEVICE, BLK_MQ_REQ_RESERVED);
 	if (IS_ERR(scmd)) {
 		err = PTR_ERR(scmd);
 		goto out_unlock;
@@ -8171,6 +8167,7 @@  static struct scsi_host_template ufshcd_driver_template = {
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
 	.can_queue		= UFSHCD_CAN_QUEUE,
+	.reserved_tags		= UFSHCD_NUM_RESERVED,
 	.max_segment_size	= PRDT_DATA_BYTE_COUNT_MAX,
 	.max_host_blocked	= 1,
 	.track_queue_depth	= 1,
@@ -9531,8 +9528,8 @@  int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Configure LRB */
 	ufshcd_host_memory_configure(hba);
 
-	host->can_queue = hba->nutrs;
-	host->cmd_per_lun = hba->nutrs;
+	host->can_queue = hba->nutrs - UFSHCD_NUM_RESERVED;
+	host->cmd_per_lun = hba->nutrs - UFSHCD_NUM_RESERVED;
 	host->max_id = UFSHCD_MAX_ID;
 	host->max_lun = UFS_MAX_LUNS;
 	host->max_channel = UFSHCD_MAX_CHANNEL;

[v2,13/20] scsi: ufs: Fix a deadlock in the error handler

Commit Message

Comments

Patch