[0/4] Fix a rare crash in the UFS driver

Message ID	20240412000246.1167600-1-bvanassche@acm.org
Headers	show Received: from 008.lax.mailroute.net (008.lax.mailroute.net [199.89.1.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E567C161 for <linux-scsi@vger.kernel.org>; Fri, 12 Apr 2024 00:02:54 +0000 (UTC) sender: bvanassche@acm.org) by 008.lax.mailroute.net (Postfix) with ESMTPSA id 4VFxbJ01Zlz6Cnk8m; Fri, 12 Apr 2024 00:02:51 +0000 (UTC) From: Bart Van Assche <bvanassche@acm.org> To: "Martin K . Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org> Subject: [PATCH 0/4] Fix a rare crash in the UFS driver Date: Thu, 11 Apr 2024 17:02:20 -0700 Message-ID: <20240412000246.1167600-1-bvanassche@acm.org> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Series	Fix a rare crash in the UFS driver \| expand [0/4] Fix a rare crash in the UFS driver [1/4] scsi: ufs: Declare ufshcd_mcq_poll_cqe_lock() once [2/4] scsi: ufs: Make ufshcd_poll() complain about unsupported arguments [3/4] scsi: ufs: Make the polling code report which command has been completed [4/4] scsi: ufs: Check for completion from the timeout handler

Message ID

20240412000246.1167600-1-bvanassche@acm.org

Headers

From: Bart Van Assche <bvanassche@acm.org>
To: "Martin K . Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org,
	Bart Van Assche <bvanassche@acm.org>
Subject: [PATCH 0/4] Fix a rare crash in the UFS driver
Date: Thu, 11 Apr 2024 17:02:20 -0700
Message-ID: <20240412000246.1167600-1-bvanassche@acm.org>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Series

Fix a rare crash in the UFS driver | expand

Message

Bart Van Assche April 12, 2024, 12:02 a.m. UTC

Hi Martin,

Sporadic crashes have been observed with the UFS kernel driver if a timeout
occurs. This patch series fixes these crashes. Please consider this patch
series for the next merge window.

Thanks,

Bart.

Bart Van Assche (4):
  scsi: ufs: Declare ufshcd_mcq_poll_cqe_lock() once
  scsi: ufs: Make ufshcd_poll() complain about unsupported arguments
  scsi: ufs: Make the polling code report which command has been
    completed
  scsi: ufs: Check for completion from the timeout handler

 drivers/ufs/core/ufs-mcq.c     | 23 ++++++++-----
 drivers/ufs/core/ufshcd-priv.h |  6 ++--
 drivers/ufs/core/ufshcd.c      | 59 +++++++++++++++++++++++++++-------
 drivers/ufs/host/ufs-qcom.c    |  2 +-
 include/ufs/ufshcd.h           |  3 +-
 5 files changed, 67 insertions(+), 26 deletions(-)

Comments

Peter Wang (王信友) April 12, 2024, 9:34 a.m. UTC | #1

On Thu, 2024-04-11 at 17:02 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  If ufshcd_abort() returns SUCCESS for an already completed command
> then
> that command is completed twice. This results in a crash. Prevent
> this by
> checking whether a command has completed without completion interrupt
> from
> the timeout handler. This CL fixes the following kernel crash:
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000000
> Call trace:
>  dma_direct_map_sg+0x70/0x274
>  scsi_dma_map+0x84/0x124
>  ufshcd_queuecommand+0x3fc/0x880
>  scsi_queue_rq+0x7d0/0x111c
>  blk_mq_dispatch_rq_list+0x440/0xebc
>  blk_mq_do_dispatch_sched+0x5a4/0x6b8
>  __blk_mq_sched_dispatch_requests+0x150/0x220
>  __blk_mq_run_hw_queue+0xf0/0x218
>  __blk_mq_delay_run_hw_queue+0x8c/0x18c
>  blk_mq_run_hw_queue+0x1a4/0x360
>  blk_mq_sched_insert_requests+0x130/0x334
>  blk_mq_flush_plug_list+0x138/0x234
>  blk_flush_plug_list+0x118/0x164
>  blk_finish_plug()
>  read_pages+0x38c/0x408
>  page_cache_ra_unbounded+0x230/0x2f8
>  do_sync_mmap_readahead+0x1a4/0x208
>  filemap_fault+0x27c/0x8f4
>  f2fs_filemap_fault+0x28/0xfc
>  __do_fault+0xc4/0x208
>  handle_pte_fault+0x290/0xe04
>  do_handle_mm_fault+0x52c/0x858
>  do_page_fault+0x5dc/0x798
>  do_translation_fault+0x40/0x54
>  do_mem_abort+0x60/0x134
>  el0_da+0x40/0xb8
>  el0t_64_sync_handler+0xc4/0xe4
>  el0t_64_sync+0x1b4/0x1b8
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 08abdd763c51..98b14623317f 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -9022,6 +9022,25 @@ static void ufshcd_async_scan(void *data,
> async_cookie_t cookie)
>  static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd
> *scmd)
>  {
>  struct ufs_hba *hba = shost_priv(scmd->device->host);
> +struct scsi_cmnd *cmd2 = scmd;
> +
> +WARN_ON_ONCE(!scmd);
> +
> +if (is_mcq_enabled(hba)) {
> +struct ufs_hw_queue *hwq =
> +ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(scmd));
> +
> +ufshcd_mcq_poll_cqe_lock(hba, hwq, &cmd2);
> +} else {
> +__ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT,
> +      &cmd2);
> 

may need check __ufshcd_poll return value?




> +}
> +if (cmd2 == NULL) {
> +sdev_printk(KERN_INFO, scmd->device,
> +    "%s: cmd with tag %#x has already been completed\n",
> +    __func__, blk_mq_unique_tag(scsi_cmd_to_rq(scmd)));
> +return SCSI_EH_DONE;
> +}
>  
>  if (!hba->system_suspending) {
>  /* Activate the error handler in the SCSI core. */

Bart Van Assche April 12, 2024, 4:50 p.m. UTC | #2

On 4/12/24 02:34, Peter Wang (王信友) wrote:
>> +if (is_mcq_enabled(hba)) {
>> +struct ufs_hw_queue *hwq =
>> +ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(scmd));
>> +
>> +ufshcd_mcq_poll_cqe_lock(hba, hwq, &cmd2);
>> +} else {
>> +__ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT,
>> +      &cmd2);
> 
> may need check __ufshcd_poll return value?
I don't think we should do that. __ufshcd_poll() returns a value > 0 if
it has completed any SCSI command and may complete commands other than
@scmd. In this function we need to know whether or not @scmd has been
completed.

Thanks,

Bart.