Message ID | 20201216190225.2769012-1-jaegeuk@kernel.org |
---|---|
State | New |
Headers | show |
Series | scsi: ufs: fix livelock on ufshcd_clear_ua_wlun | expand |
Hi Jaegeuk, On Wed, 2020-12-16 at 11:02 -0800, Jaegeuk Kim wrote: > From: Jaegeuk Kim <jaegeuk@google.com> > > This fixes the below livelock which is caused by calling a scsi command before > ufshcd_scsi_unblock_requests() in ufshcd_ungate_work(). > > Workqueue: ufs_clk_gating_0 ufshcd_ungate_work > Call trace: > __switch_to+0x298/0x2bc > __schedule+0x59c/0x760 > schedule+0xac/0xf0 > schedule_timeout+0x44/0x1b4 > io_schedule_timeout+0x44/0x68 > wait_for_common_io+0x7c/0x100 > wait_for_completion_io+0x14/0x20 > blk_execute_rq+0x94/0xd0 > __scsi_execute+0x100/0x1c0 > ufshcd_clear_ua_wlun+0x124/0x1c8 > ufshcd_host_reset_and_restore+0x1d0/0x2cc > ufshcd_link_recovery+0xac/0x134 > ufshcd_uic_hibern8_exit+0x1e8/0x1f0 > ufshcd_ungate_work+0xac/0x130 According to the latest mainstream kernel, once ufshcd_uic_hibern8_exit() encounters error, instead, error handler work will be scheduled without blocking ufshcd_uic_hibern8_exit(). In addition, ufshcd_scsi_unblock_requests() would be invoked before leaving ufshcd_uic_hibern8_exit(). So this stack is no longer existed. Thanks, Stanley Chu > process_one_work+0x270/0x47c > worker_thread+0x27c/0x4d8 > kthread+0x13c/0x320 > ret_from_fork+0x10/0x18 > > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets") > Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> > --- > drivers/scsi/ufs/ufshcd.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index e221add25a7e..b0998db1b781 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -1603,6 +1603,7 @@ static void ufshcd_ungate_work(struct work_struct *work) > } > unblock_reqs: > ufshcd_scsi_unblock_requests(hba); > + ufshcd_clear_ua_wluns(hba); > } > > /** > @@ -6913,7 +6914,7 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) > > /* Establish the link again and restore the device */ > err = ufshcd_probe_hba(hba, false); > - if (!err) > + if (!err && !hba->clk_gating.is_suspended) > ufshcd_clear_ua_wluns(hba); > out: > if (err) > @@ -8745,6 +8746,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > ufshcd_resume_clkscaling(hba); > hba->clk_gating.is_suspended = false; > hba->dev_info.b_rpm_dev_flush_capable = false; > + ufshcd_clear_ua_wluns(hba); > ufshcd_release(hba); > out: > if (hba->dev_info.b_rpm_dev_flush_capable) { > @@ -8855,6 +8857,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) > cancel_delayed_work(&hba->rpm_dev_flush_recheck_work); > } > > + ufshcd_clear_ua_wluns(hba); > + > /* Schedule clock gating in case of no access to UFS device yet */ > ufshcd_release(hba); >
On 12/17, Stanley Chu wrote: > Hi Jaegeuk, > > On Wed, 2020-12-16 at 11:02 -0800, Jaegeuk Kim wrote: > > From: Jaegeuk Kim <jaegeuk@google.com> > > > > This fixes the below livelock which is caused by calling a scsi command before > > ufshcd_scsi_unblock_requests() in ufshcd_ungate_work(). > > > > Workqueue: ufs_clk_gating_0 ufshcd_ungate_work > > Call trace: > > __switch_to+0x298/0x2bc > > __schedule+0x59c/0x760 > > schedule+0xac/0xf0 > > schedule_timeout+0x44/0x1b4 > > io_schedule_timeout+0x44/0x68 > > wait_for_common_io+0x7c/0x100 > > wait_for_completion_io+0x14/0x20 > > blk_execute_rq+0x94/0xd0 > > __scsi_execute+0x100/0x1c0 > > ufshcd_clear_ua_wlun+0x124/0x1c8 > > ufshcd_host_reset_and_restore+0x1d0/0x2cc > > ufshcd_link_recovery+0xac/0x134 > > ufshcd_uic_hibern8_exit+0x1e8/0x1f0 > > ufshcd_ungate_work+0xac/0x130 > > According to the latest mainstream kernel, once > ufshcd_uic_hibern8_exit() encounters error, instead, error handler work > will be scheduled without blocking ufshcd_uic_hibern8_exit(). In > addition, ufshcd_scsi_unblock_requests() would be invoked before leaving > ufshcd_uic_hibern8_exit(). So this stack is no longer existed. Oh, thank you for pointing this out. It seems the below patch made it. 4db7a2360597 ("scsi: ufs: Fix concurrency of error handler and other error recovery paths") Next time, I need to check upstream more carefully. :P > > Thanks, > Stanley Chu > > > process_one_work+0x270/0x47c > > worker_thread+0x27c/0x4d8 > > kthread+0x13c/0x320 > > ret_from_fork+0x10/0x18 > > > > Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets") > > Signed-off-by: Jaegeuk Kim <jaegeuk@google.com> > > --- > > drivers/scsi/ufs/ufshcd.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > > index e221add25a7e..b0998db1b781 100644 > > --- a/drivers/scsi/ufs/ufshcd.c > > +++ b/drivers/scsi/ufs/ufshcd.c > > @@ -1603,6 +1603,7 @@ static void ufshcd_ungate_work(struct work_struct *work) > > } > > unblock_reqs: > > ufshcd_scsi_unblock_requests(hba); > > + ufshcd_clear_ua_wluns(hba); > > } > > > > /** > > @@ -6913,7 +6914,7 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) > > > > /* Establish the link again and restore the device */ > > err = ufshcd_probe_hba(hba, false); > > - if (!err) > > + if (!err && !hba->clk_gating.is_suspended) > > ufshcd_clear_ua_wluns(hba); > > out: > > if (err) > > @@ -8745,6 +8746,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > > ufshcd_resume_clkscaling(hba); > > hba->clk_gating.is_suspended = false; > > hba->dev_info.b_rpm_dev_flush_capable = false; > > + ufshcd_clear_ua_wluns(hba); > > ufshcd_release(hba); > > out: > > if (hba->dev_info.b_rpm_dev_flush_capable) { > > @@ -8855,6 +8857,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) > > cancel_delayed_work(&hba->rpm_dev_flush_recheck_work); > > } > > > > + ufshcd_clear_ua_wluns(hba); > > + > > /* Schedule clock gating in case of no access to UFS device yet */ > > ufshcd_release(hba); > > >
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index e221add25a7e..b0998db1b781 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1603,6 +1603,7 @@ static void ufshcd_ungate_work(struct work_struct *work) } unblock_reqs: ufshcd_scsi_unblock_requests(hba); + ufshcd_clear_ua_wluns(hba); } /** @@ -6913,7 +6914,7 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) /* Establish the link again and restore the device */ err = ufshcd_probe_hba(hba, false); - if (!err) + if (!err && !hba->clk_gating.is_suspended) ufshcd_clear_ua_wluns(hba); out: if (err) @@ -8745,6 +8746,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) ufshcd_resume_clkscaling(hba); hba->clk_gating.is_suspended = false; hba->dev_info.b_rpm_dev_flush_capable = false; + ufshcd_clear_ua_wluns(hba); ufshcd_release(hba); out: if (hba->dev_info.b_rpm_dev_flush_capable) { @@ -8855,6 +8857,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) cancel_delayed_work(&hba->rpm_dev_flush_recheck_work); } + ufshcd_clear_ua_wluns(hba); + /* Schedule clock gating in case of no access to UFS device yet */ ufshcd_release(hba);