diff mbox series

scsi: ufs: Start the RTC update work later

Message ID 20241031212632.2799127-1-bvanassche@acm.org
State New
Headers show
Series scsi: ufs: Start the RTC update work later | expand

Commit Message

Bart Van Assche Oct. 31, 2024, 9:26 p.m. UTC
The RTC update work involves runtime resuming the UFS controller. Hence,
only start the RTC update work after runtime power management in the UFS
driver has been fully initialized. This patch fixes the following kernel
crash:

Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Workqueue: events ufshcd_rtc_work
Call trace:
 _raw_spin_lock_irqsave+0x34/0x8c (P)
 pm_runtime_get_if_active+0x24/0x9c (L)
 pm_runtime_get_if_active+0x24/0x9c
 ufshcd_rtc_work+0x138/0x1b4
 process_one_work+0x148/0x288
 worker_thread+0x2cc/0x3d4
 kthread+0x110/0x114
 ret_from_fork+0x10/0x20

Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
Cc: Bean Huo <beanhuo@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Nov. 1, 2024, 4:31 p.m. UTC | #1
On 11/1/24 12:53 AM, Manivannan Sadhasivam wrote:
> On Thu, Oct 31, 2024 at 02:26:24PM -0700, Bart Van Assche wrote:
>> The RTC update work involves runtime resuming the UFS controller. Hence,
>> only start the RTC update work after runtime power management in the UFS
>> driver has been fully initialized. This patch fixes the following kernel
>> crash:
>>
>> Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
>> Workqueue: events ufshcd_rtc_work
>> Call trace:
>>   _raw_spin_lock_irqsave+0x34/0x8c (P)
>>   pm_runtime_get_if_active+0x24/0x9c (L)
>>   pm_runtime_get_if_active+0x24/0x9c
>>   ufshcd_rtc_work+0x138/0x1b4
>>   process_one_work+0x148/0x288
>>   worker_thread+0x2cc/0x3d4
>>   kthread+0x110/0x114
>>   ret_from_fork+0x10/0x20
>>
>> Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
>> Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
>> Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
>> Cc: Bean Huo <beanhuo@micron.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> 
> Bart, Thanks for the fix! While looking into this patch, I also found the
> weirdness of the ufshcd_rpm_*() helpers in ufshcd-priv.h. Their naming doesn't
> seem to indicate whether those helpers are for WLUN or for HBA. Also, I don't
> see the benefit of these helpers since they just wrap generic pm_runtime*
> calls. Then there are other open coding instances in the ufshcd.c. Like
> 
> pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)
> pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev)
> 
> Moreover, we do check for the presence of hba->ufs_device_wlun before calling
> ufshcd_rpm_get_sync() in ufshcd_remove(). This could be one other way to fix
> this null ptr dereference even though I wouldn't recommend doing so as calling
> rtc_work early is pointless.
> 
> So I think we should remove these helpers to avoid having these discrepancies.
> WDYT?

Hi Manivannan,

In the context of the Linux kernel, in general, one-line helper
functions are considered questionable. In this case I prefer to keep the
helper functions since these encapsulate an implementation detail,
namely that the WLUN sdev_gendev member is used to control runtime power
management of the UFS host controller.

Checking whether or not the hba->ufs_device_wlun pointer is NULL from
the ufshcd_rtc_work() function would be racy since that pointer is 
modified from another thread. So I prefer the patch at the start of this
thread instead of adding a hba->ufs_device_wlun pointer check.

Thanks,

Bart.
Bean Huo Nov. 1, 2024, 9:03 p.m. UTC | #2
On Thu, 2024-10-31 at 14:26 -0700, Bart Van Assche wrote:
> The RTC update work involves runtime resuming the UFS controller.
> Hence,
> only start the RTC update work after runtime power management in the
> UFS
> driver has been fully initialized. This patch fixes the following
> kernel
> crash:
> 
> Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
> Workqueue: events ufshcd_rtc_work
> Call trace:
>  _raw_spin_lock_irqsave+0x34/0x8c (P)
>  pm_runtime_get_if_active+0x24/0x9c (L)
>  pm_runtime_get_if_active+0x24/0x9c
>  ufshcd_rtc_work+0x138/0x1b4
>  process_one_work+0x148/0x288
>  worker_thread+0x2cc/0x3d4
>  kthread+0x110/0x114
>  ret_from_fork+0x10/0x20
> 
> Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
> Closes:
> https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
> Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>


Reviewed-by: Bean Huo <beanhuo@micron.com>
Manivannan Sadhasivam Nov. 2, 2024, 10:21 a.m. UTC | #3
On Fri, Nov 01, 2024 at 09:31:27AM -0700, Bart Van Assche wrote:
> On 11/1/24 12:53 AM, Manivannan Sadhasivam wrote:
> > On Thu, Oct 31, 2024 at 02:26:24PM -0700, Bart Van Assche wrote:
> > > The RTC update work involves runtime resuming the UFS controller. Hence,
> > > only start the RTC update work after runtime power management in the UFS
> > > driver has been fully initialized. This patch fixes the following kernel
> > > crash:
> > > 
> > > Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
> > > Workqueue: events ufshcd_rtc_work
> > > Call trace:
> > >   _raw_spin_lock_irqsave+0x34/0x8c (P)
> > >   pm_runtime_get_if_active+0x24/0x9c (L)
> > >   pm_runtime_get_if_active+0x24/0x9c
> > >   ufshcd_rtc_work+0x138/0x1b4
> > >   process_one_work+0x148/0x288
> > >   worker_thread+0x2cc/0x3d4
> > >   kthread+0x110/0x114
> > >   ret_from_fork+0x10/0x20
> > > 
> > > Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
> > > Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
> > > Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
> > > Cc: Bean Huo <beanhuo@micron.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> > 
> > Bart, Thanks for the fix! While looking into this patch, I also found the
> > weirdness of the ufshcd_rpm_*() helpers in ufshcd-priv.h. Their naming doesn't
> > seem to indicate whether those helpers are for WLUN or for HBA. Also, I don't
> > see the benefit of these helpers since they just wrap generic pm_runtime*
> > calls. Then there are other open coding instances in the ufshcd.c. Like
> > 
> > pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)
> > pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev)
> > 
> > Moreover, we do check for the presence of hba->ufs_device_wlun before calling
> > ufshcd_rpm_get_sync() in ufshcd_remove(). This could be one other way to fix
> > this null ptr dereference even though I wouldn't recommend doing so as calling
> > rtc_work early is pointless.
> > 
> > So I think we should remove these helpers to avoid having these discrepancies.
> > WDYT?
> 
> Hi Manivannan,
> 
> In the context of the Linux kernel, in general, one-line helper
> functions are considered questionable. In this case I prefer to keep the
> helper functions since these encapsulate an implementation detail,
> namely that the WLUN sdev_gendev member is used to control runtime power
> management of the UFS host controller.
> 

IMO this encapsulation is causing confusion since we have a separate PM handling
for the UFS controller itself.

> Checking whether or not the hba->ufs_device_wlun pointer is NULL from
> the ufshcd_rtc_work() function would be racy since that pointer is modified
> from another thread. So I prefer the patch at the start of this
> thread instead of adding a hba->ufs_device_wlun pointer check.
> 

Absolutely! I just pointed out it as a bad example which one could think of due
to these helpers.

- Mani
Neil Armstrong Nov. 4, 2024, 1:43 p.m. UTC | #4
On 31/10/2024 22:26, Bart Van Assche wrote:
> The RTC update work involves runtime resuming the UFS controller. Hence,
> only start the RTC update work after runtime power management in the UFS
> driver has been fully initialized. This patch fixes the following kernel
> crash:
> 
> Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
> Workqueue: events ufshcd_rtc_work
> Call trace:
>   _raw_spin_lock_irqsave+0x34/0x8c (P)
>   pm_runtime_get_if_active+0x24/0x9c (L)
>   pm_runtime_get_if_active+0x24/0x9c
>   ufshcd_rtc_work+0x138/0x1b4
>   process_one_work+0x148/0x288
>   worker_thread+0x2cc/0x3d4
>   kthread+0x110/0x114
>   ret_from_fork+0x10/0x20
> 
> Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
> Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
> Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 585557eaa9a2..ed82ff329314 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -8633,6 +8633,14 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
>   		ufshcd_init_clk_scaling_sysfs(hba);
>   	}
>   
> +	/*
> +	 * The RTC update code accesses the hba->ufs_device_wlun->sdev_gendev
> +	 * pointer and hence must only be started after the WLUN pointer has
> +	 * been initialized by ufshcd_scsi_add_wlus().
> +	 */
> +	schedule_delayed_work(&hba->ufs_rtc_update_work,
> +			      msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS));
> +
>   	ufs_bsg_probe(hba);
>   	scsi_scan_host(hba->host);
>   
> @@ -8727,8 +8735,6 @@ static int ufshcd_post_device_init(struct ufs_hba *hba)
>   	ufshcd_force_reset_auto_bkops(hba);
>   
>   	ufshcd_set_timestamp_attr(hba);
> -	schedule_delayed_work(&hba->ufs_rtc_update_work,
> -			      msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS));
>   
>   	if (!hba->max_pwr_info.is_valid)
>   		return 0;

Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK

Thanks!
Neil
Martin K. Petersen Nov. 5, 2024, 2:31 a.m. UTC | #5
Bart,

> The RTC update work involves runtime resuming the UFS controller. Hence,
> only start the RTC update work after runtime power management in the UFS
> driver has been fully initialized. This patch fixes the following kernel
> crash:

Applied to 6.12/scsi-fixes, thanks!
James Bottomley Nov. 5, 2024, 3:48 a.m. UTC | #6
On Mon, 2024-11-04 at 21:31 -0500, Martin K. Petersen wrote:
> 
> Bart,
> 
> > The RTC update work involves runtime resuming the UFS controller.
> > Hence, only start the RTC update work after runtime power
> > management in the UFS driver has been fully initialized. This patch
> > fixes the following kernel
> > crash:
> 
> Applied to 6.12/scsi-fixes, thanks!

Hey, this one causes a nasty merge conflict due to 3192d28ec660 scsi:
ufs: core: Introduce ufshcd_post_device_init() ... I fixed it up in my
for-next branch but conflicting with someone else's patches can be
considered unfortunate; conflicting with your own looks like
carelessness ...

Since ufshcd_post_device_init is now called twice can you just check
that the simple fixup of removing the schedule_delayed_work() from it
is actually correct.

Thanks,

James
Bart Van Assche Nov. 5, 2024, 6:54 p.m. UTC | #7
On 11/4/24 7:48 PM, James Bottomley wrote:
> Since ufshcd_post_device_init is now called twice can you just check
> that the simple fixup of removing the schedule_delayed_work() from it
> is actually correct.

Hi James,

The conflict resolution in 6abc45870e04 ("Merge branch 'fixes' into for-
next") looks fine to me. If this would make things easier for the SCSI
maintainers I can rebase and retest my patch on top of the SCSI for-next
branch. This will cause it to be integrated later in the upstream kernel
(next merge window instead of current rc cycle).

Thank you,

Bart.
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 585557eaa9a2..ed82ff329314 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8633,6 +8633,14 @@  static int ufshcd_add_lus(struct ufs_hba *hba)
 		ufshcd_init_clk_scaling_sysfs(hba);
 	}
 
+	/*
+	 * The RTC update code accesses the hba->ufs_device_wlun->sdev_gendev
+	 * pointer and hence must only be started after the WLUN pointer has
+	 * been initialized by ufshcd_scsi_add_wlus().
+	 */
+	schedule_delayed_work(&hba->ufs_rtc_update_work,
+			      msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS));
+
 	ufs_bsg_probe(hba);
 	scsi_scan_host(hba->host);
 
@@ -8727,8 +8735,6 @@  static int ufshcd_post_device_init(struct ufs_hba *hba)
 	ufshcd_force_reset_auto_bkops(hba);
 
 	ufshcd_set_timestamp_attr(hba);
-	schedule_delayed_work(&hba->ufs_rtc_update_work,
-			      msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS));
 
 	if (!hba->max_pwr_info.is_valid)
 		return 0;