diff mbox series

[v2,07/22] drm/msm: Do rpm get sooner in the submit path

Message ID 20201012020958.229288-8-robdclark@gmail.com
State Superseded
Headers show
Series drm/msm: de-struct_mutex-ification | expand

Commit Message

Rob Clark Oct. 12, 2020, 2:09 a.m. UTC
From: Rob Clark <robdclark@chromium.org>

Unfortunately, due to an dev_pm_opp locking interaction with
mm->mmap_sem, we need to do pm get before aquiring obj locks,
otherwise we can have anger lockdep with the chain:

  opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

For an explicit fencing userspace, the impact should be minimal
as we do all the fence waits before this point.  It could result
in some needless resumes in error cases, etc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Daniel Vetter Oct. 12, 2020, 2:35 p.m. UTC | #1
On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Unfortunately, due to an dev_pm_opp locking interaction with
> mm->mmap_sem, we need to do pm get before aquiring obj locks,
> otherwise we can have anger lockdep with the chain:

tbh this sounds like a bug in that subsystem, since it means we cannot use
said subsystem in mmap handlers either.

So if you have some remapping unit or need to wake up your gpu to blt the
buffer into system memory first, we're toast. That doesn't sound right. So
maybe Cc: pm folks and figure out how to fix this long term properly? Imo
not a good reason to hold up this patch set, since unwrangling mmap_sem
tends to be work ...
-Daniel

> 
>   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> 
> For an explicit fencing userspace, the impact should be minimal
> as we do all the fence waits before this point.  It could result
> in some needless resumes in error cases, etc.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 002130d826aa..a9422d043bfe 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>  
>  	ret = submit_lookup_objects(submit, args, file);
>  	if (ret)
> -		goto out;
> +		goto out_pre_pm;
>  
>  	ret = submit_lookup_cmds(submit, args, file);
>  	if (ret)
> -		goto out;
> +		goto out_pre_pm;
> +
> +	/*
> +	 * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> +	 * in the resume path, we need to to rpm get before we lock objs.
> +	 * Which unfortunately might involve powering up the GPU sooner than
> +	 * is necessary.  But at least in the explicit fencing case, we will
> +	 * have already done all the fence waiting.
> +	 */
> +	pm_runtime_get_sync(&gpu->pdev->dev);
>  
>  	/* copy_*_user while holding a ww ticket upsets lockdep */
>  	ww_acquire_init(&submit->ticket, &reservation_ww_class);
> @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>  
>  
>  out:
> +	pm_runtime_put(&gpu->pdev->dev);
> +out_pre_pm:
>  	submit_cleanup(submit);
>  	if (has_ww_ticket)
>  		ww_acquire_fini(&submit->ticket);
> -- 
> 2.26.2
>
Rob Clark Oct. 12, 2020, 3:43 p.m. UTC | #2
On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Unfortunately, due to an dev_pm_opp locking interaction with
> > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > otherwise we can have anger lockdep with the chain:
>
> tbh this sounds like a bug in that subsystem, since it means we cannot use
> said subsystem in mmap handlers either.
>
> So if you have some remapping unit or need to wake up your gpu to blt the
> buffer into system memory first, we're toast. That doesn't sound right. So
> maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> not a good reason to hold up this patch set, since unwrangling mmap_sem
> tends to be work ...

+ a couple of PM folks

Looks like it has been this way for quite some time, so I guess the
overlap between things using dev_pm_opp and mmap is low..

fwiw, example splat so folks can see the locking interaction I am
talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
from the debugfs calls while opp_table_lock is held?

[   15.627855] ======================================================
[   15.634202] WARNING: possible circular locking dependency detected
[   15.640550] 5.4.70 #41 Not tainted
[   15.644050] ------------------------------------------------------
[   15.650397] chrome/1805 is trying to acquire lock:
[   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[   15.663092]
[   15.663092] but task is already holding lock:
[   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
at: submit_lock_objects+0x70/0x1ec
[   15.678369]
[   15.678369] which lock already depends on the new lock.
[   15.678369]
[   15.686764]
[   15.686764] the existing dependency chain (in reverse order) is:
[   15.694438]
[   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
[   15.701146]        __mutex_lock_common+0xec/0xc0c
[   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
[   15.711432]        msm_gem_fault+0x2c/0x124
[   15.715731]        __do_fault+0x40/0x16c
[   15.719766]        handle_mm_fault+0x7cc/0xd98
[   15.724337]        do_page_fault+0x230/0x3b4
[   15.728721]        do_translation_fault+0x5c/0x78
[   15.733558]        do_mem_abort+0x4c/0xb4
[   15.737680]        el0_da+0x1c/0x20
[   15.741266]
[   15.741266] -> #2 (&mm->mmap_sem){++++}:
[   15.746809]        __might_fault+0x70/0x98
[   15.751022]        compat_filldir+0xf8/0x48c
[   15.755412]        dcache_readdir+0x70/0x1dc
[   15.759808]        iterate_dir+0xd4/0x180
[   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
[   15.769476]        el0_svc_common+0xa8/0x178
[   15.773861]        el0_svc_compat_handler+0x2c/0x40
[   15.778868]        el0_svc_compat+0x8/0x10
[   15.783075]
[   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
[   15.789788]        down_write+0x54/0x16c
[   15.793826]        debugfs_remove_recursive+0x50/0x158
[   15.799108]        opp_debug_unregister+0x34/0x114
[   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
[   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
[   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
[   15.819149]        dsi_destroy+0x40/0x58
[   15.823184]        dsi_bind+0x90/0x170
[   15.827041]        component_bind_all+0xf0/0x208
[   15.831787]        msm_drm_init+0x188/0x60c
[   15.836084]        msm_drm_bind+0x24/0x30
[   15.840205]        try_to_bring_up_master+0x15c/0x1a4
[   15.845396]        __component_add+0x98/0x14c
[   15.849878]        component_add+0x28/0x34
[   15.854086]        dp_display_probe+0x324/0x370
[   15.858744]        platform_drv_probe+0x90/0xb0
[   15.863400]        really_probe+0x134/0x2ec
[   15.867699]        driver_probe_device+0x64/0xfc
[   15.872443]        __device_attach_driver+0x8c/0xa4
[   15.877459]        bus_for_each_drv+0x90/0xd8
[   15.881939]        __device_attach+0xc0/0x148
[   15.886420]        device_initial_probe+0x20/0x2c
[   15.891254]        bus_probe_device+0x34/0x94
[   15.895726]        deferred_probe_work_func+0x78/0xb4
[   15.900914]        process_one_work+0x30c/0x5d0
[   15.905573]        worker_thread+0x240/0x3f0
[   15.909959]        kthread+0x144/0x154
[   15.913809]        ret_from_fork+0x10/0x18
[   15.918016]
[   15.918016] -> #0 (opp_table_lock){+.+.}:
[   15.923660]        __lock_acquire+0xee4/0x2450
[   15.928230]        lock_acquire+0x1cc/0x210
[   15.932527]        __mutex_lock_common+0xec/0xc0c
[   15.937359]        mutex_lock_nested+0x40/0x50
[   15.941928]        _find_opp_table+0x34/0x74
[   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[   15.951680]        a6xx_gmu_resume+0xc8/0xecc
[   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[   15.956161]        a6xx_pm_resume+0x148/0x200
[   15.956166]        adreno_resume+0x28/0x34
[   15.956171]        pm_generic_runtime_resume+0x34/0x48
[   15.956174]        __rpm_callback+0x70/0x10c
[   15.956176]        rpm_callback+0x34/0x8c
[   15.956179]        rpm_resume+0x414/0x550
[   15.956182]        __pm_runtime_resume+0x7c/0xa0
[   15.956185]        msm_gpu_submit+0x60/0x1c0
[   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
[   16.003961]        drm_ioctl_kernel+0x9c/0x118
[   16.008532]        drm_ioctl+0x27c/0x408
[   16.012562]        drm_compat_ioctl+0xcc/0xdc
[   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
[   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
[   16.027412]        el0_svc_common+0xa8/0x178
[   16.031800]        el0_svc_compat_handler+0x2c/0x40
[   16.036810]        el0_svc_compat+0x8/0x10
[   16.041021]
[   16.041021] other info that might help us debug this:
[   16.041021]
[   16.049235] Chain exists of:
[   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
[   16.049235]
[   16.061014]  Possible unsafe locking scenario:
[   16.061014]
[   16.067091]        CPU0                    CPU1
[   16.071750]        ----                    ----
[   16.076399]   lock(reservation_ww_class_mutex);
[   16.081059]                                lock(&mm->mmap_sem);
[   16.087134]                                lock(reservation_ww_class_mutex);
[   16.094369]   lock(opp_table_lock);
[   16.097961]
[   16.097961]  *** DEADLOCK ***
[   16.097961]
[   16.104038] 3 locks held by chrome/1805:
[   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[   16.117264]  #1: ffffff80dd712c70
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[   16.127357]  #2: ffffff80ff3911a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[   16.137089]
[   16.137089] stack backtrace:
[   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
[   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   16.153632] Call trace:
[   16.156154]  dump_backtrace+0x0/0x158
[   16.159924]  show_stack+0x20/0x2c
[   16.163340]  dump_stack+0xc8/0x160
[   16.166840]  print_circular_bug+0x2c4/0x2c8
[   16.171144]  check_noncircular+0x1a8/0x1b0
[   16.175351]  __lock_acquire+0xee4/0x2450
[   16.179382]  lock_acquire+0x1cc/0x210
[   16.183146]  __mutex_lock_common+0xec/0xc0c
[   16.187450]  mutex_lock_nested+0x40/0x50
[   16.191481]  _find_opp_table+0x34/0x74
[   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[   16.200178]  a6xx_gmu_resume+0xc8/0xecc
[   16.204120]  a6xx_pm_resume+0x148/0x200
[   16.208064]  adreno_resume+0x28/0x34
[   16.211743]  pm_generic_runtime_resume+0x34/0x48
[   16.216488]  __rpm_callback+0x70/0x10c
[   16.220342]  rpm_callback+0x34/0x8c
[   16.223933]  rpm_resume+0x414/0x550
[   16.227524]  __pm_runtime_resume+0x7c/0xa0
[   16.231731]  msm_gpu_submit+0x60/0x1c0
[   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
[   16.240066]  drm_ioctl_kernel+0x9c/0x118
[   16.244097]  drm_ioctl+0x27c/0x408
[   16.247602]  drm_compat_ioctl+0xcc/0xdc
[   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
[   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
[   16.260861]  el0_svc_common+0xa8/0x178
[   16.264716]  el0_svc_compat_handler+0x2c/0x40
[   16.269196]  el0_svc_compat+0x8/0x10

BR,
-R

> -Daniel
>
> >
> >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> >
> > For an explicit fencing userspace, the impact should be minimal
> > as we do all the fence waits before this point.  It could result
> > in some needless resumes in error cases, etc.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 002130d826aa..a9422d043bfe 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >
> >       ret = submit_lookup_objects(submit, args, file);
> >       if (ret)
> > -             goto out;
> > +             goto out_pre_pm;
> >
> >       ret = submit_lookup_cmds(submit, args, file);
> >       if (ret)
> > -             goto out;
> > +             goto out_pre_pm;
> > +
> > +     /*
> > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > +      * in the resume path, we need to to rpm get before we lock objs.
> > +      * Which unfortunately might involve powering up the GPU sooner than
> > +      * is necessary.  But at least in the explicit fencing case, we will
> > +      * have already done all the fence waiting.
> > +      */
> > +     pm_runtime_get_sync(&gpu->pdev->dev);
> >
> >       /* copy_*_user while holding a ww ticket upsets lockdep */
> >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >
> >
> >  out:
> > +     pm_runtime_put(&gpu->pdev->dev);
> > +out_pre_pm:
> >       submit_cleanup(submit);
> >       if (has_ww_ticket)
> >               ww_acquire_fini(&submit->ticket);
> > --
> > 2.26.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Viresh Kumar Oct. 20, 2020, 9:07 a.m. UTC | #3
On 12-10-20, 08:43, Rob Clark wrote:
> On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Unfortunately, due to an dev_pm_opp locking interaction with
> > > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > > otherwise we can have anger lockdep with the chain:
> >
> > tbh this sounds like a bug in that subsystem, since it means we cannot use
> > said subsystem in mmap handlers either.
> >
> > So if you have some remapping unit or need to wake up your gpu to blt the
> > buffer into system memory first, we're toast. That doesn't sound right. So
> > maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> > not a good reason to hold up this patch set, since unwrangling mmap_sem
> > tends to be work ...
> 
> + a couple of PM folks
> 
> Looks like it has been this way for quite some time, so I guess the
> overlap between things using dev_pm_opp and mmap is low..
> 
> fwiw, example splat so folks can see the locking interaction I am
> talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
> from the debugfs calls while opp_table_lock is held?

I am not very sure about why this circular locking dependency is
happening here and how exactly can we fix it. The OPP core works under
the opp_table_lock, from within which it creates/remove the debugfs
stuff as well.

> [   15.627855] ======================================================
> [   15.634202] WARNING: possible circular locking dependency detected
> [   15.640550] 5.4.70 #41 Not tainted
> [   15.644050] ------------------------------------------------------
> [   15.650397] chrome/1805 is trying to acquire lock:
> [   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
> _find_opp_table+0x34/0x74
> [   15.663092]
> [   15.663092] but task is already holding lock:
> [   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
> at: submit_lock_objects+0x70/0x1ec
> [   15.678369]
> [   15.678369] which lock already depends on the new lock.
> [   15.678369]
> [   15.686764]
> [   15.686764] the existing dependency chain (in reverse order) is:
> [   15.694438]
> [   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
> [   15.701146]        __mutex_lock_common+0xec/0xc0c
> [   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
> [   15.711432]        msm_gem_fault+0x2c/0x124
> [   15.715731]        __do_fault+0x40/0x16c
> [   15.719766]        handle_mm_fault+0x7cc/0xd98
> [   15.724337]        do_page_fault+0x230/0x3b4
> [   15.728721]        do_translation_fault+0x5c/0x78
> [   15.733558]        do_mem_abort+0x4c/0xb4
> [   15.737680]        el0_da+0x1c/0x20
> [   15.741266]
> [   15.741266] -> #2 (&mm->mmap_sem){++++}:
> [   15.746809]        __might_fault+0x70/0x98
> [   15.751022]        compat_filldir+0xf8/0x48c
> [   15.755412]        dcache_readdir+0x70/0x1dc
> [   15.759808]        iterate_dir+0xd4/0x180
> [   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
> [   15.769476]        el0_svc_common+0xa8/0x178
> [   15.773861]        el0_svc_compat_handler+0x2c/0x40
> [   15.778868]        el0_svc_compat+0x8/0x10
> [   15.783075]
> [   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
> [   15.789788]        down_write+0x54/0x16c
> [   15.793826]        debugfs_remove_recursive+0x50/0x158
> [   15.799108]        opp_debug_unregister+0x34/0x114
> [   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
> [   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
> [   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
> [   15.819149]        dsi_destroy+0x40/0x58
> [   15.823184]        dsi_bind+0x90/0x170
> [   15.827041]        component_bind_all+0xf0/0x208
> [   15.831787]        msm_drm_init+0x188/0x60c
> [   15.836084]        msm_drm_bind+0x24/0x30
> [   15.840205]        try_to_bring_up_master+0x15c/0x1a4
> [   15.845396]        __component_add+0x98/0x14c
> [   15.849878]        component_add+0x28/0x34
> [   15.854086]        dp_display_probe+0x324/0x370
> [   15.858744]        platform_drv_probe+0x90/0xb0
> [   15.863400]        really_probe+0x134/0x2ec
> [   15.867699]        driver_probe_device+0x64/0xfc
> [   15.872443]        __device_attach_driver+0x8c/0xa4
> [   15.877459]        bus_for_each_drv+0x90/0xd8
> [   15.881939]        __device_attach+0xc0/0x148
> [   15.886420]        device_initial_probe+0x20/0x2c
> [   15.891254]        bus_probe_device+0x34/0x94
> [   15.895726]        deferred_probe_work_func+0x78/0xb4
> [   15.900914]        process_one_work+0x30c/0x5d0
> [   15.905573]        worker_thread+0x240/0x3f0
> [   15.909959]        kthread+0x144/0x154
> [   15.913809]        ret_from_fork+0x10/0x18
> [   15.918016]
> [   15.918016] -> #0 (opp_table_lock){+.+.}:
> [   15.923660]        __lock_acquire+0xee4/0x2450
> [   15.928230]        lock_acquire+0x1cc/0x210
> [   15.932527]        __mutex_lock_common+0xec/0xc0c
> [   15.937359]        mutex_lock_nested+0x40/0x50
> [   15.941928]        _find_opp_table+0x34/0x74
> [   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
> [   15.951680]        a6xx_gmu_resume+0xc8/0xecc
> [   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
> [   15.956161]        a6xx_pm_resume+0x148/0x200
> [   15.956166]        adreno_resume+0x28/0x34
> [   15.956171]        pm_generic_runtime_resume+0x34/0x48
> [   15.956174]        __rpm_callback+0x70/0x10c
> [   15.956176]        rpm_callback+0x34/0x8c
> [   15.956179]        rpm_resume+0x414/0x550
> [   15.956182]        __pm_runtime_resume+0x7c/0xa0
> [   15.956185]        msm_gpu_submit+0x60/0x1c0
> [   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
> [   16.003961]        drm_ioctl_kernel+0x9c/0x118
> [   16.008532]        drm_ioctl+0x27c/0x408
> [   16.012562]        drm_compat_ioctl+0xcc/0xdc
> [   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
> [   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
> [   16.027412]        el0_svc_common+0xa8/0x178
> [   16.031800]        el0_svc_compat_handler+0x2c/0x40
> [   16.036810]        el0_svc_compat+0x8/0x10
> [   16.041021]
> [   16.041021] other info that might help us debug this:
> [   16.041021]
> [   16.049235] Chain exists of:
> [   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> [   16.049235]
> [   16.061014]  Possible unsafe locking scenario:
> [   16.061014]
> [   16.067091]        CPU0                    CPU1
> [   16.071750]        ----                    ----
> [   16.076399]   lock(reservation_ww_class_mutex);
> [   16.081059]                                lock(&mm->mmap_sem);
> [   16.087134]                                lock(reservation_ww_class_mutex);
> [   16.094369]   lock(opp_table_lock);
> [   16.097961]
> [   16.097961]  *** DEADLOCK ***
> [   16.097961]
> [   16.104038] 3 locks held by chrome/1805:
> [   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
> msm_ioctl_gem_submit+0x264/0xb60
> [   16.117264]  #1: ffffff80dd712c70
> (reservation_ww_class_acquire){+.+.}, at:
> msm_ioctl_gem_submit+0x8e8/0xb60
> [   16.127357]  #2: ffffff80ff3911a8
> (reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
> [   16.137089]
> [   16.137089] stack backtrace:
> [   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
> [   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
> [   16.153632] Call trace:
> [   16.156154]  dump_backtrace+0x0/0x158
> [   16.159924]  show_stack+0x20/0x2c
> [   16.163340]  dump_stack+0xc8/0x160
> [   16.166840]  print_circular_bug+0x2c4/0x2c8
> [   16.171144]  check_noncircular+0x1a8/0x1b0
> [   16.175351]  __lock_acquire+0xee4/0x2450
> [   16.179382]  lock_acquire+0x1cc/0x210
> [   16.183146]  __mutex_lock_common+0xec/0xc0c
> [   16.187450]  mutex_lock_nested+0x40/0x50
> [   16.191481]  _find_opp_table+0x34/0x74
> [   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
> [   16.200178]  a6xx_gmu_resume+0xc8/0xecc
> [   16.204120]  a6xx_pm_resume+0x148/0x200
> [   16.208064]  adreno_resume+0x28/0x34
> [   16.211743]  pm_generic_runtime_resume+0x34/0x48
> [   16.216488]  __rpm_callback+0x70/0x10c
> [   16.220342]  rpm_callback+0x34/0x8c
> [   16.223933]  rpm_resume+0x414/0x550
> [   16.227524]  __pm_runtime_resume+0x7c/0xa0
> [   16.231731]  msm_gpu_submit+0x60/0x1c0
> [   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
> [   16.240066]  drm_ioctl_kernel+0x9c/0x118
> [   16.244097]  drm_ioctl+0x27c/0x408
> [   16.247602]  drm_compat_ioctl+0xcc/0xdc
> [   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
> [   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
> [   16.260861]  el0_svc_common+0xa8/0x178
> [   16.264716]  el0_svc_compat_handler+0x2c/0x40
> [   16.269196]  el0_svc_compat+0x8/0x10
> 
> BR,
> -R
> 
> > -Daniel
> >
> > >
> > >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > >
> > > For an explicit fencing userspace, the impact should be minimal
> > > as we do all the fence waits before this point.  It could result
> > > in some needless resumes in error cases, etc.
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > index 002130d826aa..a9422d043bfe 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > >       ret = submit_lookup_objects(submit, args, file);
> > >       if (ret)
> > > -             goto out;
> > > +             goto out_pre_pm;
> > >
> > >       ret = submit_lookup_cmds(submit, args, file);
> > >       if (ret)
> > > -             goto out;
> > > +             goto out_pre_pm;
> > > +
> > > +     /*
> > > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > > +      * in the resume path, we need to to rpm get before we lock objs.
> > > +      * Which unfortunately might involve powering up the GPU sooner than
> > > +      * is necessary.  But at least in the explicit fencing case, we will
> > > +      * have already done all the fence waiting.
> > > +      */
> > > +     pm_runtime_get_sync(&gpu->pdev->dev);
> > >
> > >       /* copy_*_user while holding a ww ticket upsets lockdep */
> > >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > >
> > >  out:
> > > +     pm_runtime_put(&gpu->pdev->dev);
> > > +out_pre_pm:
> > >       submit_cleanup(submit);
> > >       if (has_ww_ticket)
> > >               ww_acquire_fini(&submit->ticket);
Daniel Vetter Oct. 20, 2020, 10:56 a.m. UTC | #4
On Tue, Oct 20, 2020 at 11:07 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 12-10-20, 08:43, Rob Clark wrote:
> > On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Unfortunately, due to an dev_pm_opp locking interaction with
> > > > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > > > otherwise we can have anger lockdep with the chain:
> > >
> > > tbh this sounds like a bug in that subsystem, since it means we cannot use
> > > said subsystem in mmap handlers either.
> > >
> > > So if you have some remapping unit or need to wake up your gpu to blt the
> > > buffer into system memory first, we're toast. That doesn't sound right. So
> > > maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> > > not a good reason to hold up this patch set, since unwrangling mmap_sem
> > > tends to be work ...
> >
> > + a couple of PM folks
> >
> > Looks like it has been this way for quite some time, so I guess the
> > overlap between things using dev_pm_opp and mmap is low..
> >
> > fwiw, example splat so folks can see the locking interaction I am
> > talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
> > from the debugfs calls while opp_table_lock is held?
>
> I am not very sure about why this circular locking dependency is
> happening here and how exactly can we fix it. The OPP core works under
> the opp_table_lock, from within which it creates/remove the debugfs
> stuff as well.

Yeah that's bad practice. Generally you shouldn't need to hold locks
in setup/teardown code, since there's no other thread which can
possible hold a reference to anything your touching anymore. Ofc
excluding quickly grabbing/dropping a lock to insert/remove objects
into lists and stuff.

The other reason is that especially with anything related to sysfs or
debugfs, the locking dependencies you're pulling in are enormous: vfs
locks pull in mm locks (due to mmap) and at that point there's pretty
much nothing left you're allowed to hold while acquiring such a lock.
For simple drivers this is no issue, but for fancy drivers (like gpu
drivers) which need to interact with core mm) this means your
subsystem is a major pain to use.

Usually the correct fix is to only hold your subsystem locks in
setup/teardown when absolutely required, and fix any data
inconsistency issues by reordering your setup/teardown code: When you
register as the last step and unregister as the first step, there's no
need for any additional locking. And hence no need to call debugfs
functions while holding your subsystem locks.

The catch phrase I use for this is "don't solve object lifetime issues
with locking". Instead use refcounting and careful ordering in
setup/teardown code.

I think Rob has found some duct-tape, so this isn't an immediate
issue, but would be really nice to get fixed.

Cheers, Daniel

> > [   15.627855] ======================================================
> > [   15.634202] WARNING: possible circular locking dependency detected
> > [   15.640550] 5.4.70 #41 Not tainted
> > [   15.644050] ------------------------------------------------------
> > [   15.650397] chrome/1805 is trying to acquire lock:
> > [   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
> > _find_opp_table+0x34/0x74
> > [   15.663092]
> > [   15.663092] but task is already holding lock:
> > [   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
> > at: submit_lock_objects+0x70/0x1ec
> > [   15.678369]
> > [   15.678369] which lock already depends on the new lock.
> > [   15.678369]
> > [   15.686764]
> > [   15.686764] the existing dependency chain (in reverse order) is:
> > [   15.694438]
> > [   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
> > [   15.701146]        __mutex_lock_common+0xec/0xc0c
> > [   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
> > [   15.711432]        msm_gem_fault+0x2c/0x124
> > [   15.715731]        __do_fault+0x40/0x16c
> > [   15.719766]        handle_mm_fault+0x7cc/0xd98
> > [   15.724337]        do_page_fault+0x230/0x3b4
> > [   15.728721]        do_translation_fault+0x5c/0x78
> > [   15.733558]        do_mem_abort+0x4c/0xb4
> > [   15.737680]        el0_da+0x1c/0x20
> > [   15.741266]
> > [   15.741266] -> #2 (&mm->mmap_sem){++++}:
> > [   15.746809]        __might_fault+0x70/0x98
> > [   15.751022]        compat_filldir+0xf8/0x48c
> > [   15.755412]        dcache_readdir+0x70/0x1dc
> > [   15.759808]        iterate_dir+0xd4/0x180
> > [   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
> > [   15.769476]        el0_svc_common+0xa8/0x178
> > [   15.773861]        el0_svc_compat_handler+0x2c/0x40
> > [   15.778868]        el0_svc_compat+0x8/0x10
> > [   15.783075]
> > [   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
> > [   15.789788]        down_write+0x54/0x16c
> > [   15.793826]        debugfs_remove_recursive+0x50/0x158
> > [   15.799108]        opp_debug_unregister+0x34/0x114
> > [   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
> > [   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
> > [   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
> > [   15.819149]        dsi_destroy+0x40/0x58
> > [   15.823184]        dsi_bind+0x90/0x170
> > [   15.827041]        component_bind_all+0xf0/0x208
> > [   15.831787]        msm_drm_init+0x188/0x60c
> > [   15.836084]        msm_drm_bind+0x24/0x30
> > [   15.840205]        try_to_bring_up_master+0x15c/0x1a4
> > [   15.845396]        __component_add+0x98/0x14c
> > [   15.849878]        component_add+0x28/0x34
> > [   15.854086]        dp_display_probe+0x324/0x370
> > [   15.858744]        platform_drv_probe+0x90/0xb0
> > [   15.863400]        really_probe+0x134/0x2ec
> > [   15.867699]        driver_probe_device+0x64/0xfc
> > [   15.872443]        __device_attach_driver+0x8c/0xa4
> > [   15.877459]        bus_for_each_drv+0x90/0xd8
> > [   15.881939]        __device_attach+0xc0/0x148
> > [   15.886420]        device_initial_probe+0x20/0x2c
> > [   15.891254]        bus_probe_device+0x34/0x94
> > [   15.895726]        deferred_probe_work_func+0x78/0xb4
> > [   15.900914]        process_one_work+0x30c/0x5d0
> > [   15.905573]        worker_thread+0x240/0x3f0
> > [   15.909959]        kthread+0x144/0x154
> > [   15.913809]        ret_from_fork+0x10/0x18
> > [   15.918016]
> > [   15.918016] -> #0 (opp_table_lock){+.+.}:
> > [   15.923660]        __lock_acquire+0xee4/0x2450
> > [   15.928230]        lock_acquire+0x1cc/0x210
> > [   15.932527]        __mutex_lock_common+0xec/0xc0c
> > [   15.937359]        mutex_lock_nested+0x40/0x50
> > [   15.941928]        _find_opp_table+0x34/0x74
> > [   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
> > [   15.951680]        a6xx_gmu_resume+0xc8/0xecc
> > [   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
> > [   15.956161]        a6xx_pm_resume+0x148/0x200
> > [   15.956166]        adreno_resume+0x28/0x34
> > [   15.956171]        pm_generic_runtime_resume+0x34/0x48
> > [   15.956174]        __rpm_callback+0x70/0x10c
> > [   15.956176]        rpm_callback+0x34/0x8c
> > [   15.956179]        rpm_resume+0x414/0x550
> > [   15.956182]        __pm_runtime_resume+0x7c/0xa0
> > [   15.956185]        msm_gpu_submit+0x60/0x1c0
> > [   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
> > [   16.003961]        drm_ioctl_kernel+0x9c/0x118
> > [   16.008532]        drm_ioctl+0x27c/0x408
> > [   16.012562]        drm_compat_ioctl+0xcc/0xdc
> > [   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
> > [   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
> > [   16.027412]        el0_svc_common+0xa8/0x178
> > [   16.031800]        el0_svc_compat_handler+0x2c/0x40
> > [   16.036810]        el0_svc_compat+0x8/0x10
> > [   16.041021]
> > [   16.041021] other info that might help us debug this:
> > [   16.041021]
> > [   16.049235] Chain exists of:
> > [   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > [   16.049235]
> > [   16.061014]  Possible unsafe locking scenario:
> > [   16.061014]
> > [   16.067091]        CPU0                    CPU1
> > [   16.071750]        ----                    ----
> > [   16.076399]   lock(reservation_ww_class_mutex);
> > [   16.081059]                                lock(&mm->mmap_sem);
> > [   16.087134]                                lock(reservation_ww_class_mutex);
> > [   16.094369]   lock(opp_table_lock);
> > [   16.097961]
> > [   16.097961]  *** DEADLOCK ***
> > [   16.097961]
> > [   16.104038] 3 locks held by chrome/1805:
> > [   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
> > msm_ioctl_gem_submit+0x264/0xb60
> > [   16.117264]  #1: ffffff80dd712c70
> > (reservation_ww_class_acquire){+.+.}, at:
> > msm_ioctl_gem_submit+0x8e8/0xb60
> > [   16.127357]  #2: ffffff80ff3911a8
> > (reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
> > [   16.137089]
> > [   16.137089] stack backtrace:
> > [   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
> > [   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
> > [   16.153632] Call trace:
> > [   16.156154]  dump_backtrace+0x0/0x158
> > [   16.159924]  show_stack+0x20/0x2c
> > [   16.163340]  dump_stack+0xc8/0x160
> > [   16.166840]  print_circular_bug+0x2c4/0x2c8
> > [   16.171144]  check_noncircular+0x1a8/0x1b0
> > [   16.175351]  __lock_acquire+0xee4/0x2450
> > [   16.179382]  lock_acquire+0x1cc/0x210
> > [   16.183146]  __mutex_lock_common+0xec/0xc0c
> > [   16.187450]  mutex_lock_nested+0x40/0x50
> > [   16.191481]  _find_opp_table+0x34/0x74
> > [   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
> > [   16.200178]  a6xx_gmu_resume+0xc8/0xecc
> > [   16.204120]  a6xx_pm_resume+0x148/0x200
> > [   16.208064]  adreno_resume+0x28/0x34
> > [   16.211743]  pm_generic_runtime_resume+0x34/0x48
> > [   16.216488]  __rpm_callback+0x70/0x10c
> > [   16.220342]  rpm_callback+0x34/0x8c
> > [   16.223933]  rpm_resume+0x414/0x550
> > [   16.227524]  __pm_runtime_resume+0x7c/0xa0
> > [   16.231731]  msm_gpu_submit+0x60/0x1c0
> > [   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
> > [   16.240066]  drm_ioctl_kernel+0x9c/0x118
> > [   16.244097]  drm_ioctl+0x27c/0x408
> > [   16.247602]  drm_compat_ioctl+0xcc/0xdc
> > [   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
> > [   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
> > [   16.260861]  el0_svc_common+0xa8/0x178
> > [   16.264716]  el0_svc_compat_handler+0x2c/0x40
> > [   16.269196]  el0_svc_compat+0x8/0x10
> >
> > BR,
> > -R
> >
> > > -Daniel
> > >
> > > >
> > > >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > > >
> > > > For an explicit fencing userspace, the impact should be minimal
> > > > as we do all the fence waits before this point.  It could result
> > > > in some needless resumes in error cases, etc.
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > index 002130d826aa..a9422d043bfe 100644
> > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >
> > > >       ret = submit_lookup_objects(submit, args, file);
> > > >       if (ret)
> > > > -             goto out;
> > > > +             goto out_pre_pm;
> > > >
> > > >       ret = submit_lookup_cmds(submit, args, file);
> > > >       if (ret)
> > > > -             goto out;
> > > > +             goto out_pre_pm;
> > > > +
> > > > +     /*
> > > > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > > > +      * in the resume path, we need to to rpm get before we lock objs.
> > > > +      * Which unfortunately might involve powering up the GPU sooner than
> > > > +      * is necessary.  But at least in the explicit fencing case, we will
> > > > +      * have already done all the fence waiting.
> > > > +      */
> > > > +     pm_runtime_get_sync(&gpu->pdev->dev);
> > > >
> > > >       /* copy_*_user while holding a ww ticket upsets lockdep */
> > > >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > > > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >
> > > >
> > > >  out:
> > > > +     pm_runtime_put(&gpu->pdev->dev);
> > > > +out_pre_pm:
> > > >       submit_cleanup(submit);
> > > >       if (has_ww_ticket)
> > > >               ww_acquire_fini(&submit->ticket);
>
> --
> viresh
Viresh Kumar Oct. 20, 2020, 11:24 a.m. UTC | #5
On 20-10-20, 12:56, Daniel Vetter wrote:
> Yeah that's bad practice. Generally you shouldn't need to hold locks
> in setup/teardown code, since there's no other thread which can
> possible hold a reference to anything your touching anymore. Ofc
> excluding quickly grabbing/dropping a lock to insert/remove objects
> into lists and stuff.
> 
> The other reason is that especially with anything related to sysfs or
> debugfs, the locking dependencies you're pulling in are enormous: vfs
> locks pull in mm locks (due to mmap) and at that point there's pretty
> much nothing left you're allowed to hold while acquiring such a lock.
> For simple drivers this is no issue, but for fancy drivers (like gpu
> drivers) which need to interact with core mm) this means your
> subsystem is a major pain to use.
> 
> Usually the correct fix is to only hold your subsystem locks in
> setup/teardown when absolutely required, and fix any data
> inconsistency issues by reordering your setup/teardown code: When you
> register as the last step and unregister as the first step, there's no
> need for any additional locking. And hence no need to call debugfs
> functions while holding your subsystem locks.
> 
> The catch phrase I use for this is "don't solve object lifetime issues
> with locking". Instead use refcounting and careful ordering in
> setup/teardown code.

This is exactly what I have done in the OPP core, the locks were taken
only when really necessary, though as we have seen now I have missed
that at a single place and that should be fixed as well. Will do that,
thanks.
Daniel Vetter Oct. 20, 2020, 11:42 a.m. UTC | #6
On Tue, Oct 20, 2020 at 1:24 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 12:56, Daniel Vetter wrote:
> > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > in setup/teardown code, since there's no other thread which can
> > possible hold a reference to anything your touching anymore. Ofc
> > excluding quickly grabbing/dropping a lock to insert/remove objects
> > into lists and stuff.
> >
> > The other reason is that especially with anything related to sysfs or
> > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > locks pull in mm locks (due to mmap) and at that point there's pretty
> > much nothing left you're allowed to hold while acquiring such a lock.
> > For simple drivers this is no issue, but for fancy drivers (like gpu
> > drivers) which need to interact with core mm) this means your
> > subsystem is a major pain to use.
> >
> > Usually the correct fix is to only hold your subsystem locks in
> > setup/teardown when absolutely required, and fix any data
> > inconsistency issues by reordering your setup/teardown code: When you
> > register as the last step and unregister as the first step, there's no
> > need for any additional locking. And hence no need to call debugfs
> > functions while holding your subsystem locks.
> >
> > The catch phrase I use for this is "don't solve object lifetime issues
> > with locking". Instead use refcounting and careful ordering in
> > setup/teardown code.
>
> This is exactly what I have done in the OPP core, the locks were taken
> only when really necessary, though as we have seen now I have missed
> that at a single place and that should be fixed as well. Will do that,
> thanks.

Excellent. If the fix is small enough can you push it into 5.10? That
way drm/msm doesn't have to carry the temporary solution for 5.11 (the
issue only pops up with the locking rework, which teaches lockdep a
few more things about what's going on as a side effect).
-Daniel
Rob Clark Oct. 20, 2020, 2:13 p.m. UTC | #7
On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 12:56, Daniel Vetter wrote:
> > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > in setup/teardown code, since there's no other thread which can
> > possible hold a reference to anything your touching anymore. Ofc
> > excluding quickly grabbing/dropping a lock to insert/remove objects
> > into lists and stuff.
> >
> > The other reason is that especially with anything related to sysfs or
> > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > locks pull in mm locks (due to mmap) and at that point there's pretty
> > much nothing left you're allowed to hold while acquiring such a lock.
> > For simple drivers this is no issue, but for fancy drivers (like gpu
> > drivers) which need to interact with core mm) this means your
> > subsystem is a major pain to use.
> >
> > Usually the correct fix is to only hold your subsystem locks in
> > setup/teardown when absolutely required, and fix any data
> > inconsistency issues by reordering your setup/teardown code: When you
> > register as the last step and unregister as the first step, there's no
> > need for any additional locking. And hence no need to call debugfs
> > functions while holding your subsystem locks.
> >
> > The catch phrase I use for this is "don't solve object lifetime issues
> > with locking". Instead use refcounting and careful ordering in
> > setup/teardown code.
>
> This is exactly what I have done in the OPP core, the locks were taken
> only when really necessary, though as we have seen now I have missed
> that at a single place and that should be fixed as well. Will do that,
> thanks.

I do have an easy enough way to repro the issue, so if you have a
patch I can certainly test it.

BR,
-R
Viresh Kumar Oct. 22, 2020, 8:06 a.m. UTC | #8
On 20-10-20, 07:13, Rob Clark wrote:
> On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > in setup/teardown code, since there's no other thread which can
> > > possible hold a reference to anything your touching anymore. Ofc
> > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > into lists and stuff.
> > >
> > > The other reason is that especially with anything related to sysfs or
> > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > much nothing left you're allowed to hold while acquiring such a lock.
> > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > drivers) which need to interact with core mm) this means your
> > > subsystem is a major pain to use.
> > >
> > > Usually the correct fix is to only hold your subsystem locks in
> > > setup/teardown when absolutely required, and fix any data
> > > inconsistency issues by reordering your setup/teardown code: When you
> > > register as the last step and unregister as the first step, there's no
> > > need for any additional locking. And hence no need to call debugfs
> > > functions while holding your subsystem locks.
> > >
> > > The catch phrase I use for this is "don't solve object lifetime issues
> > > with locking". Instead use refcounting and careful ordering in
> > > setup/teardown code.
> >
> > This is exactly what I have done in the OPP core, the locks were taken
> > only when really necessary, though as we have seen now I have missed
> > that at a single place and that should be fixed as well. Will do that,
> > thanks.
> 
> I do have an easy enough way to repro the issue, so if you have a
> patch I can certainly test it.

Does this fix it for you ? There is one more place still left where we
are taking the opp_table_lock while adding stuff to debugfs and that's
not that straight forward to fix. But I didn't see that path in your
circular dependency trace, so who knows :)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 2483e765318a..4cc0fb716381 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
        struct opp_device *opp_dev, *temp;
        int i;
 
+       /* Drop the lock as soon as we can */
+       list_del(&opp_table->node);
+       mutex_unlock(&opp_table_lock);
+
        _of_clear_opp_table(opp_table);
 
        /* Release clk */
@@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
 
        mutex_destroy(&opp_table->genpd_virt_dev_lock);
        mutex_destroy(&opp_table->lock);
-       list_del(&opp_table->node);
        kfree(opp_table);
-
-       mutex_unlock(&opp_table_lock);
 }
 
 void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
Rob Clark Oct. 25, 2020, 5:39 p.m. UTC | #9
On Thu, Oct 22, 2020 at 1:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 07:13, Rob Clark wrote:
> > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > > in setup/teardown code, since there's no other thread which can
> > > > possible hold a reference to anything your touching anymore. Ofc
> > > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > > into lists and stuff.
> > > >
> > > > The other reason is that especially with anything related to sysfs or
> > > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > > much nothing left you're allowed to hold while acquiring such a lock.
> > > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > > drivers) which need to interact with core mm) this means your
> > > > subsystem is a major pain to use.
> > > >
> > > > Usually the correct fix is to only hold your subsystem locks in
> > > > setup/teardown when absolutely required, and fix any data
> > > > inconsistency issues by reordering your setup/teardown code: When you
> > > > register as the last step and unregister as the first step, there's no
> > > > need for any additional locking. And hence no need to call debugfs
> > > > functions while holding your subsystem locks.
> > > >
> > > > The catch phrase I use for this is "don't solve object lifetime issues
> > > > with locking". Instead use refcounting and careful ordering in
> > > > setup/teardown code.
> > >
> > > This is exactly what I have done in the OPP core, the locks were taken
> > > only when really necessary, though as we have seen now I have missed
> > > that at a single place and that should be fixed as well. Will do that,
> > > thanks.
> >
> > I do have an easy enough way to repro the issue, so if you have a
> > patch I can certainly test it.
>
> Does this fix it for you ? There is one more place still left where we
> are taking the opp_table_lock while adding stuff to debugfs and that's
> not that straight forward to fix. But I didn't see that path in your
> circular dependency trace, so who knows :)

Nope, I suspect any creation of debugfs files will be problematic.

(btw, _add_opp_dev_unlocked() looks like it should be called
_add_opp_dev_locked()?)

It does look like 'struct opp_table' is already refcnt'd, so I suspect
you could replace holding opp_table_lock while calling into debugfs
with holding a reference to the opp_table instead?

BR,
-R

[  +0.074543] ======================================================
[  +0.006347] WARNING: possible circular locking dependency detected
[  +0.006349] 5.4.72 #14 Not tainted
[  +0.003501] ------------------------------------------------------
[  +0.006350] chrome/1865 is trying to acquire lock:
[  +0.004922] ffffffdd34921750 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[  +0.007779]
              but task is already holding lock:
[  +0.005989] ffffff81f0fc71a8 (reservation_ww_class_mutex){+.+.}, at:
submit_lock_objects+0x70/0x1ec
[  +0.001132] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[  +0.008156]
              which lock already depends on the new lock.

[  +0.000002]
              the existing dependency chain (in reverse order) is:
[  +0.000002]
              -> #4 (reservation_ww_class_mutex){+.+.}:
[  +0.000009]        __mutex_lock_common+0xec/0xc0c
[  +0.000004]        ww_mutex_lock_interruptible+0x5c/0xc4
[  +0.000003]        msm_gem_fault+0x2c/0x124
[  +0.000005]        __do_fault+0x40/0x16c
[  +0.000003]        handle_mm_fault+0x7cc/0xd98
[  +0.000005]        do_page_fault+0x230/0x3b4
[  +0.000003]        do_translation_fault+0x5c/0x78
[  +0.000004]        do_mem_abort+0x4c/0xb4
[  +0.000006]        el0_da+0x1c/0x20
[  +0.069917]
              -> #3 (&mm->mmap_sem){++++}:
[  +0.005548]        __might_fault+0x70/0x98
[  +0.004209]        compat_filldir+0xf8/0x48c
[  +0.004394]        dcache_readdir+0x70/0x1dc
[  +0.004394]        iterate_dir+0xd4/0x180
[  +0.004119]        __arm64_compat_sys_getdents+0xa0/0x19c
[  +0.005549]        el0_svc_common+0xa8/0x178
[  +0.004394]        el0_svc_compat_handler+0x2c/0x40
[  +0.005007]        el0_svc_compat+0x8/0x10
[  +0.004205]
              -> #2 (&sb->s_type->i_mutex_key#3){++++}:
[  +0.006708]        down_write+0x54/0x16c
[  +0.004034]        start_creating+0x68/0x128
[  +0.004392]        debugfs_create_dir+0x28/0x114
[  +0.004747]        opp_debug_register+0x8c/0xc0
[  +0.004657]        _add_opp_dev_unlocked+0x5c/0x70
[  +0.004920]        _add_opp_dev+0x38/0x58
[  +0.004118]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005813]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005724]        rpmhpd_probe+0x240/0x268
[  +0.004307]        platform_drv_probe+0x90/0xb0
[  +0.004654]        really_probe+0x134/0x2ec
[  +0.004304]        driver_probe_device+0x64/0xfc
[  +0.004746]        __device_attach_driver+0x8c/0xa4
[  +0.005008]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004480]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004121]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004833]        devm_of_platform_populate+0x58/0xa0
[  +0.005283]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004481]        platform_drv_probe+0x90/0xb0
[  +0.004657]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004833]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_bus_create+0x230/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004836]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005986]        do_one_initcall+0x1c8/0x3fc
[  +0.004572]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004304]        kernel_init_freeable+0x124/0x1a4
[  +0.005009]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004205]
              -> #1 (&opp_table->lock){+.+.}:
[  +0.005815]        __mutex_lock_common+0xec/0xc0c
[  +0.004832]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _add_opp_dev+0x2c/0x58
[  +0.004119]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005814]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005721]        rpmhpd_probe+0x240/0x268
[  +0.004306]        platform_drv_probe+0x90/0xb0
[  +0.004656]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005810]        of_platform_bus_create+0x1e4/0x368
[  +0.005197]        of_platform_populate+0x70/0xbc
[  +0.004832]        devm_of_platform_populate+0x58/0xa0
[  +0.005284]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004480]        platform_drv_probe+0x90/0xb0
[  +0.004658]        really_probe+0x134/0x2ec
[  +0.004301]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004831]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004118]        of_device_add+0x3c/0x4c
[  +0.004214]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005817]        of_platform_bus_create+0x1e4/0x368
[  +0.005186]        of_platform_bus_create+0x230/0x368
[  +0.005188]        of_platform_populate+0x70/0xbc
[  +0.004832]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005987]        do_one_initcall+0x1c8/0x3fc
[  +0.004569]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004305]        kernel_init_freeable+0x124/0x1a4
[  +0.005008]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004206]
              -> #0 (opp_table_lock){+.+.}:
[  +0.005640]        __lock_acquire+0xee4/0x2450
[  +0.004570]        lock_acquire+0x1cc/0x210
[  +0.004305]        __mutex_lock_common+0xec/0xc0c
[  +0.004833]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _find_opp_table+0x34/0x74
[  +0.004393]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.005372]        a6xx_gmu_resume+0xc8/0xecc
[  +0.004480]        a6xx_pm_resume+0x148/0x200
[  +0.004482]        adreno_resume+0x28/0x34
[  +0.004209]        pm_generic_runtime_resume+0x34/0x48
[  +0.005283]        __rpm_callback+0x70/0x10c
[  +0.004393]        rpm_callback+0x34/0x8c
[  +0.004119]        rpm_resume+0x414/0x550
[  +0.004119]        __pm_runtime_resume+0x7c/0xa0
[  +0.004746]        msm_gpu_submit+0x60/0x1c0
[  +0.004394]        msm_ioctl_gem_submit+0xadc/0xb60
[  +0.005010]        drm_ioctl_kernel+0x9c/0x118
[  +0.004569]        drm_ioctl+0x27c/0x408
[  +0.004034]        drm_compat_ioctl+0xcc/0xdc
[  +0.004483]        __se_compat_sys_ioctl+0x100/0x206c
[  +0.005186]        __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.005187]        el0_svc_common+0xa8/0x178
[  +0.004393]        el0_svc_compat_handler+0x2c/0x40
[  +0.005009]        el0_svc_compat+0x8/0x10
[  +0.004205]
              other info that might help us debug this:

[  +0.008213] Chain exists of:
                opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

[  +0.011780]  Possible unsafe locking scenario:

[  +0.006082]        CPU0                    CPU1
[  +0.004660]        ----                    ----
[  +0.004656]   lock(reservation_ww_class_mutex);
[  +0.004657]                                lock(&mm->mmap_sem);
[  +0.006079]                                lock(reservation_ww_class_mutex);
[  +0.007237]   lock(opp_table_lock);
[  +0.003592]
               *** DEADLOCK ***

[  +0.006084] 3 locks held by chrome/1865:
[  +0.004031]  #0: ffffff81edecc0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[  +0.009198]  #1: ffffff81d0000870
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[  +0.010086]  #2: ffffff81f0fc71a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[  +0.009735]
              stack backtrace:
[  +0.004482] CPU: 0 PID: 1865 Comm: chrome Not tainted 5.4.72 #14
[  +0.006173] Hardware name: Google Lazor (rev1+) with LTE (DT)
[  +0.005899] Call trace:
[  +0.002515]  dump_backtrace+0x0/0x158
[  +0.003768]  show_stack+0x20/0x2c
[  +0.003407]  dump_stack+0xc8/0x160
[  +0.003506]  print_circular_bug+0x2c4/0x2c8
[  +0.004305]  check_noncircular+0x1a8/0x1b0
[  +0.004206]  __lock_acquire+0xee4/0x2450
[  +0.004032]  lock_acquire+0x1cc/0x210
[  +0.003768]  __mutex_lock_common+0xec/0xc0c
[  +0.004305]  mutex_lock_nested+0x40/0x50
[  +0.004033]  _find_opp_table+0x34/0x74
[  +0.003855]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.004833]  a6xx_gmu_resume+0xc8/0xecc
[  +0.003943]  a6xx_pm_resume+0x148/0x200
[  +0.003944]  adreno_resume+0x28/0x34
[  +0.003681]  pm_generic_runtime_resume+0x34/0x48
[  +0.004745]  __rpm_callback+0x70/0x10c
[  +0.003854]  rpm_callback+0x34/0x8c
[  +0.003592]  rpm_resume+0x414/0x550
[  +0.003592]  __pm_runtime_resume+0x7c/0xa0
[  +0.004207]  msm_gpu_submit+0x60/0x1c0
[  +0.003855]  msm_ioctl_gem_submit+0xadc/0xb60
[  +0.004481]  drm_ioctl_kernel+0x9c/0x118
[  +0.004031]  drm_ioctl+0x27c/0x408
[  +0.003504]  drm_compat_ioctl+0xcc/0xdc
[  +0.003945]  __se_compat_sys_ioctl+0x100/0x206c
[  +0.004658]  __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.004659]  el0_svc_common+0xa8/0x178
[  +0.003855]  el0_svc_compat_handler+0x2c/0x40
[  +0.004480]  el0_svc_compat+0x8/0x10

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 2483e765318a..4cc0fb716381 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
>         struct opp_device *opp_dev, *temp;
>         int i;
>
> +       /* Drop the lock as soon as we can */
> +       list_del(&opp_table->node);
> +       mutex_unlock(&opp_table_lock);
> +
>         _of_clear_opp_table(opp_table);
>
>         /* Release clk */
> @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
>
>         mutex_destroy(&opp_table->genpd_virt_dev_lock);
>         mutex_destroy(&opp_table->lock);
> -       list_del(&opp_table->node);
>         kfree(opp_table);
> -
> -       mutex_unlock(&opp_table_lock);
>  }
>
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> --
> viresh
Viresh Kumar Oct. 27, 2020, 11:35 a.m. UTC | #10
On 25-10-20, 10:39, Rob Clark wrote:
> Nope, I suspect any creation of debugfs files will be problematic.

Yeah, so it only fixed part of the problem.

> (btw, _add_opp_dev_unlocked() looks like it should be called
> _add_opp_dev_locked()?)
> 
> It does look like 'struct opp_table' is already refcnt'd, so I suspect
> you could replace holding opp_table_lock while calling into debugfs
> with holding a reference to the opp_table instead?

It isn't that straight forward unfortunately, we need to make sure the
table doesn't get allocated for the same device twice, so
find+allocate needs to happen within a locked region.

I have taken, not so straight forward, approach to fixing this issue,
lets see if this fixes it or not.

-------------------------8<-------------------------

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 4ac4e7ce6b8b..6f4a73a6391f 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -29,6 +29,8 @@
 LIST_HEAD(opp_tables);
 /* Lock to allow exclusive modification to the device and opp lists */
 DEFINE_MUTEX(opp_table_lock);
+/* Flag indicating that opp_tables list is being updated at the moment */
+static bool opp_tables_busy;
 
 static struct opp_device *_find_opp_dev(const struct device *dev,
 					struct opp_table *opp_table)
@@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,
 	kfree(opp_dev);
 }
 
-static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
-						struct opp_table *opp_table)
+struct opp_device *_add_opp_dev(const struct device *dev,
+				struct opp_table *opp_table)
 {
 	struct opp_device *opp_dev;
 
@@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
 	/* Initialize opp-dev */
 	opp_dev->dev = dev;
 
+	mutex_lock(&opp_table->lock);
 	list_add(&opp_dev->node, &opp_table->dev_list);
+	mutex_unlock(&opp_table->lock);
 
 	/* Create debugfs entries for the opp_table */
 	opp_debug_register(opp_dev, opp_table);
@@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
 	return opp_dev;
 }
 
-struct opp_device *_add_opp_dev(const struct device *dev,
-				struct opp_table *opp_table)
-{
-	struct opp_device *opp_dev;
-
-	mutex_lock(&opp_table->lock);
-	opp_dev = _add_opp_dev_unlocked(dev, opp_table);
-	mutex_unlock(&opp_table->lock);
-
-	return opp_dev;
-}
-
 static struct opp_table *_allocate_opp_table(struct device *dev, int index)
 {
 	struct opp_table *opp_table;
@@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
 	INIT_LIST_HEAD(&opp_table->opp_list);
 	kref_init(&opp_table->kref);
 
-	/* Secure the device table modification */
-	list_add(&opp_table->node, &opp_tables);
 	return opp_table;
 
 err:
@@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)
 	kref_get(&opp_table->kref);
 }
 
+/*
+ * We need to make sure that the OPP table for a device doesn't get added twice,
+ * if this routine gets called in parallel with the same device pointer.
+ *
+ * The simplest way to enforce that is to perform everything (find existing
+ * table and if not found, create a new one) under the opp_table_lock, so only
+ * one creator gets access to the same. But that expands the critical section
+ * under the lock and may end up causing circular dependencies with frameworks
+ * like debugfs, interconnect or clock framework as they may be direct or
+ * indirect users of OPP core.
+ *
+ * And for that reason we have to go for a bit tricky implementation here, which
+ * uses the opp_tables_busy flag to indicate if another creator is in the middle
+ * of adding an OPP table and others should wait for it to finish.
+ */
 static struct opp_table *_opp_get_opp_table(struct device *dev, int index)
 {
 	struct opp_table *opp_table;
 
-	/* Hold our table modification lock here */
+again:
 	mutex_lock(&opp_table_lock);
 
 	opp_table = _find_opp_table_unlocked(dev);
 	if (!IS_ERR(opp_table))
 		goto unlock;
 
+	/*
+	 * The opp_tables list or an OPP table's dev_list is getting updated by
+	 * another user, wait for it to finish.
+	 */
+	if (unlikely(opp_tables_busy)) {
+		mutex_unlock(&opp_table_lock);
+		cpu_relax();
+		goto again;
+	}
+
+	opp_tables_busy = true;
 	opp_table = _managed_opp(dev, index);
+
+	/* Drop the lock to reduce the size of critical section */
+	mutex_unlock(&opp_table_lock);
+
 	if (opp_table) {
-		if (!_add_opp_dev_unlocked(dev, opp_table)) {
+		if (!_add_opp_dev(dev, opp_table)) {
 			dev_pm_opp_put_opp_table(opp_table);
 			opp_table = ERR_PTR(-ENOMEM);
 		}
-		goto unlock;
+
+		mutex_lock(&opp_table_lock);
+	} else {
+		opp_table = _allocate_opp_table(dev, index);
+
+		mutex_lock(&opp_table_lock);
+		if (!IS_ERR(opp_table))
+			list_add(&opp_table->node, &opp_tables);
 	}
 
-	opp_table = _allocate_opp_table(dev, index);
+	opp_tables_busy = false;
 
 unlock:
 	mutex_unlock(&opp_table_lock);
@@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)
 	struct opp_device *opp_dev, *temp;
 	int i;
 
+	/* Drop the lock as soon as we can */
+	list_del(&opp_table->node);
+	mutex_unlock(&opp_table_lock);
+
 	_of_clear_opp_table(opp_table);
 
 	/* Release clk */
@@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)
 
 	mutex_destroy(&opp_table->genpd_virt_dev_lock);
 	mutex_destroy(&opp_table->lock);
-	list_del(&opp_table->node);
 	kfree(opp_table);
-
-	mutex_unlock(&opp_table_lock);
 }
 
 void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
Viresh Kumar Nov. 3, 2020, 5:47 a.m. UTC | #11
On 27-10-20, 17:05, Viresh Kumar wrote:
> It isn't that straight forward unfortunately, we need to make sure the

> table doesn't get allocated for the same device twice, so

> find+allocate needs to happen within a locked region.

> 

> I have taken, not so straight forward, approach to fixing this issue,

> lets see if this fixes it or not.

> 

> -------------------------8<-------------------------

> 

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c

> index 4ac4e7ce6b8b..6f4a73a6391f 100644

> --- a/drivers/opp/core.c

> +++ b/drivers/opp/core.c

> @@ -29,6 +29,8 @@

>  LIST_HEAD(opp_tables);

>  /* Lock to allow exclusive modification to the device and opp lists */

>  DEFINE_MUTEX(opp_table_lock);

> +/* Flag indicating that opp_tables list is being updated at the moment */

> +static bool opp_tables_busy;

>  

>  static struct opp_device *_find_opp_dev(const struct device *dev,

>  					struct opp_table *opp_table)

> @@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,

>  	kfree(opp_dev);

>  }

>  

> -static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,

> -						struct opp_table *opp_table)

> +struct opp_device *_add_opp_dev(const struct device *dev,

> +				struct opp_table *opp_table)

>  {

>  	struct opp_device *opp_dev;

>  

> @@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,

>  	/* Initialize opp-dev */

>  	opp_dev->dev = dev;

>  

> +	mutex_lock(&opp_table->lock);

>  	list_add(&opp_dev->node, &opp_table->dev_list);

> +	mutex_unlock(&opp_table->lock);

>  

>  	/* Create debugfs entries for the opp_table */

>  	opp_debug_register(opp_dev, opp_table);

> @@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,

>  	return opp_dev;

>  }

>  

> -struct opp_device *_add_opp_dev(const struct device *dev,

> -				struct opp_table *opp_table)

> -{

> -	struct opp_device *opp_dev;

> -

> -	mutex_lock(&opp_table->lock);

> -	opp_dev = _add_opp_dev_unlocked(dev, opp_table);

> -	mutex_unlock(&opp_table->lock);

> -

> -	return opp_dev;

> -}

> -

>  static struct opp_table *_allocate_opp_table(struct device *dev, int index)

>  {

>  	struct opp_table *opp_table;

> @@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)

>  	INIT_LIST_HEAD(&opp_table->opp_list);

>  	kref_init(&opp_table->kref);

>  

> -	/* Secure the device table modification */

> -	list_add(&opp_table->node, &opp_tables);

>  	return opp_table;

>  

>  err:

> @@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)

>  	kref_get(&opp_table->kref);

>  }

>  

> +/*

> + * We need to make sure that the OPP table for a device doesn't get added twice,

> + * if this routine gets called in parallel with the same device pointer.

> + *

> + * The simplest way to enforce that is to perform everything (find existing

> + * table and if not found, create a new one) under the opp_table_lock, so only

> + * one creator gets access to the same. But that expands the critical section

> + * under the lock and may end up causing circular dependencies with frameworks

> + * like debugfs, interconnect or clock framework as they may be direct or

> + * indirect users of OPP core.

> + *

> + * And for that reason we have to go for a bit tricky implementation here, which

> + * uses the opp_tables_busy flag to indicate if another creator is in the middle

> + * of adding an OPP table and others should wait for it to finish.

> + */

>  static struct opp_table *_opp_get_opp_table(struct device *dev, int index)

>  {

>  	struct opp_table *opp_table;

>  

> -	/* Hold our table modification lock here */

> +again:

>  	mutex_lock(&opp_table_lock);

>  

>  	opp_table = _find_opp_table_unlocked(dev);

>  	if (!IS_ERR(opp_table))

>  		goto unlock;

>  

> +	/*

> +	 * The opp_tables list or an OPP table's dev_list is getting updated by

> +	 * another user, wait for it to finish.

> +	 */

> +	if (unlikely(opp_tables_busy)) {

> +		mutex_unlock(&opp_table_lock);

> +		cpu_relax();

> +		goto again;

> +	}

> +

> +	opp_tables_busy = true;

>  	opp_table = _managed_opp(dev, index);

> +

> +	/* Drop the lock to reduce the size of critical section */

> +	mutex_unlock(&opp_table_lock);

> +

>  	if (opp_table) {

> -		if (!_add_opp_dev_unlocked(dev, opp_table)) {

> +		if (!_add_opp_dev(dev, opp_table)) {

>  			dev_pm_opp_put_opp_table(opp_table);

>  			opp_table = ERR_PTR(-ENOMEM);

>  		}

> -		goto unlock;

> +

> +		mutex_lock(&opp_table_lock);

> +	} else {

> +		opp_table = _allocate_opp_table(dev, index);

> +

> +		mutex_lock(&opp_table_lock);

> +		if (!IS_ERR(opp_table))

> +			list_add(&opp_table->node, &opp_tables);

>  	}

>  

> -	opp_table = _allocate_opp_table(dev, index);

> +	opp_tables_busy = false;

>  

>  unlock:

>  	mutex_unlock(&opp_table_lock);

> @@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)

>  	struct opp_device *opp_dev, *temp;

>  	int i;

>  

> +	/* Drop the lock as soon as we can */

> +	list_del(&opp_table->node);

> +	mutex_unlock(&opp_table_lock);

> +

>  	_of_clear_opp_table(opp_table);

>  

>  	/* Release clk */

> @@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)

>  

>  	mutex_destroy(&opp_table->genpd_virt_dev_lock);

>  	mutex_destroy(&opp_table->lock);

> -	list_del(&opp_table->node);

>  	kfree(opp_table);

> -

> -	mutex_unlock(&opp_table_lock);

>  }

>  

>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)


Rob, Ping.

-- 
viresh
Rob Clark Nov. 3, 2020, 4:50 p.m. UTC | #12
On Mon, Nov 2, 2020 at 9:47 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 27-10-20, 17:05, Viresh Kumar wrote:
> > It isn't that straight forward unfortunately, we need to make sure the
> > table doesn't get allocated for the same device twice, so
> > find+allocate needs to happen within a locked region.
> >
> > I have taken, not so straight forward, approach to fixing this issue,
> > lets see if this fixes it or not.
> >
> > -------------------------8<-------------------------
> >
> > diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> > index 4ac4e7ce6b8b..6f4a73a6391f 100644
> > --- a/drivers/opp/core.c
> > +++ b/drivers/opp/core.c
> > @@ -29,6 +29,8 @@
> >  LIST_HEAD(opp_tables);
> >  /* Lock to allow exclusive modification to the device and opp lists */
> >  DEFINE_MUTEX(opp_table_lock);
> > +/* Flag indicating that opp_tables list is being updated at the moment */
> > +static bool opp_tables_busy;
> >
> >  static struct opp_device *_find_opp_dev(const struct device *dev,
> >                                       struct opp_table *opp_table)
> > @@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,
> >       kfree(opp_dev);
> >  }
> >
> > -static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> > -                                             struct opp_table *opp_table)
> > +struct opp_device *_add_opp_dev(const struct device *dev,
> > +                             struct opp_table *opp_table)
> >  {
> >       struct opp_device *opp_dev;
> >
> > @@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> >       /* Initialize opp-dev */
> >       opp_dev->dev = dev;
> >
> > +     mutex_lock(&opp_table->lock);
> >       list_add(&opp_dev->node, &opp_table->dev_list);
> > +     mutex_unlock(&opp_table->lock);
> >
> >       /* Create debugfs entries for the opp_table */
> >       opp_debug_register(opp_dev, opp_table);
> > @@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> >       return opp_dev;
> >  }
> >
> > -struct opp_device *_add_opp_dev(const struct device *dev,
> > -                             struct opp_table *opp_table)
> > -{
> > -     struct opp_device *opp_dev;
> > -
> > -     mutex_lock(&opp_table->lock);
> > -     opp_dev = _add_opp_dev_unlocked(dev, opp_table);
> > -     mutex_unlock(&opp_table->lock);
> > -
> > -     return opp_dev;
> > -}
> > -
> >  static struct opp_table *_allocate_opp_table(struct device *dev, int index)
> >  {
> >       struct opp_table *opp_table;
> > @@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
> >       INIT_LIST_HEAD(&opp_table->opp_list);
> >       kref_init(&opp_table->kref);
> >
> > -     /* Secure the device table modification */
> > -     list_add(&opp_table->node, &opp_tables);
> >       return opp_table;
> >
> >  err:
> > @@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)
> >       kref_get(&opp_table->kref);
> >  }
> >
> > +/*
> > + * We need to make sure that the OPP table for a device doesn't get added twice,
> > + * if this routine gets called in parallel with the same device pointer.
> > + *
> > + * The simplest way to enforce that is to perform everything (find existing
> > + * table and if not found, create a new one) under the opp_table_lock, so only
> > + * one creator gets access to the same. But that expands the critical section
> > + * under the lock and may end up causing circular dependencies with frameworks
> > + * like debugfs, interconnect or clock framework as they may be direct or
> > + * indirect users of OPP core.
> > + *
> > + * And for that reason we have to go for a bit tricky implementation here, which
> > + * uses the opp_tables_busy flag to indicate if another creator is in the middle
> > + * of adding an OPP table and others should wait for it to finish.
> > + */
> >  static struct opp_table *_opp_get_opp_table(struct device *dev, int index)
> >  {
> >       struct opp_table *opp_table;
> >
> > -     /* Hold our table modification lock here */
> > +again:
> >       mutex_lock(&opp_table_lock);
> >
> >       opp_table = _find_opp_table_unlocked(dev);
> >       if (!IS_ERR(opp_table))
> >               goto unlock;
> >
> > +     /*
> > +      * The opp_tables list or an OPP table's dev_list is getting updated by
> > +      * another user, wait for it to finish.
> > +      */
> > +     if (unlikely(opp_tables_busy)) {
> > +             mutex_unlock(&opp_table_lock);
> > +             cpu_relax();
> > +             goto again;
> > +     }
> > +
> > +     opp_tables_busy = true;
> >       opp_table = _managed_opp(dev, index);
> > +
> > +     /* Drop the lock to reduce the size of critical section */
> > +     mutex_unlock(&opp_table_lock);
> > +
> >       if (opp_table) {
> > -             if (!_add_opp_dev_unlocked(dev, opp_table)) {
> > +             if (!_add_opp_dev(dev, opp_table)) {
> >                       dev_pm_opp_put_opp_table(opp_table);
> >                       opp_table = ERR_PTR(-ENOMEM);
> >               }
> > -             goto unlock;
> > +
> > +             mutex_lock(&opp_table_lock);
> > +     } else {
> > +             opp_table = _allocate_opp_table(dev, index);
> > +
> > +             mutex_lock(&opp_table_lock);
> > +             if (!IS_ERR(opp_table))
> > +                     list_add(&opp_table->node, &opp_tables);
> >       }
> >
> > -     opp_table = _allocate_opp_table(dev, index);
> > +     opp_tables_busy = false;
> >
> >  unlock:
> >       mutex_unlock(&opp_table_lock);
> > @@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)
> >       struct opp_device *opp_dev, *temp;
> >       int i;
> >
> > +     /* Drop the lock as soon as we can */
> > +     list_del(&opp_table->node);
> > +     mutex_unlock(&opp_table_lock);
> > +
> >       _of_clear_opp_table(opp_table);
> >
> >       /* Release clk */
> > @@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)
> >
> >       mutex_destroy(&opp_table->genpd_virt_dev_lock);
> >       mutex_destroy(&opp_table->lock);
> > -     list_del(&opp_table->node);
> >       kfree(opp_table);
> > -
> > -     mutex_unlock(&opp_table_lock);
> >  }
> >
> >  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> Rob, Ping.
>

sorry, it didn't apply cleanly (which I guess is due to some other
dependencies that need to be picked back to v5.4 product kernel), and
due to some other things I'm in middle of debugging I didn't have time
yet to switch to v5.10-rc or look at what else needs to
cherry-picked..

If you could, pushing a branch with this patch somewhere would be a
bit easier to work with (ie. fetch && cherry-pick is easier to deal
with than picking things from list)

BR,
-R
Viresh Kumar Nov. 4, 2020, 3:03 a.m. UTC | #13
On 03-11-20, 08:50, Rob Clark wrote:
> sorry, it didn't apply cleanly (which I guess is due to some other
> dependencies that need to be picked back to v5.4 product kernel), and
> due to some other things I'm in middle of debugging I didn't have time
> yet to switch to v5.10-rc or look at what else needs to
> cherry-picked..
> 
> If you could, pushing a branch with this patch somewhere would be a
> bit easier to work with (ie. fetch && cherry-pick is easier to deal
> with than picking things from list)

It has been in linux-next for a few days. Here is the HEAD to pick
from. There are few patches there since rc1.

commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
Rob Clark Nov. 5, 2020, 7:24 p.m. UTC | #14
On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 03-11-20, 08:50, Rob Clark wrote:
> > sorry, it didn't apply cleanly (which I guess is due to some other
> > dependencies that need to be picked back to v5.4 product kernel), and
> > due to some other things I'm in middle of debugging I didn't have time
> > yet to switch to v5.10-rc or look at what else needs to
> > cherry-picked..
> >
> > If you could, pushing a branch with this patch somewhere would be a
> > bit easier to work with (ie. fetch && cherry-pick is easier to deal
> > with than picking things from list)
>
> It has been in linux-next for a few days. Here is the HEAD to pick
> from. There are few patches there since rc1.
>
> commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
>

sorry for the delay, with that cherry-picked, I'm getting a whole lot of:

[   10.191497] WARNING: CPU: 7 PID: 52 at drivers/opp/of.c:115
_find_table_of_opp_np+0x8c/0x94
[   10.191502] Modules linked in:
[   10.191517] CPU: 7 PID: 52 Comm: kworker/7:1 Tainted: G        W
     5.10.0-rc2+ #2
[   10.191522] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   10.191537] Workqueue: events deferred_probe_work_func
[   10.191551] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
[   10.202819] mmc0: CQHCI version 5.10
[   10.206038] pc : _find_table_of_opp_np+0x8c/0x94
[   10.206045] lr : _find_table_of_opp_np+0x34/0x94
[   10.206050] sp : ffffffc010373810
[   10.206054] x29: ffffffc010373810 x28: ffffff94c5a3d170
[   10.206070] x27: ffffff94c5a3d168
[   10.249098] mmc0: SDHCI controller on 7c4000.sdhci [7c4000.sdhci]
using ADMA 64-bit
[   10.251366] x26: ffffff94c580c000
[   10.251374] x25: 0000000000000001 x24: ffffff963f02c750
[   10.251385] x23: 0000000000000000 x22: ffffff94c5aabc80
[   10.251397] x21: ffffff963f021c78 x20: ffffff94c5a75800
[   10.256963] sdhci_msm 7c4000.sdhci: mmc0: CQE init: success
[   10.260376]
[   10.260380] x19: ffffff963f02c750 x18: 0000000000000004
[   10.260392] x17: 000000000000002c x16: ffffffe2468e1e78
[   10.260404] x15: ffffffe246df3eb8 x14: ffffffff52f45308
[   10.311816] x13: 0000000000000000 x12: ffffffe24541aef0
[   10.317298] x11: ffffffe246df3eb8 x10: fffffffefe60e678
[   10.322776] x9 : 0000000000000000 x8 : ffffffb3f89a7000
[   10.328258] x7 : ffffffe245c5d9d0 x6 : 0000000000000000
[   10.333730] x5 : 0000000000000080 x4 : 0000000000000001
[   10.339206] x3 : 0000000000000000 x2 : 0000000000000006
[   10.344684] x1 : ffffffe24684aa88 x0 : 0000000000000000
[   10.350158] Call trace:
[   10.352695]  _find_table_of_opp_np+0x8c/0x94
[   10.353507] mmc0: Command Queue Engine enabled
[   10.357095]  _of_init_opp_table+0x15c/0x1e4
[   10.357103]  _opp_get_opp_table+0x168/0x280
[   10.357110]  dev_pm_opp_set_clkname+0x28/0xcc
[   10.357119]  dpu_bind+0x50/0x1a4
[   10.357128]  component_bind_all+0xf4/0x20c
[   10.357138]  msm_drm_init+0x180/0x588
[   10.361815] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[   10.366050]  msm_drm_bind+0x1c/0x24
[   10.366057]  try_to_bring_up_master+0x160/0x1a8
[   10.366065]  component_master_add_with_match+0xc4/0x108
[   10.366072]  msm_pdev_probe+0x214/0x2a4
[   10.366081]  platform_drv_probe+0x94/0xb4
[   10.374415] mmcblk0: mmc0:0001 DA4064 58.2 GiB
[   10.374871]  really_probe+0x138/0x348
[   10.374881]  driver_probe_device+0x80/0xb8
[   10.379483] mmcblk0boot0: mmc0:0001 DA4064 partition 1 4.00 MiB
[   10.382446]  __device_attach_driver+0x90/0xa8
[   10.382453]  bus_for_each_drv+0x84/0xcc
[   10.382459]  __device_attach+0xc0/0x148
[   10.382466]  device_initial_probe+0x18/0x20
[   10.382473]  bus_probe_device+0x38/0x98
[   10.382483]  deferred_probe_work_func+0x7c/0xb8
[   10.387402] mmcblk0boot1: mmc0:0001 DA4064 partition 2 4.00 MiB
[   10.392780]  process_one_work+0x314/0x60c
[   10.392786]  worker_thread+0x238/0x3e8
[   10.392793]  kthread+0x148/0x158
[   10.392800]  ret_from_fork+0x10/0x18
[   10.392809] CPU: 7 PID: 52 Comm: kworker/7:1 Tainted: G        W
     5.10.0-rc2+ #2
[   10.397683] mmcblk0rpmb: mmc0:0001 DA4064 partition 3 16.0 MiB,
chardev (241:0)
[   10.401051] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   10.401062] Workqueue: events deferred_probe_work_func
[   10.401069] Call trace:
[   10.401077]  dump_backtrace+0x0/0x1b4
[   10.401087]  show_stack+0x1c/0x24
[   10.427111]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
[   10.427156]  dump_stack+0xdc/0x158
[   10.427165]  __warn+0xd8/0x16c
[   10.427173]  report_bug+0x88/0xe0
[   10.427179]  bug_handler+0x24/0x6c
[   10.535574]  brk_handler+0x78/0xb4
[   10.539090]  do_debug_exception+0x1a4/0x208
[   10.543395]  el1_sync_handler+0x8c/0x110
[   10.547434]  el1_sync+0x7c/0x100
[   10.550762]  _find_table_of_opp_np+0x8c/0x94
[   10.555166]  _of_init_opp_table+0x15c/0x1e4
[   10.559472]  _opp_get_opp_table+0x168/0x280
[   10.563779]  dev_pm_opp_set_clkname+0x28/0xcc
[   10.568270]  dpu_bind+0x50/0x1a4
[   10.571607]  component_bind_all+0xf4/0x20c
[   10.575826]  msm_drm_init+0x180/0x588
[   10.579603]  msm_drm_bind+0x1c/0x24
[   10.583205]  try_to_bring_up_master+0x160/0x1a8
[   10.587877]  component_master_add_with_match+0xc4/0x108
[   10.593251]  msm_pdev_probe+0x214/0x2a4
[   10.597203]  platform_drv_probe+0x94/0xb4
[   10.601334]  really_probe+0x138/0x348
[   10.605110]  driver_probe_device+0x80/0xb8
[   10.609329]  __device_attach_driver+0x90/0xa8
[   10.613821]  bus_for_each_drv+0x84/0xcc
[   10.617774]  __device_attach+0xc0/0x148
[   10.621729]  device_initial_probe+0x18/0x20
[   10.626044]  bus_probe_device+0x38/0x98
[   10.629998]  deferred_probe_work_func+0x7c/0xb8
[   10.634668]  process_one_work+0x314/0x60c
[   10.638797]  worker_thread+0x238/0x3e8
[   10.642661]  kthread+0x148/0x158
[   10.645997]  ret_from_fork+0x10/0x18
[   10.649683] irq event stamp: 117274
[   10.653290] hardirqs last  enabled at (117273):
[<ffffffe245ed8430>] _raw_spin_unlock_irqrestore+0x60/0x94
[   10.663213] hardirqs last disabled at (117274):
[<ffffffe245420ea0>] do_debug_exception+0x60/0x208
[   10.672420] softirqs last  enabled at (116976):
[<ffffffe245400eec>] __do_softirq+0x4bc/0x540
[   10.681184] softirqs last disabled at (116971):
[<ffffffe24547dd10>] __irq_exit_rcu+0x118/0x138
[   10.690123] ---[ end trace 00b127c206a99072 ]---
Viresh Kumar Nov. 6, 2020, 7:16 a.m. UTC | #15
On 05-11-20, 11:24, Rob Clark wrote:
> On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> >

> > On 03-11-20, 08:50, Rob Clark wrote:

> > > sorry, it didn't apply cleanly (which I guess is due to some other

> > > dependencies that need to be picked back to v5.4 product kernel), and

> > > due to some other things I'm in middle of debugging I didn't have time

> > > yet to switch to v5.10-rc or look at what else needs to

> > > cherry-picked..

> > >

> > > If you could, pushing a branch with this patch somewhere would be a

> > > bit easier to work with (ie. fetch && cherry-pick is easier to deal

> > > with than picking things from list)

> >

> > It has been in linux-next for a few days. Here is the HEAD to pick

> > from. There are few patches there since rc1.

> >

> > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")

> >

> 

> sorry for the delay, with that cherry-picked, I'm getting a whole lot of:


Ahh, sorry about that and thanks for reporting it. Here is the fix:

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c718092757d9..6b7f0066942d 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
        struct opp_table *opp_table;
        struct device_node *opp_table_np;
 
-       lockdep_assert_held(&opp_table_lock);
-
        opp_table_np = of_get_parent(opp_np);
        if (!opp_table_np)
                goto err;
@@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
        /* It is safe to put the node now as all we need now is its address */
        of_node_put(opp_table_np);
 
+       mutex_lock(&opp_table_lock);
        list_for_each_entry(opp_table, &opp_tables, node) {
                if (opp_table_np == opp_table->np) {
                        _get_opp_table_kref(opp_table);
+                       mutex_unlock(&opp_table_lock);
                        return opp_table;
                }
        }
+       mutex_unlock(&opp_table_lock);
 
 err:
        return ERR_PTR(-ENODEV);

-- 
viresh
Viresh Kumar Nov. 17, 2020, 10:03 a.m. UTC | #16
On Fri, 6 Nov 2020 at 12:46, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>

> On 05-11-20, 11:24, Rob Clark wrote:

> > On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > >

> > > On 03-11-20, 08:50, Rob Clark wrote:

> > > > sorry, it didn't apply cleanly (which I guess is due to some other

> > > > dependencies that need to be picked back to v5.4 product kernel), and

> > > > due to some other things I'm in middle of debugging I didn't have time

> > > > yet to switch to v5.10-rc or look at what else needs to

> > > > cherry-picked..

> > > >

> > > > If you could, pushing a branch with this patch somewhere would be a

> > > > bit easier to work with (ie. fetch && cherry-pick is easier to deal

> > > > with than picking things from list)

> > >

> > > It has been in linux-next for a few days. Here is the HEAD to pick

> > > from. There are few patches there since rc1.

> > >

> > > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")

> > >

> >

> > sorry for the delay, with that cherry-picked, I'm getting a whole lot of:

>

> Ahh, sorry about that and thanks for reporting it. Here is the fix:

>

> diff --git a/drivers/opp/of.c b/drivers/opp/of.c

> index c718092757d9..6b7f0066942d 100644

> --- a/drivers/opp/of.c

> +++ b/drivers/opp/of.c

> @@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)

>         struct opp_table *opp_table;

>         struct device_node *opp_table_np;

>

> -       lockdep_assert_held(&opp_table_lock);

> -

>         opp_table_np = of_get_parent(opp_np);

>         if (!opp_table_np)

>                 goto err;

> @@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)

>         /* It is safe to put the node now as all we need now is its address */

>         of_node_put(opp_table_np);

>

> +       mutex_lock(&opp_table_lock);

>         list_for_each_entry(opp_table, &opp_tables, node) {

>                 if (opp_table_np == opp_table->np) {

>                         _get_opp_table_kref(opp_table);

> +                       mutex_unlock(&opp_table_lock);

>                         return opp_table;

>                 }

>         }

> +       mutex_unlock(&opp_table_lock);

>

>  err:

>         return ERR_PTR(-ENODEV);


Ping.
Rob Clark Nov. 17, 2020, 5:02 p.m. UTC | #17
On Thu, Nov 5, 2020 at 11:16 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>

> On 05-11-20, 11:24, Rob Clark wrote:

> > On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > >

> > > On 03-11-20, 08:50, Rob Clark wrote:

> > > > sorry, it didn't apply cleanly (which I guess is due to some other

> > > > dependencies that need to be picked back to v5.4 product kernel), and

> > > > due to some other things I'm in middle of debugging I didn't have time

> > > > yet to switch to v5.10-rc or look at what else needs to

> > > > cherry-picked..

> > > >

> > > > If you could, pushing a branch with this patch somewhere would be a

> > > > bit easier to work with (ie. fetch && cherry-pick is easier to deal

> > > > with than picking things from list)

> > >

> > > It has been in linux-next for a few days. Here is the HEAD to pick

> > > from. There are few patches there since rc1.

> > >

> > > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")

> > >

> >

> > sorry for the delay, with that cherry-picked, I'm getting a whole lot of:

>

> Ahh, sorry about that and thanks for reporting it. Here is the fix:

>

> diff --git a/drivers/opp/of.c b/drivers/opp/of.c

> index c718092757d9..6b7f0066942d 100644

> --- a/drivers/opp/of.c

> +++ b/drivers/opp/of.c

> @@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)

>         struct opp_table *opp_table;

>         struct device_node *opp_table_np;

>

> -       lockdep_assert_held(&opp_table_lock);

> -

>         opp_table_np = of_get_parent(opp_np);

>         if (!opp_table_np)

>                 goto err;

> @@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)

>         /* It is safe to put the node now as all we need now is its address */

>         of_node_put(opp_table_np);

>

> +       mutex_lock(&opp_table_lock);

>         list_for_each_entry(opp_table, &opp_tables, node) {

>                 if (opp_table_np == opp_table->np) {

>                         _get_opp_table_kref(opp_table);

> +                       mutex_unlock(&opp_table_lock);

>                         return opp_table;

>                 }

>         }

> +       mutex_unlock(&opp_table_lock);

>

>  err:

>         return ERR_PTR(-ENODEV);

>


With that on top of the previous patch,

[   26.378245] ======================================================
[   26.384595] WARNING: possible circular locking dependency detected
[   26.390947] 5.10.0-rc2+ #6 Not tainted
[   26.394804] ------------------------------------------------------
[   26.401155] chrome/1886 is trying to acquire lock:
[   26.406087] ffffffe5e264aa88 (opp_table_lock){+.+.}-{3:3}, at:
_find_opp_table+0x38/0x78
[   26.414436]
[   26.414436] but task is already holding lock:
[   26.420423] ffffffb0283935b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   26.430268]
[   26.430268] which lock already depends on the new lock.
[   26.430268]
[   26.438661]
[   26.438661] the existing dependency chain (in reverse order) is:
[   26.446343]
[   26.446343] -> #3 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   26.453603]        lock_acquire+0x23c/0x30c
[   26.457910]        __mutex_lock_common+0xdc/0xbc4
[   26.462743]        ww_mutex_lock_interruptible+0x84/0xec
[   26.468203]        msm_gem_fault+0x30/0x138
[   26.472507]        __do_fault+0x44/0x184
[   26.476541]        handle_mm_fault+0x754/0xc50
[   26.481117]        do_page_fault+0x230/0x354
[   26.485507]        do_translation_fault+0x40/0x54
[   26.490338]        do_mem_abort+0x44/0xac
[   26.494469]        el0_sync_compat_handler+0x15c/0x190
[   26.499756]        el0_sync_compat+0x144/0x180
[   26.504328]
[   26.504328] -> #2 (&mm->mmap_lock){++++}-{3:3}:
[   26.510511]        lock_acquire+0x23c/0x30c
[   26.514813]        __might_fault+0x60/0x80
[   26.519034]        compat_filldir+0x118/0x4d0
[   26.523519]        dcache_readdir+0x74/0x1e0
[   26.527907]        iterate_dir+0xd4/0x198
[   26.532037]        __arm64_compat_sys_getdents+0x6c/0x168
[   26.537583]        el0_svc_common+0xa4/0x174
[   26.541970]        do_el0_svc_compat+0x20/0x30
[   26.546543]        el0_sync_compat_handler+0x124/0x190
[   26.551828]        el0_sync_compat+0x144/0x180
[   26.556399]
[   26.556399] -> #1 (&sb->s_type->i_mutex_key#2){++++}-{3:3}:
[   26.563660]        lock_acquire+0x23c/0x30c
[   26.567963]        down_write+0x80/0x1dc
[   26.571997]        simple_recursive_removal+0x48/0x238
[   26.577288]        debugfs_remove+0x5c/0x78
[   26.581592]        opp_debug_unregister+0x34/0x118
[   26.586521]        dev_pm_opp_put_opp_table+0xd0/0x14c
[   26.591802]        dev_pm_opp_put_clkname+0x40/0x54
[   26.596820]        msm_dsi_host_destroy+0xe0/0x108
[   26.601748]        dsi_destroy+0x40/0x58
[   26.605789]        dsi_bind+0x8c/0x16c
[   26.609648]        component_bind_all+0xf4/0x20c
[   26.614399]        msm_drm_init+0x180/0x588
[   26.618696]        msm_drm_bind+0x1c/0x24
[   26.622822]        try_to_bring_up_master+0x160/0x1a8
[   26.628014]        component_master_add_with_match+0xc4/0x108
[   26.633918]        msm_pdev_probe+0x214/0x2a4
[   26.638405]        platform_drv_probe+0x94/0xb4
[   26.643066]        really_probe+0x138/0x348
[   26.647365]        driver_probe_device+0x80/0xb8
[   26.652111]        device_driver_attach+0x50/0x70
[   26.656951]        __driver_attach+0xb4/0xc8
[   26.661340]        bus_for_each_dev+0x80/0xc8
[   26.665825]        driver_attach+0x28/0x30
[   26.670038]        bus_add_driver+0x100/0x1d4
[   26.674524]        driver_register+0x68/0xfc
[   26.678910]        __platform_driver_register+0x48/0x50
[   26.684284]        msm_drm_register+0x64/0x68
[   26.688766]        do_one_initcall+0x1ac/0x3e4
[   26.693341]        do_initcall_level+0xa0/0xb8
[   26.697910]        do_initcalls+0x58/0x94
[   26.702039]        do_basic_setup+0x28/0x30
[   26.706338]        kernel_init_freeable+0x190/0x1d0
[   26.711352]        kernel_init+0x18/0x10c
[   26.715481]        ret_from_fork+0x10/0x18
[   26.719692]
[   26.719692] -> #0 (opp_table_lock){+.+.}-{3:3}:
[   26.725883]        check_noncircular+0x12c/0x134
[   26.730627]        __lock_acquire+0x2288/0x2b2c
[   26.735284]        lock_acquire+0x23c/0x30c
[   26.739584]        __mutex_lock_common+0xdc/0xbc4
[   26.744426]        mutex_lock_nested+0x50/0x58
[   26.748998]        _find_opp_table+0x38/0x78
[   26.753395]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[   26.758771]        a6xx_gmu_resume+0xcc/0xed0
[   26.763255]        a6xx_pm_resume+0x140/0x174
[   26.767741]        adreno_resume+0x24/0x2c
[   26.771956]        pm_generic_runtime_resume+0x2c/0x3c
[   26.777244]        __rpm_callback+0x74/0x114
[   26.781633]        rpm_callback+0x30/0x84
[   26.785759]        rpm_resume+0x3c8/0x4f0
[   26.789884]        __pm_runtime_resume+0x80/0xa4
[   26.794631]        msm_gpu_submit+0x60/0x228
[   26.799019]        msm_ioctl_gem_submit+0xba0/0xc1c
[   26.804038]        drm_ioctl_kernel+0xa0/0x11c
[   26.808608]        drm_ioctl+0x240/0x3dc
[   26.812653]        drm_compat_ioctl+0xd4/0xe4
[   26.817141]        __arm64_compat_sys_ioctl+0xc4/0xf8
[   26.822331]        el0_svc_common+0xa4/0x174
[   26.826718]        do_el0_svc_compat+0x20/0x30
[   26.831291]        el0_sync_compat_handler+0x124/0x190
[   26.836577]        el0_sync_compat+0x144/0x180
[   26.841148]
[   26.841148] other info that might help us debug this:
[   26.841148]
[   26.849361] Chain exists of:
[   26.849361]   opp_table_lock --> &mm->mmap_lock -->
reservation_ww_class_mutex
[   26.849361]
[   26.861249]  Possible unsafe locking scenario:
[   26.861249]
[   26.867334]        CPU0                    CPU1
[   26.871990]        ----                    ----
[   26.876647]   lock(reservation_ww_class_mutex);
[   26.881309]                                lock(&mm->mmap_lock);
[   26.887487]                                lock(reservation_ww_class_mutex);
[   26.894730]   lock(opp_table_lock);
[   26.898327]
[   26.898327]  *** DEADLOCK ***
[   26.898327]
[   26.904410] 3 locks held by chrome/1886:
[   26.908447]  #0: ffffffb005bd9138 (&dev->struct_mutex){+.+.}-{3:3},
at: msm_ioctl_gem_submit+0x238/0xc1c
[   26.918199]  #1: ffffffb02251fa70
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
msm_ioctl_gem_submit+0x978/0xc1c
[   26.928843]  #2: ffffffb0283935b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   26.939126]
[   26.939126] stack backtrace:
[   26.943612] CPU: 5 PID: 1886 Comm: chrome Not tainted 5.10.0-rc2+ #6
[   26.950137] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   26.956039] Call trace:
[   26.958566]  dump_backtrace+0x0/0x1b4
[   26.962333]  show_stack+0x1c/0x24
[   26.965754]  dump_stack+0xdc/0x158
[   26.969251]  print_circular_bug+0x308/0x338
[   26.973550]  check_noncircular+0x12c/0x134
[   26.977762]  __lock_acquire+0x2288/0x2b2c
[   26.981889]  lock_acquire+0x23c/0x30c
[   26.985658]  __mutex_lock_common+0xdc/0xbc4
[   26.989961]  mutex_lock_nested+0x50/0x58
[   26.993994]  _find_opp_table+0x38/0x78
[   26.997852]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[   27.002690]  a6xx_gmu_resume+0xcc/0xed0
[   27.006635]  a6xx_pm_resume+0x140/0x174
[   27.010580]  adreno_resume+0x24/0x2c
[   27.014259]  pm_generic_runtime_resume+0x2c/0x3c
[   27.019008]  __rpm_callback+0x74/0x114
[   27.022868]  rpm_callback+0x30/0x84
[   27.026455]  rpm_resume+0x3c8/0x4f0
[   27.030042]  __pm_runtime_resume+0x80/0xa4
[   27.034259]  msm_gpu_submit+0x60/0x228
[   27.038117]  msm_ioctl_gem_submit+0xba0/0xc1c
[   27.042604]  drm_ioctl_kernel+0xa0/0x11c
[   27.046637]  drm_ioctl+0x240/0x3dc
[   27.050139]  drm_compat_ioctl+0xd4/0xe4
[   27.054088]  __arm64_compat_sys_ioctl+0xc4/0xf8
[   27.058748]  el0_svc_common+0xa4/0x174
[   27.062606]  do_el0_svc_compat+0x20/0x30
[   27.066647]  el0_sync_compat_handler+0x124/0x190
[   27.071393]  el0_sync_compat+0x144/0x180

BR,
-R
Viresh Kumar Nov. 18, 2020, 5:28 a.m. UTC | #18
On 17-11-20, 09:02, Rob Clark wrote:
> With that on top of the previous patch,


Don't you still have this ? Which fixed the lockdep in the remove path.

https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/

To make it clear you need these patches to fix the OPP stuff:

//From 5.10-rc3 (the one from the above link).
commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")

//Below two from linux-next
commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")

This matches the diff I gave you earlier.

-- 
viresh
Viresh Kumar Nov. 19, 2020, 6:05 a.m. UTC | #19
On 18-11-20, 08:53, Rob Clark wrote:
> On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> >

> > On 17-11-20, 09:02, Rob Clark wrote:

> > > With that on top of the previous patch,

> >

> > Don't you still have this ? Which fixed the lockdep in the remove path.

> >

> > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/

> >

> > To make it clear you need these patches to fix the OPP stuff:

> >

> > //From 5.10-rc3 (the one from the above link).

> > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")


This fixes debugfs stuff while the OPP table is removed.

> > //Below two from linux-next

> > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")

> > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")


This fixes debugfs stuff while the OPP table is added.

> > This matches the diff I gave you earlier.

> >

> 

> no, I did not have all three, only "opp: Allocate the OPP table

> outside of opp_table_lock" plus the fixup.  But with all three:


And looking at the lockdep you gave now, it looks like we have a
problem with OPP table's internal lock (opp_table->lock) as well apart
from the global opp_table_lock.

I wish there was a way for me to reproduce the lockdep :(

I know this is exhausting for both of us and I really want to be over
with it as soon as possible, this really should be the last patch
here, please try this along with other two. This fixes the debugfs
thing while the OPPs in the OPP table are removed (they are already
added without a lock around debugfs stuff).

AFAIU, there is no further debugfs stuff that happens from within the
locks and so this really should be the last patch unless I missed
something.

-- 
viresh

-------------------------8<-------------------------
From: Viresh Kumar <viresh.kumar@linaro.org>

Date: Thu, 19 Nov 2020 11:24:32 +0530
Subject: [PATCH] opp: Reduce the size of critical section in
 _opp_kref_release()

There is a lot of stuff here which can be done outside of the
opp_table->lock, do that. This helps avoiding a circular dependency
lockdeps around debugfs.

Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 drivers/opp/core.c | 94 +++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 9d145bb99a59..4268eb359915 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1251,9 +1251,14 @@ void _opp_free(struct dev_pm_opp *opp)
 	kfree(opp);
 }
 
-static void _opp_kref_release(struct dev_pm_opp *opp,
-			      struct opp_table *opp_table)
+static void _opp_kref_release(struct kref *kref)
 {
+	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
+	struct opp_table *opp_table = opp->opp_table;
+
+	list_del(&opp->node);
+	mutex_unlock(&opp_table->lock);
+
 	/*
 	 * Notify the changes in the availability of the operable
 	 * frequency/voltage list.
@@ -1261,27 +1266,9 @@ static void _opp_kref_release(struct dev_pm_opp *opp,
 	blocking_notifier_call_chain(&opp_table->head, OPP_EVENT_REMOVE, opp);
 	_of_opp_free_required_opps(opp_table, opp);
 	opp_debug_remove_one(opp);
-	list_del(&opp->node);
 	kfree(opp);
 }
 
-static void _opp_kref_release_unlocked(struct kref *kref)
-{
-	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
-	struct opp_table *opp_table = opp->opp_table;
-
-	_opp_kref_release(opp, opp_table);
-}
-
-static void _opp_kref_release_locked(struct kref *kref)
-{
-	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
-	struct opp_table *opp_table = opp->opp_table;
-
-	_opp_kref_release(opp, opp_table);
-	mutex_unlock(&opp_table->lock);
-}
-
 void dev_pm_opp_get(struct dev_pm_opp *opp)
 {
 	kref_get(&opp->kref);
@@ -1289,16 +1276,10 @@ void dev_pm_opp_get(struct dev_pm_opp *opp)
 
 void dev_pm_opp_put(struct dev_pm_opp *opp)
 {
-	kref_put_mutex(&opp->kref, _opp_kref_release_locked,
-		       &opp->opp_table->lock);
+	kref_put_mutex(&opp->kref, _opp_kref_release, &opp->opp_table->lock);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_put);
 
-static void dev_pm_opp_put_unlocked(struct dev_pm_opp *opp)
-{
-	kref_put(&opp->kref, _opp_kref_release_unlocked);
-}
-
 /**
  * dev_pm_opp_remove()  - Remove an OPP from OPP table
  * @dev:	device for which we do this operation
@@ -1342,30 +1323,49 @@ void dev_pm_opp_remove(struct device *dev, unsigned long freq)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_remove);
 
+static struct dev_pm_opp *_opp_get_next(struct opp_table *opp_table,
+					bool dynamic)
+{
+	struct dev_pm_opp *opp = NULL, *temp;
+
+	mutex_lock(&opp_table->lock);
+	list_for_each_entry(temp, &opp_table->opp_list, node) {
+		if (dynamic == temp->dynamic) {
+			opp = temp;
+			break;
+		}
+	}
+
+	mutex_unlock(&opp_table->lock);
+	return opp;
+}
+
 bool _opp_remove_all_static(struct opp_table *opp_table)
 {
-	struct dev_pm_opp *opp, *tmp;
-	bool ret = true;
+	struct dev_pm_opp *opp;
 
 	mutex_lock(&opp_table->lock);
 
 	if (!opp_table->parsed_static_opps) {
-		ret = false;
-		goto unlock;
+		mutex_unlock(&opp_table->lock);
+		return false;
 	}
 
-	if (--opp_table->parsed_static_opps)
-		goto unlock;
-
-	list_for_each_entry_safe(opp, tmp, &opp_table->opp_list, node) {
-		if (!opp->dynamic)
-			dev_pm_opp_put_unlocked(opp);
+	if (--opp_table->parsed_static_opps) {
+		mutex_unlock(&opp_table->lock);
+		return true;
 	}
 
-unlock:
 	mutex_unlock(&opp_table->lock);
 
-	return ret;
+	/*
+	 * Can't remove the OPP from under the lock, debugfs removal needs to
+	 * happen lock less to avoid circular dependency issues.
+	 */
+	while ((opp = _opp_get_next(opp_table, false)))
+		dev_pm_opp_put(opp);
+
+	return true;
 }
 
 /**
@@ -1377,21 +1377,21 @@ bool _opp_remove_all_static(struct opp_table *opp_table)
 void dev_pm_opp_remove_all_dynamic(struct device *dev)
 {
 	struct opp_table *opp_table;
-	struct dev_pm_opp *opp, *temp;
+	struct dev_pm_opp *opp;
 	int count = 0;
 
 	opp_table = _find_opp_table(dev);
 	if (IS_ERR(opp_table))
 		return;
 
-	mutex_lock(&opp_table->lock);
-	list_for_each_entry_safe(opp, temp, &opp_table->opp_list, node) {
-		if (opp->dynamic) {
-			dev_pm_opp_put_unlocked(opp);
-			count++;
-		}
+	/*
+	 * Can't remove the OPP from under the lock, debugfs removal needs to
+	 * happen lock less to avoid circular dependency issues.
+	 */
+	while ((opp = _opp_get_next(opp_table, true))) {
+		dev_pm_opp_put(opp);
+		count++;
 	}
-	mutex_unlock(&opp_table->lock);
 
 	/* Drop the references taken by dev_pm_opp_add() */
 	while (count--)
Viresh Kumar Dec. 7, 2020, 6:16 a.m. UTC | #20
On 19-11-20, 11:35, Viresh Kumar wrote:
> On 18-11-20, 08:53, Rob Clark wrote:

> > On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > >

> > > On 17-11-20, 09:02, Rob Clark wrote:

> > > > With that on top of the previous patch,

> > >

> > > Don't you still have this ? Which fixed the lockdep in the remove path.

> > >

> > > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/

> > >

> > > To make it clear you need these patches to fix the OPP stuff:

> > >

> > > //From 5.10-rc3 (the one from the above link).

> > > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")

> 

> This fixes debugfs stuff while the OPP table is removed.

> 

> > > //Below two from linux-next

> > > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")

> > > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")

> 

> This fixes debugfs stuff while the OPP table is added.

> 

> > > This matches the diff I gave you earlier.

> > >

> > 

> > no, I did not have all three, only "opp: Allocate the OPP table

> > outside of opp_table_lock" plus the fixup.  But with all three:

> 

> And looking at the lockdep you gave now, it looks like we have a

> problem with OPP table's internal lock (opp_table->lock) as well apart

> from the global opp_table_lock.

> 

> I wish there was a way for me to reproduce the lockdep :(

> 

> I know this is exhausting for both of us and I really want to be over

> with it as soon as possible, this really should be the last patch

> here, please try this along with other two. This fixes the debugfs

> thing while the OPPs in the OPP table are removed (they are already

> added without a lock around debugfs stuff).

> 

> AFAIU, there is no further debugfs stuff that happens from within the

> locks and so this really should be the last patch unless I missed

> something.


Rob, were you able to test this patch ?

-- 
viresh
Viresh Kumar Dec. 16, 2020, 5:22 a.m. UTC | #21
On 07-12-20, 11:46, Viresh Kumar wrote:
> On 19-11-20, 11:35, Viresh Kumar wrote:

> > On 18-11-20, 08:53, Rob Clark wrote:

> > > On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > > >

> > > > On 17-11-20, 09:02, Rob Clark wrote:

> > > > > With that on top of the previous patch,

> > > >

> > > > Don't you still have this ? Which fixed the lockdep in the remove path.

> > > >

> > > > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/

> > > >

> > > > To make it clear you need these patches to fix the OPP stuff:

> > > >

> > > > //From 5.10-rc3 (the one from the above link).

> > > > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")

> > 

> > This fixes debugfs stuff while the OPP table is removed.

> > 

> > > > //Below two from linux-next

> > > > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")

> > > > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")

> > 

> > This fixes debugfs stuff while the OPP table is added.

> > 

> > > > This matches the diff I gave you earlier.

> > > >

> > > 

> > > no, I did not have all three, only "opp: Allocate the OPP table

> > > outside of opp_table_lock" plus the fixup.  But with all three:

> > 

> > And looking at the lockdep you gave now, it looks like we have a

> > problem with OPP table's internal lock (opp_table->lock) as well apart

> > from the global opp_table_lock.

> > 

> > I wish there was a way for me to reproduce the lockdep :(

> > 

> > I know this is exhausting for both of us and I really want to be over

> > with it as soon as possible, this really should be the last patch

> > here, please try this along with other two. This fixes the debugfs

> > thing while the OPPs in the OPP table are removed (they are already

> > added without a lock around debugfs stuff).

> > 

> > AFAIU, there is no further debugfs stuff that happens from within the

> > locks and so this really should be the last patch unless I missed

> > something.

> 

> Rob, were you able to test this patch ?


FWIW, this patch and everything else I had is merged into Linus's
master. You can test 5.11-rc1 to see if you still see a lockdep or
not.

-- 
viresh
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 002130d826aa..a9422d043bfe 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -744,11 +744,20 @@  int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = submit_lookup_objects(submit, args, file);
 	if (ret)
-		goto out;
+		goto out_pre_pm;
 
 	ret = submit_lookup_cmds(submit, args, file);
 	if (ret)
-		goto out;
+		goto out_pre_pm;
+
+	/*
+	 * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
+	 * in the resume path, we need to to rpm get before we lock objs.
+	 * Which unfortunately might involve powering up the GPU sooner than
+	 * is necessary.  But at least in the explicit fencing case, we will
+	 * have already done all the fence waiting.
+	 */
+	pm_runtime_get_sync(&gpu->pdev->dev);
 
 	/* copy_*_user while holding a ww ticket upsets lockdep */
 	ww_acquire_init(&submit->ticket, &reservation_ww_class);
@@ -825,6 +834,8 @@  int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 
 out:
+	pm_runtime_put(&gpu->pdev->dev);
+out_pre_pm:
 	submit_cleanup(submit);
 	if (has_ww_ticket)
 		ww_acquire_fini(&submit->ticket);