diff mbox series

[v1] PM: sleep: core: Synchronize runtime PM status of parents and children

Message ID 12619233.O9o76ZdvQC@rjwysocki.net
State New
Headers show
Series [v1] PM: sleep: core: Synchronize runtime PM status of parents and children | expand

Commit Message

Rafael J. Wysocki Jan. 28, 2025, 7:24 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
resume phase") overlooked the case in which the parent of a device with
DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
suspended before a transition into a system-wide sleep state.  In that
case, if the child is resumed during the subsequent transition from
that state into the working state, its runtime PM status will be set to
RPM_ACTIVE, but the runtime PM status of the parent will not be updated
accordingly, even though the parent will be resumed too, because of the
dev_pm_skip_suspend() check in device_resume_noirq().

Address this problem by tracking the need to set the runtime PM status
to RPM_ACTIVE during system-wide resume transitions for devices with
DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.

Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
Reported-by: Johan Hovold <johan@kernel.org>
Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/main.c |   29 ++++++++++++++++++++---------
 include/linux/pm.h        |    1 +
 2 files changed, 21 insertions(+), 9 deletions(-)

Comments

Johan Hovold Jan. 29, 2025, 8:31 a.m. UTC | #1
On Tue, Jan 28, 2025 at 08:24:41PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> resume phase") overlooked the case in which the parent of a device with
> DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> suspended before a transition into a system-wide sleep state.  In that
> case, if the child is resumed during the subsequent transition from
> that state into the working state, its runtime PM status will be set to
> RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> accordingly, even though the parent will be resumed too, because of the
> dev_pm_skip_suspend() check in device_resume_noirq().
> 
> Address this problem by tracking the need to set the runtime PM status
> to RPM_ACTIVE during system-wide resume transitions for devices with
> DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> 
> Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> Reported-by: Johan Hovold <johan@kernel.org>
> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks for tracking this down Mani, and thanks Rafael for the quick fix.

As expected this makes the warning go away also in my setup, and the
patch itself looks correct to me:

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan
Ulf Hansson Jan. 29, 2025, 11:52 a.m. UTC | #2
On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> resume phase") overlooked the case in which the parent of a device with
> DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> suspended before a transition into a system-wide sleep state.  In that
> case, if the child is resumed during the subsequent transition from
> that state into the working state, its runtime PM status will be set to
> RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> accordingly, even though the parent will be resumed too, because of the
> dev_pm_skip_suspend() check in device_resume_noirq().
>
> Address this problem by tracking the need to set the runtime PM status
> to RPM_ACTIVE during system-wide resume transitions for devices with
> DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
>
> Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> Reported-by: Johan Hovold <johan@kernel.org>
> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
>  include/linux/pm.h        |    1 +
>  2 files changed, 21 insertions(+), 9 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -656,13 +656,15 @@
>          * so change its status accordingly.
>          *
>          * Otherwise, the device is going to be resumed, so set its PM-runtime
> -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> -        * to avoid confusing drivers that don't use it.
> +        * status to "active" unless its power.set_active flag is clear, in
> +        * which case it is not necessary to update its PM-runtime status.
>          */
> -       if (skip_resume)
> +       if (skip_resume) {
>                 pm_runtime_set_suspended(dev);
> -       else if (dev_pm_skip_suspend(dev))
> +       } else if (dev->power.set_active) {
>                 pm_runtime_set_active(dev);
> +               dev->power.set_active = false;
> +       }
>
>         if (dev->pm_domain) {
>                 info = "noirq power domain ";
> @@ -1189,18 +1191,24 @@
>         return PMSG_ON;
>  }
>
> -static void dpm_superior_set_must_resume(struct device *dev)
> +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
>  {
>         struct device_link *link;
>         int idx;
>
> -       if (dev->parent)
> +       if (dev->parent) {
>                 dev->parent->power.must_resume = true;
> +               if (set_active)
> +                       dev->parent->power.set_active = true;
> +       }
>
>         idx = device_links_read_lock();
>
> -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
>                 link->supplier->power.must_resume = true;
> +               if (set_active)
> +                       link->supplier->power.set_active = true;

If I understand correctly, the suppliers are already handled when the
pm_runtime_set_active() is called for consumers, so the above should
not be needed.

That said, maybe we instead allow parent/child to work in the similar
way as for consumer/suppliers, when pm_runtime_set_active() is called
for the child. In other words, when pm_runtime_set_active() is called
for a child and the parent is runtime PM enabled, let's runtime resume
it too, as we do for suppliers. Would that work, you think?

> +       }
>
>         device_links_read_unlock(idx);
>  }
> @@ -1278,8 +1286,11 @@
>               dev->power.may_skip_resume))
>                 dev->power.must_resume = true;
>
> -       if (dev->power.must_resume)
> -               dpm_superior_set_must_resume(dev);
> +       if (dev->power.must_resume) {
> +               dev->power.set_active = dev->power.set_active ||
> +                       dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
> +               dpm_superior_set_must_resume(dev, dev->power.set_active);
> +       }
>
>  Complete:
>         complete_all(&dev->power.completion);
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -683,6 +683,7 @@
>         bool                    no_pm_callbacks:1;      /* Owned by the PM core */
>         bool                    async_in_progress:1;    /* Owned by the PM core */
>         bool                    must_resume:1;          /* Owned by the PM core */
> +       bool                    set_active:1;           /* Owned by the PM core */
>         bool                    may_skip_resume:1;      /* Set by subsystems */
>  #else
>         bool                    should_wakeup:1;
>
>
>

Kind regards
Uffe
Rafael J. Wysocki Jan. 29, 2025, 3:55 p.m. UTC | #3
On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > resume phase") overlooked the case in which the parent of a device with
> > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > suspended before a transition into a system-wide sleep state.  In that
> > case, if the child is resumed during the subsequent transition from
> > that state into the working state, its runtime PM status will be set to
> > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > accordingly, even though the parent will be resumed too, because of the
> > dev_pm_skip_suspend() check in device_resume_noirq().
> >
> > Address this problem by tracking the need to set the runtime PM status
> > to RPM_ACTIVE during system-wide resume transitions for devices with
> > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> >
> > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > Reported-by: Johan Hovold <johan@kernel.org>
> > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
> >  include/linux/pm.h        |    1 +
> >  2 files changed, 21 insertions(+), 9 deletions(-)
> >
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -656,13 +656,15 @@
> >          * so change its status accordingly.
> >          *
> >          * Otherwise, the device is going to be resumed, so set its PM-runtime
> > -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> > -        * to avoid confusing drivers that don't use it.
> > +        * status to "active" unless its power.set_active flag is clear, in
> > +        * which case it is not necessary to update its PM-runtime status.
> >          */
> > -       if (skip_resume)
> > +       if (skip_resume) {
> >                 pm_runtime_set_suspended(dev);
> > -       else if (dev_pm_skip_suspend(dev))
> > +       } else if (dev->power.set_active) {
> >                 pm_runtime_set_active(dev);
> > +               dev->power.set_active = false;
> > +       }
> >
> >         if (dev->pm_domain) {
> >                 info = "noirq power domain ";
> > @@ -1189,18 +1191,24 @@
> >         return PMSG_ON;
> >  }
> >
> > -static void dpm_superior_set_must_resume(struct device *dev)
> > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> >  {
> >         struct device_link *link;
> >         int idx;
> >
> > -       if (dev->parent)
> > +       if (dev->parent) {
> >                 dev->parent->power.must_resume = true;
> > +               if (set_active)
> > +                       dev->parent->power.set_active = true;
> > +       }
> >
> >         idx = device_links_read_lock();
> >
> > -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> > +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> >                 link->supplier->power.must_resume = true;
> > +               if (set_active)
> > +                       link->supplier->power.set_active = true;
>
> If I understand correctly, the suppliers are already handled when the
> pm_runtime_set_active() is called for consumers, so the above should
> not be needed.

It is needed because pm_runtime_set_active() doesn't cause the setting
to propagate to the parent's/suppliers of the suppliers AFAICS.

> That said, maybe we instead allow parent/child to work in the similar
> way as for consumer/suppliers, when pm_runtime_set_active() is called
> for the child. In other words, when pm_runtime_set_active() is called
> for a child and the parent is runtime PM enabled, let's runtime resume
> it too, as we do for suppliers. Would that work, you think?

The parent is not runtime-PM enabled when this happens.
Ulf Hansson Jan. 29, 2025, 4:42 p.m. UTC | #4
On Wed, 29 Jan 2025 at 16:55, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > > resume phase") overlooked the case in which the parent of a device with
> > > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > > suspended before a transition into a system-wide sleep state.  In that
> > > case, if the child is resumed during the subsequent transition from
> > > that state into the working state, its runtime PM status will be set to
> > > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > > accordingly, even though the parent will be resumed too, because of the
> > > dev_pm_skip_suspend() check in device_resume_noirq().
> > >
> > > Address this problem by tracking the need to set the runtime PM status
> > > to RPM_ACTIVE during system-wide resume transitions for devices with
> > > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> > >
> > > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > > Reported-by: Johan Hovold <johan@kernel.org>
> > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
> > >  include/linux/pm.h        |    1 +
> > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > >
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -656,13 +656,15 @@
> > >          * so change its status accordingly.
> > >          *
> > >          * Otherwise, the device is going to be resumed, so set its PM-runtime
> > > -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> > > -        * to avoid confusing drivers that don't use it.
> > > +        * status to "active" unless its power.set_active flag is clear, in
> > > +        * which case it is not necessary to update its PM-runtime status.
> > >          */
> > > -       if (skip_resume)
> > > +       if (skip_resume) {
> > >                 pm_runtime_set_suspended(dev);
> > > -       else if (dev_pm_skip_suspend(dev))
> > > +       } else if (dev->power.set_active) {
> > >                 pm_runtime_set_active(dev);
> > > +               dev->power.set_active = false;
> > > +       }
> > >
> > >         if (dev->pm_domain) {
> > >                 info = "noirq power domain ";
> > > @@ -1189,18 +1191,24 @@
> > >         return PMSG_ON;
> > >  }
> > >
> > > -static void dpm_superior_set_must_resume(struct device *dev)
> > > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> > >  {
> > >         struct device_link *link;
> > >         int idx;
> > >
> > > -       if (dev->parent)
> > > +       if (dev->parent) {
> > >                 dev->parent->power.must_resume = true;
> > > +               if (set_active)
> > > +                       dev->parent->power.set_active = true;
> > > +       }
> > >
> > >         idx = device_links_read_lock();
> > >
> > > -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> > > +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> > >                 link->supplier->power.must_resume = true;
> > > +               if (set_active)
> > > +                       link->supplier->power.set_active = true;
> >
> > If I understand correctly, the suppliers are already handled when the
> > pm_runtime_set_active() is called for consumers, so the above should
> > not be needed.
>
> It is needed because pm_runtime_set_active() doesn't cause the setting
> to propagate to the parent's/suppliers of the suppliers AFAICS.

Hmm, even if that sounds reasonable, I don't think it's a good idea as
it may introduce interesting propagation problems between drivers.

For example, consider that Saravana is trying to enable runtime PM for
fw_devlinks. It would mean synchronization issues for the runtime PM
status, all over the place.

That said, is even consumer/suppliers part of the problem we are
trying to solve?

>
> > That said, maybe we instead allow parent/child to work in the similar
> > way as for consumer/suppliers, when pm_runtime_set_active() is called
> > for the child. In other words, when pm_runtime_set_active() is called
> > for a child and the parent is runtime PM enabled, let's runtime resume
> > it too, as we do for suppliers. Would that work, you think?
>
> The parent is not runtime-PM enabled when this happens.

That sounds really weird to me.

Does that mean that the parent has not been system resumed either? If
so, isn't that really the root cause for this problem, or what am I
missing?

Kind regards
Uffe
Rafael J. Wysocki Jan. 29, 2025, 4:58 p.m. UTC | #5
On Wed, Jan 29, 2025 at 5:42 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 29 Jan 2025 at 16:55, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > > > resume phase") overlooked the case in which the parent of a device with
> > > > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > > > suspended before a transition into a system-wide sleep state.  In that
> > > > case, if the child is resumed during the subsequent transition from
> > > > that state into the working state, its runtime PM status will be set to
> > > > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > > > accordingly, even though the parent will be resumed too, because of the
> > > > dev_pm_skip_suspend() check in device_resume_noirq().
> > > >
> > > > Address this problem by tracking the need to set the runtime PM status
> > > > to RPM_ACTIVE during system-wide resume transitions for devices with
> > > > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> > > >
> > > > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > > > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > > > Reported-by: Johan Hovold <johan@kernel.org>
> > > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
> > > >  include/linux/pm.h        |    1 +
> > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > >
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -656,13 +656,15 @@
> > > >          * so change its status accordingly.
> > > >          *
> > > >          * Otherwise, the device is going to be resumed, so set its PM-runtime
> > > > -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> > > > -        * to avoid confusing drivers that don't use it.
> > > > +        * status to "active" unless its power.set_active flag is clear, in
> > > > +        * which case it is not necessary to update its PM-runtime status.
> > > >          */
> > > > -       if (skip_resume)
> > > > +       if (skip_resume) {
> > > >                 pm_runtime_set_suspended(dev);
> > > > -       else if (dev_pm_skip_suspend(dev))
> > > > +       } else if (dev->power.set_active) {
> > > >                 pm_runtime_set_active(dev);
> > > > +               dev->power.set_active = false;
> > > > +       }
> > > >
> > > >         if (dev->pm_domain) {
> > > >                 info = "noirq power domain ";
> > > > @@ -1189,18 +1191,24 @@
> > > >         return PMSG_ON;
> > > >  }
> > > >
> > > > -static void dpm_superior_set_must_resume(struct device *dev)
> > > > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> > > >  {
> > > >         struct device_link *link;
> > > >         int idx;
> > > >
> > > > -       if (dev->parent)
> > > > +       if (dev->parent) {
> > > >                 dev->parent->power.must_resume = true;
> > > > +               if (set_active)
> > > > +                       dev->parent->power.set_active = true;
> > > > +       }
> > > >
> > > >         idx = device_links_read_lock();
> > > >
> > > > -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> > > > +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> > > >                 link->supplier->power.must_resume = true;
> > > > +               if (set_active)
> > > > +                       link->supplier->power.set_active = true;
> > >
> > > If I understand correctly, the suppliers are already handled when the
> > > pm_runtime_set_active() is called for consumers, so the above should
> > > not be needed.
> >
> > It is needed because pm_runtime_set_active() doesn't cause the setting
> > to propagate to the parent's/suppliers of the suppliers AFAICS.
>
> Hmm, even if that sounds reasonable, I don't think it's a good idea as
> it may introduce interesting propagation problems between drivers.
>
> For example, consider that Saravana is trying to enable runtime PM for
> fw_devlinks. It would mean synchronization issues for the runtime PM
> status, all over the place.

What synchronization issues?

> That said, is even consumer/suppliers part of the problem we are
> trying to solve?

They are in general.

It's just that stuff that was runtime-suspended prior to a system-wide
suspend may need to be resumed and marked as RPM_ACTIVE during
system-wide resume because one of the devices wants/needs to be
resumed then.

If this turns out to be problematic, the problem will need to be
addressed, but for now I'm not seeing why there would be a problem.

> >
> > > That said, maybe we instead allow parent/child to work in the similar
> > > way as for consumer/suppliers, when pm_runtime_set_active() is called
> > > for the child. In other words, when pm_runtime_set_active() is called
> > > for a child and the parent is runtime PM enabled, let's runtime resume
> > > it too, as we do for suppliers. Would that work, you think?
> >
> > The parent is not runtime-PM enabled when this happens.
>
> That sounds really weird to me.
>
> Does that mean that the parent has not been system resumed either?

Yes.

It hasn't been resumed yet, but it is known that it will be resumed.

> If so, isn't that really the root cause for this problem,

No, it is not.

> or what am I missing?

Essentially, what I said above.

If a device that was suspended prior to a system-wide suspend
wants/needs to be resumed during the subsequent system-wide resume,
and it was runtime-PM-enabled before the suspend transition, it needs
to (a) be runtime-PM-enabled during the subsequent system-wide resume
transition and (b) it also needs to be marked as RPM_ACTIVE because in
fact it is not suspended any more.  The existing code before the patch
takes care of this for the device itself, but not for the devices it
depends on which also need to be resumed (which happens) and marked as
RPM_ACTIVE (which doesn't happen) and the patch just makes sure that
the latter will happen.

Actually, what happens now is that the actual state of the parent
during the system-wide resume, right before re-enabling runtime PM for
it, does not match its runtime PM status which is still RPM_SUSPENDED.
That's what is fixed here and it applies to the parent as well as to
all of the other devices depended on by the child and the parent.
Ulf Hansson Jan. 30, 2025, 11:11 a.m. UTC | #6
On Wed, 29 Jan 2025 at 17:58, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Jan 29, 2025 at 5:42 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Wed, 29 Jan 2025 at 16:55, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >
> > > > > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > > > > resume phase") overlooked the case in which the parent of a device with
> > > > > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > > > > suspended before a transition into a system-wide sleep state.  In that
> > > > > case, if the child is resumed during the subsequent transition from
> > > > > that state into the working state, its runtime PM status will be set to
> > > > > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > > > > accordingly, even though the parent will be resumed too, because of the
> > > > > dev_pm_skip_suspend() check in device_resume_noirq().
> > > > >
> > > > > Address this problem by tracking the need to set the runtime PM status
> > > > > to RPM_ACTIVE during system-wide resume transitions for devices with
> > > > > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> > > > >
> > > > > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > > > > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > > > > Reported-by: Johan Hovold <johan@kernel.org>
> > > > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > ---
> > > > >  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
> > > > >  include/linux/pm.h        |    1 +
> > > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > > >
> > > > > --- a/drivers/base/power/main.c
> > > > > +++ b/drivers/base/power/main.c
> > > > > @@ -656,13 +656,15 @@
> > > > >          * so change its status accordingly.
> > > > >          *
> > > > >          * Otherwise, the device is going to be resumed, so set its PM-runtime
> > > > > -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> > > > > -        * to avoid confusing drivers that don't use it.
> > > > > +        * status to "active" unless its power.set_active flag is clear, in
> > > > > +        * which case it is not necessary to update its PM-runtime status.
> > > > >          */
> > > > > -       if (skip_resume)
> > > > > +       if (skip_resume) {
> > > > >                 pm_runtime_set_suspended(dev);
> > > > > -       else if (dev_pm_skip_suspend(dev))
> > > > > +       } else if (dev->power.set_active) {
> > > > >                 pm_runtime_set_active(dev);
> > > > > +               dev->power.set_active = false;
> > > > > +       }
> > > > >
> > > > >         if (dev->pm_domain) {
> > > > >                 info = "noirq power domain ";
> > > > > @@ -1189,18 +1191,24 @@
> > > > >         return PMSG_ON;
> > > > >  }
> > > > >
> > > > > -static void dpm_superior_set_must_resume(struct device *dev)
> > > > > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> > > > >  {
> > > > >         struct device_link *link;
> > > > >         int idx;
> > > > >
> > > > > -       if (dev->parent)
> > > > > +       if (dev->parent) {
> > > > >                 dev->parent->power.must_resume = true;
> > > > > +               if (set_active)
> > > > > +                       dev->parent->power.set_active = true;
> > > > > +       }
> > > > >
> > > > >         idx = device_links_read_lock();
> > > > >
> > > > > -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> > > > > +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> > > > >                 link->supplier->power.must_resume = true;
> > > > > +               if (set_active)
> > > > > +                       link->supplier->power.set_active = true;
> > > >
> > > > If I understand correctly, the suppliers are already handled when the
> > > > pm_runtime_set_active() is called for consumers, so the above should
> > > > not be needed.
> > >
> > > It is needed because pm_runtime_set_active() doesn't cause the setting
> > > to propagate to the parent's/suppliers of the suppliers AFAICS.
> >
> > Hmm, even if that sounds reasonable, I don't think it's a good idea as
> > it may introduce interesting propagation problems between drivers.
> >
> > For example, consider that Saravana is trying to enable runtime PM for
> > fw_devlinks. It would mean synchronization issues for the runtime PM
> > status, all over the place.
>
> What synchronization issues?

Changing the runtime PM status for a parent/supplier that doesn't have
DPM_FLAG_SMART_SUSPEND, is likely to confuse their drivers.

You also removed that part of the comment a few lines above, in
device_resume_noirq(). I am not sure I understand why?

>
> > That said, is even consumer/suppliers part of the problem we are
> > trying to solve?
>
> They are in general.
>
> It's just that stuff that was runtime-suspended prior to a system-wide
> suspend may need to be resumed and marked as RPM_ACTIVE during
> system-wide resume because one of the devices wants/needs to be
> resumed then.
>
> If this turns out to be problematic, the problem will need to be
> addressed, but for now I'm not seeing why there would be a problem.
>
> > >
> > > > That said, maybe we instead allow parent/child to work in the similar
> > > > way as for consumer/suppliers, when pm_runtime_set_active() is called
> > > > for the child. In other words, when pm_runtime_set_active() is called
> > > > for a child and the parent is runtime PM enabled, let's runtime resume
> > > > it too, as we do for suppliers. Would that work, you think?
> > >
> > > The parent is not runtime-PM enabled when this happens.
> >
> > That sounds really weird to me.
> >
> > Does that mean that the parent has not been system resumed either?
>
> Yes.
>
> It hasn't been resumed yet, but it is known that it will be resumed.
>
> > If so, isn't that really the root cause for this problem,
>
> No, it is not.
>
> > or what am I missing?
>
> Essentially, what I said above.
>
> If a device that was suspended prior to a system-wide suspend
> wants/needs to be resumed during the subsequent system-wide resume,
> and it was runtime-PM-enabled before the suspend transition, it needs
> to (a) be runtime-PM-enabled during the subsequent system-wide resume
> transition and (b) it also needs to be marked as RPM_ACTIVE because in
> fact it is not suspended any more.  The existing code before the patch
> takes care of this for the device itself, but not for the devices it
> depends on which also need to be resumed (which happens) and marked as
> RPM_ACTIVE (which doesn't happen) and the patch just makes sure that
> the latter will happen.

Thanks for clarifying!

>
> Actually, what happens now is that the actual state of the parent
> during the system-wide resume, right before re-enabling runtime PM for
> it, does not match its runtime PM status which is still RPM_SUSPENDED.
> That's what is fixed here and it applies to the parent as well as to
> all of the other devices depended on by the child and the parent.

Well, unfortunately I don't think it will work to call
pm_runtime_set_active() for parents/suppliers like this.

I think we need the drivers for the parents/suppliers to be in
agreement with the behaviour of DPM_FLAG_SMART_SUSPEND to allow the
propagation. Not sure how to best achieve this though.

Kind regards
Uffe
Rafael J. Wysocki Jan. 30, 2025, 1:19 p.m. UTC | #7
On Thu, Jan 30, 2025 at 12:11 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 29 Jan 2025 at 17:58, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Jan 29, 2025 at 5:42 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Wed, 29 Jan 2025 at 16:55, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > > >
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > >
> > > > > > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > > > > > resume phase") overlooked the case in which the parent of a device with
> > > > > > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > > > > > suspended before a transition into a system-wide sleep state.  In that
> > > > > > case, if the child is resumed during the subsequent transition from
> > > > > > that state into the working state, its runtime PM status will be set to
> > > > > > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > > > > > accordingly, even though the parent will be resumed too, because of the
> > > > > > dev_pm_skip_suspend() check in device_resume_noirq().
> > > > > >
> > > > > > Address this problem by tracking the need to set the runtime PM status
> > > > > > to RPM_ACTIVE during system-wide resume transitions for devices with
> > > > > > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> > > > > >
> > > > > > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > > > > > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > > > > > Reported-by: Johan Hovold <johan@kernel.org>
> > > > > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > > ---
> > > > > >  drivers/base/power/main.c |   29 ++++++++++++++++++++---------
> > > > > >  include/linux/pm.h        |    1 +
> > > > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > --- a/drivers/base/power/main.c
> > > > > > +++ b/drivers/base/power/main.c
> > > > > > @@ -656,13 +656,15 @@
> > > > > >          * so change its status accordingly.
> > > > > >          *
> > > > > >          * Otherwise, the device is going to be resumed, so set its PM-runtime
> > > > > > -        * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> > > > > > -        * to avoid confusing drivers that don't use it.
> > > > > > +        * status to "active" unless its power.set_active flag is clear, in
> > > > > > +        * which case it is not necessary to update its PM-runtime status.
> > > > > >          */
> > > > > > -       if (skip_resume)
> > > > > > +       if (skip_resume) {
> > > > > >                 pm_runtime_set_suspended(dev);
> > > > > > -       else if (dev_pm_skip_suspend(dev))
> > > > > > +       } else if (dev->power.set_active) {
> > > > > >                 pm_runtime_set_active(dev);
> > > > > > +               dev->power.set_active = false;
> > > > > > +       }
> > > > > >
> > > > > >         if (dev->pm_domain) {
> > > > > >                 info = "noirq power domain ";
> > > > > > @@ -1189,18 +1191,24 @@
> > > > > >         return PMSG_ON;
> > > > > >  }
> > > > > >
> > > > > > -static void dpm_superior_set_must_resume(struct device *dev)
> > > > > > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> > > > > >  {
> > > > > >         struct device_link *link;
> > > > > >         int idx;
> > > > > >
> > > > > > -       if (dev->parent)
> > > > > > +       if (dev->parent) {
> > > > > >                 dev->parent->power.must_resume = true;
> > > > > > +               if (set_active)
> > > > > > +                       dev->parent->power.set_active = true;
> > > > > > +       }
> > > > > >
> > > > > >         idx = device_links_read_lock();
> > > > > >
> > > > > > -       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> > > > > > +       list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> > > > > >                 link->supplier->power.must_resume = true;
> > > > > > +               if (set_active)
> > > > > > +                       link->supplier->power.set_active = true;
> > > > >
> > > > > If I understand correctly, the suppliers are already handled when the
> > > > > pm_runtime_set_active() is called for consumers, so the above should
> > > > > not be needed.
> > > >
> > > > It is needed because pm_runtime_set_active() doesn't cause the setting
> > > > to propagate to the parent's/suppliers of the suppliers AFAICS.
> > >
> > > Hmm, even if that sounds reasonable, I don't think it's a good idea as
> > > it may introduce interesting propagation problems between drivers.
> > >
> > > For example, consider that Saravana is trying to enable runtime PM for
> > > fw_devlinks. It would mean synchronization issues for the runtime PM
> > > status, all over the place.
> >
> > What synchronization issues?
>
> Changing the runtime PM status for a parent/supplier that doesn't have
> DPM_FLAG_SMART_SUSPEND, is likely to confuse their drivers.

I'm not sure why though.

> You also removed that part of the comment a few lines above, in
> device_resume_noirq(). I am not sure I understand why?

Not removed, but replaced.

The set_active flag is only set for devices with
DPM_FLAG_SMART_SUSPEND set and devices depended on by them.  Also, it
is only set for devices whose must_resume is set, which for devices
with DPM_FLAG_SMART_SUSPEND means that they literally must be resumed.
Consequently, the devices depended on by them also must be resumed.

> >
> > > That said, is even consumer/suppliers part of the problem we are
> > > trying to solve?
> >
> > They are in general.
> >
> > It's just that stuff that was runtime-suspended prior to a system-wide
> > suspend may need to be resumed and marked as RPM_ACTIVE during
> > system-wide resume because one of the devices wants/needs to be
> > resumed then.
> >
> > If this turns out to be problematic, the problem will need to be
> > addressed, but for now I'm not seeing why there would be a problem.
> >
> > > >
> > > > > That said, maybe we instead allow parent/child to work in the similar
> > > > > way as for consumer/suppliers, when pm_runtime_set_active() is called
> > > > > for the child. In other words, when pm_runtime_set_active() is called
> > > > > for a child and the parent is runtime PM enabled, let's runtime resume
> > > > > it too, as we do for suppliers. Would that work, you think?
> > > >
> > > > The parent is not runtime-PM enabled when this happens.
> > >
> > > That sounds really weird to me.
> > >
> > > Does that mean that the parent has not been system resumed either?
> >
> > Yes.
> >
> > It hasn't been resumed yet, but it is known that it will be resumed.
> >
> > > If so, isn't that really the root cause for this problem,
> >
> > No, it is not.
> >
> > > or what am I missing?
> >
> > Essentially, what I said above.
> >
> > If a device that was suspended prior to a system-wide suspend
> > wants/needs to be resumed during the subsequent system-wide resume,
> > and it was runtime-PM-enabled before the suspend transition, it needs
> > to (a) be runtime-PM-enabled during the subsequent system-wide resume
> > transition and (b) it also needs to be marked as RPM_ACTIVE because in
> > fact it is not suspended any more.  The existing code before the patch
> > takes care of this for the device itself, but not for the devices it
> > depends on which also need to be resumed (which happens) and marked as
> > RPM_ACTIVE (which doesn't happen) and the patch just makes sure that
> > the latter will happen.
>
> Thanks for clarifying!
>
> >
> > Actually, what happens now is that the actual state of the parent
> > during the system-wide resume, right before re-enabling runtime PM for
> > it, does not match its runtime PM status which is still RPM_SUSPENDED.
> > That's what is fixed here and it applies to the parent as well as to
> > all of the other devices depended on by the child and the parent.
>
> Well, unfortunately I don't think it will work to call
> pm_runtime_set_active() for parents/suppliers like this.

As stated above, if a device with DPM_FLAG_SMART_SUSPEND has
power.must_resume set, it must be resumed.  Therefore, all of the
devices depended on by it must be resumed (literally, they need to be
powered up and configured to work).  This is already a rule without
the patch.

Accordingly, they all effectively will be "active" and so their
runtime PM status must reflect this.

I don't see anything that cannot work, but I see why it is broken
without this patch.

> I think we need the drivers for the parents/suppliers to be in
> agreement with the behaviour of DPM_FLAG_SMART_SUSPEND to allow the
> propagation. Not sure how to best achieve this though.

It would be good to actually identify the cases in which it may not
work and they can be fixed on top of this patch.
Ulf Hansson Feb. 3, 2025, 12:12 p.m. UTC | #8
[...]

> >
> > The problem with $subject patch is that drivers/buses are required to
> > check the runtime PM status, with pm_runtime_suspended(),
> > pm_runtime_status_suspended() or pm_runtime_active() in one of its
> > system suspend/resume callbacks , to synchronize it with the HW state
> > for its device (turn on/off clocks for example).
>
> Well, I'm kind of unaware of this requirement.
>
> It clearly is not even followed by the code without the $subject patch.
>
> The real requirement is that the runtime PM status at the point when
> runtime PM is re-enabled, that is in device_resume_early(), must
> reflect the current actual HW state.

Right. Seems like we are in agreement, just that there seems to be
multiple ways to describe the similar problem.

>
> > Certainly, we can not rely on drivers to conform to this behaviour and
> > there are plenty of cases where they really don't. For example, we
> > have drivers that only implements runtime PM support or simply don't
> > care about the runtime PM status during system resume, but just leaves
> > the device in the state it is already in.
>
> Drivers that only support runtime PM are broken with respect to system
> sleep ATM.  They need to be made to support system sleep or they
> cannot be used on systems that use system sleep.  There may be a way
> around this for system suspend/resume (see below), but not for
> hibernation.

I think calling them broken may be to take this a step too far.

While I certainly agree that these drivers have room for some
improvements, it looks to me that these drivers work today, but may
not with $subject patch.

>
> > Moreover, we have the users of pm_runtime_force_suspend|resume(),
> > which we also know *not* to work with DPM_FLAG_SMART_SUSPEND and thus
> > $subject patch too. I am less worried about these cases though, as I
> > believe we should be able to fix them, by taking into account the
> > suggested "->power.set_active flag", or something along those lines.
>
> Yes, and that's what I'm going to do.
>
> > That said, it seems like we need another way forward.
>
> I still don't see why, sorry.
>
> I guess the concern is that if a device suddenly needs to be resumed
> during system resume even though it was runtime-suspended before the
> preceding system suspend, there is no way to tell its driver/bus
> type/etc that this is the case if they all decide to leave the device
> as is, but as I have said for multiple times in this thread, leaving a
> device as is during system resume may not be an option unless it is a
> leaf device without any subordinates.  This has always been the case.
>
> We'll see if there is any damage resulting from the $subject change
> and we'll fix it if so.
>
> In the future, though, I'd like to integrate system resume with
> runtime PM more than it is now.  Namely, it should be possible to move
> the re-enabling of runtime PM to the front of device_resume_early()
> and then use pm_runtime_resume() for resuming devices whose drivers
> support runtime PM (I don't see any fundamental obstacles to this).
> One benefit of doing this would be that pm_runtime_resume() would
> automatically trigger a runtime resume for all of the "superior"
> devices, so they could be left in whatever state they had been in
> before the preceding system suspend regardless of what happens to
> their "subordinates".  It is likely that some kind of driver opt-in
> will be needed for this or maybe the core can figure it out by itself.

Right, I am certainly interested to discuss this topic too. It's not
an easy thing and the biggest problem is that we can't really just
change the behaviour without a big risk of breaking things.

Some kind of opt-in behaviour is the only thing that can work, in my
opinion. Anyway, I am here to review and discuss. :-)

>
> It can look at what callbacks are implemented etc.  For example, if a
> driver only implements :runtime_suspend() and :runtime_resume() and no
> other PM callbacks, it is reasonable to assume that the devices
> handled by it should be suspended and resumed with the help of the
> runtime PM infrastructure even during system-wide suspend/resume (that
> doesn't apply to hibernation, though).

Maybe this can work in some opt-in/out way, but certainly not as a
solution for all.

For example, we have subsystems that deal with system suspend/resume
quite differently, where drivers instead implement some subsystem
specific callbacks, rather than the common system suspend/resume ops.

Kind regards
Uffe
Jon Hunter Feb. 7, 2025, 1:38 p.m. UTC | #9
Hi Rafael,

On 28/01/2025 19:24, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> resume phase") overlooked the case in which the parent of a device with
> DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> suspended before a transition into a system-wide sleep state.  In that
> case, if the child is resumed during the subsequent transition from
> that state into the working state, its runtime PM status will be set to
> RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> accordingly, even though the parent will be resumed too, because of the
> dev_pm_skip_suspend() check in device_resume_noirq().
> 
> Address this problem by tracking the need to set the runtime PM status
> to RPM_ACTIVE during system-wide resume transitions for devices with
> DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> 
> Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> Reported-by: Johan Hovold <johan@kernel.org>
> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/base/power/main.c |   29 ++++++++++++++++++++---------
>   include/linux/pm.h        |    1 +
>   2 files changed, 21 insertions(+), 9 deletions(-)
> 
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -656,13 +656,15 @@
>   	 * so change its status accordingly.
>   	 *
>   	 * Otherwise, the device is going to be resumed, so set its PM-runtime
> -	 * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> -	 * to avoid confusing drivers that don't use it.
> +	 * status to "active" unless its power.set_active flag is clear, in
> +	 * which case it is not necessary to update its PM-runtime status.
>   	 */
> -	if (skip_resume)
> +	if (skip_resume) {
>   		pm_runtime_set_suspended(dev);
> -	else if (dev_pm_skip_suspend(dev))
> +	} else if (dev->power.set_active) {
>   		pm_runtime_set_active(dev);
> +		dev->power.set_active = false;
> +	}
>   
>   	if (dev->pm_domain) {
>   		info = "noirq power domain ";
> @@ -1189,18 +1191,24 @@
>   	return PMSG_ON;
>   }
>   
> -static void dpm_superior_set_must_resume(struct device *dev)
> +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
>   {
>   	struct device_link *link;
>   	int idx;
>   
> -	if (dev->parent)
> +	if (dev->parent) {
>   		dev->parent->power.must_resume = true;
> +		if (set_active)
> +			dev->parent->power.set_active = true;
> +	}
>   
>   	idx = device_links_read_lock();
>   
> -	list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> +	list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
>   		link->supplier->power.must_resume = true;
> +		if (set_active)
> +			link->supplier->power.set_active = true;
> +	}
>   
>   	device_links_read_unlock(idx);
>   }
> @@ -1278,8 +1286,11 @@
>   	      dev->power.may_skip_resume))
>   		dev->power.must_resume = true;
>   
> -	if (dev->power.must_resume)
> -		dpm_superior_set_must_resume(dev);
> +	if (dev->power.must_resume) {
> +		dev->power.set_active = dev->power.set_active ||
> +			dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
> +		dpm_superior_set_must_resume(dev, dev->power.set_active);
> +	}
>   
>   Complete:
>   	complete_all(&dev->power.completion);
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -683,6 +683,7 @@
>   	bool			no_pm_callbacks:1;	/* Owned by the PM core */
>   	bool			async_in_progress:1;	/* Owned by the PM core */
>   	bool			must_resume:1;		/* Owned by the PM core */
> +	bool			set_active:1;		/* Owned by the PM core */
>   	bool			may_skip_resume:1;	/* Set by subsystems */
>   #else
>   	bool			should_wakeup:1;


I am seeing the following crash during suspend on a couple of our boards (with mainline/next) and bisect is pointing to this commit ...

[  216.311009] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  216.311229] Mem abort info:
[  216.311298]   ESR = 0x0000000096000004
[  216.311393]   EC = 0x25: DABT (current EL), IL = 32 bits
[  216.311515]   SET = 0, FnV = 0
[  216.311593]   EA = 0, S1PTW = 0
[  216.311668]   FSC = 0x04: level 0 translation fault
[  216.311770] Data abort info:
[  216.311855]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  216.311972]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  216.312081]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  216.312267] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010b905000
[  216.312411] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  216.312620] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  216.312747] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra210_amx snd_soc_tegra210_i2s snd_soc_tegra210_sfc tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib drm_kms_helper snd_soc_tegra210_ahub tegra210_adma drm snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_rt5659 backlight snd_soc_simple_card_utils snd_soc_rl6231 ucsi_ccg typec_ucsi typec pwm_tegra ina3221 tegra_aconnect pwm_fan snd_hda_codec_hdmi snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 tegra_bpmp_thermal host1x pcie_tegra194 ip_tables x_tables ipv6
[  216.354364] CPU: 1 UID: 0 PID: 14450 Comm: rtcwake Tainted: G        W          6.14.0-rc1-next-20250206-g808eb958781e #1
[  216.365388] Tainted: [W]=WARN
[  216.368279] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[  216.375099] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  216.381928] pc : simple_pm_bus_runtime_suspend+0x14/0x48
[  216.387698] lr : pm_generic_runtime_suspend+0x2c/0x44
[  216.392962] sp : ffff80009003baf0
[  216.396103] x29: ffff80009003baf0 x28: ffff000088043480 x27: 0000000000000000
[  216.403453] x26: ffff800082c11350 x25: ffff800082e0e000 x24: ffff80008093d634
[  216.410803] x23: 0000000000000000 x22: 0000000000000002 x21: ffff800082e0e000
[  216.418411] x20: ffff800080938af8 x19: ffff0000808ec410 x18: ffffffffffff4ac8
[  216.425328] x17: 2033303a35353a36 x16: 312031312d39302d x15: 000047edca5d49da
[  216.432937] x14: ffff000088043500 x13: 0000000000000001 x12: 0000000000000000
[  216.440110] x11: 000000321a7856a0 x10: 0000000000000aa0 x9 : ffff80009003b8e0
[  216.447219] x8 : ffff000088043f80 x7 : ffff0003fdf08a40 x6 : 0000000000000000
[  216.454291] x5 : 0000000000000000 x4 : ffff800081ffe8c0 x3 : ffff0000808ec410
[  216.461723] x2 : ffff8000805d133c x1 : 0000000000000000 x0 : 0000000000000000
[  216.468986] Call trace:
[  216.471179]  simple_pm_bus_runtime_suspend+0x14/0x48 (P)
[  216.476775]  pm_generic_runtime_suspend+0x2c/0x44
[  216.481499]  pm_runtime_force_suspend+0x54/0x14c
[  216.486049]  device_suspend_noirq+0x6c/0x278
[  216.490253]  dpm_suspend_noirq+0xc0/0x198
[  216.494278]  suspend_devices_and_enter+0x210/0x4c0
[  216.499348]  pm_suspend+0x164/0x1c8
[  216.503023]  state_store+0x8c/0xfc
[  216.506260]  kobj_attr_store+0x18/0x2c
[  216.509940]  sysfs_kf_write+0x44/0x54
[  216.513699]  kernfs_fop_write_iter+0x118/0x1a8
[  216.518163]  vfs_write+0x2b0/0x35c
[  216.521399]  ksys_write+0x68/0xfc
[  216.524810]  __arm64_sys_write+0x1c/0x28
[  216.528574]  invoke_syscall+0x48/0x110
[  216.532253]  el0_svc_common.constprop.0+0x40/0xe8
[  216.536628]  do_el0_svc+0x20/0x2c
[  216.540299]  el0_svc+0x30/0xd0
[  216.543016]  el0t_64_sync_handler+0x144/0x168
[  216.547736]  el0t_64_sync+0x198/0x19c
[  216.551327] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
[  216.557197] ---[ end trace 0000000000000000 ]---

I have not looked any further, but if you have any thoughts, let me know.

Thanks
Jon
Johan Hovold Feb. 7, 2025, 1:50 p.m. UTC | #10
On Fri, Feb 07, 2025 at 01:38:58PM +0000, Jon Hunter wrote:
> On 28/01/2025 19:24, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > resume phase") overlooked the case in which the parent of a device with
> > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > suspended before a transition into a system-wide sleep state.  In that
> > case, if the child is resumed during the subsequent transition from
> > that state into the working state, its runtime PM status will be set to
> > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > accordingly, even though the parent will be resumed too, because of the
> > dev_pm_skip_suspend() check in device_resume_noirq().
> > 
> > Address this problem by tracking the need to set the runtime PM status
> > to RPM_ACTIVE during system-wide resume transitions for devices with
> > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> > 
> > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > Reported-by: Johan Hovold <johan@kernel.org>
> > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> I am seeing the following crash during suspend on a couple of our boards (with mainline/next) and bisect is pointing to this commit ...
> 
> [  216.311009] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

> [  216.468986] Call trace:
> [  216.471179]  simple_pm_bus_runtime_suspend+0x14/0x48 (P)
> [  216.476775]  pm_generic_runtime_suspend+0x2c/0x44
> [  216.481499]  pm_runtime_force_suspend+0x54/0x14c
> [  216.486049]  device_suspend_noirq+0x6c/0x278
> [  216.490253]  dpm_suspend_noirq+0xc0/0x198
> [  216.494278]  suspend_devices_and_enter+0x210/0x4c0
> [  216.499348]  pm_suspend+0x164/0x1c8
> [  216.503023]  state_store+0x8c/0xfc
> [  216.506260]  kobj_attr_store+0x18/0x2c
> [  216.509940]  sysfs_kf_write+0x44/0x54
> [  216.513699]  kernfs_fop_write_iter+0x118/0x1a8
> [  216.518163]  vfs_write+0x2b0/0x35c
> [  216.521399]  ksys_write+0x68/0xfc
> [  216.524810]  __arm64_sys_write+0x1c/0x28
> [  216.528574]  invoke_syscall+0x48/0x110
> [  216.532253]  el0_svc_common.constprop.0+0x40/0xe8
> [  216.536628]  do_el0_svc+0x20/0x2c
> [  216.540299]  el0_svc+0x30/0xd0
> [  216.543016]  el0t_64_sync_handler+0x144/0x168
> [  216.547736]  el0t_64_sync+0x198/0x19c
> [  216.551327] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
> [  216.557197] ---[ end trace 0000000000000000 ]---
> 
> I have not looked any further, but if you have any thoughts, let me know.

Yeah, I hit something like this yesterday as well and did confirm that
reverting this commit makes the problem go away. Haven't had time to dig
much further.

[  110.522368] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  110.531751] Mem abort info:
[  110.534785]   ESR = 0x0000000096000004
[  110.538799]   EC = 0x25: DABT (current EL), IL = 32 bits
[  110.544421]   SET = 0, FnV = 0
[  110.547716]   EA = 0, S1PTW = 0
[  110.551097]   FSC = 0x04: level 0 translation fault
[  110.556274] Data abort info:
[  110.559385]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  110.565188]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  110.570536]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  110.576157] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000892256000
[  110.582946] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  110.590348] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
...
[  110.742358] CPU: 2 UID: 0 PID: 420 Comm: suspend-test.sh Not tainted 6.13.0 #118
[  110.750067] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
[  110.759198] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  110.766462] pc : simple_pm_bus_runtime_suspend+0x14/0x48
[  110.772048] lr : pm_generic_runtime_suspend+0x2c/0x44
[  110.777352] sp : ffff8000839d3a20
[  110.780866] x29: ffff8000839d3a20 x28: ffff0edf80fff810 x27: ffffa2dc2d1f1d30
[  110.788303] x26: ffffa2dc2dd82000 x25: 0000000000000002 x24: ffffa2dc2cc7aca0
[  110.795741] x23: ffffa2dc2dd1e000 x22: 0000000000000000 x21: ffffa2dc2d090e50
[  110.803177] x20: ffffa2dc2c612498 x19: ffff0edf80fff810 x18: 0000000000000030
[  110.810615] x17: 0005002c00000000 x16: ffffa2dc2c614ce4 x15: ffffffffffffffff
[  110.818052] x14: 0000000000000000 x13: ffff0edf80fff980 x12: 705f64706e65675f
[  110.825490] x11: ffffa2dc2d9c5890 x10: 0000000000000000 x9 : 0000000000000000
[  110.832927] x8 : ffffa2dc2d2af000 x7 : ffff8000839d3a10 x6 : ffff8000839d39b0
[  110.840364] x5 : ffff8000839d4000 x4 : 0000000000000004 x3 : ffff0edf953e0000
[  110.847801] x2 : ffffa2dc2c4e5784 x1 : 0000000000000000 x0 : 0000000000000000
[  110.855238] Call trace:
[  110.857861]  simple_pm_bus_runtime_suspend+0x14/0x48 (P)
[  110.863425]  pm_generic_runtime_suspend+0x2c/0x44
[  110.868362]  pm_runtime_force_suspend+0x54/0x100
[  110.873217]  dpm_run_callback+0xb4/0x228
[  110.877347]  device_suspend_noirq+0x70/0x2a8
[  110.881844]  dpm_noirq_suspend_devices+0xe0/0x230
[  110.886778]  dpm_suspend_noirq+0x24/0x98
[  110.890904]  suspend_devices_and_enter+0x368/0x678
[  110.895941]  pm_suspend+0x1b4/0x348
[  110.899627]  state_store+0x8c/0xfc
[  110.903228]  kobj_attr_store+0x18/0x2c
[  110.907195]  sysfs_kf_write+0x4c/0x78
[  110.911074]  kernfs_fop_write_iter+0x120/0x1b4
[  110.915735]  vfs_write+0x2ac/0x358
[  110.919352]  ksys_write+0x68/0xfc
[  110.922873]  __arm64_sys_write+0x1c/0x28
[  110.927002]  invoke_syscall+0x48/0x110
[  110.930969]  el0_svc_common.constprop.0+0x40/0xe0
[  110.935907]  do_el0_svc+0x1c/0x28
[  110.939427]  el0_svc+0x48/0x114
[  110.942769]  el0t_64_sync_handler+0xc8/0xcc
[  110.947180]  el0t_64_sync+0x198/0x19c
[  110.951059] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
[  110.957428] ---[ end trace 0000000000000000 ]---

Johan
Johan Hovold Feb. 7, 2025, 2:45 p.m. UTC | #11
On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:

> Yeah, I hit something like this yesterday as well and did confirm that
> reverting this commit makes the problem go away. Haven't had time to dig
> much further.
> 
> [  110.522368] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

> [  110.855238] Call trace:
> [  110.857861]  simple_pm_bus_runtime_suspend+0x14/0x48 (P)
> [  110.863425]  pm_generic_runtime_suspend+0x2c/0x44
> [  110.868362]  pm_runtime_force_suspend+0x54/0x100
> [  110.873217]  dpm_run_callback+0xb4/0x228
> [  110.877347]  device_suspend_noirq+0x70/0x2a8
> [  110.881844]  dpm_noirq_suspend_devices+0xe0/0x230
> [  110.886778]  dpm_suspend_noirq+0x24/0x98
> [  110.890904]  suspend_devices_and_enter+0x368/0x678
> [  110.895941]  pm_suspend+0x1b4/0x348
> [  110.899627]  state_store+0x8c/0xfc
> [  110.903228]  kobj_attr_store+0x18/0x2c
> [  110.907195]  sysfs_kf_write+0x4c/0x78
> [  110.911074]  kernfs_fop_write_iter+0x120/0x1b4
> [  110.915735]  vfs_write+0x2ac/0x358
> [  110.919352]  ksys_write+0x68/0xfc
> [  110.922873]  __arm64_sys_write+0x1c/0x28
> [  110.927002]  invoke_syscall+0x48/0x110
> [  110.930969]  el0_svc_common.constprop.0+0x40/0xe0
> [  110.935907]  do_el0_svc+0x1c/0x28
> [  110.939427]  el0_svc+0x48/0x114
> [  110.942769]  el0t_64_sync_handler+0xc8/0xcc
> [  110.947180]  el0t_64_sync+0x198/0x19c
> [  110.951059] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
> [  110.957428] ---[ end trace 0000000000000000 ]---

Ok, so the driver data is never set and runtime PM is never enabled for
this simple bus device, which uses pm_runtime_force_suspend() for system
sleep.

This used to work as the runtime PM state was left at 'suspended', which
makes pm_runtime_force_suspend() return early, but now we can end up
with a call to the driver runtime PM ops that dereference the NULL
driver data.

Johan
Rafael J. Wysocki Feb. 7, 2025, 3:41 p.m. UTC | #12
On Fri, Feb 7, 2025 at 3:45 PM Johan Hovold <johan@kernel.org> wrote:
>
> On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:
>
> > Yeah, I hit something like this yesterday as well and did confirm that
> > reverting this commit makes the problem go away. Haven't had time to dig
> > much further.
> >
> > [  110.522368] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>
> > [  110.855238] Call trace:
> > [  110.857861]  simple_pm_bus_runtime_suspend+0x14/0x48 (P)
> > [  110.863425]  pm_generic_runtime_suspend+0x2c/0x44
> > [  110.868362]  pm_runtime_force_suspend+0x54/0x100
> > [  110.873217]  dpm_run_callback+0xb4/0x228
> > [  110.877347]  device_suspend_noirq+0x70/0x2a8
> > [  110.881844]  dpm_noirq_suspend_devices+0xe0/0x230
> > [  110.886778]  dpm_suspend_noirq+0x24/0x98
> > [  110.890904]  suspend_devices_and_enter+0x368/0x678
> > [  110.895941]  pm_suspend+0x1b4/0x348
> > [  110.899627]  state_store+0x8c/0xfc
> > [  110.903228]  kobj_attr_store+0x18/0x2c
> > [  110.907195]  sysfs_kf_write+0x4c/0x78
> > [  110.911074]  kernfs_fop_write_iter+0x120/0x1b4
> > [  110.915735]  vfs_write+0x2ac/0x358
> > [  110.919352]  ksys_write+0x68/0xfc
> > [  110.922873]  __arm64_sys_write+0x1c/0x28
> > [  110.927002]  invoke_syscall+0x48/0x110
> > [  110.930969]  el0_svc_common.constprop.0+0x40/0xe0
> > [  110.935907]  do_el0_svc+0x1c/0x28
> > [  110.939427]  el0_svc+0x48/0x114
> > [  110.942769]  el0t_64_sync_handler+0xc8/0xcc
> > [  110.947180]  el0t_64_sync+0x198/0x19c
> > [  110.951059] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
> > [  110.957428] ---[ end trace 0000000000000000 ]---
>
> Ok, so the driver data is never set and runtime PM is never enabled for
> this simple bus device, which uses pm_runtime_force_suspend() for system
> sleep.

This is kind of confusing.  Why use pm_runtime_force_suspend() if
runtime PM is never enabled and cannot really work?

> This used to work as the runtime PM state was left at 'suspended', which
> makes pm_runtime_force_suspend() return early, but now we can end up
> with a call to the driver runtime PM ops that dereference the NULL
> driver data.

Thanks for the info!

pm_runtime_force_suspend() is a known weak point, but I had assumed
that it wouldn't be involved in dependency chains starting at devices
with DPM_FLAG_SMART_SUSPEND set.

Well, more work ...
Johan Hovold Feb. 7, 2025, 4:06 p.m. UTC | #13
On Fri, Feb 07, 2025 at 04:41:18PM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 7, 2025 at 3:45 PM Johan Hovold <johan@kernel.org> wrote:
> > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:

> > Ok, so the driver data is never set and runtime PM is never enabled for
> > this simple bus device, which uses pm_runtime_force_suspend() for system
> > sleep.
> 
> This is kind of confusing.  Why use pm_runtime_force_suspend() if
> runtime PM is never enabled and cannot really work?

It's only done for some buses that this driver handles. The driver is
buggy; I'm preparing a fix for it regardless of the correctness of the
commit that now triggered this.

Johan
Johan Hovold Feb. 7, 2025, 4:26 p.m. UTC | #14
On Fri, Feb 07, 2025 at 05:06:32PM +0100, Johan Hovold wrote:
> On Fri, Feb 07, 2025 at 04:41:18PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Feb 7, 2025 at 3:45 PM Johan Hovold <johan@kernel.org> wrote:
> > > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:
> 
> > > Ok, so the driver data is never set and runtime PM is never enabled for
> > > this simple bus device, which uses pm_runtime_force_suspend() for system
> > > sleep.
> > 
> > This is kind of confusing.  Why use pm_runtime_force_suspend() if
> > runtime PM is never enabled and cannot really work?
> 
> It's only done for some buses that this driver handles. The driver is
> buggy; I'm preparing a fix for it regardless of the correctness of the
> commit that now triggered this.

Hmm. The driver implementation is highly odd, but actually works as long
as the runtime PM status is left at 'suspended' (as
pm_runtime_force_resume() won't enable runtime PM unless it was enabled
before suspend).

So we'd strictly only need something like the below if we are going to
keep the set_active propagation.

Johan


diff --git a/drivers/bus/simple-pm-bus.c b/drivers/bus/simple-pm-bus.c
index 5dea31769f9a..d8e029e7e53f 100644
--- a/drivers/bus/simple-pm-bus.c
+++ b/drivers/bus/simple-pm-bus.c
@@ -109,9 +109,29 @@ static int simple_pm_bus_runtime_resume(struct device *dev)
 	return 0;
 }
 
+static int simple_pm_bus_suspend(struct device *dev)
+{
+	struct simple_pm_bus *bus = dev_get_drvdata(dev);
+
+	if (!bus)
+		return 0;
+
+	return pm_runtime_force_suspend(dev);
+}
+
+static int simple_pm_bus_resume(struct device *dev)
+{
+	struct simple_pm_bus *bus = dev_get_drvdata(dev);
+
+	if (!bus)
+		return 0;
+
+	return pm_runtime_force_resume(dev);
+}
+
 static const struct dev_pm_ops simple_pm_bus_pm_ops = {
 	RUNTIME_PM_OPS(simple_pm_bus_runtime_suspend, simple_pm_bus_runtime_resume, NULL)
-	NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
+	NOIRQ_SYSTEM_SLEEP_PM_OPS(simple_pm_bus_suspend, simple_pm_bus_resume)
 };
 
 #define ONLY_BUS	((void *) 1) /* Match if the device is only a bus. */
Rafael J. Wysocki Feb. 7, 2025, 6:14 p.m. UTC | #15
On Fri, Feb 7, 2025 at 5:26 PM Johan Hovold <johan@kernel.org> wrote:
>
> On Fri, Feb 07, 2025 at 05:06:32PM +0100, Johan Hovold wrote:
> > On Fri, Feb 07, 2025 at 04:41:18PM +0100, Rafael J. Wysocki wrote:
> > > On Fri, Feb 7, 2025 at 3:45 PM Johan Hovold <johan@kernel.org> wrote:
> > > > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:
> >
> > > > Ok, so the driver data is never set and runtime PM is never enabled for
> > > > this simple bus device, which uses pm_runtime_force_suspend() for system
> > > > sleep.
> > >
> > > This is kind of confusing.  Why use pm_runtime_force_suspend() if
> > > runtime PM is never enabled and cannot really work?
> >
> > It's only done for some buses that this driver handles. The driver is
> > buggy; I'm preparing a fix for it regardless of the correctness of the
> > commit that now triggered this.
>
> Hmm. The driver implementation is highly odd, but actually works as long
> as the runtime PM status is left at 'suspended' (as
> pm_runtime_force_resume() won't enable runtime PM unless it was enabled
> before suspend).
>
> So we'd strictly only need something like the below if we are going to
> keep the set_active propagation.

I think you are right.

> diff --git a/drivers/bus/simple-pm-bus.c b/drivers/bus/simple-pm-bus.c
> index 5dea31769f9a..d8e029e7e53f 100644
> --- a/drivers/bus/simple-pm-bus.c
> +++ b/drivers/bus/simple-pm-bus.c
> @@ -109,9 +109,29 @@ static int simple_pm_bus_runtime_resume(struct device *dev)
>         return 0;
>  }
>
> +static int simple_pm_bus_suspend(struct device *dev)
> +{
> +       struct simple_pm_bus *bus = dev_get_drvdata(dev);
> +
> +       if (!bus)
> +               return 0;
> +
> +       return pm_runtime_force_suspend(dev);
> +}
> +
> +static int simple_pm_bus_resume(struct device *dev)
> +{
> +       struct simple_pm_bus *bus = dev_get_drvdata(dev);
> +
> +       if (!bus)
> +               return 0;
> +
> +       return pm_runtime_force_resume(dev);
> +}
> +
>  static const struct dev_pm_ops simple_pm_bus_pm_ops = {
>         RUNTIME_PM_OPS(simple_pm_bus_runtime_suspend, simple_pm_bus_runtime_resume, NULL)
> -       NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> +       NOIRQ_SYSTEM_SLEEP_PM_OPS(simple_pm_bus_suspend, simple_pm_bus_resume)
>  };
>
>  #define ONLY_BUS       ((void *) 1) /* Match if the device is only a bus. */

In the meantime, I've cut the attached (untested) patch that should
take care of the pm_runtime_force_suspend() issue altogether.

It changes the code to only set the device's runtime PM status to
"active" if runtime PM is going to be enabled for it by the first
pm_runtime_enable() call, which never happens for devices where
runtime PM has never been enabled (because it is disabled for them
once again in device_suspend_late()) and for devices subject to
pm_runtime_force_suspend() during system suspend (because it disables
runtime PM for them once again).
Rafael J. Wysocki Feb. 8, 2025, 12:10 p.m. UTC | #16
On Fri, Feb 7, 2025 at 7:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Feb 7, 2025 at 5:26 PM Johan Hovold <johan@kernel.org> wrote:
> >
> > On Fri, Feb 07, 2025 at 05:06:32PM +0100, Johan Hovold wrote:
> > > On Fri, Feb 07, 2025 at 04:41:18PM +0100, Rafael J. Wysocki wrote:
> > > > On Fri, Feb 7, 2025 at 3:45 PM Johan Hovold <johan@kernel.org> wrote:
> > > > > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:
> > >
> > > > > Ok, so the driver data is never set and runtime PM is never enabled for
> > > > > this simple bus device, which uses pm_runtime_force_suspend() for system
> > > > > sleep.
> > > >
> > > > This is kind of confusing.  Why use pm_runtime_force_suspend() if
> > > > runtime PM is never enabled and cannot really work?
> > >
> > > It's only done for some buses that this driver handles. The driver is
> > > buggy; I'm preparing a fix for it regardless of the correctness of the
> > > commit that now triggered this.
> >
> > Hmm. The driver implementation is highly odd, but actually works as long
> > as the runtime PM status is left at 'suspended' (as
> > pm_runtime_force_resume() won't enable runtime PM unless it was enabled
> > before suspend).
> >
> > So we'd strictly only need something like the below if we are going to
> > keep the set_active propagation.
>
> I think you are right.
>
> > diff --git a/drivers/bus/simple-pm-bus.c b/drivers/bus/simple-pm-bus.c
> > index 5dea31769f9a..d8e029e7e53f 100644
> > --- a/drivers/bus/simple-pm-bus.c
> > +++ b/drivers/bus/simple-pm-bus.c
> > @@ -109,9 +109,29 @@ static int simple_pm_bus_runtime_resume(struct device *dev)
> >         return 0;
> >  }
> >
> > +static int simple_pm_bus_suspend(struct device *dev)
> > +{
> > +       struct simple_pm_bus *bus = dev_get_drvdata(dev);
> > +
> > +       if (!bus)
> > +               return 0;
> > +
> > +       return pm_runtime_force_suspend(dev);
> > +}
> > +
> > +static int simple_pm_bus_resume(struct device *dev)
> > +{
> > +       struct simple_pm_bus *bus = dev_get_drvdata(dev);
> > +
> > +       if (!bus)
> > +               return 0;
> > +
> > +       return pm_runtime_force_resume(dev);
> > +}
> > +
> >  static const struct dev_pm_ops simple_pm_bus_pm_ops = {
> >         RUNTIME_PM_OPS(simple_pm_bus_runtime_suspend, simple_pm_bus_runtime_resume, NULL)
> > -       NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
> > +       NOIRQ_SYSTEM_SLEEP_PM_OPS(simple_pm_bus_suspend, simple_pm_bus_resume)
> >  };
> >
> >  #define ONLY_BUS       ((void *) 1) /* Match if the device is only a bus. */
>
> In the meantime, I've cut the attached (untested) patch that should
> take care of the pm_runtime_force_suspend() issue altogether.
>
> It changes the code to only set the device's runtime PM status to
> "active" if runtime PM is going to be enabled for it by the first
> pm_runtime_enable() call, which never happens for devices where
> runtime PM has never been enabled (because it is disabled for them
> once again in device_suspend_late()) and for devices subject to
> pm_runtime_force_suspend() during system suspend (because it disables
> runtime PM for them once again).

All right, this is not going to work.

I see two problems and both are related to pm_runtime_force_suspend/resume().

The first problem is that pm_runtime_force_suspend() doesn't
distinguish devices for which runtime PM has never been enabled from
devices that have been runtime-suspended.  This clearly is a mistake
as demonstrated by the breakage at hand.

The second problem is that pm_runtime_force_resume() expects all
devices to be resumed to have both runtime PM status set to
RPM_SUSPENDED and power.needs_force_resume set.  AFAICS, checking
power.needs_force_resume alone should be sufficient there, but even
that is problematic because something needs to set the flag for
devices that are expected to be resumed.

Unfortunately, they both are not straightforward to address.  I have
ideas, but nothing clear yet.

For now, the power.set_active propagation can be restricted to the
parent of the device with DPM_FLAG_SMART_SUSPEND set that needs to be
resumed and that will just be sufficient ATM, so attached is a patch
doing this.

In the future, though, all of this needs to be addressed properly
because if one device in a dependency chain needs to be resumed,
whatever the reason, all of the devices depended on by it need to be
resumed as well.
Johan Hovold Feb. 8, 2025, 4:42 p.m. UTC | #17
On Sat, Feb 08, 2025 at 01:10:19PM +0100, Rafael J. Wysocki wrote:
> On Fri, Feb 7, 2025 at 7:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:

> > > > > > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:

> > > > > > Ok, so the driver data is never set and runtime PM is never enabled for
> > > > > > this simple bus device, which uses pm_runtime_force_suspend() for system
> > > > > > sleep.

> For now, the power.set_active propagation can be restricted to the
> parent of the device with DPM_FLAG_SMART_SUSPEND set that needs to be
> resumed and that will just be sufficient ATM, so attached is a patch
> doing this.

I can confirm that this fixes the simple-pm-bus regression (without
reintroducing the pci warning) in case you want to get this to Linus
for rc2:

Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan
Rafael J. Wysocki Feb. 8, 2025, 5:43 p.m. UTC | #18
On Sat, Feb 8, 2025 at 5:42 PM Johan Hovold <johan@kernel.org> wrote:
>
> On Sat, Feb 08, 2025 at 01:10:19PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Feb 7, 2025 at 7:14 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> > > > > > > On Fri, Feb 07, 2025 at 02:50:29PM +0100, Johan Hovold wrote:
>
> > > > > > > Ok, so the driver data is never set and runtime PM is never enabled for
> > > > > > > this simple bus device, which uses pm_runtime_force_suspend() for system
> > > > > > > sleep.
>
> > For now, the power.set_active propagation can be restricted to the
> > parent of the device with DPM_FLAG_SMART_SUSPEND set that needs to be
> > resumed and that will just be sufficient ATM, so attached is a patch
> > doing this.
>
> I can confirm that this fixes the simple-pm-bus regression (without
> reintroducing the pci warning) in case you want to get this to Linus
> for rc2:
>
> Tested-by: Johan Hovold <johan+linaro@kernel.org>

-rc2 may be a bit too close, and I think I should add a check for
ignore_children to it, but yeah.

I'll send a full patch shortly, thanks!
diff mbox series

Patch

--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -656,13 +656,15 @@ 
 	 * so change its status accordingly.
 	 *
 	 * Otherwise, the device is going to be resumed, so set its PM-runtime
-	 * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
-	 * to avoid confusing drivers that don't use it.
+	 * status to "active" unless its power.set_active flag is clear, in
+	 * which case it is not necessary to update its PM-runtime status.
 	 */
-	if (skip_resume)
+	if (skip_resume) {
 		pm_runtime_set_suspended(dev);
-	else if (dev_pm_skip_suspend(dev))
+	} else if (dev->power.set_active) {
 		pm_runtime_set_active(dev);
+		dev->power.set_active = false;
+	}
 
 	if (dev->pm_domain) {
 		info = "noirq power domain ";
@@ -1189,18 +1191,24 @@ 
 	return PMSG_ON;
 }
 
-static void dpm_superior_set_must_resume(struct device *dev)
+static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
 {
 	struct device_link *link;
 	int idx;
 
-	if (dev->parent)
+	if (dev->parent) {
 		dev->parent->power.must_resume = true;
+		if (set_active)
+			dev->parent->power.set_active = true;
+	}
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
+	list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
 		link->supplier->power.must_resume = true;
+		if (set_active)
+			link->supplier->power.set_active = true;
+	}
 
 	device_links_read_unlock(idx);
 }
@@ -1278,8 +1286,11 @@ 
 	      dev->power.may_skip_resume))
 		dev->power.must_resume = true;
 
-	if (dev->power.must_resume)
-		dpm_superior_set_must_resume(dev);
+	if (dev->power.must_resume) {
+		dev->power.set_active = dev->power.set_active ||
+			dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
+		dpm_superior_set_must_resume(dev, dev->power.set_active);
+	}
 
 Complete:
 	complete_all(&dev->power.completion);
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -683,6 +683,7 @@ 
 	bool			no_pm_callbacks:1;	/* Owned by the PM core */
 	bool			async_in_progress:1;	/* Owned by the PM core */
 	bool			must_resume:1;		/* Owned by the PM core */
+	bool			set_active:1;		/* Owned by the PM core */
 	bool			may_skip_resume:1;	/* Set by subsystems */
 #else
 	bool			should_wakeup:1;