[v1,0/5] Optimize async device suspend/resume

Message ID	20241114220921.2529905-1-saravanak@google.com
Headers	show Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 200F71A3AA9 for <linux-pm@vger.kernel.org>; Thu, 14 Nov 2024 22:09:28 +0000 (UTC) Date: Thu, 14 Nov 2024 14:09:14 -0800 Message-Id: <20241114220921.2529905-1-saravanak@google.com> Precedence: bulk Mime-Version: 1.0 Subject: [PATCH v1 0/5] Optimize async device suspend/resume From: Saravana Kannan <saravanak@google.com> To: "Rafael J. Wysocki" <rafael@kernel.org>, Pavel Machek <pavel@ucw.cz>, Len Brown <len.brown@intel.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Valentin Schneider <vschneid@redhat.com> Cc: Saravana Kannan <saravanak@google.com>, Geert Uytterhoeven <geert@linux-m68k.org>, Marek Vasut <marex@denx.de>, Bird@google.com, Tim <Tim.Bird@sony.com>, kernel-team@android.com, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8"
Series	Optimize async device suspend/resume \| expand [v1,0/5] Optimize async device suspend/resume [v1,1/5] PM: sleep: Fix runtime PM issue in dpm_resume() [v1,2/5] PM: sleep: Remove unnecessary mutex lock when waiting on parent [v1,3/5] PM: sleep: Add helper functions to loop through superior/subordinate devs [v1,4/5] PM: sleep: Do breadth first suspend/resume for async suspend/resume [v1,5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases

Saravana Kannan Nov. 14, 2024, 10:09 p.m. UTC

A lot of the details are in patch 4/5 and 5/5. The summary is that
there's a lot of overhead and wasted work in how async device
suspend/resume is handled today. I talked about this and otther
suspend/resume issues at LPC 2024[1].

You can remove a lot of the overhead by doing a breadth first queuing of
async suspend/resumes. That's what this patch series does. I also
noticed that during resume, because of EAS, we don't use the bigger CPUs
as quickly. This was leading to a lot of scheduling latency and
preemption of runnable threads and increasing the resume latency. So, we
also disable EAS for that tiny period of resume where we know there'll
be a lot of parallelism.

On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
series yields significant improvements:
+---------------------------+-----------+----------------+------------+-------+
| Phase			    | Old full sync | Old full async | New full async |
|			    |		    | 		     | + EAS disabled |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
+---------------------------+-----------+----------------+------------+-------+
| Sum			    |        182 ms |         162 ms |         123 ms |
+---------------------------+-----------+----------------+------------+-------+

There might be room for some more optimizations in the future, but I'm
keep this patch series simple enough so that it's easier to review and
check that it's not breaking anything. If this series lands and is
stable and no bug reports for a few months, I can work on optimizing
this a bit further.

Thanks,
Saravana
P.S: Cc-ing some usual suspects you might be interested in testing this
out.

[1] - https://lpc.events/event/18/contributions/1845/

Saravana Kannan (5):
  PM: sleep: Fix runtime PM issue in dpm_resume()
  PM: sleep: Remove unnecessary mutex lock when waiting on parent
  PM: sleep: Add helper functions to loop through superior/subordinate
    devs
  PM: sleep: Do breadth first suspend/resume for async suspend/resume
  PM: sleep: Spread out async kworker threads during dpm_resume*()
    phases

 drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
 kernel/power/suspend.c    |  16 ++
 kernel/sched/topology.c   |  13 ++
 3 files changed, 276 insertions(+), 78 deletions(-)

Saravana Kannan Nov. 15, 2024, 5:25 a.m. UTC | #1

On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);
>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

  CC      kernel/sched/build_utility.o
In file included from kernel/sched/build_utility.c:88:
kernel/sched/topology.c:287:6: warning: no previous prototype for
‘sched_set_energy_aware’ [-Wmissing-prototypes]
  287 | void sched_set_energy_aware(unsigned int enable)
      |      ^~~~~~~~~~~~~~~~~~~~~~

Peter/Vincent,

I noticed that I'm getting a warning for this line. But I'm not sure
what to do about it. I intentionally didn't put this in a header file
because I'm guessing we don't want to make this available to
drivers/frameworks in general.

Let me know how you want me to handle this.

-Saravana

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>

Geert Uytterhoeven Nov. 15, 2024, 9:25 a.m. UTC | #2

Hi Saravana,

On Fri, Nov 15, 2024 at 6:25 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.

Put the prototype in kernel/sched/sched.h, and include
../sched/sched.h from kernel/power/suspend.c?

Gr{oetje,eeting}s,

                        Geert

Vincent Guittot Nov. 15, 2024, 3:30 p.m. UTC | #3

On Fri, 15 Nov 2024 at 06:25, Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);

extern void sched_set_energy_aware(unsigned int enable);

clear the warning

> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
>   CC      kernel/sched/build_utility.o
> In file included from kernel/sched/build_utility.c:88:
> kernel/sched/topology.c:287:6: warning: no previous prototype for
> ‘sched_set_energy_aware’ [-Wmissing-prototypes]
>   287 | void sched_set_energy_aware(unsigned int enable)
>       |      ^~~~~~~~~~~~~~~~~~~~~~
>
> Peter/Vincent,
>
> I noticed that I'm getting a warning for this line. But I'm not sure
> what to do about it. I intentionally didn't put this in a header file
> because I'm guessing we don't want to make this available to
> drivers/frameworks in general.
>
> Let me know how you want me to handle this.
>
> -Saravana
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >

Vincent Guittot Nov. 15, 2024, 4:12 p.m. UTC | #4

On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
>
> As of today, the scheduler doesn't spread out all the kworker threads
> across all the available CPUs during suspend/resume. This causes
> significant resume latency during the dpm_resume*() phases.
>
> System resume latency is a very user-visible event. Reducing the
> latency is more important than trying to be energy aware during that
> period.
>
> Since there are no userspace processes running during this time and
> this is a very short time window, we can simply disable EAS during
> resume so that the parallel resume of the devices is spread across all
> the CPUs.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> plus disabling EAS for resume yields significant improvements:
> +---------------------------+-----------+------------+------------------+
> | Phase                     | Old full sync | New full async | % change |
> |                           |               | + EAS disabled |          |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> +---------------------------+-----------+------------+------------------+
> | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> +---------------------------+-----------+------------+------------------+
> | Sum                       |        182 ms |         123 ms |     -32% |
> +---------------------------+-----------+------------+------------------+

in cover letter you have figures for
 - Old full sync
 - New full async
 - New full async  + EAS disabled

you should better use the figures for  New full async vs New full
async  + EAS disabled to show EAS disabled impact

I would be interested to get figures about the impact of disabling it
during full suspend sequence as I'm not convince that it's worth the
complexity especially with fix OPP during suspend

>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  kernel/power/suspend.c  | 16 ++++++++++++++++
>  kernel/sched/topology.c | 13 +++++++++++++
>  2 files changed, 29 insertions(+)
>
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 09f8397bae15..7304dc39958f 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
>         local_irq_enable();
>  }
>
> +/*
> + * Intentionally not part of a header file to avoid risk of abuse by other
> + * drivers.
> + */
> +void sched_set_energy_aware(unsigned int enable);
> +
>  /**
>   * suspend_enter - Make the system enter the given sleep state.
>   * @state: System sleep state to enter.
> @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
>
>   Platform_wake:
>         platform_resume_noirq(state);
> +       /*
> +        * We do this only for resume instead of suspend and resume for these
> +        * reasons:
> +        * - Performance is more important than power for resume.
> +        * - Power spent entering suspend is more important for suspend. Also,
> +        *   stangely, disabling EAS was making suspent a few milliseconds
> +        *   slower in my testing.
> +        */
> +       sched_set_energy_aware(0);
>         dpm_resume_noirq(PMSG_RESUME);
>
>   Platform_early_resume:
> @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
>   Resume_devices:
>         suspend_test_start();
>         dpm_resume_end(PMSG_RESUME);
> +       sched_set_energy_aware(1);

If we end up having a special scheduling mode during suspend, we
should make the function more generic and not only EAS/ smartphone
specific

Like a sched_suspend and sched_resume

>         suspend_test_finish("resume devices");
>         trace_suspend_resume(TPS("resume_console"), state, true);
>         resume_console();
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d668..c069c0b17cbf 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
>         mutex_unlock(&sched_energy_mutex);
>  }
>
> +void sched_set_energy_aware(unsigned int enable)

This is a copy/paste of sched_energy_aware_handler() below, we should
have 1 helper for both

> +{
> +       int state;
> +
> +       if (!sched_is_eas_possible(cpu_active_mask))
> +               return;
> +
> +       sysctl_sched_energy_aware = enable;
> +       state = static_branch_unlikely(&sched_energy_present);
> +       if (state != sysctl_sched_energy_aware)
> +               rebuild_sched_domains_energy();
> +}
> +
>  #ifdef CONFIG_PROC_SYSCTL
>  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
>                 void *buffer, size_t *lenp, loff_t *ppos)
> --
> 2.47.0.338.g60cca15819-goog
>

Saravana Kannan Nov. 15, 2024, 6:33 p.m. UTC | #5

On Fri, Nov 15, 2024 at 8:13 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote:
> >
> > As of today, the scheduler doesn't spread out all the kworker threads
> > across all the available CPUs during suspend/resume. This causes
> > significant resume latency during the dpm_resume*() phases.
> >
> > System resume latency is a very user-visible event. Reducing the
> > latency is more important than trying to be energy aware during that
> > period.
> >
> > Since there are no userspace processes running during this time and
> > this is a very short time window, we can simply disable EAS during
> > resume so that the parallel resume of the devices is spread across all
> > the CPUs.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
> > plus disabling EAS for resume yields significant improvements:
> > +---------------------------+-----------+------------+------------------+
> > | Phase                     | Old full sync | New full async | % change |
> > |                           |               | + EAS disabled |          |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_suspend*() time |        107 ms |          62 ms |     -42% |
> > +---------------------------+-----------+------------+------------------+
> > | Total dpm_resume*() time  |         75 ms |          61 ms |     -19% |
> > +---------------------------+-----------+------------+------------------+
> > | Sum                       |        182 ms |         123 ms |     -32% |
> > +---------------------------+-----------+------------+------------------+
>
> in cover letter you have figures for
>  - Old full sync
>  - New full async
>  - New full async  + EAS disabled
>
> you should better use the figures for  New full async vs New full
> async  + EAS disabled to show EAS disabled impact

I do give those numbers in the commit text of each patch making the changes.

Patch 4 commit text shows how it's improving things compared to the
older logic full sync (this is the baseline) - resume is 1% faster.
Patch 5 commit text shows you how disabling EAS is improving numbers
compared to baseline - resume 19% faster.

So, yeah, all the numbers are there in one of these emails. Patch 5
(which is the only one touching EAS) is the one that has the
comparison you are asking for.

> I would be interested to get figures about the impact of disabling it
> during full suspend sequence as I'm not convince that it's worth the
> complexity especially with fix OPP during suspend

1. Device suspend actually got worse by 5ms or so. I already provided that.

2. As I said in the Patch 5, suspend is more about reducing the energy
going into suspend. It's a balance of how quick you can be to how much
power you use to be quick. So, disabling EAS across all of
suspend/resume will have a huge impact on power because userspace is
still running, there are a ton of threads and userspace could get
preempted between disabling suspend and kicking off suspend. Lots of
obvious power concerns overall.

Thanks,
Saravana

>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  kernel/power/suspend.c  | 16 ++++++++++++++++
> >  kernel/sched/topology.c | 13 +++++++++++++
> >  2 files changed, 29 insertions(+)
> >
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 09f8397bae15..7304dc39958f 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
> >         local_irq_enable();
> >  }
> >
> > +/*
> > + * Intentionally not part of a header file to avoid risk of abuse by other
> > + * drivers.
> > + */
> > +void sched_set_energy_aware(unsigned int enable);
> > +
> >  /**
> >   * suspend_enter - Make the system enter the given sleep state.
> >   * @state: System sleep state to enter.
> > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
> >
> >   Platform_wake:
> >         platform_resume_noirq(state);
> > +       /*
> > +        * We do this only for resume instead of suspend and resume for these
> > +        * reasons:
> > +        * - Performance is more important than power for resume.
> > +        * - Power spent entering suspend is more important for suspend. Also,
> > +        *   stangely, disabling EAS was making suspent a few milliseconds
> > +        *   slower in my testing.
> > +        */
> > +       sched_set_energy_aware(0);
> >         dpm_resume_noirq(PMSG_RESUME);
> >
> >   Platform_early_resume:
> > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
> >   Resume_devices:
> >         suspend_test_start();
> >         dpm_resume_end(PMSG_RESUME);
> > +       sched_set_energy_aware(1);
>
> If we end up having a special scheduling mode during suspend, we
> should make the function more generic and not only EAS/ smartphone
> specific
>
> Like a sched_suspend and sched_resume
>
> >         suspend_test_finish("resume devices");
> >         trace_suspend_resume(TPS("resume_console"), state, true);
> >         resume_console();
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9748a4c8d668..c069c0b17cbf 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
> >         mutex_unlock(&sched_energy_mutex);
> >  }
> >
> > +void sched_set_energy_aware(unsigned int enable)
>
> This is a copy/paste of sched_energy_aware_handler() below, we should
> have 1 helper for both
>
> > +{
> > +       int state;
> > +
> > +       if (!sched_is_eas_possible(cpu_active_mask))
> > +               return;
> > +
> > +       sysctl_sched_energy_aware = enable;
> > +       state = static_branch_unlikely(&sched_energy_present);
> > +       if (state != sysctl_sched_energy_aware)
> > +               rebuild_sched_domains_energy();
> > +}
> > +
> >  #ifdef CONFIG_PROC_SYSCTL
> >  static int sched_energy_aware_handler(const struct ctl_table *table, int write,
> >                 void *buffer, size_t *lenp, loff_t *ppos)
> > --
> > 2.47.0.338.g60cca15819-goog
> >

Saravana Kannan Nov. 19, 2024, 4:04 a.m. UTC | #6

On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> A lot of the details are in patch 4/5 and 5/5. The summary is that
> there's a lot of overhead and wasted work in how async device
> suspend/resume is handled today. I talked about this and otther
> suspend/resume issues at LPC 2024[1].
>
> You can remove a lot of the overhead by doing a breadth first queuing of
> async suspend/resumes. That's what this patch series does. I also
> noticed that during resume, because of EAS, we don't use the bigger CPUs
> as quickly. This was leading to a lot of scheduling latency and
> preemption of runnable threads and increasing the resume latency. So, we
> also disable EAS for that tiny period of resume where we know there'll
> be a lot of parallelism.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> series yields significant improvements:
> +---------------------------+-----------+----------------+------------+-------+
> | Phase                     | Old full sync | Old full async | New full async |
> |                           |               |                | + EAS disabled |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Sum                       |        182 ms |         162 ms |         123 ms |
> +---------------------------+-----------+----------------+------------+-------+
>
> There might be room for some more optimizations in the future, but I'm
> keep this patch series simple enough so that it's easier to review and
> check that it's not breaking anything. If this series lands and is
> stable and no bug reports for a few months, I can work on optimizing
> this a bit further.
>
> Thanks,
> Saravana
> P.S: Cc-ing some usual suspects you might be interested in testing this
> out.
>
> [1] - https://lpc.events/event/18/contributions/1845/
>
> Saravana Kannan (5):
>   PM: sleep: Fix runtime PM issue in dpm_resume()
>   PM: sleep: Remove unnecessary mutex lock when waiting on parent
>   PM: sleep: Add helper functions to loop through superior/subordinate
>     devs
>   PM: sleep: Do breadth first suspend/resume for async suspend/resume
>   PM: sleep: Spread out async kworker threads during dpm_resume*()
>     phases
>
>  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------

Hi Rafael/Greg,

I'm waiting for one of your reviews before I send out the next version.

-Saravana

>  kernel/power/suspend.c    |  16 ++
>  kernel/sched/topology.c   |  13 ++
>  3 files changed, 276 insertions(+), 78 deletions(-)
>
> --
> 2.47.0.338.g60cca15819-goog
>

Greg Kroah-Hartman Nov. 19, 2024, 9:51 a.m. UTC | #7

On Mon, Nov 18, 2024 at 08:04:26PM -0800, Saravana Kannan wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > A lot of the details are in patch 4/5 and 5/5. The summary is that
> > there's a lot of overhead and wasted work in how async device
> > suspend/resume is handled today. I talked about this and otther
> > suspend/resume issues at LPC 2024[1].
> >
> > You can remove a lot of the overhead by doing a breadth first queuing of
> > async suspend/resumes. That's what this patch series does. I also
> > noticed that during resume, because of EAS, we don't use the bigger CPUs
> > as quickly. This was leading to a lot of scheduling latency and
> > preemption of runnable threads and increasing the resume latency. So, we
> > also disable EAS for that tiny period of resume where we know there'll
> > be a lot of parallelism.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> > series yields significant improvements:
> > +---------------------------+-----------+----------------+------------+-------+
> > | Phase                     | Old full sync | Old full async | New full async |
> > |                           |               |                | + EAS disabled |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Sum                       |        182 ms |         162 ms |         123 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> >
> > There might be room for some more optimizations in the future, but I'm
> > keep this patch series simple enough so that it's easier to review and
> > check that it's not breaking anything. If this series lands and is
> > stable and no bug reports for a few months, I can work on optimizing
> > this a bit further.
> >
> > Thanks,
> > Saravana
> > P.S: Cc-ing some usual suspects you might be interested in testing this
> > out.
> >
> > [1] - https://lpc.events/event/18/contributions/1845/
> >
> > Saravana Kannan (5):
> >   PM: sleep: Fix runtime PM issue in dpm_resume()
> >   PM: sleep: Remove unnecessary mutex lock when waiting on parent
> >   PM: sleep: Add helper functions to loop through superior/subordinate
> >     devs
> >   PM: sleep: Do breadth first suspend/resume for async suspend/resume
> >   PM: sleep: Spread out async kworker threads during dpm_resume*()
> >     phases
> >
> >  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
> 
> Hi Rafael/Greg,
> 
> I'm waiting for one of your reviews before I send out the next version.

Please feel free to send, it's the middle of the merge window now, and
I'm busy with that for the next 2 weeks, so I can't do anything until
after that.

thanks,

greg k-h

[v1,0/5] Optimize async device suspend/resume

Message

Comments