mbox series

[v1,0/2] x86/smp: Fix power regression introduced by commit 96040f7273e2

Message ID 2006806.PYKUYFuaPT@rjwysocki.net
Headers show
Series x86/smp: Fix power regression introduced by commit 96040f7273e2 | expand

Message

Rafael J. Wysocki May 28, 2025, 12:53 p.m. UTC
Hi Everyone,

Commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()")
that shipped in 6.15 introduced a nasty power regression on systems that
start with "nosmt" in the kernel command line which prevents it from entering
deep package idle states (for instance, PC10) later on.  Idle power, including
power in suspend-to-idle, goes up significantly on those systems as a result.

Address this by reverting commit 96040f7273e2 (patch [1/2]) and using a
different approach, which is to retain mwait_play_dead_cpuid_hint() and
still prefer it to hlt_play_dead() in case it is needed when cpuidle is
not available, but prefer cpuidle_play_dead() to it by default (patch [2/2]).

Thanks!

Comments

Peter Zijlstra May 30, 2025, 8:07 a.m. UTC | #1
On Thu, May 29, 2025 at 11:38:05AM +0200, Rafael J. Wysocki wrote:

> First off, I'm not sure if all of the requisite things are ready then
> (sysfs etc.).

Pretty much everything is already running at early_initcall(). Sysfs
certainly is.

> We may end up doing this eventually, but it may not be straightforward.
> 
> More importantly, this is not a change for 6.15.y (y > 0).

Seriously, have you even tried? 

AFAICT the below is all that is needed.  That boots just fine on the one
randon system I tried, and seems to still work.

And this is plenty small enough to go into 6.15.y


diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 0835da449db8..0f25de8081af 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -814,4 +814,4 @@ static int __init cpuidle_init(void)
 
 module_param(off, int, 0444);
 module_param_string(governor, param_governor, CPUIDLE_NAME_LEN, 0444);
-core_initcall(cpuidle_init);
+early_initcall(cpuidle_init);
Rafael J. Wysocki May 30, 2025, 9:27 a.m. UTC | #2
On Fri, May 30, 2025 at 11:18 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, May 30, 2025 at 10:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, May 29, 2025 at 11:38:05AM +0200, Rafael J. Wysocki wrote:
> >
> > > First off, I'm not sure if all of the requisite things are ready then
> > > (sysfs etc.).
> >
> > Pretty much everything is already running at early_initcall(). Sysfs
> > certainly is.
> >
> > > We may end up doing this eventually, but it may not be straightforward.
> > >
> > > More importantly, this is not a change for 6.15.y (y > 0).
> >
> > Seriously, have you even tried?
> >
> > AFAICT the below is all that is needed.  That boots just fine on the one
> > randon system I tried, and seems to still work.
> >
> > And this is plenty small enough to go into 6.15.y
>
> But there is still intel_idle_init() which is device_initcall() ATM
> and this needs to be tested on other arches too.
>
> So not really there, sorry.

BTW, I do think that this is the way to go, but not in a hurry, and if
6.15.y does not include a proper fix for the original issue, this is
not the end of the world.

If it turns out to be unproblematic and all goes well, we may as well
try to put it into 6.16.

> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index 0835da449db8..0f25de8081af 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -814,4 +814,4 @@ static int __init cpuidle_init(void)
> >
> >  module_param(off, int, 0444);
> >  module_param_string(governor, param_governor, CPUIDLE_NAME_LEN, 0444);
> > -core_initcall(cpuidle_init);
> > +early_initcall(cpuidle_init);
Rafael J. Wysocki May 30, 2025, 4:59 p.m. UTC | #3
On Fri, May 30, 2025 at 11:18 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, May 30, 2025 at 10:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, May 29, 2025 at 11:38:05AM +0200, Rafael J. Wysocki wrote:
> >
> > > First off, I'm not sure if all of the requisite things are ready then
> > > (sysfs etc.).
> >
> > Pretty much everything is already running at early_initcall(). Sysfs
> > certainly is.
> >
> > > We may end up doing this eventually, but it may not be straightforward.
> > >
> > > More importantly, this is not a change for 6.15.y (y > 0).
> >
> > Seriously, have you even tried?
> >
> > AFAICT the below is all that is needed.  That boots just fine on the one
> > randon system I tried, and seems to still work.
> >
> > And this is plenty small enough to go into 6.15.y
>
> But there is still intel_idle_init() which is device_initcall() ATM
> and this needs to be tested on other arches too.

intel_idle_init() depends on ACPI which initializes via a
subsys_initcall() and doing that earlier would mean a major redesign.

Pretty much same for reordering APs initialization past ACPI initialization.

One more option is to kick the "dead" SMT siblings after the idle
driver initializes and let them do "play dead" again, but who'd be
responsible for doing that?