diff mbox series

[v1,2/2] x86/smp: Prefer cpuidle_play_dead() to mwait_play_dead_cpuid_hint()

Message ID 3633769.iIbC2pHGDl@rjwysocki.net
State New
Headers show
Series x86/smp: Fix power regression introduced by commit 96040f7273e2 | expand

Commit Message

Rafael J. Wysocki May 28, 2025, 12:54 p.m. UTC
From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Currently, mwait_play_dead_cpuid_hint() looks up the MWAIT hint of the
deepest idle state by inspecting CPUID leaf 0x05 with the assumption
that, if the number of sub-states for a given major C-state is nonzero,
those sub-states are always represented by consecutive numbers starting
from 0. This assumption is not based on the documented platform behavior
and in fact it is not met on recent Intel platforms (eg. Sierra Forest).

For this reason, it is better to let the cpuidle driver for the given
platform put CPUs going offline into appropriate idle state and only
if that fails, fall back to mwait_play_dead_cpuid_hint(), which may
still be the next best "play dead" variant if cpuidle is not available.

For example, when "nosmt" is passed to the kernel in the command line,
SMT siblings are disabled early, before cpuidle gets ready, but they
need to be put into sufficiently deep idle states to allow the whole
processor to reach deep package idle states, like PC10, later on.

Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 arch/x86/kernel/smpboot.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff mbox series

Patch

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1377,9 +1377,10 @@ 
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);
 
+	/* Each call in the following sequence returns only on errors. */
+	cpuidle_play_dead();
 	mwait_play_dead_cpuid_hint();
-	if (cpuidle_play_dead())
-		hlt_play_dead();
+	hlt_play_dead();
 }
 
 #else /* ... !CONFIG_HOTPLUG_CPU */