diff mbox series

[v1,1/1] x86/fred: Fix system hang during S4 resume with FRED enabled

Message ID 20250326062540.820556-1-xin@zytor.com
State New
Headers show
Series [v1,1/1] x86/fred: Fix system hang during S4 resume with FRED enabled | expand

Commit Message

Xin Li March 26, 2025, 6:25 a.m. UTC
During an S4 resume, the system first performs a cold power-on.  The
kernel image is initially loaded to a random linear address, and the
FRED MSRs are initialized.  Subsequently, the S4 image is loaded,
and the kernel image is relocated to its original address from before
the S4 suspend.  Due to changes in the kernel text and data mappings,
the FRED MSRs must be reinitialized.

Reported-by: Xi Pardee <xi.pardee@intel.com>
Reported-and-Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Cc: stable@kernel.org # 6.9+
---
 arch/x86/power/cpu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Rafael J. Wysocki March 31, 2025, 3:30 p.m. UTC | #1
On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) <xin@zytor.com> wrote:
>
> During an S4 resume, the system first performs a cold power-on.  The
> kernel image is initially loaded to a random linear address, and the
> FRED MSRs are initialized.  Subsequently, the S4 image is loaded,
> and the kernel image is relocated to its original address from before
> the S4 suspend.  Due to changes in the kernel text and data mappings,
> the FRED MSRs must be reinitialized.

To be precise, the above description of the hibernation control flow
doesn't exactly match the code.

Yes, a new kernel is booted upon a wakeup from S4, but this is not "a
cold power-on", strictly speaking.  This kernel is often referred to
as the restore kernel and yes, it initializes the FRED MSRs as
appropriate from its perspective.

Yes, it loads a hibernation image, including the kernel that was
running before hibernation, often referred to as the image kernel, but
it does its best to load image pages directly into the page frames
occupied by them before hibernation unless those page frames are
currently in use.  In that case, the given image pages are loaded into
currently free page frames, but they may or may not be part of the
image kernel (they may as well belong to user space processes that
were running before hibernation).  Yes, all of these pages need to be
moved to their original locations before the last step of restore,
which is a jump into a "trampoline" page in the image kernel, but this
is sort of irrelevant to the issue at hand.

At this point, the image kernel has control, but the FRED MSRs still
contain values written to them by the restore kernel and there is no
guarantee that those values are the same as the ones written into them
by the image kernel before hibernation.  Thus the image kernel must
ensure that the values of the FRED MSRs will be the same as they were
before hibernation, and because they only depend on the location of
the kernel text and data, they may as well be recomputed from scratch.

> Reported-by: Xi Pardee <xi.pardee@intel.com>
> Reported-and-Tested-by: Todd Brandt <todd.e.brandt@intel.com>
> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> Cc: stable@kernel.org # 6.9+
> ---
>  arch/x86/power/cpu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 63230ff8cf4f..ef3c152c319c 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -27,6 +27,7 @@
>  #include <asm/mmu_context.h>
>  #include <asm/cpu_device_id.h>
>  #include <asm/microcode.h>
> +#include <asm/fred.h>
>
>  #ifdef CONFIG_X86_32
>  __visible unsigned long saved_context_ebx;
> @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>          */
>  #ifdef CONFIG_X86_64
>         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
> +
> +       /*
> +        * Restore FRED configs.
> +        *
> +        * FRED configs are completely derived from current kernel text and
> +        * data mappings, thus nothing needs to be saved and restored.
> +        *
> +        * As such, simply re-initialize FRED to restore FRED configs.

Instead of the above, I would just say "Reinitialize FRED to ensure
that the FRED registers contain the same values as before
hibernation."

> +        *
> +        * Note, FRED RSPs setup needs to access percpu data structures.

And I'm not sure what you wanted to say here?  Does this refer to the
ordering of the code below or to something else?

> +        */
> +       if (ctxt->cr4 & X86_CR4_FRED) {
> +               cpu_init_fred_exceptions();
> +               cpu_init_fred_rsps();
> +       }
>  #else
>         loadsegment(fs, __KERNEL_PERCPU);
>  #endif
> --
Ingo Molnar March 31, 2025, 4:19 p.m. UTC | #2
* Rafael J. Wysocki <rafael@kernel.org> wrote:

> On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) <xin@zytor.com> wrote:
> >
> > During an S4 resume, the system first performs a cold power-on.  The
> > kernel image is initially loaded to a random linear address, and the
> > FRED MSRs are initialized.  Subsequently, the S4 image is loaded,
> > and the kernel image is relocated to its original address from before
> > the S4 suspend.  Due to changes in the kernel text and data mappings,
> > the FRED MSRs must be reinitialized.
> 
> To be precise, the above description of the hibernation control flow
> doesn't exactly match the code.
> 
> Yes, a new kernel is booted upon a wakeup from S4, but this is not "a
> cold power-on", strictly speaking.  This kernel is often referred to
> as the restore kernel and yes, it initializes the FRED MSRs as
> appropriate from its perspective.
> 
> Yes, it loads a hibernation image, including the kernel that was
> running before hibernation, often referred to as the image kernel, but
> it does its best to load image pages directly into the page frames
> occupied by them before hibernation unless those page frames are
> currently in use.  In that case, the given image pages are loaded into
> currently free page frames, but they may or may not be part of the
> image kernel (they may as well belong to user space processes that
> were running before hibernation).  Yes, all of these pages need to be
> moved to their original locations before the last step of restore,
> which is a jump into a "trampoline" page in the image kernel, but this
> is sort of irrelevant to the issue at hand.
> 
> At this point, the image kernel has control, but the FRED MSRs still
> contain values written to them by the restore kernel and there is no
> guarantee that those values are the same as the ones written into them
> by the image kernel before hibernation.  Thus the image kernel must
> ensure that the values of the FRED MSRs will be the same as they were
> before hibernation, and because they only depend on the location of
> the kernel text and data, they may as well be recomputed from scratch.

That's a rather critical difference... I zapped the commit from 
tip:x86/urgent, awaiting -v2 with a better changelog and better
in-code comments.

Thanks,

	Ingo
H. Peter Anvin March 31, 2025, 8:03 p.m. UTC | #3
On March 31, 2025 8:30:49 AM PDT, "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) <xin@zytor.com> wrote:
>>
>> During an S4 resume, the system first performs a cold power-on.  The
>> kernel image is initially loaded to a random linear address, and the
>> FRED MSRs are initialized.  Subsequently, the S4 image is loaded,
>> and the kernel image is relocated to its original address from before
>> the S4 suspend.  Due to changes in the kernel text and data mappings,
>> the FRED MSRs must be reinitialized.
>
>To be precise, the above description of the hibernation control flow
>doesn't exactly match the code.
>
>Yes, a new kernel is booted upon a wakeup from S4, but this is not "a
>cold power-on", strictly speaking.  This kernel is often referred to
>as the restore kernel and yes, it initializes the FRED MSRs as
>appropriate from its perspective.
>
>Yes, it loads a hibernation image, including the kernel that was
>running before hibernation, often referred to as the image kernel, but
>it does its best to load image pages directly into the page frames
>occupied by them before hibernation unless those page frames are
>currently in use.  In that case, the given image pages are loaded into
>currently free page frames, but they may or may not be part of the
>image kernel (they may as well belong to user space processes that
>were running before hibernation).  Yes, all of these pages need to be
>moved to their original locations before the last step of restore,
>which is a jump into a "trampoline" page in the image kernel, but this
>is sort of irrelevant to the issue at hand.
>
>At this point, the image kernel has control, but the FRED MSRs still
>contain values written to them by the restore kernel and there is no
>guarantee that those values are the same as the ones written into them
>by the image kernel before hibernation.  Thus the image kernel must
>ensure that the values of the FRED MSRs will be the same as they were
>before hibernation, and because they only depend on the location of
>the kernel text and data, they may as well be recomputed from scratch.
>
>> Reported-by: Xi Pardee <xi.pardee@intel.com>
>> Reported-and-Tested-by: Todd Brandt <todd.e.brandt@intel.com>
>> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> Cc: stable@kernel.org # 6.9+
>> ---
>>  arch/x86/power/cpu.c | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 63230ff8cf4f..ef3c152c319c 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -27,6 +27,7 @@
>>  #include <asm/mmu_context.h>
>>  #include <asm/cpu_device_id.h>
>>  #include <asm/microcode.h>
>> +#include <asm/fred.h>
>>
>>  #ifdef CONFIG_X86_32
>>  __visible unsigned long saved_context_ebx;
>> @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>>          */
>>  #ifdef CONFIG_X86_64
>>         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
>> +
>> +       /*
>> +        * Restore FRED configs.
>> +        *
>> +        * FRED configs are completely derived from current kernel text and
>> +        * data mappings, thus nothing needs to be saved and restored.
>> +        *
>> +        * As such, simply re-initialize FRED to restore FRED configs.
>
>Instead of the above, I would just say "Reinitialize FRED to ensure
>that the FRED registers contain the same values as before
>hibernation."
>
>> +        *
>> +        * Note, FRED RSPs setup needs to access percpu data structures.
>
>And I'm not sure what you wanted to say here?  Does this refer to the
>ordering of the code below or to something else?
>
>> +        */
>> +       if (ctxt->cr4 & X86_CR4_FRED) {
>> +               cpu_init_fred_exceptions();
>> +               cpu_init_fred_rsps();
>> +       }
>>  #else
>>         loadsegment(fs, __KERNEL_PERCPU);
>>  #endif
>> --
>

Just to make it clear: the patch is correct, the shortcoming is in the description. 

I would say that Xin's description, although perhaps excessively brief, is correct from the *hardware* point of view, whereas Rafael adds the much needed *software* perspective. 

As far as hardware is concerned, Linux S4 is just a power on (we don't use any BIOS support for S4 even if it exists, which it rarely does anymore, and for very good reasons.) From a software point of view, it is more like a kexec into the frozen kernel image, which then has to re-establish its runtime execution environment – (including the FRED state, which is what this patch does.)

For the APs this is done through the normal AP bringup mechanism, it is only the BSP that needs special treatment.
Rafael J. Wysocki March 31, 2025, 8:25 p.m. UTC | #4
On Mon, Mar 31, 2025 at 10:04 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On March 31, 2025 8:30:49 AM PDT, "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) <xin@zytor.com> wrote:
> >>
> >> During an S4 resume, the system first performs a cold power-on.  The
> >> kernel image is initially loaded to a random linear address, and the
> >> FRED MSRs are initialized.  Subsequently, the S4 image is loaded,
> >> and the kernel image is relocated to its original address from before
> >> the S4 suspend.  Due to changes in the kernel text and data mappings,
> >> the FRED MSRs must be reinitialized.
> >
> >To be precise, the above description of the hibernation control flow
> >doesn't exactly match the code.
> >
> >Yes, a new kernel is booted upon a wakeup from S4, but this is not "a
> >cold power-on", strictly speaking.  This kernel is often referred to
> >as the restore kernel and yes, it initializes the FRED MSRs as
> >appropriate from its perspective.
> >
> >Yes, it loads a hibernation image, including the kernel that was
> >running before hibernation, often referred to as the image kernel, but
> >it does its best to load image pages directly into the page frames
> >occupied by them before hibernation unless those page frames are
> >currently in use.  In that case, the given image pages are loaded into
> >currently free page frames, but they may or may not be part of the
> >image kernel (they may as well belong to user space processes that
> >were running before hibernation).  Yes, all of these pages need to be
> >moved to their original locations before the last step of restore,
> >which is a jump into a "trampoline" page in the image kernel, but this
> >is sort of irrelevant to the issue at hand.
> >
> >At this point, the image kernel has control, but the FRED MSRs still
> >contain values written to them by the restore kernel and there is no
> >guarantee that those values are the same as the ones written into them
> >by the image kernel before hibernation.  Thus the image kernel must
> >ensure that the values of the FRED MSRs will be the same as they were
> >before hibernation, and because they only depend on the location of
> >the kernel text and data, they may as well be recomputed from scratch.
> >
> >> Reported-by: Xi Pardee <xi.pardee@intel.com>
> >> Reported-and-Tested-by: Todd Brandt <todd.e.brandt@intel.com>
> >> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> >> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> >> Cc: stable@kernel.org # 6.9+
> >> ---
> >>  arch/x86/power/cpu.c | 16 ++++++++++++++++
> >>  1 file changed, 16 insertions(+)
> >>
> >> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> >> index 63230ff8cf4f..ef3c152c319c 100644
> >> --- a/arch/x86/power/cpu.c
> >> +++ b/arch/x86/power/cpu.c
> >> @@ -27,6 +27,7 @@
> >>  #include <asm/mmu_context.h>
> >>  #include <asm/cpu_device_id.h>
> >>  #include <asm/microcode.h>
> >> +#include <asm/fred.h>
> >>
> >>  #ifdef CONFIG_X86_32
> >>  __visible unsigned long saved_context_ebx;
> >> @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >>          */
> >>  #ifdef CONFIG_X86_64
> >>         wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
> >> +
> >> +       /*
> >> +        * Restore FRED configs.
> >> +        *
> >> +        * FRED configs are completely derived from current kernel text and
> >> +        * data mappings, thus nothing needs to be saved and restored.
> >> +        *
> >> +        * As such, simply re-initialize FRED to restore FRED configs.
> >
> >Instead of the above, I would just say "Reinitialize FRED to ensure
> >that the FRED registers contain the same values as before
> >hibernation."
> >
> >> +        *
> >> +        * Note, FRED RSPs setup needs to access percpu data structures.
> >
> >And I'm not sure what you wanted to say here?  Does this refer to the
> >ordering of the code below or to something else?
> >
> >> +        */
> >> +       if (ctxt->cr4 & X86_CR4_FRED) {
> >> +               cpu_init_fred_exceptions();
> >> +               cpu_init_fred_rsps();
> >> +       }
> >>  #else
> >>         loadsegment(fs, __KERNEL_PERCPU);
> >>  #endif
> >> --
> >
>
> Just to make it clear: the patch is correct, the shortcoming is in the description.

Yes, the code changes in the patch are technically correct.

> I would say that Xin's description, although perhaps excessively brief, is correct from the *hardware* point of view, whereas Rafael adds the much needed *software* perspective.
>
> As far as hardware is concerned, Linux S4 is just a power on (we don't use any BIOS support for S4 even if it exists, which it rarely does anymore, and for very good reasons.) From a software point of view, it is more like a kexec into the frozen kernel image, which then has to re-establish its runtime execution environment – (including the FRED state, which is what this patch does.)
>
> For the APs this is done through the normal AP bringup mechanism, it is only the BSP that needs special treatment.

That's correct.
Ingo Molnar April 1, 2025, 8:22 a.m. UTC | #5
* H. Peter Anvin <hpa@zytor.com> wrote:

> Just to make it clear: the patch is correct, the shortcoming is in 
> the description.
> 
> I would say that Xin's description, although perhaps excessively 
> brief, is correct from the *hardware* point of view, whereas Rafael 
> adds the much needed *software* perspective.

This part of the -v1 patch was a bit misleading to me:

>> Due to changes in the kernel text and data mappings, the FRED MSRs 
>> must be reinitialized.

... as it suggests that the FRED MSRs will change from before the 
suspend - while they don't.

What this sentence meant is that FRED MSRs set up by the intermediate 
*kexec kernel* are incorrect and must be reinitialized again to 
reconstruct the pre-hibernation state. Ie. there's 3 FRED setup states: 
pre-S4, kexec and post-S4, where pre-S4 == post-S4. Right?

I think the description and comments in the -v2 patch are better in 
this regard.

Thanks,

	Ingo
diff mbox series

Patch

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..ef3c152c319c 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@ 
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/fred.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -231,6 +232,21 @@  static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 */
 #ifdef CONFIG_X86_64
 	wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
+
+	/*
+	 * Restore FRED configs.
+	 *
+	 * FRED configs are completely derived from current kernel text and
+	 * data mappings, thus nothing needs to be saved and restored.
+	 *
+	 * As such, simply re-initialize FRED to restore FRED configs.
+	 *
+	 * Note, FRED RSPs setup needs to access percpu data structures.
+	 */
+	if (ctxt->cr4 & X86_CR4_FRED) {
+		cpu_init_fred_exceptions();
+		cpu_init_fred_rsps();
+	}
 #else
 	loadsegment(fs, __KERNEL_PERCPU);
 #endif