diff mbox series

[v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance

Message ID 20250410132850.3708703-2-ardb+git@google.com
State New
Headers show
Series [v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance | expand

Commit Message

Ard Biesheuvel April 10, 2025, 1:28 p.m. UTC
From: Ard Biesheuvel <ardb@kernel.org>

Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.

So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.

Cc: Tom Lendacky <thomas.lendacky@amd.com>,
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>,
Cc: Kevin Loughlin <kevinloughlin@google.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
Changes since v2 [0]:
- avoid two separate acceptance APIs; instead, use MSR based page-by-page
  acceptance for the decompressor as well

[0] https://lore.kernel.org/all/20250404082921.2767593-8-ardb+git@google.com/T/#m38389f607accd0cfa83c41c3bd0d410514b023c6

 arch/x86/boot/compressed/sev.c | 67 +++++---------------
 1 file changed, 15 insertions(+), 52 deletions(-)

Comments

Ard Biesheuvel April 17, 2025, 4:14 p.m. UTC | #1
On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/11/25 14:00, Ard Biesheuvel wrote:
> > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>
> >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>
> >>> Communicating with the hypervisor using the shared GHCB page requires
> >>> clearing the C bit in the mapping of that page. When executing in the
> >>> context of the EFI boot services, the page tables are owned by the
> >>> firmware, and this manipulation is not possible.
> >>>
> >>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>
> >> That being the GHCB MSR protocol, it seems.
> >>
> >
> > Yes.
> >
> >> And since Tom co-developed, I guess we wanna do that.
> >>
> >> But then how much slower do we become?
> >>
> >
> > Non-EFI stub boot will become slower if the memory that is used to
> > decompress the kernel has not been accepted yet. But given how heavily
> > SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> > that is the only boot path that goes through the traditional
> > decompressor.
>
> Some quick testing showed no significant differences in kexec booting
> and testing shows everything seems to be good.
>

Thanks.

> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> memory) and without the change to how SNP is detected before
> sev_enable() is called, we hit the error path in arch_accept_memory() in
> arch/x86/boot/compressed/mem.c and the boot crashes.
>

Right. So this is because sev_snp_enabled() is based on sev_status,
which has not been set yet at this point, right?

And for the record, could you please indicate whether you are ok with
the co-developed-by/signed-off-by credits on this patch (and
subsequent revisions)?
Ard Biesheuvel April 17, 2025, 4:38 p.m. UTC | #2
On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/17/25 11:14, Ard Biesheuvel wrote:
> > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> >>
> >> On 4/11/25 14:00, Ard Biesheuvel wrote:
> >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
> >>>>
> >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
> >>>>> From: Ard Biesheuvel <ardb@kernel.org>
> >>>>>
> >>>>> Communicating with the hypervisor using the shared GHCB page requires
> >>>>> clearing the C bit in the mapping of that page. When executing in the
> >>>>> context of the EFI boot services, the page tables are owned by the
> >>>>> firmware, and this manipulation is not possible.
> >>>>>
> >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
> >>>>
> >>>> That being the GHCB MSR protocol, it seems.
> >>>>
> >>>
> >>> Yes.
> >>>
> >>>> And since Tom co-developed, I guess we wanna do that.
> >>>>
> >>>> But then how much slower do we become?
> >>>>
> >>>
> >>> Non-EFI stub boot will become slower if the memory that is used to
> >>> decompress the kernel has not been accepted yet. But given how heavily
> >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
> >>> that is the only boot path that goes through the traditional
> >>> decompressor.
> >>
> >> Some quick testing showed no significant differences in kexec booting
> >> and testing shows everything seems to be good.
> >>
> >
> > Thanks.
> >
> >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
> >> memory) and without the change to how SNP is detected before
> >> sev_enable() is called, we hit the error path in arch_accept_memory() in
> >> arch/x86/boot/compressed/mem.c and the boot crashes.
> >>
> >
> > Right. So this is because sev_snp_enabled() is based on sev_status,
> > which has not been set yet at this point, right?
>
> Correct.
>

OK. Would this do the trick? (with asm/sev.h added to the #includes)

--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)

 void arch_accept_memory(phys_addr_t start, phys_addr_t end)
 {
+       static bool sevsnp;
+
        /* Platform-specific memory-acceptance call goes here */
        if (early_is_tdx_guest()) {
                if (!tdx_accept_memory(start, end))
                        panic("TDX: Failed to accept memory\n");
-       } else if (sev_snp_enabled()) {
+       } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
+               sevsnp = true;
                snp_accept_memory(start, end);
        } else {
                error("Cannot accept memory: unknown platform\n");

> >
> > And for the record, could you please indicate whether you are ok with
> > the co-developed-by/signed-off-by credits on this patch (and
> > subsequent revisions)?
>
> Yep, I'm fine with that.
>

Thanks.
Tom Lendacky April 17, 2025, 8:01 p.m. UTC | #3
On 4/17/25 12:26, Tom Lendacky wrote:
> On 4/17/25 11:38, Ard Biesheuvel wrote:
>> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>
>>> On 4/17/25 11:14, Ard Biesheuvel wrote:
>>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>>
>>>>> On 4/11/25 14:00, Ard Biesheuvel wrote:
>>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote:
>>>>>>>> From: Ard Biesheuvel <ardb@kernel.org>
>>>>>>>>
>>>>>>>> Communicating with the hypervisor using the shared GHCB page requires
>>>>>>>> clearing the C bit in the mapping of that page. When executing in the
>>>>>>>> context of the EFI boot services, the page tables are owned by the
>>>>>>>> firmware, and this manipulation is not possible.
>>>>>>>>
>>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one
>>>>>>>
>>>>>>> That being the GHCB MSR protocol, it seems.
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> And since Tom co-developed, I guess we wanna do that.
>>>>>>>
>>>>>>> But then how much slower do we become?
>>>>>>>
>>>>>>
>>>>>> Non-EFI stub boot will become slower if the memory that is used to
>>>>>> decompress the kernel has not been accepted yet. But given how heavily
>>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as
>>>>>> that is the only boot path that goes through the traditional
>>>>>> decompressor.
>>>>>
>>>>> Some quick testing showed no significant differences in kexec booting
>>>>> and testing shows everything seems to be good.
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of
>>>>> memory) and without the change to how SNP is detected before
>>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in
>>>>> arch/x86/boot/compressed/mem.c and the boot crashes.
>>>>>
>>>>
>>>> Right. So this is because sev_snp_enabled() is based on sev_status,
>>>> which has not been set yet at this point, right?
>>>
>>> Correct.
>>>
>>
>> OK. Would this do the trick? (with asm/sev.h added to the #includes)
> 
> Yes, that works for booting. Let me do some kexec testing and get back
> to you. Sorry, that might not be until tomorrow, though.

Ok, found some time... looks good with kexec, too.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> --- a/arch/x86/boot/compressed/mem.c
>> +++ b/arch/x86/boot/compressed/mem.c
>> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
>>
>>  void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>>  {
>> +       static bool sevsnp;
>> +
>>         /* Platform-specific memory-acceptance call goes here */
>>         if (early_is_tdx_guest()) {
>>                 if (!tdx_accept_memory(start, end))
>>                         panic("TDX: Failed to accept memory\n");
>> -       } else if (sev_snp_enabled()) {
>> +       } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
>> +               sevsnp = true;
>>                 snp_accept_memory(start, end);
>>         } else {
>>                 error("Cannot accept memory: unknown platform\n");
>>
>>>>
>>>> And for the record, could you please indicate whether you are ok with
>>>> the co-developed-by/signed-off-by credits on this patch (and
>>>> subsequent revisions)?
>>>
>>> Yep, I'm fine with that.
>>>
>>
>> Thanks.
diff mbox series

Patch

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bb55934c1cee..89ba168f4f0f 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -164,10 +164,7 @@  bool sev_snp_enabled(void)
 
 static void __page_state_change(unsigned long paddr, enum psc_op op)
 {
-	u64 val;
-
-	if (!sev_snp_enabled())
-		return;
+	u64 val, msr;
 
 	/*
 	 * If private -> shared then invalidate the page before requesting the
@@ -176,6 +173,9 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if (op == SNP_PAGE_STATE_SHARED)
 		pvalidate_4k_page(paddr, paddr, false);
 
+	/* Save the current GHCB MSR value */
+	msr = sev_es_rd_ghcb_msr();
+
 	/* Issue VMGEXIT to change the page state in RMP table. */
 	sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
 	VMGEXIT();
@@ -185,6 +185,9 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
 
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(msr);
+
 	/*
 	 * Now that page state is changed in the RMP table, validate it so that it is
 	 * consistent with the RMP entry.
@@ -195,11 +198,17 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 
 void snp_set_page_private(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
 }
 
 void snp_set_page_shared(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
 }
 
@@ -223,56 +232,10 @@  static bool early_setup_ghcb(void)
 	return true;
 }
 
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
-				       phys_addr_t pa, phys_addr_t pa_end)
-{
-	struct psc_hdr *hdr;
-	struct psc_entry *e;
-	unsigned int i;
-
-	hdr = &desc->hdr;
-	memset(hdr, 0, sizeof(*hdr));
-
-	e = desc->entries;
-
-	i = 0;
-	while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
-		hdr->end_entry = i;
-
-		e->gfn = pa >> PAGE_SHIFT;
-		e->operation = SNP_PAGE_STATE_PRIVATE;
-		if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
-			e->pagesize = RMP_PG_SIZE_2M;
-			pa += PMD_SIZE;
-		} else {
-			e->pagesize = RMP_PG_SIZE_4K;
-			pa += PAGE_SIZE;
-		}
-
-		e++;
-		i++;
-	}
-
-	if (vmgexit_psc(boot_ghcb, desc))
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pvalidate_pages(desc);
-
-	return pa;
-}
-
 void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 {
-	struct snp_psc_desc desc = {};
-	unsigned int i;
-	phys_addr_t pa;
-
-	if (!boot_ghcb && !early_setup_ghcb())
-		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
-	pa = start;
-	while (pa < end)
-		pa = __snp_accept_memory(&desc, pa, end);
+	for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+		__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
 }
 
 void sev_es_shutdown_ghcb(void)