[RFT,v3,00/21] x86: strict separation of startup code

Message ID	20250512190834.332684-23-ardb+git@google.com
Headers	show Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C826297A59 for <linux-efi@vger.kernel.org>; Mon, 12 May 2025 19:11:33 +0000 (UTC) Date: Mon, 12 May 2025 21:08:35 +0200 Precedence: bulk Mime-Version: 1.0 Message-ID: <20250512190834.332684-23-ardb+git@google.com> Subject: [RFT PATCH v3 00/21] x86: strict separation of startup code From: Ard Biesheuvel <ardb+git@google.com> To: linux-kernel@vger.kernel.org Cc: linux-efi@vger.kernel.org, x86@kernel.org, Ard Biesheuvel <ardb@kernel.org>, Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@kernel.org>, Dionna Amalie Glaze <dionnaglaze@google.com>, Kevin Loughlin <kevinloughlin@google.com>, Tom Lendacky <thomas.lendacky@amd.com> Content-Type: text/plain; charset="UTF-8"
Series	x86: strict separation of startup code \| expand [RFT,v3,00/21] x86: strict separation of startup code [RFT,v3,01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback [RFT,v3,02/21] x86/sev: Use MSR protocol for remapping SVSM calling area [RFT,v3,03/21] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call [RFT,v3,04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL [RFT,v3,05/21] x86/sev: Move GHCB page based HV communication out of startup code [RFT,v3,06/21] x86/sev: Avoid global variable to store virtual address of SVSM area [RFT,v3,07/21] x86/sev: Move MSR save/restore out of early page state change helper [RFT,v3,08/21] x86/sev: Share implementation of MSR-based page state change [RFT,v3,09/21] x86/sev: Pass SVSM calling area down to early page state change API [RFT,v3,10/21] x86/sev: Use boot SVSM CA for all startup and init code [RFT,v3,11/21] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check [RFT,v3,12/21] x86/sev: Unify SEV-SNP hypervisor feature check [RFT,v3,13/21] x86/sev: Provide PIC aliases for SEV related data objects [RFT,v3,14/21] x86/boot: Provide PIC aliases for 5-level paging related constants [RFT,v3,15/21] x86/sev: Move __sev_[get\|put]_ghcb() into separate noinstr object [RFT,v3,16/21] x86/sev: Export startup routines for later use [RFT,v3,17/21] x86/boot: Create a confined code area for startup code [RFT,v3,18/21] x86/boot: Move startup code out of __head section [RFT,v3,19/21] x86/boot: Disallow absolute symbol references in startup code [RFT,v3,20/21] x86/boot: Revert "Reject absolute references in .head.text" [RFT,v3,21/21] x86/boot: Get rid of the .head.text section

Ard Biesheuvel May 12, 2025, 7:08 p.m. UTC

From: Ard Biesheuvel <ardb@kernel.org>

!!! Boot tested on non-SEV guest ONLY !!!!

This RFT series implements a strict separation between startup code and
ordinary code, where startup code is built in a way that tolerates being
invoked from the initial 1:1 mapping of memory.

The existsing approach of emitting this code into .head.text and
checking for absolute relocations in that section is not 100% safe, and
produces diagnostics that are sometimes difficult to interpret. [0]

Instead, rely on symbol prefixes, similar to how this is implemented for
the EFI stub and for the startup code in the arm64 port. This ensures
that startup code can only call other startup code, unless a special
symbol alias is emitted that exposes a non-startup routine to the
startup code.

This is somewhat intrusive, as there are many data objects that are
referenced both by startup code and by ordinary code, and an alias needs
to be emitted for each of those. If startup code references anything
that has not been made available to it explicitly, a build time link
error will occur.

This ultimately allows the .head.text section to be dropped entirely, as
it no longer has a special significance. Instead, code that only
executes at boot is emitted into .init.text as it should.

The majority of changes is around early SEV code. The main issue is that
the use of GHCB pages and SVSM calling areas in code that may run from
both the 1:1 mapping and the kernel virtual mapping is problematic as it
relies on __pa() to perform VA to PA translations, which are ambiguous
in this context. Also, __pa() pulls in non-trivial instrumented code
when CONFIG_DEBUG_VIRTUAL=y and so it is better to avoid VA to PA
translations altogether in the startup code.

Change since RFT/v2:
- Rebase onto tip/x86/boot and drop the patches from the previous
  revision that have been applied in the meantime.
- Omit the pgtable_l5_enabled() changes for now, and just expose PIC
  aliases for the variables in question - this can be sorted later.
- Don't use the boot SVSM calling area in snp_kexec_finish(), but pass
  down the correct per-CPU one to the early page state API.
- Rename arch/x86/coco/sev/sev-noinstr.o to arch/x86/coco/sev/noinstr.o
- Further reduce the amount of SEV code that needs to be constructed in
  a special way.

Change since RFC/v1:
- Include a major disentanglement/refactor of the SEV-SNP startup code,
  so that only code that really needs to run from the 1:1 mapping is
  included in the startup/ code

- Incorporate some early notes from Ingo

Build tested defconfig and allmodconfig

!!! Boot tested on non-SEV guest ONLY !!!!

Again, I will need to lean on Tom to determine whether this breaks
SEV-SNP guest boot. As I mentioned before, I am still waiting for
SEV-SNP capable hardware to be delivered.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>

[0] https://lore.kernel.org/all/CAHk-=wj7k9nvJn6cpa3-5Ciwn2RGyE605BMkjWE4MqnvC9E92A@mail.gmail.com/

Ard Biesheuvel (21):
  x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
  x86/sev: Use MSR protocol for remapping SVSM calling area
  x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
  x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
  x86/sev: Move GHCB page based HV communication out of startup code
  x86/sev: Avoid global variable to store virtual address of SVSM area
  x86/sev: Move MSR save/restore out of early page state change helper
  x86/sev: Share implementation of MSR-based page state change
  x86/sev: Pass SVSM calling area down to early page state change API
  x86/sev: Use boot SVSM CA for all startup and init code
  x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check
  x86/sev: Unify SEV-SNP hypervisor feature check
  x86/sev: Provide PIC aliases for SEV related data objects
  x86/boot: Provide PIC aliases for 5-level paging related constants
  x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
  x86/sev: Export startup routines for later use
  x86/boot: Create a confined code area for startup code
  x86/boot: Move startup code out of __head section
  x86/boot: Disallow absolute symbol references in startup code
  x86/boot: Revert "Reject absolute references in .head.text"
  x86/boot: Get rid of the .head.text section

 arch/x86/boot/compressed/sev-handle-vc.c   |   3 +
 arch/x86/boot/compressed/sev.c             | 132 ++-----
 arch/x86/boot/startup/Makefile             |  21 ++
 arch/x86/boot/startup/exports.h            |  12 +
 arch/x86/boot/startup/gdt_idt.c            |   4 +-
 arch/x86/boot/startup/map_kernel.c         |   4 +-
 arch/x86/boot/startup/sev-shared.c         | 361 +++++---------------
 arch/x86/boot/startup/sev-startup.c        | 190 ++---------
 arch/x86/boot/startup/sme.c                |  29 +-
 arch/x86/coco/sev/Makefile                 |   6 +-
 arch/x86/coco/sev/core.c                   | 179 +++++++---
 arch/x86/coco/sev/{sev-nmi.c => noinstr.c} |  74 ++++
 arch/x86/coco/sev/vc-handle.c              |   3 +-
 arch/x86/coco/sev/vc-shared.c              | 143 +++++++-
 arch/x86/include/asm/init.h                |   6 -
 arch/x86/include/asm/setup.h               |   1 +
 arch/x86/include/asm/sev-internal.h        |  29 +-
 arch/x86/include/asm/sev.h                 |  67 +++-
 arch/x86/kernel/head64.c                   |   5 +-
 arch/x86/kernel/head_32.S                  |   2 +-
 arch/x86/kernel/head_64.S                  |  10 +-
 arch/x86/kernel/vmlinux.lds.S              |   7 +-
 arch/x86/mm/mem_encrypt_amd.c              |   6 -
 arch/x86/mm/mem_encrypt_boot.S             |   6 +-
 arch/x86/platform/pvh/head.S               |   2 +-
 arch/x86/tools/relocs.c                    |   8 +-
 26 files changed, 621 insertions(+), 689 deletions(-)
 create mode 100644 arch/x86/boot/startup/exports.h
 rename arch/x86/coco/sev/{sev-nmi.c => noinstr.c} (61%)


base-commit: ed4d95d033e359f9445e85bf5a768a5859a5830b

Borislav Petkov May 12, 2025, 7:17 p.m. UTC | #1

On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> !!! Boot tested on non-SEV guest ONLY !!!!

...

> !!! Boot tested on non-SEV guest ONLY !!!!
> 
> Again, I will need to lean on Tom to determine whether this breaks
> SEV-SNP guest boot. As I mentioned before, I am still waiting for
> SEV-SNP capable hardware to be delivered.

Ingo, please do not rush this stuff in before Tom and I have tested it
successfully with SEV* guests.

Thanks!

Ingo Molnar May 13, 2025, 10:02 a.m. UTC | #2

* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> > 
> > !!! Boot tested on non-SEV guest ONLY !!!!
> 
> ...
> 
> > !!! Boot tested on non-SEV guest ONLY !!!!
> > 
> > Again, I will need to lean on Tom to determine whether this breaks
> > SEV-SNP guest boot. As I mentioned before, I am still waiting for
> > SEV-SNP capable hardware to be delivered.
> 
> Ingo, please do not rush this stuff in before Tom and I have tested it
> successfully with SEV* guests.
> 
> Thanks!

I don't intend to rush it, but note that AMD's SEV-SNP testing is 
lagging a *lot* at the moment: Ard asked for testing the -v2 series on 
May 4:

    https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com

That request for testing was ignored AFAICS. It's May 13 and still 
crickets.

We also had SEV-SNP boot bugs pending since August 2024, that nobody 
but (eventually) AMD triggered. Ie. very few people outside of the 
vendor are testing SEV-SNP AFAICS, and even vendor testing is sporadic 
...

Please ask AMD internally to get SEV-SNP tested more reliably. Testing 
this -v3 series would be a good start. Hint, hint. ;-)

Thanks!

	Ingo

Borislav Petkov May 13, 2025, 10:12 a.m. UTC | #3

On Tue, May 13, 2025 at 12:02:22PM +0200, Ingo Molnar wrote:
> I don't intend to rush it,

Thanks.

> That request for testing was ignored AFAICS. It's May 13 and still 
> crickets.

Not ignored - Tom and I are testing but we're busy as hell too.

> We also had SEV-SNP boot bugs pending since August 2024, that nobody 
> but (eventually) AMD triggered.

Where?

> Ie. very few people outside of the vendor are testing SEV-SNP AFAICS, and
> even vendor testing is sporadic ...

Not true - SEV* testing happens on a daily basis.

> Please ask AMD internally to get SEV-SNP tested more reliably. Testing 
> this -v3 series would be a good start. Hint, hint. ;-)

We test everything that goes into linux-next. We haven't started testing
unreviewed patchsets yet because we don't do that - that stuff is moving.

So if you want to merge something, just ping me or Tom and we'll test it. But
you have to give us ample time to do so - you can't merge something which Ard
sent *on the same day*.

If you can't test it, you don't merge it but ask people to test it. You know
that.

Thx.

Ingo Molnar May 13, 2025, 11:22 a.m. UTC | #4

* Borislav Petkov <bp@alien8.de> wrote:

> On Tue, May 13, 2025 at 12:02:22PM +0200, Ingo Molnar wrote:
> > I don't intend to rush it,
> 
> Thanks.
> 
> > That request for testing was ignored AFAICS. It's May 13 and still 
> > crickets.
> 
> Not ignored - Tom and I are testing but we're busy as hell too.

Yeah, so the problem is that SEV* is hardware that basically no active 
tester outside of the vendor (AMD) owns and is testing against 
development trees AFAICS.

> > We also had SEV-SNP boot bugs pending since August 2024, that 
> > nobody but (eventually) AMD triggered.
> 
> Where?

I did a quick Git search, and here are a few examples:

For example, this commit from last summer:

  6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")

... was only fixed recently:

  d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")

Or this commit from June 2024:

  34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas")

... was only fixed a few days ago:

  f7387eff4bad ("x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro")

Or this commit from June 2024:

  fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")

... was fixed a few weeks ago:

  8ed12ab1319b ("x86/boot/sev: Support memory acceptance in the EFI stub under SVSM")

Ie. bugfix latencies here were 10+ months.

Note that two of those fixes were from Ard who is working on further 
robustifying the startup code - a much needed change.

Ie. when Ard is asking for SEV-SNP testing for WIP series, which he did 
10+ days ago, you should not ignore it ... or if you do ignore his 
request for testing, you should not complain about the changes being 
merged eventually, once they pass review & testing on non-SEV 
platforms.

> > Ie. very few people outside of the vendor are testing SEV-SNP 
> > AFAICS, and even vendor testing is sporadic ...
> 
> Not true - SEV* testing happens on a daily basis.

If you didn't have time to personally test Ard's -v2 series since May 
2, that's OK: I can merge these proposed changes in an RFT branch so 
that it gets tested in the daily testing flow. [see further below for 
the Git link]

In other words: please no "gatekeeping". Please don't force Ard into a 
catch-22 situation where he cannot test the patches on SEV-SNP, but you 
are blocking these x86 startup code changes on the grounds that they 
weren't tested on SEV-SNP ...

> > Please ask AMD internally to get SEV-SNP tested more reliably. 
> > Testing this -v3 series would be a good start. Hint, hint. ;-)
> 
> We test everything that goes into linux-next. We haven't started 
> testing unreviewed patchsets yet because we don't do that - that 
> stuff is moving.
>
> So if you want to merge something, just ping me or Tom and we'll test 
> it.

Here's Ard's request from May 2:

    https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com

    "Again, I will need to lean on Tom to determine whether this breaks 
     SEV-SNP guest boot. As I mentioned before, I am still waiting for 
     SEV-SNP capable hardware to be delivered."

This request for testing was ignored AFAICS.

> But you have to give us ample time to do so - you can't merge 
> something which Ard sent *on the same day*.

Sure: -v2 was sent more than 10 days ago, and the testing request was 
ignored AFAICS. Do 10 days count as 'ample time'?

Anyway, to make it even lower-overhead to test these changes, I've put 
the -v3 series into the WIP.x86/boot tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/boot

Only lightly tested. Just a "boots/doesn't boot" kind of quick feedback 
would be much appreciated.

Note that naturally this tree is still subject to rebasing, as review 
feedback is incorporated.

Thanks!

	Ingo

Ard Biesheuvel May 13, 2025, 1:55 p.m. UTC | #5

On Mon, 12 May 2025 at 20:11, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The early page state change API is mostly only used very early, when
> only the boot time SVSM calling area is in use. However, this API is
> also called by the kexec finishing code, which runs very late, and
> potentially from a different CPU (which uses a different calling area).
>
> To avoid pulling the per-CPU SVSM calling area pointers and related SEV
> state into the startup code, refactor the page state change API so the
> SVSM calling area virtual and physical addresses can be provided by the
> caller.
>
> No functional change intended.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

This patch is broken - I'll send a followup fix asap.



> ---
>  arch/x86/boot/compressed/sev.c      | 12 +++++++++---
>  arch/x86/boot/startup/sev-shared.c  | 18 ++++++++++--------
>  arch/x86/boot/startup/sev-startup.c | 11 +++++++----
>  arch/x86/coco/sev/core.c            |  3 ++-
>  arch/x86/include/asm/sev-internal.h |  3 ++-
>  5 files changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 7a01eef9ae01..04bc39d065ff 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -68,7 +68,9 @@ void snp_set_page_private(unsigned long paddr)
>                 return;
>
>         msr = sev_es_rd_ghcb_msr();
> -       __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
> +       __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE,
> +                           (struct svsm_ca *)boot_svsm_caa_pa,
> +                           boot_svsm_caa_pa);
>         sev_es_wr_ghcb_msr(msr);
>  }
>
> @@ -80,7 +82,9 @@ void snp_set_page_shared(unsigned long paddr)
>                 return;
>
>         msr = sev_es_rd_ghcb_msr();
> -       __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
> +       __page_state_change(paddr, SNP_PAGE_STATE_SHARED,
> +                           (struct svsm_ca *)boot_svsm_caa_pa,
> +                           boot_svsm_caa_pa);
>         sev_es_wr_ghcb_msr(msr);
>  }
>
> @@ -109,7 +113,9 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
>         u64 msr = sev_es_rd_ghcb_msr();
>
>         for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
> -               __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
> +               __page_state_change(pa, SNP_PAGE_STATE_PRIVATE,
> +                                   (struct svsm_ca *)boot_svsm_caa_pa,
> +                                   boot_svsm_caa_pa);
>         sev_es_wr_ghcb_msr(msr);
>  }
>
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index dae770327b50..70ad9a0aa023 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -538,7 +538,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
>         }
>  }
>
> -static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
> +static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
> +                                    struct svsm_ca *caa, u64 caa_pa)
>  {
>         struct svsm_pvalidate_call *pc;
>         struct svsm_call call = {};
> @@ -552,10 +553,10 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
>          */
>         flags = native_local_irq_save();
>
> -       call.caa = svsm_get_caa();
> +       call.caa = caa;
>
>         pc = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
> -       pc_pa = svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
> +       pc_pa = caa_pa + offsetof(struct svsm_ca, svsm_buffer);
>
>         pc->num_entries = 1;
>         pc->cur_index   = 0;
> @@ -578,12 +579,12 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
>  }
>
>  static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
> -                                    bool validate)
> +                                    bool validate, struct svsm_ca *caa, u64 caa_pa)
>  {
>         int ret;
>
>         if (snp_vmpl) {
> -               svsm_pval_4k_page(paddr, validate);
> +               svsm_pval_4k_page(paddr, validate, caa, caa_pa);
>         } else {
>                 ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
>                 if (ret)
> @@ -591,7 +592,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
>         }
>  }
>
> -static void __head __page_state_change(unsigned long paddr, enum psc_op op)
> +static void __head __page_state_change(unsigned long paddr, enum psc_op op,
> +                                      struct svsm_ca *caa, u64 caa_pa)
>  {
>         u64 val;
>
> @@ -600,7 +602,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
>          * state change in the RMP table.
>          */
>         if (op == SNP_PAGE_STATE_SHARED)
> -               pvalidate_4k_page(paddr, paddr, false);
> +               pvalidate_4k_page(paddr, paddr, false, caa, caa_pa);
>
>         /* Issue VMGEXIT to change the page state in RMP table. */
>         sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> @@ -616,7 +618,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
>          * consistent with the RMP entry.
>          */
>         if (op == SNP_PAGE_STATE_PRIVATE)
> -               pvalidate_4k_page(paddr, paddr, true);
> +               pvalidate_4k_page(paddr, paddr, true, caa, caa_pa);
>  }
>
>  /*
> diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
> index 28bf68753580..7a3ad17d06f6 100644
> --- a/arch/x86/boot/startup/sev-startup.c
> +++ b/arch/x86/boot/startup/sev-startup.c
> @@ -132,7 +132,8 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
>
>  void __head
>  early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> -                     unsigned long npages, enum psc_op op)
> +                     unsigned long npages, enum psc_op op,
> +                     struct svsm_ca *caa, u64 caa_pa)
>  {
>         unsigned long paddr_end;
>
> @@ -142,7 +143,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
>         paddr_end = paddr + (npages << PAGE_SHIFT);
>
>         while (paddr < paddr_end) {
> -               __page_state_change(paddr, op);
> +               __page_state_change(paddr, op, caa, caa_pa);
>
>                 vaddr += PAGE_SIZE;
>                 paddr += PAGE_SIZE;
> @@ -165,7 +166,8 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
>           * Ask the hypervisor to mark the memory pages as private in the RMP
>           * table.
>           */
> -       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
> +       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
> +                             svsm_get_caa(), svsm_get_caa_pa());
>  }
>
>  void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> @@ -181,7 +183,8 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
>                 return;
>
>          /* Ask hypervisor to mark the memory pages shared in the RMP table. */
> -       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
> +       early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
> +                             svsm_get_caa(), svsm_get_caa_pa());
>  }
>
>  /*
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index 0e0ddf4c92aa..39bbbea09c24 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -584,7 +584,8 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
>
>         /* Use the MSR protocol when a GHCB is not available. */
>         if (!boot_ghcb)
> -               return early_set_pages_state(vaddr, __pa(vaddr), npages, op);
> +               return early_set_pages_state(vaddr, __pa(vaddr), npages, op,
> +                                            svsm_get_caa(), svsm_get_caa_pa());
>
>         vaddr = vaddr & PAGE_MASK;
>         vaddr_end = vaddr + (npages << PAGE_SHIFT);
> diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
> index e3b203c280aa..08e2cfdef512 100644
> --- a/arch/x86/include/asm/sev-internal.h
> +++ b/arch/x86/include/asm/sev-internal.h
> @@ -55,7 +55,8 @@ DECLARE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
>  DECLARE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
>
>  void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> -                          unsigned long npages, enum psc_op op);
> +                          unsigned long npages, enum psc_op op,
> +                          struct svsm_ca *ca, u64 caa_pa);
>
>  DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
>  DECLARE_PER_CPU(u64, svsm_caa_pa);
> --
> 2.49.0.1045.g170613ef41-goog
>

Ard Biesheuvel May 13, 2025, 1:58 p.m. UTC | #6

On Tue, 13 May 2025 at 14:55, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 12 May 2025 at 20:11, Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The early page state change API is mostly only used very early, when
> > only the boot time SVSM calling area is in use. However, this API is
> > also called by the kexec finishing code, which runs very late, and
> > potentially from a different CPU (which uses a different calling area).
> >
> > To avoid pulling the per-CPU SVSM calling area pointers and related SEV
> > state into the startup code, refactor the page state change API so the
> > SVSM calling area virtual and physical addresses can be provided by the
> > caller.
> >
> > No functional change intended.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> This patch is broken - I'll send a followup fix asap.

... or actually, the one before it.

Borislav Petkov May 13, 2025, 2:16 p.m. UTC | #7

On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
> Yeah, so the problem is that SEV* is hardware that basically no active 
> tester outside of the vendor (AMD) owns and is testing against 
> development trees AFAICS.

I don't think you even know what you're talking about. Hell, I only recently
gave you the 101 on SEV because you asked me on IRC.

So no, not even close. If you had bothered to ask a search engine of your
choosing, you would've known better.

All the cloud vendors have it and we are getting bug reports and fixes from
them and everyone else that's using it. You could've seen that by doing a git
log on the SEV files in the kernel even.

> I did a quick Git search, and here are a few examples:
> 
> For example, this commit from last summer:
> 
>   6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
> 
> ... was only fixed recently:
> 
>   d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")

Because we pretty much test with huge-page aligned memory sizes...  memory
acceptance tracks pages at the 2mb level. It will accept memory if there is an
unaccepted memory EFI entry that isn't 2mb aligned at the start or end.

So when you have a 4G guest or 16G guest you don't have that. If you specify
4095M for the guest memory, then it will trigger. And since that was done
before SEV was initialized (at least in the EFI stub path, not the
decompressor path) things just didn't work.

> Or this commit from June 2024:
> 
>   34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas")
> 
> ... was only fixed a few days ago:
> 
>   f7387eff4bad ("x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL
>   macro")

That was "fixed" because we never had to run a multi-VMPL level setup in Linux
yet as we run Linux guests differently with that respect. So we couldn't have
hit it even if we wanted to.

And even in the SVSM testing, Linux never requests a non-zero VMPL, and so it
wasn't caught during testing. Linux will always request VMPL0.

There is a lot of testing of the guest code with Coconut-SVSM and it is
a scenario that doesn't exist.

> Or this commit from June 2024:
> 
>   fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
> 
> ... was fixed a few weeks ago:
> 
>   8ed12ab1319b ("x86/boot/sev: Support memory acceptance in the EFI stub under SVSM")

That's a fix for the above fix d54d610243a4 which relates to the multiple VMPL
thing which we don't do (yet) in Linux.

> Ie. bugfix latencies here were 10+ months.

While doing your git search, did you check my reaction time when fixes are
sent too?

Or you decided to find some random patches with Fixes: tags pointing to SEV
code?

Or are you saying our crystal ball of what's broken is not working fast
enough?

Lemme see:

https://lore.kernel.org/r/c0af2efa-aea4-43aa-b1da-46ac4c50314b@amd.com

This is only the latest test report I requested.

So no, we test the hell of it and as much as possible. What you claim here is
dumbfounded and completely false.

> Note that two of those fixes were from Ard who is working on further 
> robustifying the startup code - a much needed change.

Really? Much needed huh?

Please do explain why is it much needed?

Because the reason Ard is doing it is a different one but maybe
I misunderstood him...

> Ie. when Ard is asking for SEV-SNP testing for WIP series, which he did 
> 10+ days ago, you should not ignore it

More unfounded claims. Here's Tom and me ignoring it:

https://lore.kernel.org/r/836eb6be-926b-dfb4-2c67-f55cba4a072b@amd.com
https://lore.kernel.org/r/20250507095801.GNaBsuqd7m15z0kHji@fat_crate.local
https://lore.kernel.org/r/20250508110800.GBaByQkJwmZlihk6Xp@fat_crate.local
https://lore.kernel.org/r/f4750413-a2e6-15c4-7fa5-2595b509500b@amd.com
https://lore.kernel.org/r/20250505160346.GJaBjhYp09sLZ5AyyJ@fat_crate.local
https://lore.kernel.org/r/20250505164759.GKaBjrv5SI4MX_NiX-@fat_crate.local

Nah, this is not ignoring - this is Tom and me rushing to see whether
something broke because *you* applied stuff on the same day without waiting
for any review!

This is basically you doing whatever the hell you like and not even asking
people working on that code.

And you completely ignored my requests on IRC to wait with that code a bit so
that we can take a look.

> ... or if you do ignore his request for testing, you should not complain
> about the changes being merged eventually, once they pass review & testing
> on non-SEV platforms.

What review?

Show me.

Commit-ID:     bd4a58beaaf1f4aff025282c6e8b130bdb4a29e4
Gitweb:        https://git.kernel.org/tip/bd4a58beaaf1f4aff025282c6e8b130bdb4a29e4
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Sun, 04 May 2025 11:52:31 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sun, 04 May 2025 15:27:23 +02:00

What review do you expect to see in 3 hours on a Sunday?!?!

> If you didn't have time to personally test Ard's -v2 series since May 
> 2, that's OK

It was May 4th, it was a Sunday. And you can see my replies on the next
days.

> I can merge these proposed changes in an RFT branch so that it gets tested

How about you wait first for those patches to be reviewed like every other
patchset on lkml and then take them?

I mean, normal review process. Remember?

The thing we all are supposed to do...

> In other words: please no "gatekeeping".

Sorry, zero gatekeeping here.

> Please don't force Ard into a catch-22 situation where he cannot test the
> patches on SEV-SNP, but you are blocking these x86 startup code changes on
> the grounds that they weren't tested on SEV-SNP ...

I'm helping Ard to get and setup a SEV machine. And I'm testing too.

If you had asked, you would've learned all that but you haven't.

> This request for testing was ignored AFAICS.

Review, then test. As always. You know that.

I keep telling you that but I don't think you're hearing me - you just do
whatever the hell you like.

> Sure: -v2 was sent more than 10 days ago, and the testing request was 
> ignored AFAICS. Do 10 days count as 'ample time'?

I am reviewing. I can't just drop everything and concentrate only on this.

Hell, that set is not even ready yet:

https://lore.kernel.org/r/CAMj1kXH5C6FzMyrki_23TTk_Yma5NJdHTo-nv4DmZoz_qaGbVQ@mail.gmail.com

Now you tell me: why are *YOU* in such a hurry with that set?

Ard Biesheuvel May 13, 2025, 3:01 p.m. UTC | #8

On Tue, 13 May 2025 at 15:17, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
...
> > Note that two of those fixes were from Ard who is working on further
> > robustifying the startup code - a much needed change.
>
> Really? Much needed huh?
>
> Please do explain why is it much needed?
>
> Because the reason Ard is doing it is a different one but maybe
> I misunderstood him...
>

I will refrain from inserting myself into the intra-tip review and
testing policy debate, but let me at least provide a quick recap of
what I am doing here and why.

Since commit

   c88d71508e36 x86/boot/64: Rewrite startup_64() in C

dated Jun 6 2017, we have been using C code on the boot path in a way
that is not supported by the toolchain, i.e., to execute non-PIC C
code from a mapping of memory that is different from the one provided
to the linker. It should have been obvious at the time that this was a
bad idea, given the need to sprinkle fixup_pointer() calls left and
right to manipulate global variables (including non-pointer variables)
without crashing.

This C startup code has been expanding, and in particular, the SEV-SNP
startup code has been expanding over the past couple of years, and
grown many of these warts, where the C code needs to use special
annotations or helpers to access global objects.

Google uses Clang internally, and as expected, it does not behave
quite like GCC in this regard either. The result is that the SEV-SNP
boot tended to break in cryptic ways with Clang built kernels, due to
absolute references in the startup code that runs before those
absolute references are mapped.

I've done a preliminary pass upstream with RIP_REL_REF() and
rip_rel_ptr() and the use of the .head.text section for startup code
to ensure that we detect such issues at build time, and it has already
resulted in a number of true positives where the code in question
would have failed at boot time. At this point, I'm not aware of any
issues caused by absolute references going undetected.

However, Linus kindly indicated that the resulting diagnostics
produced by the relocs tool do not meet his high standards, and so I
proposed another approach, which I am implementing now (see cover
letter for details). Note that this approach is also much more robust,
as annotating things as __head by hand to get it emitted into the
section to which the diagnostics are applied is obviously not
foolproof.

Fixing the existing 5-level paging and kernel mapping code was rather
straight-forward. However, splitting up the SEV-SNP code has been
rather challenging due to the way it was put together, i.e., as a
single source file used everywhere, and to which additional
functionality has been added piecemeal (i.e., the SVSM support).

It is obvious that these changes should be tested before being merged,
hence the RFT in the subject. And I have been struggling a bit to get
access to usable hardware. (I do have access to internal development
systems, but those do not fit the 'usable' description by any measure,
given that I have to go through the cloud VM orchestration APIs to
test boot a simple kernel image).

What Boris might allude to is the fact that some of these changes also
form a prerequisite for being able to construct a generic EFI zboot
image for x86, which is a long-term objective that I am working on in
the background. But this is not the main reason.

In any case, there is no urgency wrt these changes as far as I am
concerned, and given that I already found an issue myself with v3,
perhaps it is better if we disregard it for the time being, and we can
come back to it for the next cycle. In the mean time, I can compare
notes with Boris and Tom directly to ensure that this is in the right
shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
as well (for which I sent out a RFC/v3 today).

Borislav Petkov May 13, 2025, 4:44 p.m. UTC | #9

On Tue, May 13, 2025 at 04:01:08PM +0100, Ard Biesheuvel wrote:
> I will refrain from inserting myself into the intra-tip review and
> testing policy debate, but let me at least provide a quick recap of
> what I am doing here and why.

Thanks for taking the time - much appreciated!

> Since commit
> 
>    c88d71508e36 x86/boot/64: Rewrite startup_64() in C
> 
> dated Jun 6 2017, we have been using C code on the boot path in a way
> that is not supported by the toolchain, i.e., to execute non-PIC C
> code from a mapping of memory that is different from the one provided
> to the linker. It should have been obvious at the time that this was a
> bad idea, given the need to sprinkle fixup_pointer() calls left and
> right to manipulate global variables (including non-pointer variables)
> without crashing.
> 
> This C startup code has been expanding, and in particular, the SEV-SNP
> startup code has been expanding over the past couple of years, and
> grown many of these warts, where the C code needs to use special
> annotations or helpers to access global objects.
> 
> Google uses Clang internally, and as expected, it does not behave
> quite like GCC in this regard either. The result is that the SEV-SNP
> boot tended to break in cryptic ways with Clang built kernels, due to
> absolute references in the startup code that runs before those
> absolute references are mapped.

Oh look, Google is using SEV-SNP too. Non! Si! Oh!

https://www.youtube.com/watch?v=pFjSDM6D500

> I've done a preliminary pass upstream with RIP_REL_REF() and
> rip_rel_ptr() and the use of the .head.text section for startup code
> to ensure that we detect such issues at build time, and it has already
> resulted in a number of true positives where the code in question
> would have failed at boot time. At this point, I'm not aware of any
> issues caused by absolute references going undetected.
> 
> However, Linus kindly indicated that the resulting diagnostics
> produced by the relocs tool do not meet his high standards, and so I
> proposed another approach, which I am implementing now (see cover
> letter for details). Note that this approach is also much more robust,
> as annotating things as __head by hand to get it emitted into the
> section to which the diagnostics are applied is obviously not
> foolproof.

AFAIR, this is what you've done for arm64 too, right?

> Fixing the existing 5-level paging and kernel mapping code was rather
> straight-forward. However, splitting up the SEV-SNP code has been
> rather challenging due to the way it was put together, i.e., as a
> single source file used everywhere, and to which additional
> functionality has been added piecemeal (i.e., the SVSM support).
> 
> It is obvious that these changes should be tested before being merged,

Preach!

> hence the RFT in the subject. And I have been struggling a bit to get
> access to usable hardware. (I do have access to internal development
> systems, but those do not fit the 'usable' description by any measure,
> given that I have to go through the cloud VM orchestration APIs to
> test boot a simple kernel image).
> 
> What Boris might allude to is the fact that some of these changes also
> form a prerequisite for being able to construct a generic EFI zboot
> image for x86, which is a long-term objective that I am working on in
> the background. But this is not the main reason.
> 
> In any case, there is no urgency wrt these changes as far as I am

THANK YOU!

So basically we can all slow down and do normal review and testing. Without
any of that hurried patching back'n'forth. And we didn't need any of that
grief!

Basically something which I've been trying to say from the very beginning but
no one listens to me!

Geez.

> concerned, and given that I already found an issue myself with v3,
> perhaps it is better if we disregard it for the time being, and we can
> come back to it for the next cycle. In the mean time, I can compare
> notes with Boris and Tom directly to ensure that this is in the right
> shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> as well (for which I sent out a RFC/v3 today).

Thanks man, let's also sync on IRC.

We have enough time to run the set through all our testing pile and shake out
any outstanding issues.

Ard Biesheuvel May 13, 2025, 9:31 p.m. UTC | #10

On Tue, 13 May 2025 at 17:44, Borislav Petkov <bp@alien8.de> wrote:
>
...
> > I've done a preliminary pass upstream with RIP_REL_REF() and
> > rip_rel_ptr() and the use of the .head.text section for startup code
> > to ensure that we detect such issues at build time, and it has already
> > resulted in a number of true positives where the code in question
> > would have failed at boot time. At this point, I'm not aware of any
> > issues caused by absolute references going undetected.
> >
> > However, Linus kindly indicated that the resulting diagnostics
> > produced by the relocs tool do not meet his high standards, and so I
> > proposed another approach, which I am implementing now (see cover
> > letter for details). Note that this approach is also much more robust,
> > as annotating things as __head by hand to get it emitted into the
> > section to which the diagnostics are applied is obviously not
> > foolproof.
>
> AFAIR, this is what you've done for arm64 too, right?
>

The new approach proposed in this series is what I implemented before
for arm64, yes.

Ingo Molnar May 14, 2025, 6:20 a.m. UTC | #11

* Borislav Petkov <bp@alien8.de> wrote:

> > Note that two of those fixes were from Ard who is working on 
> > further robustifying the startup code - a much needed change.
> 
> Please do explain why is it much needed?

See Ard's excellent description:

    https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB+HxvfQbFz0sTiHdn4B++zVBJ2mhkPkQ4Q@mail.gmail.com

    This C startup code has been expanding, and in particular, the SEV-SNP
    startup code has been expanding over the past couple of years, and
    grown many of these warts, where the C code needs to use special
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
    annotations or helpers to access global objects.

    Google uses Clang internally, and as expected, it does not behave
    quite like GCC in this regard either. The result is that the SEV-SNP
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    boot tended to break in cryptic ways with Clang built kernels, due to
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    absolute references in the startup code that runs before those
    absolute references are mapped.

    I've done a preliminary pass upstream with RIP_REL_REF() and
    rip_rel_ptr() and the use of the .head.text section for startup code
    to ensure that we detect such issues at build time, and it has already
    resulted in a number of true positives where the code in question
    would have failed at boot time. At this point, I'm not aware of any
    issues caused by absolute references going undetected.

    However, Linus kindly indicated that the resulting diagnostics
    produced by the relocs tool do not meet his high standards, and so I
    proposed another approach, which I am implementing now (see cover
    letter for details). Note that this approach is also much more robust,
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    as annotating things as __head by hand to get it emitted into the
    section to which the diagnostics are applied is obviously not
    foolproof.

    Fixing the existing 5-level paging and kernel mapping code was rather
    straight-forward. However, splitting up the SEV-SNP code has been
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    rather challenging due to the way it was put together, i.e., as a
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    single source file used everywhere, and to which additional
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    functionality has been added piecemeal (i.e., the SVSM support).
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I've highlighted all the robustness problems and design flaws of the 
existing SEV* startup code...

> Really? Much needed huh?

Your failure to follow and understand this series is not an excuse for 
the flippant and condescending tone you are using in your replies ...

Ard is being very generous in his responses, and I'll defer to him as 
to the timing of this series: v6.17 is perfectly fine too.

Thanks,

	Ingo

Ingo Molnar May 14, 2025, 6:32 a.m. UTC | #12

* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Tue, 13 May 2025 at 15:17, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
> ...
> > > Note that two of those fixes were from Ard who is working on further
> > > robustifying the startup code - a much needed change.
> >
> > Really? Much needed huh?
> >
> > Please do explain why is it much needed?
> >
> > Because the reason Ard is doing it is a different one but maybe
> > I misunderstood him...
> >
> 
> I will refrain from inserting myself into the intra-tip review and
> testing policy debate, but let me at least provide a quick recap of
> what I am doing here and why.
> 
> Since commit
> 
>    c88d71508e36 x86/boot/64: Rewrite startup_64() in C
> 
> dated Jun 6 2017, we have been using C code on the boot path in a way
> that is not supported by the toolchain, i.e., to execute non-PIC C
> code from a mapping of memory that is different from the one provided
> to the linker. It should have been obvious at the time that this was a
> bad idea, given the need to sprinkle fixup_pointer() calls left and
> right to manipulate global variables (including non-pointer variables)
> without crashing.
> 
> This C startup code has been expanding, and in particular, the SEV-SNP
> startup code has been expanding over the past couple of years, and
> grown many of these warts, where the C code needs to use special
> annotations or helpers to access global objects.
> 
> Google uses Clang internally, and as expected, it does not behave
> quite like GCC in this regard either. The result is that the SEV-SNP
> boot tended to break in cryptic ways with Clang built kernels, due to
> absolute references in the startup code that runs before those
> absolute references are mapped.
> 
> I've done a preliminary pass upstream with RIP_REL_REF() and
> rip_rel_ptr() and the use of the .head.text section for startup code
> to ensure that we detect such issues at build time, and it has already
> resulted in a number of true positives where the code in question
> would have failed at boot time. At this point, I'm not aware of any
> issues caused by absolute references going undetected.
> 
> However, Linus kindly indicated that the resulting diagnostics
> produced by the relocs tool do not meet his high standards, and so I
> proposed another approach, which I am implementing now (see cover
> letter for details). Note that this approach is also much more robust,
> as annotating things as __head by hand to get it emitted into the
> section to which the diagnostics are applied is obviously not
> foolproof.

Exactly.

> Fixing the existing 5-level paging and kernel mapping code was rather 
> straight-forward. However, splitting up the SEV-SNP code has been 
> rather challenging due to the way it was put together, i.e., as a 
> single source file used everywhere, and to which additional 
> functionality has been added piecemeal (i.e., the SVSM support).

Yeah.

> It is obvious that these changes should be tested before being merged,
> hence the RFT in the subject. And I have been struggling a bit to get
> access to usable hardware. (I do have access to internal development
> systems, but those do not fit the 'usable' description by any measure,
> given that I have to go through the cloud VM orchestration APIs to
> test boot a simple kernel image).

:-/ This is one of the reasons why bugs have such long latencies here.

For example it appears nobody has run kdump on SEV-SNP since last 
August:

  d2062cc1b1c3 x86/sev: Do not touch VMSA pages during SNP guest memory kdump

  ...
    It then results in unrecoverable #NPF/RMP faults as the VMSA page is
    marked busy/in-use when the vCPU is running and subsequently a causes
    guest softlockup/hang.
  ...

  1 file changed, 158 insertions(+), 86 deletions(-)

Yet lack of testability of your SEV-SNP series is still somehow 
blocking ongoing development work.

> What Boris might allude to is the fact that some of these changes also
> form a prerequisite for being able to construct a generic EFI zboot
> image for x86, which is a long-term objective that I am working on in
> the background. But this is not the main reason.
> 
> In any case, there is no urgency wrt these changes as far as I am
> concerned, and given that I already found an issue myself with v3,
> perhaps it is better if we disregard it for the time being, and we can
> come back to it for the next cycle. In the mean time, I can compare
> notes with Boris and Tom directly to ensure that this is in the right
> shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> as well (for which I sent out a RFC/v3 today).

You are being exceedingly generous here, but obviously boot code 
changes need quite a bit of testing, and v6.17 (or later) is perfectly 
fine too.

We could perhaps do the mechanical code movement to 
arch/x86/boot/startup/ alone, without any of the followup functional 
changes. This would reduce the cross section of the riskiest part of 
your series substantially. If that sounds good to you, please send a 
series for review.

Thanks,

	Ingo

Ard Biesheuvel May 14, 2025, 7:41 a.m. UTC | #13

On Wed, 14 May 2025 at 07:32, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
...
> > In any case, there is no urgency wrt these changes as far as I am
> > concerned, and given that I already found an issue myself with v3,
> > perhaps it is better if we disregard it for the time being, and we can
> > come back to it for the next cycle. In the mean time, I can compare
> > notes with Boris and Tom directly to ensure that this is in the right
> > shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> > as well (for which I sent out a RFC/v3 today).
>
...
> We could perhaps do the mechanical code movement to
> arch/x86/boot/startup/ alone, without any of the followup functional
> changes. This would reduce the cross section of the riskiest part of
> your series substantially.

The first phase of this work, which is already queued up, was to move
all of the source files that were using RIP_REL_REF() into
arch/x86/boot/startup to be built with -fPIC so that RIP_REL_REF()
could be removed.

The current phase is to separate code that really needs to live under
startup/ from code that doesn't. This is the bit that was
straight-forward for mapping the kernel (including the SME encryption
pieces) because they were already in dedicated source files, but not
so straight-forward for SEV-SNP.

In reality, the mechanical code movement in this phase is mostly in
the opposite direction, where things are separated into startup and
non-startup code at a high level of detail, and the latter is moved
out again.

> If that sounds good to you, please send a
> series for review.
>

Not sure what happened to the tip/x86/boot branch in the meantime, but
assuming that what was already queued up is still scheduled for the
next cycle, I don't think there are any parts of this series that
could be meaningfully rearranged. IOW, the SEV-SNP refactoring needs
to be completed first, which accounts for most of the code movement.
Then, implementing the confined symbol space is just a couple of
patches on top.

Borislav Petkov May 14, 2025, 8:17 a.m. UTC | #14

On Wed, May 14, 2025 at 08:20:31AM +0200, Ingo Molnar wrote:
> I've highlighted all the robustness problems and design flaws of the 
> existing SEV* startup code...

You can highlight all you want - nothing warrants rushing those in without
testing. Period. No matter how much you try to spin it and flip it.

> Your failure to follow and understand this series is not an excuse for 
> the flippant and condescending tone you are using in your replies ...

You must be confusing me with someone else to constantly give me that
condescending schoolteacher tone.

If you do shit like you do, you will get called on it until you behave
according to the rules we all have agreed upon.

Borislav Petkov May 14, 2025, 8:21 a.m. UTC | #15

On Wed, May 14, 2025 at 08:20:31AM +0200, Ingo Molnar wrote:
> See Ard's excellent description:

Btw, you somehow conveniently ignored the patch Ard mentioned started all
this: a patch you committed without any review.

Sounds familiar?

Thomas Gleixner May 14, 2025, 9:54 a.m. UTC | #16

Ingo!

On Wed, May 14 2025 at 08:20, Ingo Molnar wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
> Your failure to follow and understand this series is not an excuse for 
> the flippant and condescending tone you are using in your replies ...

Can you please stop right there?

I've been observing this for a while now and to be blatantly honest,
_you_ are the one who is constantly failing to work in a team and _you_
are the one who acts like a 19'th century school master.

Your outbreaks of showing up and applying random patches fresh from the
press, your absolute refusal to slow down when asked for, is annoying as
hell to everyone working in and against the tip tree.

You can do whatever you want on your private mingo/*.git branches, but
this frenzy of shoving crap into tip has to stop once and forever.

Borislav, Dave and Peter are doing a great job in taming the influx and
reviewing the patch firehose, but there are limitations on bandwidth and
you cannot demand that everyone drops everything just because you
declare that a particular patch series is the most important thing since
the invention of sliced bread.

You are not the one setting the rules of how the tip team works together
and if you can't agree with the rest of the team, then either we all sit
together and hash it out or you step back.

Continuing this sh*tshow, which is going on for way too long now (hint:
years), is not an option at all.

Thanks,

        Thomas

Borislav Petkov May 14, 2025, 5:21 p.m. UTC | #17

On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> This is somewhat intrusive, as there are many data objects that are
> referenced both by startup code and by ordinary code, and an alias needs
> to be emitted for each of those. If startup code references anything
> that has not been made available to it explicitly, a build time link
> error will occur.

Makes me wonder: what will be our rule for what should be made available to
startup code, what should be moved to startup code etc....

I guess as less as possible but past experience teaches us differently.
I guess applying the usual skeptical approach should be sane enough...

We'll see.

Thx.

Ard Biesheuvel May 14, 2025, 5:37 p.m. UTC | #18

On Wed, 14 May 2025 at 18:21, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> > This is somewhat intrusive, as there are many data objects that are
> > referenced both by startup code and by ordinary code, and an alias needs
> > to be emitted for each of those. If startup code references anything
> > that has not been made available to it explicitly, a build time link
> > error will occur.
>
> Makes me wonder: what will be our rule for what should be made available to
> startup code, what should be moved to startup code etc....
>

The rule is really that you can no longer randomly call other code
without being forced to consider carefully what else gets pulled in,
and whether or not that code is guaranteed to behave correctly when
being called via the 1:1 mapping.

> I guess as less as possible but past experience teaches us differently.
> I guess applying the usual skeptical approach should be sane enough...
>

Basically, the first order of business when calling the kernel via the
1:1 mapping is to create the kernel virtual mapping in the page
tables. It is just really unfortunate that SEV-SNP requires so much
prep work before we can even map the kernel.

I don't anticipate that this code will grow a lot after I'm done with it.

Borislav Petkov May 14, 2025, 6:53 p.m. UTC | #19

On Wed, May 14, 2025 at 06:37:14PM +0100, Ard Biesheuvel wrote:
> The rule is really that you can no longer randomly call other code
> without being forced to consider carefully what else gets pulled in,
> and whether or not that code is guaranteed to behave correctly when
> being called via the 1:1 mapping.

I like "consider carefully". Right now, the decompressor is a mess and
untangling it is a losing game.

> Basically, the first order of business when calling the kernel via the
> 1:1 mapping is to create the kernel virtual mapping in the page
> tables. It is just really unfortunate that SEV-SNP requires so much
> prep work before we can even map the kernel.
> 
> I don't anticipate that this code will grow a lot after I'm done with it.

Right, ok.

Lemme keep looking.

Thx.

Ingo Molnar May 15, 2025, 7:17 a.m. UTC | #20

* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Wed, 14 May 2025 at 07:32, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> ...
> > > In any case, there is no urgency wrt these changes as far as I am
> > > concerned, and given that I already found an issue myself with v3,
> > > perhaps it is better if we disregard it for the time being, and we can
> > > come back to it for the next cycle. In the mean time, I can compare
> > > notes with Boris and Tom directly to ensure that this is in the right
> > > shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> > > as well (for which I sent out a RFC/v3 today).
> >
> ...
> > We could perhaps do the mechanical code movement to
> > arch/x86/boot/startup/ alone, without any of the followup functional
> > changes. This would reduce the cross section of the riskiest part of
> > your series substantially.
> 
> The first phase of this work, which is already queued up, was to move
> all of the source files that were using RIP_REL_REF() into
> arch/x86/boot/startup to be built with -fPIC so that RIP_REL_REF()
> could be removed.
> 
> The current phase is to separate code that really needs to live under
> startup/ from code that doesn't. This is the bit that was
> straight-forward for mapping the kernel (including the SME encryption
> pieces) because they were already in dedicated source files, but not
> so straight-forward for SEV-SNP.
> 
> In reality, the mechanical code movement in this phase is mostly in
> the opposite direction, where things are separated into startup and
> non-startup code at a high level of detail, and the latter is moved
> out again.
> 
> > If that sounds good to you, please send a
> > series for review.
> >
> 
> Not sure what happened to the tip/x86/boot branch in the meantime,

It got merged into tip:x86/core. I wrote you an email about it 
yesterday, should be somewhere in your inbox. :)

> [...] but assuming that what was already queued up is still scheduled 
> for the next cycle, I don't think there are any parts of this series 
> that could be meaningfully rearranged. IOW, the SEV-SNP refactoring 
> needs to be completed first, which accounts for most of the code 
> movement.

Understood.

Thanks,

	Ingo

[RFT,v3,00/21] x86: strict separation of startup code

Message

Comments