mbox series

[v10,00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

Message ID 20231016132819.1002933-1-michael.roth@amd.com
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Message

Michael Roth Oct. 16, 2023, 1:27 p.m. UTC
This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v10

and is based on top of the following series:

  "[PATCH RFC gmem v1 0/8] KVM: gmem hooks/changes needed for x86 (other archs?)"
  https://lore.kernel.org/kvm/20231016115028.996656-1-michael.roth@amd.com/

which in turn is based on the KVM-x86 staging tree for guest_memfd:

  https://github.com/kvm-x86/linux/commits/guest_memfd


== OVERVIEW ==

This patchset implements SEV-SNP hypervisor support for linux. It
relies on the gmem changes noted above, which are still in an RFC
state, but other than those aspects, the series is being targeted for
inclusion in the KVM x86 tree to support running SEV-SNP guests on AMD
EPYC systems utilizing Zen 3 and newer microarchitectures.

More details on what SEV-SNP is and how it works are available below
under "BACKGROUND".


== PATCH LAYOUT ==

PATCH 01-02: Dependencies for patch #3 that are already upstream but not in
             current guest_memfd staging tree
PATCH 03   : General SEV-ES fix for MSR_IA32_XSS interception that fixes a
             minor bug for SEV-ES, but a more severe one for SNP guests.
             Planning to also submit this separately as an SEV-ES fix.
PATCH 04-19: Host SNP initialization code and CCP driver prep for handling
             SNP cmds
PATCH 20-43: general SNP enablement for KVM and CCP driver
PATCH 47-50: misc handling for IOMMU support, guest request handling, debug
             infrastructure, and kdump-related handling.


== TESTING ==

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-latest-gmem-v12

SEV-SNP with gmem enabled:

  # set discard=none to disable discarding memory post-conversion, faster
  # boot times, but increased memory usage
  qemu-system-x86_64 -cpu EPYC-Milan-v2 \
    -object memory-backend-memfd-private,id=ram1,size=2G,share=true \
    -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,discard=both \
    -machine q35,confidential-guest-support=sev0,memory-backend=ram1,kvm-type=protected \
    ...

KVM selftests for UPM:

  cd $kernel_src_dir
  make -C tools/testing/selftests TARGETS="kvm" EXTRA_CFLAGS="-DDEBUG -I<path to kernel headers>"
  sudo tools/testing/selftests/kvm/x86_64/private_mem_conversions_test


== BACKGROUND (SEV-SNP) ==

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Guest Support now part of mainline.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:

- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

When pages are marked as guest-owned in the RMP table, they are assigned
to a specific guest/ASID, as well as a specific GFN with in the guest. Any
attempts to map it in the RMP table to a different guest/ASID, or a
different GFN within a guest/ASID, will result in an RMP nested page fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outisde of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation SEV-SNP, private guest memory is managed by a new 
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v9:

 * Split off gmem changes to separate RFC series, drop RFC tag from this series
 * Use 2M RMPUPDATE instructions whenever possible when invalidating/releasing
   gmem pages
 * Tighten up RMP #NPF handling to better differentiate spurious cases from
   unexpected behavior
 * Simplify/optimize logic for determine when 2M NPT private mappings are
   possible
 * Be more consistent with PFN data types and stub return values (Dave)
 * Reduce potential flooding from frequently-printed pr_debug()'s (Dave)
 * Use existing #PF handling paths to catch illegal userspace-generated RMP
   faults (Dave)
 * Improve host kexec/kdump support (Ashish)
 * Reduce overhead from unecessary WBINVD via MMU notifiers (Ashish)
 * Avoid host crashes during CCP module probe if SNP_INIT* is issued while
   guests are running (Tom L.)
 * Simplify AutoIBRS disablement (Kim, Dave)
 * Avoid unecessary zero'ing in extended guest requests (Alexey)
 * Fix padding in struct sev_user_data_ext_snp_config (Alexey)
 * Report AP creation failures via GHCB error codes rather than inducing #GP in
   guest (Peter)
 * Disallow multiple allocations of snp_context via userspace (Peter)
 * Error out on unsupported SNP policy bits (Tom)
 * Fix snp_leak_pages() stub (Jeremi)
 * Use C99 flexible arrays where appropriate
 * Use helper to handle HVA->PFN conversions prior to dumping RMP entries (Dave)
 * Don't potentially print out all 512 entries when dumping 2MB RMP range (Dave)
 * Don't use a union to dump raw RMP entries, just cast at dump-site (Dave)
 * Don't use helpers to access RMP entry bitfields, use them directly (Dave) 
 * Simplify logic and improve comments for AutoIBRS disablement (Dave)

 # Changes that were split off to separate gmem series
 * Use KVM_X86_SNP_VM to implement SNP-specific checks on whether a fault was
   shared/private and drop the duplicate memslot lookup (Isaku, Sean)
 * Use Isaku's version of patch to plumb 64-bit #NPF error code (Isaku)
 * Fix up stub for kvm_arch_gmem_invalidate() (Boris)

Changes since v8:

 * Rework gmem/UPM hooks based on Sean's latest gmem/UPM tree
 * Move SEV lazy-pinning support out to a separate series which uses this
   series as a prereq instead of the other way around.
 * Re-organize extended guest request patches into 3 patches encompassing
   SEV FD ioctls for host-wide certs, KVM ioctls for per-instance certs,
   and the guest request handling that consumes them. Also move them to
   the top of the series to better separate them for the core SNP patches
   (Alexey, Zhi, Ashish, Dov, Dionna, others)
 * Various other changes/fixups for extended guests request handling (Dov,
   Alexey, Dionna)
 * Use helper to calculate max RMP entry size and improve readability (Dave)
 * Use architecture-independent GPA value for initial VMSA pages
 * Ensure SEV_CMD_SNP_GUEST_REQUEST failures are indicated to guest (Alex)
 * Allocate per-instance certs on-demand (Alex)
 * comment fixup for RMP fault handling (Zhi)
 * commit msg rewording for MSR-based PSCs (Zhi)
 * update SNP command/struct definitions based on 1.54 ABI (Saban)
 * use sev_deactivate_lock around SEV_CMD_SNP_DECOMMISSION (Saban)
 * Various comment/commit fixups (Zhi, Alex, Kim, Vlastimil, Dave, 
 * kexec fixes for newer SNP firmwares (Ashish)
 * Various other fixups and re-ordering of patches.

----------------------------------------------------------------
Ashish Kalra (4):
      x86/sev: Introduce snp leaked pages list
      KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
      iommu/amd: Add IOMMU_SNP_SHUTDOWN support
      crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

Brijesh Singh (29):
      x86/cpufeatures: Add SEV-SNP CPU feature
      x86/sev: Add the host SEV-SNP initialization support
      x86/sev: Add RMP entry lookup helpers
      x86/fault: Add helper for dumping RMP entries
      x86/traps: Define RMP violation #PF error code
      x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
      x86/sev: Invalidate pages from the direct map when adding them to the RMP table
      crypto: ccp: Define the SEV-SNP commands
      crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
      crypto: ccp: Provide API to issue SEV and SNP commands
      crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
      crypto: ccp: Handle the legacy SEV command when SNP is enabled
      crypto: ccp: Add the SNP_PLATFORM_STATUS command
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
      KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
      KVM: SEV: Add initial SEV-SNP support
      KVM: SEV: Add KVM_SNP_INIT command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
      KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SEV: Add support to handle Page State Change VMGEXIT
      KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
      KVM: SEV: Add support to handle RMP nested page faults
      KVM: SVM: Add module parameter to enable the SEV-SNP
      crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
      crypto: ccp: Add debug support for decrypting pages

Dionna Glaze (1):
      x86/sev: Add KVM commands for per-instance certs

Kim Phillips (1):
      x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

Michael Roth (9):
      KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
      x86/fault: Report RMP page faults for kernel addresses
      KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
      KVM: SEV: Add KVM_EXIT_VMGEXIT
      KVM: SEV: Add support for GHCB-based termination requests
      KVM: SEV: Implement gmem hook for initializing private pages
      KVM: SEV: Implement gmem hook for invalidating private pages
      KVM: x86: Add gmem hook for determining max NPT mapping level
      iommu/amd: Report all cases inhibiting SNP enablement

Paolo Bonzini (1):
      KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway

Tom Lendacky (4):
      KVM: SVM: Fix TSC_AUX virtualization setup
      KVM: SEV: Add support to handle AP reset MSR protocol
      KVM: SEV: Use a VMSA physical address variable for populating VMCB
      KVM: SEV: Support SEV-SNP AP Creation NAE event

Vishal Annapurve (1):
      KVM: Add HVA range operator

 Documentation/virt/coco/sev-guest.rst              |   54 +
 Documentation/virt/kvm/api.rst                     |   34 +
 .../virt/kvm/x86/amd-memory-encryption.rst         |  147 ++
 arch/x86/Kbuild                                    |    2 +
 arch/x86/include/asm/cpufeatures.h                 |    1 +
 arch/x86/include/asm/disabled-features.h           |    8 +-
 arch/x86/include/asm/kvm-x86-ops.h                 |    2 +
 arch/x86/include/asm/kvm_host.h                    |    5 +
 arch/x86/include/asm/msr-index.h                   |   11 +-
 arch/x86/include/asm/sev-common.h                  |   33 +
 arch/x86/include/asm/sev-host.h                    |   37 +
 arch/x86/include/asm/sev.h                         |    6 +
 arch/x86/include/asm/svm.h                         |    6 +
 arch/x86/include/asm/trap_pf.h                     |    4 +
 arch/x86/kernel/cpu/amd.c                          |   24 +-
 arch/x86/kernel/cpu/common.c                       |    7 +-
 arch/x86/kernel/crash.c                            |    7 +
 arch/x86/kvm/Kconfig                               |    3 +
 arch/x86/kvm/lapic.c                               |    5 +-
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |   13 +-
 arch/x86/kvm/svm/nested.c                          |    2 +-
 arch/x86/kvm/svm/sev.c                             | 1903 +++++++++++++++++---
 arch/x86/kvm/svm/svm.c                             |   64 +-
 arch/x86/kvm/svm/svm.h                             |   41 +-
 arch/x86/kvm/x86.c                                 |   11 +
 arch/x86/mm/fault.c                                |    5 +
 arch/x86/virt/svm/Makefile                         |    3 +
 arch/x86/virt/svm/sev.c                            |  548 ++++++
 drivers/crypto/ccp/sev-dev.c                       | 1253 ++++++++++++-
 drivers/crypto/ccp/sev-dev.h                       |   16 +
 drivers/iommu/amd/init.c                           |   65 +-
 include/linux/amd-iommu.h                          |    5 +-
 include/linux/kvm_host.h                           |    6 +
 include/linux/psp-sev.h                            |  304 +++-
 include/uapi/linux/kvm.h                           |   74 +
 include/uapi/linux/psp-sev.h                       |   71 +
 tools/arch/x86/include/asm/cpufeatures.h           |    1 +
 virt/kvm/kvm_main.c                                |   49 +
 39 files changed, 4497 insertions(+), 335 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-host.h
 create mode 100644 arch/x86/virt/svm/Makefile
 create mode 100644 arch/x86/virt/svm/sev.c

Comments

Paolo Bonzini Oct. 16, 2023, 3:14 p.m. UTC | #1
On 10/16/23 17:12, Greg KH wrote:
> On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> svm_recalc_instruction_intercepts() is always called at least once
>> before the vCPU is started, so the setting or clearing of the RDTSCP
>> intercept can be dropped from the TSC_AUX virtualization support.
>>
>> Extracted from a patch by Tom Lendacky.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>>   arch/x86/kvm/svm/sev.c | 5 +----
>>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> What stable tree(s) are you wanting this applied to (same for the others
> in this series)?  It's already in the 6.1.56 release, and the Fixes tag
> is for 5.19, so I don't see where it could be missing from?

I tink it's missing in the (destined for 6.7) tree that Michael is 
basing this series on, so he's cherry picking it from Linus's tree.

Paolo
Dionna Glaze Oct. 16, 2023, 11:11 p.m. UTC | #2
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + *             reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> +       __u64 config_address;           /* In */
> +       __u64 certs_address;            /* In */
> +       __u32 certs_len;                /* In */
> +} __packed;
> +

Can we add a generation number to this? Whenever user space sets the
certs blob it will invalidate the instance-specific certificates that
are settable in KVM.
The VMM will need to weave the instance-specific data with the new
certs installed at the machine level since we're not adding
interpretation of the cert blob to KVM.