mbox series

[RFC,v1,00/15] MSR refactor with new MSR instructions support

Message ID 20250331082251.3171276-1-xin@zytor.com
Headers show
Series MSR refactor with new MSR instructions support | expand

Message

Xin Li March 31, 2025, 8:22 a.m. UTC
Obviously the existing MSR code and the pv_ops MSR access APIs need some
love: https://lore.kernel.org/lkml/87y1h81ht4.ffs@tglx/

hpa has started a discussion about how to refactor it last October:
https://lore.kernel.org/lkml/7a4de623-ecda-4369-a7ae-0c43ef328177@zytor.com/

The consensus so far is to utilize the alternatives mechanism to eliminate
the Xen MSR access overhead on native systems and enable new MSR instructions
based on their availability.

To achieve this, a code refactor is necessary.  Initially, the MSR API usage
needs to be unified and simplified, which is addressed by patches 1 through 7.

Patches 8 and 9 introduce basic support for immediate form MSR instructions,
while patch 10 employs the immediate form WRMSRNS in VMX.

Finally, patches 11 to 15 leverage the alternatives mechanism to read and
write MSR.


H. Peter Anvin (Intel) (1):
  x86/extable: Implement EX_TYPE_FUNC_REWIND

Xin Li (Intel) (14):
  x86/msr: Replace __wrmsr() with native_wrmsrl()
  x86/msr: Replace __rdmsr() with native_rdmsrl()
  x86/msr: Simplify pmu_msr_{read,write}()
  x86/msr: Let pv_cpu_ops.write_msr{_safe}() take an u64 instead of two
    u32
  x86/msr: Replace wrmsr(msr, low, 0) with wrmsrl(msr, value)
  x86/msr: Remove MSR write APIs that take the MSR value in two u32
    arguments
  x86/msr: Remove pmu_msr_{read,write}()
  x86/cpufeatures: Add a CPU feature bit for MSR immediate form
    instructions
  x86/opcode: Add immediate form MSR instructions to x86-opcode-map
  KVM: VMX: Use WRMSRNS or its immediate form when available
  x86/msr: Use the alternatives mechanism to write MSR
  x86/msr: Use the alternatives mechanism to read MSR
  x86/extable: Add support for the immediate form MSR instructions
  x86/msr: Move the ARGS macros after the MSR read/write APIs

 arch/x86/coco/sev/core.c                   |   2 +-
 arch/x86/events/amd/brs.c                  |   4 +-
 arch/x86/hyperv/hv_apic.c                  |   6 +-
 arch/x86/hyperv/hv_vtl.c                   |   4 +-
 arch/x86/hyperv/ivm.c                      |   2 +-
 arch/x86/include/asm/apic.h                |   4 +-
 arch/x86/include/asm/asm.h                 |   6 +
 arch/x86/include/asm/cpufeatures.h         |  19 +-
 arch/x86/include/asm/extable_fixup_types.h |   1 +
 arch/x86/include/asm/fred.h                |   2 +-
 arch/x86/include/asm/mshyperv.h            |   2 +-
 arch/x86/include/asm/msr-index.h           |   6 +
 arch/x86/include/asm/msr.h                 | 664 ++++++++++++++++-----
 arch/x86/include/asm/paravirt.h            |  64 --
 arch/x86/include/asm/paravirt_types.h      |  11 -
 arch/x86/include/asm/switch_to.h           |   2 +-
 arch/x86/kernel/cpu/amd.c                  |   2 +-
 arch/x86/kernel/cpu/common.c               |  10 +-
 arch/x86/kernel/cpu/mce/core.c             |   6 +-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c  |  12 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c     |   2 +-
 arch/x86/kernel/cpu/scattered.c            |   1 +
 arch/x86/kernel/cpu/umwait.c               |   4 +-
 arch/x86/kernel/kvm.c                      |   2 +-
 arch/x86/kernel/kvmclock.c                 |   2 +-
 arch/x86/kernel/paravirt.c                 |   4 -
 arch/x86/kvm/svm/svm.c                     |  15 +-
 arch/x86/kvm/vmx/vmenter.S                 |  28 +-
 arch/x86/kvm/vmx/vmx.c                     |   4 +-
 arch/x86/lib/x86-opcode-map.txt            |   5 +-
 arch/x86/mm/extable.c                      | 186 ++++--
 arch/x86/mm/mem_encrypt_identity.c         |   4 +-
 arch/x86/xen/enlighten_pv.c                | 110 ++--
 arch/x86/xen/pmu.c                         |  43 +-
 arch/x86/xen/xen-asm.S                     |  89 +++
 arch/x86/xen/xen-ops.h                     |  12 +-
 drivers/ata/pata_cs5535.c                  |  12 +-
 drivers/ata/pata_cs5536.c                  |   6 +-
 drivers/cpufreq/acpi-cpufreq.c             |   2 +-
 drivers/cpufreq/e_powersaver.c             |   2 +-
 drivers/cpufreq/powernow-k6.c              |   8 +-
 tools/arch/x86/lib/x86-opcode-map.txt      |   5 +-
 42 files changed, 896 insertions(+), 479 deletions(-)


base-commit: 8fc8ae1aeed6dc895bf35a4797c6e770574f4612

Comments

Ingo Molnar March 31, 2025, 10:26 a.m. UTC | #1
* Xin Li (Intel) <xin@zytor.com> wrote:

> __rdmsr() is the lowest level primitive MSR read API, and its direct 
> use is NOT preferred.  Use its wrapper function native_rdmsrl() 
> instead.

This description is very misleading. As of today, native_rdmsrl() 
doesn't exist in-tree, so it cannot be 'preferred' in any fashion.

We have native_read_msr(), a confusingly similar function name, and 
this changelog doesn't make it clear, at all, why the extra 
native_rdmsrl() indirection is introduced.

Please split this into two changes and explain them properly:

 - x86/msr: Add the native_rdmsrl() helper
 - x86/msr: Convert __rdmsr() uses to native_rdmsrl() uses

For the first patch you should explain why you want an extra layer of 
indirection within these APIs and how it relates to native_read_msr() 
and why there is a _read_msr() and a _rdmsr() variant...

Thanks,

	Ingo
Andrew Cooper March 31, 2025, 8:41 p.m. UTC | #2
On 31/03/2025 9:27 pm, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 31, 2025 at 01:22:46AM -0700, Xin Li (Intel) wrote:
>> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
>> ---
>>  arch/x86/include/asm/msr-index.h |  6 ++++++
>>  arch/x86/kvm/vmx/vmenter.S       | 28 ++++++++++++++++++++++++----
>>  2 files changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e6134ef2263d..04244c3ba374 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1226,4 +1226,10 @@
>>  						* a #GP
>>  						*/
>>  
>> +/* Instruction opcode for WRMSRNS supported in binutils >= 2.40 */
>> +#define ASM_WRMSRNS		_ASM_BYTES(0x0f,0x01,0xc6)
>> +
>> +/* Instruction opcode for the immediate form RDMSR/WRMSRNS */
>> +#define ASM_WRMSRNS_RAX		_ASM_BYTES(0xc4,0xe7,0x7a,0xf6,0xc0)
>> +
>>  #endif /* _ASM_X86_MSR_INDEX_H */
>> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
>> index f6986dee6f8c..9fae43723c44 100644
>> --- a/arch/x86/kvm/vmx/vmenter.S
>> +++ b/arch/x86/kvm/vmx/vmenter.S
>> @@ -64,6 +64,29 @@
>>  	RET
>>  .endm
>>  
>> +/*
>> + * Write EAX to MSR_IA32_SPEC_CTRL.
>> + *
>> + * Choose the best WRMSR instruction based on availability.
>> + *
>> + * Replace with 'wrmsrns' and 'wrmsrns %rax, $MSR_IA32_SPEC_CTRL' once binutils support them.
>> + */
>> +.macro WRITE_EAX_TO_MSR_IA32_SPEC_CTRL
>> +	ALTERNATIVE_2 __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ds wrmsr),					\
>> +		      __stringify(mov $MSR_IA32_SPEC_CTRL, %ecx;		\
>> +				  xor %edx, %edx;				\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS),					\
>> +		      X86_FEATURE_WRMSRNS,					\
>> +		      __stringify(xor %_ASM_AX, %_ASM_AX;			\
>> +				  mov %edi, %eax;				\
>> +				  ASM_WRMSRNS_RAX; .long MSR_IA32_SPEC_CTRL),	\
>> +		      X86_FEATURE_MSR_IMM
>> +.endm
>> +
>>  .section .noinstr.text, "ax"
>>  
>>  /**
>> @@ -123,10 +146,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
>>  	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
>>  	cmp %edi, %esi
>>  	je .Lspec_ctrl_done
>> -	mov $MSR_IA32_SPEC_CTRL, %ecx
>> -	xor %edx, %edx
>> -	mov %edi, %eax
>> -	wrmsr
> Is that the right path forward?
>
> That is replace the MSR write to disable speculative execution with a
> non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?

MSR_SPEC_CTRL is explicitly non-serialising, even with a plain WRMSR.

non-serialising != non-speculative.

Although WRMSRNS's precise statement on the matter of
non-speculativeness is woolly, given an intent to optimise it some more
in the future.

~Andrew
H. Peter Anvin March 31, 2025, 8:55 p.m. UTC | #3
On 3/31/25 13:41, Andrew Cooper wrote:
>>
>> That is replace the MSR write to disable speculative execution with a
>> non-serialized WRMSR? Doesn't that mean the WRMSRNS is speculative?
> 
> MSR_SPEC_CTRL is explicitly non-serialising, even with a plain WRMSR.
> 
> non-serialising != non-speculative.
> 
> Although WRMSRNS's precise statement on the matter of
> non-speculativeness is woolly, given an intent to optimise it some more
> in the future.
> 

To be specific, "serializing" is a much harder statement than 
"non-speculative."

For architecturally non-serializing MSRs, WRMSRNS and WRMSR are 
equivalent (or to put it differently, WRMSR acts like WRMSRNS.)

The advantage with making them explicitly WRMSRNS is that it allows for 
the substitution of the upcoming immediate forms.

	-hpa