[v3,0/5] Add support for the Idle HLT intercept feature

Message ID	20240528041926.3989-1-manali.shukla@amd.com
Headers	show Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2078.outbound.protection.outlook.com [40.107.220.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 117CD10F7; Tue, 28 May 2024 04:19:42 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Manali Shukla <manali.shukla@amd.com> To: <kvm@vger.kernel.org>, <linux-kselftest@vger.kernel.org> CC: <pbonzini@redhat.com>, <seanjc@google.com>, <shuah@kernel.org>, <nikunj@amd.com>, <thomas.lendacky@amd.com>, <vkuznets@redhat.com>, <manali.shukla@amd.com>, <bp@alien8.de>, <ajones@ventanamicro.com> Subject: [PATCH v3 0/5] Add support for the Idle HLT intercept feature Date: Tue, 28 May 2024 04:19:21 +0000 Message-ID: <20240528041926.3989-1-manali.shukla@amd.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit
Series	Add support for the Idle HLT intercept feature \| expand [v3,0/5] Add support for the Idle HLT intercept feature [v3,4/5] KVM: selftests: Add an interface to read the data of named vcpu stat [v3,5/5] KVM: selftests: KVM: SVM: Add Idle HLT intercept test

Manali Shukla May 28, 2024, 4:19 a.m. UTC

The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.

Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].

Document for the Idle HLT intercept feature is available at [1].

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
     Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
     https://bugzilla.kernel.org/attachment.cgi?id=306250

Testing Done:
- Added a selftest to test the Idle HLT intercept functionality.
- Compile and functionality testing for the Idle HLT intercept selftest
  are only done for x86_64.
- Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.

v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
  a way that each architecture can share the generic types and also
  provide its own.

v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
  doesn't use string to get stat data based on the comments from Sean.
- Added self_halt() and cli() helpers based on the comments from Sean.

Manali Shukla (5):
  x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
  KVM: SVM: Add Idle HLT intercept support
  KVM: selftests: Add safe_halt() and cli() helpers to common code
  KVM: selftests: Add an interface to read the data of named vcpu stat
  KVM: selftests: KVM: SVM: Add Idle HLT intercept test

 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/svm.h                    |  1 +
 arch/x86/include/uapi/asm/svm.h               |  2 +
 arch/x86/kvm/svm/svm.c                        | 11 ++-
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../testing/selftests/kvm/include/kvm_util.h  | 44 +++++++++
 .../kvm/include/x86_64/kvm_util_arch.h        | 40 +++++++++
 .../selftests/kvm/include/x86_64/processor.h  | 18 ++++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 32 +++++++
 .../selftests/kvm/x86_64/svm_idle_hlt_test.c  | 89 +++++++++++++++++++
 10 files changed, 236 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idle_hlt_test.c


base-commit: d91a9cc16417b8247213a0144a1f0fd61dc855dd

Sean Christopherson June 4, 2024, 12:47 a.m. UTC | #1

On Tue, May 28, 2024, Paolo Bonzini wrote:
> On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote:
> >
> > The upcoming new Idle HLT Intercept feature allows for the HLT
> > instruction execution by a vCPU to be intercepted by the hypervisor
> > only if there are no pending V_INTR and V_NMI events for the vCPU.
> > When the vCPU is expected to service the pending V_INTR and V_NMI
> > events, the Idle HLT intercept won’t trigger. The feature allows the
> > hypervisor to determine if the vCPU is actually idle and reduces
> > wasteful VMEXITs.
> 
> Does this have an effect on the number of vmexits for KVM, unless AVIC
> is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat
> that shows an improvement?
> 
> The reason I am wondering is because KVM does not really use V_INTR
> injection. The "idle HLT" intercept basically differs from the basic
> HLT trigger only in how it handles an STI;HLT sequence, as in that
> case the interrupt can be injected directly and the HLT vmexit is
> suppressed. But in that circumstance KVM would anyway use a V_INTR
> intercept to detect the opening of the interrupt injection window (and
> then the interrupt uses event injection rather than V_INTR). Again,
> this is only true if AVIC is disabled, but that is the default.
> 
> So unless I'm wrong in my analysis above, I'm not sure this series,
> albeit small, is really worth it.

But aren't we hoping to enable x2AVIC by default sooner than later?

> As things stand, it would be more interesting to enable this for nested VMs,
> especially Hyper-V which does use V_INTR and V_TPL; even better, _emulating_
> it on older processors would reduce the L2->L0->L1->L0->L2 path to a
> less-expensive L2->L0->L2 vmexit.

Manali Shukla June 4, 2024, 12:23 p.m. UTC | #2

Hi Paolo,

Thank you for reviewing my patches.

On 5/28/2024 3:52 PM, Paolo Bonzini wrote:
> On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote:
>>
>> The upcoming new Idle HLT Intercept feature allows for the HLT
>> instruction execution by a vCPU to be intercepted by the hypervisor
>> only if there are no pending V_INTR and V_NMI events for the vCPU.
>> When the vCPU is expected to service the pending V_INTR and V_NMI
>> events, the Idle HLT intercept won’t trigger. The feature allows the
>> hypervisor to determine if the vCPU is actually idle and reduces
>> wasteful VMEXITs.
> 
> Does this have an effect on the number of vmexits for KVM, unless AVIC
> is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat
> that shows an improvement?

I have measured the total numbers of vmexits (using perf kvm stat report), while running the the test case I have
written for the idle halt intercept in vmexit.flat.


Without idle halt
-------------------------------------------------------------------------------------------------
|    Event name      |    Samples        |     Sample%      |      Time (ns)     |     Time%    |
-------------------------------------------------------------------------------------------------
|        msr         |      524213       |       49.00%     |      592573933     |    64.00 %   |
-------------------------------------------------------------------------------------------------
|        hlt         |      262080       |       24.00%     |      154429476     |    16.00%    |
-------------------------------------------------------------------------------------------------
|        vintr       |      262080       |       24.00%     |      163724208     |     16.00%   |
-------------------------------------------------------------------------------------------------

With idle halt

-----------------------------------------------------------------------------------------------
|    Event name      |     Samples       |     Sample%    |      Time (ns)     |     Time%    |
-----------------------------------------------------------------------------------------------  
|       msr          |      524213       |     66.00%     |      502011088     |    75.00 %   |
-----------------------------------------------------------------------------------------------
|      vintr         |      262080       |     33.00%     |      147515295     |    22.00%    |
-----------------------------------------------------------------------------------------------
|       io           |        1879       |      0.00%     |       8784729      |     1.00%    |
-----------------------------------------------------------------------------------------------


Below is data for the average of 10 runs of idle halt test case in vmexit.flat
----------------------------------------------------------------------------------
|  idle halt (on/off)  |   full test run    |   ipi test run   |  eoi test run   |
----------------------------------------------------------------------------------
|          on          |       5048 .4      |     1289.2       |    1140.6       |
----------------------------------------------------------------------------------
|          off         |       4806.1       |     1318.6       |    1165.8       |
----------------------------------------------------------------------------------

The "ipi test run" when the idle halt is enabled, takes less time (~2.28% )to finish as compared to
the  "ipi test run" when the idle halt is disabled. 

The "eoi test run" when the idle halt is enabled, takes less time (~2.20% )to finish as compared to
the  "eoi test run" when the idle halt is disabled. 

The "full test run" when the idle halt is enabled, takes more time (~5.4%) as compared to
the "full test run" when the idle halt is disabled.
(Seems a bit odd, I have not yet begun to investigate this behavior)

Snippet of the Test case:
+static void idle_hlt_test(void)
+{
+       x = 0;
+       cli();
+       apic_self_ipi(IPI_TEST_VECTOR);
+       safe_halt();
+       if (x != 1) printf("%d", x);
+}
+
 
> The reason I am wondering is because KVM does not really use V_INTR
> injection. The "idle HLT" intercept basically differs from the basic
> HLT trigger only in how it handles an STI;HLT sequence, as in that
> case the interrupt can be injected directly and the HLT vmexit is
> suppressed. But in that circumstance KVM would anyway use a V_INTR
> intercept to detect the opening of the interrupt injection window (and
> then the interrupt uses event injection rather than V_INTR). Again,
> this is only true if AVIC is disabled, but that is the default.
> 

I have taken traces to analyze it further.

With idle halt enabled
 220.696238: kvm_apic_ipi: dst b0 vec 176 (Fixed|physical|assert|edge|self)
 220.696238: kvm_apic_accept_irq: apicid 0 vec 176 (Fixed|edge)
 220.696238: kvm_apic: apic_write APIC_ICR = 0xb0000440b0
 220.696238: kvm_msr: msr_write 830 = 0xb0000440b0
 220.696238: kvm_entry: vcpu 0, rip 0x406a89
 220.696239: kvm_exit: vcpu 0 reason vintr rip 0x4004ae info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
 220.696239: kvm_inj_virq: IRQ 0xb0
 220.696240: kvm_entry: vcpu 0, rip 0x4004ae
 220.696240: kvm_exit: vcpu 0 reason msr rip 0x406a74 info1 0x0000000000000001 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
 220.696240: kvm_apic: apic_write APIC_EOI = 0x0
 220.696240: kvm_eoi: apicid 0 vector 176


Without idle halt enabled
 6204.951631: kvm_apic_ipi: dst b0 vec 176 (Fixed|physical|assert|edge|self)
 6204.951631: kvm_apic_accept_irq: apicid 0 vec 176 (Fixed|edge)
 6204.951631: kvm_apic: apic_write APIC_ICR = 0xb0000440b0
 6204.951631: kvm_msr: msr_write 830 = 0xb0000440b0
 6204.951631: kvm_entry: vcpu 0, rip 0x406a89
 6204.951632: kvm_exit: vcpu 0 reason hlt rip 0x4004ad info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
 6204.951632: kvm_entry: vcpu 0, rip 0x4004ae
 6204.951632: kvm_exit: vcpu 0 reason vintr rip 0x4004ae info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
 6204.951632: kvm_inj_virq: IRQ 0xb0
 6204.951632: kvm_entry: vcpu 0, rip 0x4004ae
 6204.951633: kvm_exit: vcpu 0 reason msr rip 0x406a74 info1 0x0000000000000001 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
 6204.951633: kvm_apic: apic_write APIC_EOI = 0x0
 6204.951633: kvm_eoi: apicid 0 vector 176

Manali Shukla June 4, 2024, 1:21 p.m. UTC | #3

On 6/4/2024 6:17 AM, Sean Christopherson wrote:
> On Tue, May 28, 2024, Paolo Bonzini wrote:
>> On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote:
>>>
>>> The upcoming new Idle HLT Intercept feature allows for the HLT
>>> instruction execution by a vCPU to be intercepted by the hypervisor
>>> only if there are no pending V_INTR and V_NMI events for the vCPU.
>>> When the vCPU is expected to service the pending V_INTR and V_NMI
>>> events, the Idle HLT intercept won’t trigger. The feature allows the
>>> hypervisor to determine if the vCPU is actually idle and reduces
>>> wasteful VMEXITs.
>>
>> Does this have an effect on the number of vmexits for KVM, unless AVIC
>> is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat
>> that shows an improvement?
>>
>> The reason I am wondering is because KVM does not really use V_INTR
>> injection. The "idle HLT" intercept basically differs from the basic
>> HLT trigger only in how it handles an STI;HLT sequence, as in that
>> case the interrupt can be injected directly and the HLT vmexit is
>> suppressed. But in that circumstance KVM would anyway use a V_INTR
>> intercept to detect the opening of the interrupt injection window (and
>> then the interrupt uses event injection rather than V_INTR). Again,
>> this is only true if AVIC is disabled, but that is the default.
>>
>> So unless I'm wrong in my analysis above, I'm not sure this series,
>> albeit small, is really worth it.
> 
> But aren't we hoping to enable x2AVIC by default sooner than later?

The idle halt intercept feature not only suppresses HLT exit when a V_INTR
event is pending during the execution of halt instruction, but it also
suppresses HLT exit when a V_NMI event is pending during the execution of
halt instruction. This capability will be advantageous in IBS virtualization
and PMC virtualization functionalities, as both rely on VNMI for delivering
virtualized interrupts from IBS and PMC hardware.

> 
>> As things stand, it would be more interesting to enable this for nested VMs,
>> especially Hyper-V which does use V_INTR and V_TPL; even better, _emulating_
>> it on older processors would reduce the L2->L0->L1->L0->L2 path to a
>> less-expensive L2->L0->L2 vmexit.

Sean Christopherson Aug. 13, 2024, 4:19 p.m. UTC | #4

On Tue, Jun 04, 2024, Manali Shukla wrote:
> On 5/28/2024 3:52 PM, Paolo Bonzini wrote:
> > Does this have an effect on the number of vmexits for KVM, unless AVIC
> > is enabled?

Ah, I suspect it will (as Manali's trace shows), because KVM will pend a V_INTR
(V_IRQ in KVM's world) in order to detect the interrupt window.  And while KVM
will still exit on the V_INTR, it'll avoid an exit on HLT.

Of course, we could (should?) address that in KVM by clearing the V_INTR (and its
intercept) when there are no pending, injectable IRQs at the end of
kvm_check_and_inject_events().  VMX would benefit from that change as well.

I think it's just this?  Because enabling an IRQ window for userspace happens
after this.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af6c8cf6a37a..373c850cc325 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10556,9 +10556,11 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
                                WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0);
                        }
                }
-               if (kvm_cpu_has_injectable_intr(vcpu))
-                       kvm_x86_call(enable_irq_window)(vcpu);
        }
+       if (kvm_cpu_has_injectable_intr(vcpu))
+               kvm_x86_call(enable_irq_window)(vcpu);
+       else
+               kvm_x86_call(disable_irq_window)(vcpu);

        if (is_guest_mode(vcpu) &&
            kvm_x86_ops.nested_ops->has_events &&

> Snippet of the Test case:
> +static void idle_hlt_test(void)
> +{
> +       x = 0;
> +       cli();
> +       apic_self_ipi(IPI_TEST_VECTOR);
> +       safe_halt();
> +       if (x != 1) printf("%d", x);
> +}

This isn't very representative of real world behavior.  In practice, the window
for a wake event to arrive between CLI and STI;HLT is quite small, i.e. having a
V_INTR (or V_NMI) pending when HLT is executed is fairly uncommon.

A more compelling benchmark would be something like a netperf latency test.

I honestly don't know how high of a bar we should set for this feature.  On one
hand, it's a tiny amount of enabling.  On the other hand, it would be extremely
unfortunate if this somehow caused latency/throughput regressions, which seems
highly improbably, but never say never...

Manali Shukla Oct. 22, 2024, 10:35 a.m. UTC | #5

Hi Sean,

Thank you for reviewing my patches. Sorry for the delay in response.

On 8/13/2024 9:49 PM, Sean Christopherson wrote:
> On Tue, Jun 04, 2024, Manali Shukla wrote:
>> On 5/28/2024 3:52 PM, Paolo Bonzini wrote:
>>> Does this have an effect on the number of vmexits for KVM, unless AVIC
>>> is enabled?
> 
> Ah, I suspect it will (as Manali's trace shows), because KVM will pend a V_INTR
> (V_IRQ in KVM's world) in order to detect the interrupt window.  And while KVM
> will still exit on the V_INTR, it'll avoid an exit on HLT.
> 
> Of course, we could (should?) address that in KVM by clearing the V_INTR (and its
> intercept) when there are no pending, injectable IRQs at the end of
> kvm_check_and_inject_events().  VMX would benefit from that change as well.
> 
> I think it's just this?  Because enabling an IRQ window for userspace happens
> after this.
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index af6c8cf6a37a..373c850cc325 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10556,9 +10556,11 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
>                                 WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0);
>                         }
>                 }
> -               if (kvm_cpu_has_injectable_intr(vcpu))
> -                       kvm_x86_call(enable_irq_window)(vcpu);
>         }
> +       if (kvm_cpu_has_injectable_intr(vcpu))
> +               kvm_x86_call(enable_irq_window)(vcpu);
> +       else
> +               kvm_x86_call(disable_irq_window)(vcpu);
>  
>         if (is_guest_mode(vcpu) &&
>             kvm_x86_ops.nested_ops->has_events &&
> 
> 

IIUC, this is already addressed in [2].

>> Snippet of the Test case:
>> +static void idle_hlt_test(void)
>> +{
>> +       x = 0;
>> +       cli();
>> +       apic_self_ipi(IPI_TEST_VECTOR);
>> +       safe_halt();
>> +       if (x != 1) printf("%d", x);
>> +}
> 
> This isn't very representative of real world behavior.  In practice, the window
> for a wake event to arrive between CLI and STI;HLT is quite small, i.e. having a
> V_INTR (or V_NMI) pending when HLT is executed is fairly uncommon.
> 
> A more compelling benchmark would be something like a netperf latency test.
> 
> I honestly don't know how high of a bar we should set for this feature.  On one
> hand, it's a tiny amount of enabling.  On the other hand, it would be extremely
> unfortunate if this somehow caused latency/throughput regressions, which seems
> highly improbably, but never say never...

I have added netperf data for normal guest and nested guest in V4 [1].

[1]: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/T/#m2e755334c327bb1b479fb65e293bfe3f476d2852

[2]: https://lore.kernel.org/all/20240802195120.325560-1-seanjc@google.com/

- Manali

[v3,0/5] Add support for the Idle HLT intercept feature

Message

Comments