Message ID | 20240528041926.3989-1-manali.shukla@amd.com |
---|---|
Headers | show |
Series | Add support for the Idle HLT intercept feature | expand |
On Tue, May 28, 2024, Paolo Bonzini wrote: > On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote: > > > > The upcoming new Idle HLT Intercept feature allows for the HLT > > instruction execution by a vCPU to be intercepted by the hypervisor > > only if there are no pending V_INTR and V_NMI events for the vCPU. > > When the vCPU is expected to service the pending V_INTR and V_NMI > > events, the Idle HLT intercept won’t trigger. The feature allows the > > hypervisor to determine if the vCPU is actually idle and reduces > > wasteful VMEXITs. > > Does this have an effect on the number of vmexits for KVM, unless AVIC > is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat > that shows an improvement? > > The reason I am wondering is because KVM does not really use V_INTR > injection. The "idle HLT" intercept basically differs from the basic > HLT trigger only in how it handles an STI;HLT sequence, as in that > case the interrupt can be injected directly and the HLT vmexit is > suppressed. But in that circumstance KVM would anyway use a V_INTR > intercept to detect the opening of the interrupt injection window (and > then the interrupt uses event injection rather than V_INTR). Again, > this is only true if AVIC is disabled, but that is the default. > > So unless I'm wrong in my analysis above, I'm not sure this series, > albeit small, is really worth it. But aren't we hoping to enable x2AVIC by default sooner than later? > As things stand, it would be more interesting to enable this for nested VMs, > especially Hyper-V which does use V_INTR and V_TPL; even better, _emulating_ > it on older processors would reduce the L2->L0->L1->L0->L2 path to a > less-expensive L2->L0->L2 vmexit.
Hi Paolo, Thank you for reviewing my patches. On 5/28/2024 3:52 PM, Paolo Bonzini wrote: > On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote: >> >> The upcoming new Idle HLT Intercept feature allows for the HLT >> instruction execution by a vCPU to be intercepted by the hypervisor >> only if there are no pending V_INTR and V_NMI events for the vCPU. >> When the vCPU is expected to service the pending V_INTR and V_NMI >> events, the Idle HLT intercept won’t trigger. The feature allows the >> hypervisor to determine if the vCPU is actually idle and reduces >> wasteful VMEXITs. > > Does this have an effect on the number of vmexits for KVM, unless AVIC > is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat > that shows an improvement? I have measured the total numbers of vmexits (using perf kvm stat report), while running the the test case I have written for the idle halt intercept in vmexit.flat. Without idle halt ------------------------------------------------------------------------------------------------- | Event name | Samples | Sample% | Time (ns) | Time% | ------------------------------------------------------------------------------------------------- | msr | 524213 | 49.00% | 592573933 | 64.00 % | ------------------------------------------------------------------------------------------------- | hlt | 262080 | 24.00% | 154429476 | 16.00% | ------------------------------------------------------------------------------------------------- | vintr | 262080 | 24.00% | 163724208 | 16.00% | ------------------------------------------------------------------------------------------------- With idle halt ----------------------------------------------------------------------------------------------- | Event name | Samples | Sample% | Time (ns) | Time% | ----------------------------------------------------------------------------------------------- | msr | 524213 | 66.00% | 502011088 | 75.00 % | ----------------------------------------------------------------------------------------------- | vintr | 262080 | 33.00% | 147515295 | 22.00% | ----------------------------------------------------------------------------------------------- | io | 1879 | 0.00% | 8784729 | 1.00% | ----------------------------------------------------------------------------------------------- Below is data for the average of 10 runs of idle halt test case in vmexit.flat ---------------------------------------------------------------------------------- | idle halt (on/off) | full test run | ipi test run | eoi test run | ---------------------------------------------------------------------------------- | on | 5048 .4 | 1289.2 | 1140.6 | ---------------------------------------------------------------------------------- | off | 4806.1 | 1318.6 | 1165.8 | ---------------------------------------------------------------------------------- The "ipi test run" when the idle halt is enabled, takes less time (~2.28% )to finish as compared to the "ipi test run" when the idle halt is disabled. The "eoi test run" when the idle halt is enabled, takes less time (~2.20% )to finish as compared to the "eoi test run" when the idle halt is disabled. The "full test run" when the idle halt is enabled, takes more time (~5.4%) as compared to the "full test run" when the idle halt is disabled. (Seems a bit odd, I have not yet begun to investigate this behavior) Snippet of the Test case: +static void idle_hlt_test(void) +{ + x = 0; + cli(); + apic_self_ipi(IPI_TEST_VECTOR); + safe_halt(); + if (x != 1) printf("%d", x); +} + > The reason I am wondering is because KVM does not really use V_INTR > injection. The "idle HLT" intercept basically differs from the basic > HLT trigger only in how it handles an STI;HLT sequence, as in that > case the interrupt can be injected directly and the HLT vmexit is > suppressed. But in that circumstance KVM would anyway use a V_INTR > intercept to detect the opening of the interrupt injection window (and > then the interrupt uses event injection rather than V_INTR). Again, > this is only true if AVIC is disabled, but that is the default. > I have taken traces to analyze it further. With idle halt enabled 220.696238: kvm_apic_ipi: dst b0 vec 176 (Fixed|physical|assert|edge|self) 220.696238: kvm_apic_accept_irq: apicid 0 vec 176 (Fixed|edge) 220.696238: kvm_apic: apic_write APIC_ICR = 0xb0000440b0 220.696238: kvm_msr: msr_write 830 = 0xb0000440b0 220.696238: kvm_entry: vcpu 0, rip 0x406a89 220.696239: kvm_exit: vcpu 0 reason vintr rip 0x4004ae info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 220.696239: kvm_inj_virq: IRQ 0xb0 220.696240: kvm_entry: vcpu 0, rip 0x4004ae 220.696240: kvm_exit: vcpu 0 reason msr rip 0x406a74 info1 0x0000000000000001 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 220.696240: kvm_apic: apic_write APIC_EOI = 0x0 220.696240: kvm_eoi: apicid 0 vector 176 Without idle halt enabled 6204.951631: kvm_apic_ipi: dst b0 vec 176 (Fixed|physical|assert|edge|self) 6204.951631: kvm_apic_accept_irq: apicid 0 vec 176 (Fixed|edge) 6204.951631: kvm_apic: apic_write APIC_ICR = 0xb0000440b0 6204.951631: kvm_msr: msr_write 830 = 0xb0000440b0 6204.951631: kvm_entry: vcpu 0, rip 0x406a89 6204.951632: kvm_exit: vcpu 0 reason hlt rip 0x4004ad info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 6204.951632: kvm_entry: vcpu 0, rip 0x4004ae 6204.951632: kvm_exit: vcpu 0 reason vintr rip 0x4004ae info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 6204.951632: kvm_inj_virq: IRQ 0xb0 6204.951632: kvm_entry: vcpu 0, rip 0x4004ae 6204.951633: kvm_exit: vcpu 0 reason msr rip 0x406a74 info1 0x0000000000000001 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000 6204.951633: kvm_apic: apic_write APIC_EOI = 0x0 6204.951633: kvm_eoi: apicid 0 vector 176
On 6/4/2024 6:17 AM, Sean Christopherson wrote: > On Tue, May 28, 2024, Paolo Bonzini wrote: >> On Tue, May 28, 2024 at 6:19 AM Manali Shukla <manali.shukla@amd.com> wrote: >>> >>> The upcoming new Idle HLT Intercept feature allows for the HLT >>> instruction execution by a vCPU to be intercepted by the hypervisor >>> only if there are no pending V_INTR and V_NMI events for the vCPU. >>> When the vCPU is expected to service the pending V_INTR and V_NMI >>> events, the Idle HLT intercept won’t trigger. The feature allows the >>> hypervisor to determine if the vCPU is actually idle and reduces >>> wasteful VMEXITs. >> >> Does this have an effect on the number of vmexits for KVM, unless AVIC >> is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat >> that shows an improvement? >> >> The reason I am wondering is because KVM does not really use V_INTR >> injection. The "idle HLT" intercept basically differs from the basic >> HLT trigger only in how it handles an STI;HLT sequence, as in that >> case the interrupt can be injected directly and the HLT vmexit is >> suppressed. But in that circumstance KVM would anyway use a V_INTR >> intercept to detect the opening of the interrupt injection window (and >> then the interrupt uses event injection rather than V_INTR). Again, >> this is only true if AVIC is disabled, but that is the default. >> >> So unless I'm wrong in my analysis above, I'm not sure this series, >> albeit small, is really worth it. > > But aren't we hoping to enable x2AVIC by default sooner than later? The idle halt intercept feature not only suppresses HLT exit when a V_INTR event is pending during the execution of halt instruction, but it also suppresses HLT exit when a V_NMI event is pending during the execution of halt instruction. This capability will be advantageous in IBS virtualization and PMC virtualization functionalities, as both rely on VNMI for delivering virtualized interrupts from IBS and PMC hardware. > >> As things stand, it would be more interesting to enable this for nested VMs, >> especially Hyper-V which does use V_INTR and V_TPL; even better, _emulating_ >> it on older processors would reduce the L2->L0->L1->L0->L2 path to a >> less-expensive L2->L0->L2 vmexit.
On Tue, Jun 04, 2024, Manali Shukla wrote: > On 5/28/2024 3:52 PM, Paolo Bonzini wrote: > > Does this have an effect on the number of vmexits for KVM, unless AVIC > > is enabled? Ah, I suspect it will (as Manali's trace shows), because KVM will pend a V_INTR (V_IRQ in KVM's world) in order to detect the interrupt window. And while KVM will still exit on the V_INTR, it'll avoid an exit on HLT. Of course, we could (should?) address that in KVM by clearing the V_INTR (and its intercept) when there are no pending, injectable IRQs at the end of kvm_check_and_inject_events(). VMX would benefit from that change as well. I think it's just this? Because enabling an IRQ window for userspace happens after this. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index af6c8cf6a37a..373c850cc325 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10556,9 +10556,11 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu, WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0); } } - if (kvm_cpu_has_injectable_intr(vcpu)) - kvm_x86_call(enable_irq_window)(vcpu); } + if (kvm_cpu_has_injectable_intr(vcpu)) + kvm_x86_call(enable_irq_window)(vcpu); + else + kvm_x86_call(disable_irq_window)(vcpu); if (is_guest_mode(vcpu) && kvm_x86_ops.nested_ops->has_events && > Snippet of the Test case: > +static void idle_hlt_test(void) > +{ > + x = 0; > + cli(); > + apic_self_ipi(IPI_TEST_VECTOR); > + safe_halt(); > + if (x != 1) printf("%d", x); > +} This isn't very representative of real world behavior. In practice, the window for a wake event to arrive between CLI and STI;HLT is quite small, i.e. having a V_INTR (or V_NMI) pending when HLT is executed is fairly uncommon. A more compelling benchmark would be something like a netperf latency test. I honestly don't know how high of a bar we should set for this feature. On one hand, it's a tiny amount of enabling. On the other hand, it would be extremely unfortunate if this somehow caused latency/throughput regressions, which seems highly improbably, but never say never...
Hi Sean, Thank you for reviewing my patches. Sorry for the delay in response. On 8/13/2024 9:49 PM, Sean Christopherson wrote: > On Tue, Jun 04, 2024, Manali Shukla wrote: >> On 5/28/2024 3:52 PM, Paolo Bonzini wrote: >>> Does this have an effect on the number of vmexits for KVM, unless AVIC >>> is enabled? > > Ah, I suspect it will (as Manali's trace shows), because KVM will pend a V_INTR > (V_IRQ in KVM's world) in order to detect the interrupt window. And while KVM > will still exit on the V_INTR, it'll avoid an exit on HLT. > > Of course, we could (should?) address that in KVM by clearing the V_INTR (and its > intercept) when there are no pending, injectable IRQs at the end of > kvm_check_and_inject_events(). VMX would benefit from that change as well. > > I think it's just this? Because enabling an IRQ window for userspace happens > after this. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index af6c8cf6a37a..373c850cc325 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10556,9 +10556,11 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu, > WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0); > } > } > - if (kvm_cpu_has_injectable_intr(vcpu)) > - kvm_x86_call(enable_irq_window)(vcpu); > } > + if (kvm_cpu_has_injectable_intr(vcpu)) > + kvm_x86_call(enable_irq_window)(vcpu); > + else > + kvm_x86_call(disable_irq_window)(vcpu); > > if (is_guest_mode(vcpu) && > kvm_x86_ops.nested_ops->has_events && > > IIUC, this is already addressed in [2]. >> Snippet of the Test case: >> +static void idle_hlt_test(void) >> +{ >> + x = 0; >> + cli(); >> + apic_self_ipi(IPI_TEST_VECTOR); >> + safe_halt(); >> + if (x != 1) printf("%d", x); >> +} > > This isn't very representative of real world behavior. In practice, the window > for a wake event to arrive between CLI and STI;HLT is quite small, i.e. having a > V_INTR (or V_NMI) pending when HLT is executed is fairly uncommon. > > A more compelling benchmark would be something like a netperf latency test. > > I honestly don't know how high of a bar we should set for this feature. On one > hand, it's a tiny amount of enabling. On the other hand, it would be extremely > unfortunate if this somehow caused latency/throughput regressions, which seems > highly improbably, but never say never... I have added netperf data for normal guest and nested guest in V4 [1]. [1]: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/T/#m2e755334c327bb1b479fb65e293bfe3f476d2852 [2]: https://lore.kernel.org/all/20240802195120.325560-1-seanjc@google.com/ - Manali