Message ID | 20181023181709.11883-2-julien.grall@arm.com |
---|---|
State | Accepted |
Commit | 177afec4556c676e5a1a958d1626226fbca2a696 |
Headers | show |
Series | xen/arm: GIC fixes and improvement | expand |
Hello Julien, This patch is very interesting to me. On 23.10.18 21:17, Julien Grall wrote: > Devices that expose their interrupt status registers via system > registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer, > vgic (although unused by Linux), ...) I guess under vgic you mean GICH registers, is it correct? I would speculate that GIC v2 registers are memory mapped, not exposed via system registers. > rely on a context synchronising > operation on the CPU to ensure that the updated status register is > visible to the CPU when handling the interrupt. This usually happens as > a result of taking the IRQ exception in the first place, but there are > two race scenarios where this isn't the case. > > For example, let's say we have two peripherals (X and Y), where Y uses a > system register for its interrupt status. > > Case 1: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. Y then raises its interrupt line, but the update to its system > register is not yet visible to the CPU > 3. The GIC decides to expose Y's interrupt number first in the Ack > register > 4. The CPU runs the IRQ handler for Y, but the status register is stale But this scenario somehow explains a strange thing I saw during IRQ latency investigation (optimization attempts) during last weeks. The strange sequence is as following: 1. CPU takes an IRQ exception from the guest context 2. In `enter_hypervisor_head()` function (i.e. in `gic_clear_lrs()` as for mainline) from some LR registers interrupt statuses are read as PENDING 3. Performing further code, without returning to the guest (i.e. inside vgic_vcpu_inject_irq()), it happens that we read status INVALID from the LR we read PENDING before, in step 2. Please note that we are tailoring xen based on RELEASE-4.10.0. > Case 2: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. CPU reads the interrupt number for X from the Ack register and runs > its IRQ handler > 3. Y raises its interrupt line and the Ack register is updated, but > again, the update to its system register is not yet visible to the > CPU. > 4. Since the GIC drivers poll the Ack register, we read Y's interrupt > number and run its handler without a context synchronisation > operation, therefore seeing the stale register value. > > In either case, we run the risk of missing an IRQ. This patch solves the > problem by ensuring that we execute an ISB in the GIC drivers prior > to invoking the interrupt handler. > > Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 > "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > --- > This patch is a candidate for backporting up to Xen 4.9. > --- > xen/arch/arm/gic.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 8d7e491060..305fbd66dd 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) > if ( likely(irq >= 16 && irq < 1020) ) > { > local_irq_enable(); > + isb(); Taking in account that the first GICH accesses are from `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest moving isb() there. > do_IRQ(regs, irq, is_fiq); > local_irq_disable(); > } > else if ( is_lpi(irq) ) > { > local_irq_enable(); > + isb(); > gic_hw_ops->do_LPI(irq); > local_irq_disable(); > } -- *Andrii Anisov*
On 10/24/18 10:38 AM, Andrii Anisov wrote: > Hello Julien, Hi Andrii, Thank you for the review. > On 23.10.18 21:17, Julien Grall wrote: >> Devices that expose their interrupt status registers via system >> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer, >> vgic (although unused by Linux), ...) > I guess under vgic you mean GICH registers, is it correct? > I would speculate that GIC v2 registers are memory mapped, not exposed > via system registers. vGIC is not only about GICv2. It also covers GICv3 that is accessible via system registers. > >> rely on a context synchronising >> operation on the CPU to ensure that the updated status register is >> visible to the CPU when handling the interrupt. This usually happens as >> a result of taking the IRQ exception in the first place, but there are >> two race scenarios where this isn't the case. >> >> For example, let's say we have two peripherals (X and Y), where Y uses a >> system register for its interrupt status. >> >> Case 1: >> 1. CPU takes an IRQ exception as a result of X raising an interrupt >> 2. Y then raises its interrupt line, but the update to its system >> register is not yet visible to the CPU >> 3. The GIC decides to expose Y's interrupt number first in the Ack >> register >> 4. The CPU runs the IRQ handler for Y, but the status register is stale > But this scenario somehow explains a strange thing I saw during IRQ > latency investigation (optimization attempts) during last weeks. > The strange sequence is as following: > 1. CPU takes an IRQ exception from the guest context > 2. In `enter_hypervisor_head()` function (i.e. in `gic_clear_lrs()` > as for mainline) from some LR registers interrupt statuses are read as > PENDING > 3. Performing further code, without returning to the guest (i.e. > inside vgic_vcpu_inject_irq()), it happens that we read status INVALID > from the LR we read PENDING before, in step 2. > > Please note that we are tailoring xen based on RELEASE-4.10.0. As you tailor Xen, I need more details on your modification to be able to provide feedback on the oddness you encounter. But you are using GICv2, right? If so, you are not using system registers for the vGIC. This is not covered by this patch. > >> Case 2: >> 1. CPU takes an IRQ exception as a result of X raising an interrupt >> 2. CPU reads the interrupt number for X from the Ack register and runs >> its IRQ handler >> 3. Y raises its interrupt line and the Ack register is updated, but >> again, the update to its system register is not yet visible to the >> CPU. >> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt >> number and run its handler without a context synchronisation >> operation, therefore seeing the stale register value. >> >> In either case, we run the risk of missing an IRQ. This patch solves the >> problem by ensuring that we execute an ISB in the GIC drivers prior >> to invoking the interrupt handler. >> >> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 >> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". >> >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> >> --- >> This patch is a candidate for backporting up to Xen 4.9. >> --- >> xen/arch/arm/gic.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >> index 8d7e491060..305fbd66dd 100644 >> --- a/xen/arch/arm/gic.c >> +++ b/xen/arch/arm/gic.c >> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) >> if ( likely(irq >= 16 && irq < 1020) ) >> { >> local_irq_enable(); >> + isb(); > Taking in account that the first GICH accesses are from > `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest > moving isb() there. > >> do_IRQ(regs, irq, is_fiq); >> local_irq_disable(); >> } >> else if ( is_lpi(irq) ) >> { >> local_irq_enable(); >> + isb(); >> gic_hw_ops->do_LPI(irq); >> local_irq_disable(); >> } > Cheers,
Hello Julien, On 24.10.18 17:41, Julien Grall wrote: > vGIC is not only about GICv2. It also covers GICv3 that is accessible > via system registers. Yes, I know. But, as you state below, for GICv2 based systems those barriers are not needed. >>> rely on a context synchronising >>> operation on the CPU to ensure that the updated status register is >>> visible to the CPU when handling the interrupt. This usually happens as >>> a result of taking the IRQ exception in the first place, but there are >>> two race scenarios where this isn't the case. >>> >>> For example, let's say we have two peripherals (X and Y), where Y >>> uses a >>> system register for its interrupt status. >>> >>> Case 1: >>> 1. CPU takes an IRQ exception as a result of X raising an interrupt >>> 2. Y then raises its interrupt line, but the update to its system >>> register is not yet visible to the CPU >>> 3. The GIC decides to expose Y's interrupt number first in the Ack >>> register >>> 4. The CPU runs the IRQ handler for Y, but the status register is stale >> But this scenario somehow explains a strange thing I saw during IRQ >> latency investigation (optimization attempts) during last weeks. >> The strange sequence is as following: >> 1. CPU takes an IRQ exception from the guest context >> 2. In `enter_hypervisor_head()` function (i.e. in >> `gic_clear_lrs()` >> as for mainline) from some LR registers interrupt statuses are read as >> PENDING >> 3. Performing further code, without returning to the guest (i.e. >> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID >> from the LR we read PENDING before, in step 2. > >> Please note that we are tailoring xen based on RELEASE-4.10.0. > > As you tailor Xen, I need more details on your modification to be able > to provide feedback on the oddness you encounter. I guess I should make a dedicated patch applicable to mainline to reveal the issue. I hope I'll do this nearest days. > But you are using GICv2, right? If so, you are not using system > registers for the vGIC. This is not covered by this patch. Yep, our SoC has GICv2. >>> Case 2: >>> 1. CPU takes an IRQ exception as a result of X raising an interrupt >>> 2. CPU reads the interrupt number for X from the Ack register and runs >>> its IRQ handler >>> 3. Y raises its interrupt line and the Ack register is updated, but >>> again, the update to its system register is not yet visible to the >>> CPU. >>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt >>> number and run its handler without a context synchronisation >>> operation, therefore seeing the stale register value. >>> >>> In either case, we run the risk of missing an IRQ. This patch solves >>> the >>> problem by ensuring that we execute an ISB in the GIC drivers prior >>> to invoking the interrupt handler. >>> >>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 >>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". >>> >>> Signed-off-by: Julien Grall <julien.grall@arm.com> >>> >>> --- >>> This patch is a candidate for backporting up to Xen 4.9. >>> --- >>> xen/arch/arm/gic.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >>> index 8d7e491060..305fbd66dd 100644 >>> --- a/xen/arch/arm/gic.c >>> +++ b/xen/arch/arm/gic.c >>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, >>> int is_fiq) >>> if ( likely(irq >= 16 && irq < 1020) ) >>> { >>> local_irq_enable(); >>> + isb(); >> Taking in account that the first GICH accesses are from >> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest >> moving isb() there. Any comments here? >> >>> do_IRQ(regs, irq, is_fiq); >>> local_irq_disable(); >>> } >>> else if ( is_lpi(irq) ) >>> { >>> local_irq_enable(); >>> + isb(); >>> gic_hw_ops->do_LPI(irq); >>> local_irq_disable(); >>> } >> > > Cheers, > -- *Andrii Anisov* <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style> </head> <body dir="ltr"> <div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr"> <font size="2"><span style="font-size:11pt"> <div class="PlainText">Hello Julien,<br> <br> <br> On 24.10.18 17:41, Julien Grall wrote:<br> > vGIC is not only about GICv2. It also covers GICv3 that is accessible <br> > via system registers.<br> Yes, I know. But, as you state below, for GICv2 based systems those <br> barriers are not needed.<br> <br> >>> rely on a context synchronising<br> >>> operation on the CPU to ensure that the updated status register is<br> >>> visible to the CPU when handling the interrupt. This usually happens as<br> >>> a result of taking the IRQ exception in the first place, but there are<br> >>> two race scenarios where this isn't the case.<br> >>><br> >>> For example, let's say we have two peripherals (X and Y), where Y <br> >>> uses a<br> >>> system register for its interrupt status.<br> >>><br> >>> Case 1:<br> >>> 1. CPU takes an IRQ exception as a result of X raising an interrupt<br> >>> 2. Y then raises its interrupt line, but the update to its system<br> >>> register is not yet visible to the CPU<br> >>> 3. The GIC decides to expose Y's interrupt number first in the Ack<br> >>> register<br> >>> 4. The CPU runs the IRQ handler for Y, but the status register is stale<br> >> But this scenario somehow explains a strange thing I saw during IRQ<br> >> latency investigation (optimization attempts) during last weeks.<br> >> The strange sequence is as following:<br> >> 1. CPU takes an IRQ exception from the guest context<br> >> 2. In `enter_hypervisor_head()` function (i.e. in <br> >> `gic_clear_lrs()`<br> >> as for mainline) from some LR registers interrupt statuses are read as<br> >> PENDING<br> >> 3. Performing further code, without returning to the guest (i.e.<br> >> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID<br> >> from the LR we read PENDING before, in step 2. ><br> >> Please note that we are tailoring xen based on RELEASE-4.10.0.<br> ><br> > As you tailor Xen, I need more details on your modification to be able <br> > to provide feedback on the oddness you encounter.<br> I guess I should make a dedicated patch applicable to mainline to reveal <br> the issue. I hope I'll do this nearest days.<br> <br> > But you are using GICv2, right? If so, you are not using system <br> > registers for the vGIC. This is not covered by this patch.<br> Yep, our SoC has GICv2.<br> <br> >>> Case 2:<br> >>> 1. CPU takes an IRQ exception as a result of X raising an interrupt<br> >>> 2. CPU reads the interrupt number for X from the Ack register and runs<br> >>> its IRQ handler<br> >>> 3. Y raises its interrupt line and the Ack register is updated, but<br> >>> again, the update to its system register is not yet visible to the<br> >>> CPU.<br> >>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt<br> >>> number and run its handler without a context synchronisation<br> >>> operation, therefore seeing the stale register value.<br> >>><br> >>> In either case, we run the risk of missing an IRQ. This patch solves <br> >>> the<br> >>> problem by ensuring that we execute an ISB in the GIC drivers prior<br> >>> to invoking the interrupt handler.<br> >>><br> >>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206<br> >>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".<br> >>><br> >>> Signed-off-by: Julien Grall <julien.grall@arm.com><br> >>><br> >>> ---<br> >>> This patch is a candidate for backporting up to Xen 4.9.<br> >>> ---<br> >>> xen/arch/arm/gic.c | 2 ++<br> >>> 1 file changed, 2 insertions(+)<br> >>><br> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c<br> >>> index 8d7e491060..305fbd66dd 100644<br> >>> --- a/xen/arch/arm/gic.c<br> >>> +++ b/xen/arch/arm/gic.c<br> >>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, <br> >>> int is_fiq)<br> >>> if ( likely(irq >= 16 && irq < 1020) )<br> >>> {<br> >>> local_irq_enable();<br> >>> + isb();<br> >> Taking in account that the first GICH accesses are from<br> >> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest<br> >> moving isb() there.<br> Any comments here?<br> <br> >><br> >>> do_IRQ(regs, irq, is_fiq);<br> >>> local_irq_disable();<br> >>> }<br> >>> else if ( is_lpi(irq) )<br> >>> {<br> >>> local_irq_enable();<br> >>> + isb();<br> >>> gic_hw_ops->do_LPI(irq);<br> >>> local_irq_disable();<br> >>> }<br> >><br> ><br> > Cheers,<br> ><br> <br> -- <br> <br> *Andrii Anisov*<br> <br> <br> </div> </span></font></div> </body> </html>
On 10/25/18 3:11 PM, Andrii Anisov wrote: > Hello Julien, Hi, > On 24.10.18 17:41, Julien Grall wrote: >> vGIC is not only about GICv2. It also covers GICv3 that is accessible >> via system registers. > Yes, I know. But, as you state below, for GICv2 based systems those > barriers are not needed. You misread what I wrote. They are not needed to cover the vGIC for GICv2 platform. However they are still necessary, for instance, for the timer as it is accessible via system registers. > >>>> rely on a context synchronising >>>> operation on the CPU to ensure that the updated status register is >>>> visible to the CPU when handling the interrupt. This usually happens as >>>> a result of taking the IRQ exception in the first place, but there are >>>> two race scenarios where this isn't the case. >>>> >>>> For example, let's say we have two peripherals (X and Y), where Y >>>> uses a >>>> system register for its interrupt status. >>>> >>>> Case 1: >>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt >>>> 2. Y then raises its interrupt line, but the update to its system >>>> register is not yet visible to the CPU >>>> 3. The GIC decides to expose Y's interrupt number first in the Ack >>>> register >>>> 4. The CPU runs the IRQ handler for Y, but the status register is stale >>> But this scenario somehow explains a strange thing I saw during IRQ >>> latency investigation (optimization attempts) during last weeks. >>> The strange sequence is as following: >>> 1. CPU takes an IRQ exception from the guest context >>> 2. In `enter_hypervisor_head()` function (i.e. in >>> `gic_clear_lrs()` >>> as for mainline) from some LR registers interrupt statuses are read as >>> PENDING >>> 3. Performing further code, without returning to the guest (i.e. >>> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID >>> from the LR we read PENDING before, in step 2. > >>> Please note that we are tailoring xen based on RELEASE-4.10.0. >> >> As you tailor Xen, I need more details on your modification to be able >> to provide feedback on the oddness you encounter. > I guess I should make a dedicated patch applicable to mainline to reveal > the issue. I hope I'll do this nearest days. > >> But you are using GICv2, right? If so, you are not using system >> registers for the vGIC. This is not covered by this patch. > Yep, our SoC has GICv2. > >>>> Case 2: >>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt >>>> 2. CPU reads the interrupt number for X from the Ack register and runs >>>> its IRQ handler >>>> 3. Y raises its interrupt line and the Ack register is updated, but >>>> again, the update to its system register is not yet visible to the >>>> CPU. >>>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt >>>> number and run its handler without a context synchronisation >>>> operation, therefore seeing the stale register value. >>>> >>>> In either case, we run the risk of missing an IRQ. This patch solves >>>> the >>>> problem by ensuring that we execute an ISB in the GIC drivers prior >>>> to invoking the interrupt handler. >>>> >>>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 >>>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". >>>> >>>> Signed-off-by: Julien Grall <julien.grall@arm.com> >>>> >>>> --- >>>> This patch is a candidate for backporting up to Xen 4.9. >>>> --- >>>> xen/arch/arm/gic.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >>>> index 8d7e491060..305fbd66dd 100644 >>>> --- a/xen/arch/arm/gic.c >>>> +++ b/xen/arch/arm/gic.c >>>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, >>>> int is_fiq) >>>> if ( likely(irq >= 16 && irq < 1020) ) >>>> { >>>> local_irq_enable(); >>>> + isb(); >>> Taking in account that the first GICH accesses are from >>> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest >>> moving isb() there. > Any comments here? I haven't answered to this bit because I thought this was related to your issue. If this is not related to your issue, then the placement would be wrong. Here the isb() sits between ack() and do_IRQ(). This ensure the system registers are synchronized after you acknowledge an interrupt. Cheers,
On 25.10.18 17:22, Julien Grall wrote: > You misread what I wrote. They are not needed to cover the vGIC for > GICv2 platform. However they are still necessary, for instance, for > the timer as it is accessible via system registers. Ah, OK. > Here the isb() sits between ack() and do_IRQ(). > > This ensure the system registers are synchronized after you > acknowledge an interrupt. I see.
On 23.10.18 21:17, Julien Grall wrote: > Devices that expose their interrupt status registers via system > registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer, > vgic (although unused by Linux), ...) rely on a context synchronising > operation on the CPU to ensure that the updated status register is > visible to the CPU when handling the interrupt. This usually happens as > a result of taking the IRQ exception in the first place, but there are > two race scenarios where this isn't the case. > > For example, let's say we have two peripherals (X and Y), where Y uses a > system register for its interrupt status. > > Case 1: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. Y then raises its interrupt line, but the update to its system > register is not yet visible to the CPU > 3. The GIC decides to expose Y's interrupt number first in the Ack > register > 4. The CPU runs the IRQ handler for Y, but the status register is stale > > Case 2: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. CPU reads the interrupt number for X from the Ack register and runs > its IRQ handler > 3. Y raises its interrupt line and the Ack register is updated, but > again, the update to its system register is not yet visible to the > CPU. > 4. Since the GIC drivers poll the Ack register, we read Y's interrupt > number and run its handler without a context synchronisation > operation, therefore seeing the stale register value. > > In either case, we run the risk of missing an IRQ. This patch solves the > problem by ensuring that we execute an ISB in the GIC drivers prior > to invoking the interrupt handler. > > Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 > "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > --- Reviewed-by: Andrii Anisov<andrii_anisov@epam.com>
Hello Julien On 25.10.18 17:11, Andrii Anisov wrote: > I guess I should make a dedicated patch applicable to mainline to reveal > the issue. I hope I'll do this nearest days. Please find below the diff applicable to the current xenbits/smoke which exposes the issue. With that diff I see (on my board) outputs like: (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97 (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 (XEN) Stored LR is stale: stored 0x1A00001B read 0xA00001B (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97 Those prints I treat as LR content change (state PENDING->INVALID) without exiting from hyp to guest. Unfortunately, I did not find a kind of explanation to this effect in ARM IHI 0048B.b document. I have GICv2 on the die. I appreciate you find some time to look at this and express your opinion. --- diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index 511c8d7..0b9aa2d 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask) return mask; } +static void gicv2_store_lrs(void) +{ + int i; + + for ( i = 0; i < gicv2_info.nr_lrs; i++ ) + current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4); +} + static void gicv2_save_state(struct vcpu *v) { int i; @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct pending_irq *p, << GICH_V2_LR_PHYSICAL_SHIFT); writel_gich(lr_reg, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = lr_reg; } static void gicv2_clear_lr(int lr) { writel_gich(0, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = 0; } static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) uint32_t lrv; lrv = readl_gich(GICH_LR + lr * 4); + if ( lrv != current->arch.gic.v2.lr[lr]) + printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n", + current->arch.gic.v2.lr[lr], lrv); lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & GICH_V2_LR_PHYSICAL_MASK; lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & GICH_V2_LR_VIRTUAL_MASK; lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & GICH_V2_LR_PRIORITY_MASK; @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg) ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << GICH_V2_LR_GRP_SHIFT) ); writel_gich(lrv, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = lrv; } static void gicv2_hcr_status(uint32_t flag, bool status) @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = { .info = &gicv2_info, .init = gicv2_init, .secondary_init = gicv2_secondary_cpu_init, + .store_lrs = gicv2_store_lrs, .save_state = gicv2_save_state, .restore_state = gicv2_restore_state, .dump_state = gicv2_dump_state, diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index ed363f6..a05c518 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i) } } +void gic_store_lrs(void) +{ + if(gic_hw_ops->store_lrs) + gic_hw_ops->store_lrs(); +} + void gic_clear_lrs(struct vcpu *v) { int i = 0; diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs) if ( current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); + gic_store_lrs(); gic_clear_lrs(current); } } diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h index d3d7bda..6fe3fdb 100644 --- a/xen/include/asm-arm/gic.h +++ b/xen/include/asm-arm/gic.h @@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void); /* IRQ translation function for the device tree */ int gic_irq_xlate(const u32 *intspec, unsigned int intsize, unsigned int *out_hwirq, unsigned int *out_type); +void gic_store_lrs(void); void gic_clear_lrs(struct vcpu *v); struct gic_info { @@ -314,6 +315,8 @@ struct gic_hw_operations { /* Initialize the GIC and the boot CPU */ int (*init)(void); /* Save GIC registers */ + void (*store_lrs)(void); + /* Save GIC registers */ void (*save_state)(struct vcpu *); /* Restore GIC registers */ void (*restore_state)(const struct vcpu *);
On 10/26/18 1:49 PM, Andrii Anisov wrote: > Hello Julien Hi, > On 25.10.18 17:11, Andrii Anisov wrote: >> I guess I should make a dedicated patch applicable to mainline to reveal >> the issue. I hope I'll do this nearest days. > > Please find below the diff applicable to the current xenbits/smoke which > exposes the issue. The functions below does not exist in latest Xen. So are you sure this based on mainline? > > With that diff I see (on my board) outputs like: > > > (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97 > (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 > (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 > (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 > (XEN) Stored LR is stale: stored 0x1A00001B read 0xA00001B > (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97 > (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97 How many domain do you have run at that time? Under which condition this is hapenning? Also, what is your Xen command line? I assume you didn't disable serrors, right? So we should have plenty of dsb(sy) and isb() in the path to there. So I would rule out a missing barrier here. > > > Those prints I treat as LR content change (state PENDING->INVALID) > without exiting from hyp to guest. Unfortunately, I did not find a kind > of explanation to this effect in ARM IHI 0048B.b document. > I have GICv2 on the die. > > I appreciate you find some time to look at this and express your opinion. > > > --- > > diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c > index 511c8d7..0b9aa2d 100644 > --- a/xen/arch/arm/gic-v2.c > +++ b/xen/arch/arm/gic-v2.c > @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t > *cpumask) > return mask; > } > > +static void gicv2_store_lrs(void) > +{ > + int i; > + > + for ( i = 0; i < gicv2_info.nr_lrs; i++ ) > + current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4); > +} > + > static void gicv2_save_state(struct vcpu *v) > { > int i; > @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct > pending_irq *p, > << GICH_V2_LR_PHYSICAL_SHIFT); > > writel_gich(lr_reg, GICH_LR + lr * 4); > + current->arch.gic.v2.lr[lr] = lr_reg; > } > > static void gicv2_clear_lr(int lr) > { > writel_gich(0, GICH_LR + lr * 4); > + current->arch.gic.v2.lr[lr] = 0; > } > > static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) > @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) > uint32_t lrv; > > lrv = readl_gich(GICH_LR + lr * 4); > + if ( lrv != current->arch.gic.v2.lr[lr]) > + printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n", > + current->arch.gic.v2.lr[lr], lrv); > lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & > GICH_V2_LR_PHYSICAL_MASK; > lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & > GICH_V2_LR_VIRTUAL_MASK; > lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & > GICH_V2_LR_PRIORITY_MASK; > @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct > gic_lr *lr_reg) > ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << > GICH_V2_LR_GRP_SHIFT) ); > > writel_gich(lrv, GICH_LR + lr * 4); > + current->arch.gic.v2.lr[lr] = lrv; > } > > static void gicv2_hcr_status(uint32_t flag, bool status) > @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = { > .info = &gicv2_info, > .init = gicv2_init, > .secondary_init = gicv2_secondary_cpu_init, > + .store_lrs = gicv2_store_lrs, > .save_state = gicv2_save_state, > .restore_state = gicv2_restore_state, > .dump_state = gicv2_dump_state, > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index ed363f6..a05c518 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i) > } > } > > +void gic_store_lrs(void) > +{ > + if(gic_hw_ops->store_lrs) > + gic_hw_ops->store_lrs(); > +} > + > void gic_clear_lrs(struct vcpu *v) > { > int i = 0; > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index f6f6de3..985192b 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct > cpu_user_regs *regs) > if ( current->arch.hcr_el2 & HCR_VA ) > current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); > > + gic_store_lrs(); gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are not due by this function, can you move that after gic_clear_lrs()? > gic_clear_lrs(current); > } > } > diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h > index d3d7bda..6fe3fdb 100644 > --- a/xen/include/asm-arm/gic.h > +++ b/xen/include/asm-arm/gic.h > @@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void); > /* IRQ translation function for the device tree */ > int gic_irq_xlate(const u32 *intspec, unsigned int intsize, > unsigned int *out_hwirq, unsigned int *out_type); > +void gic_store_lrs(void); > void gic_clear_lrs(struct vcpu *v); > > struct gic_info { > @@ -314,6 +315,8 @@ struct gic_hw_operations { > /* Initialize the GIC and the boot CPU */ > int (*init)(void); > /* Save GIC registers */ > + void (*store_lrs)(void); > + /* Save GIC registers */ > void (*save_state)(struct vcpu *); > /* Restore GIC registers */ > void (*restore_state)(const struct vcpu *); Cheers,
Hello Julien, On 10/27/2018 12:55 AM, Julien Grall wrote: The functions below does not exist in latest Xen. So are you sure this based on mainline? Yep, I've put a wrong diff into the email, it is for XEN 4.10. Please find the patch for XEN 4.12-unstable on my github [1]. How many domain do you have run at that time? Only one domain. In my setup I'm running a BSP release under XEN as Dom0. Under which condition this is hapenning? Under no specific conditions. During the system boot, during console operations, during glmark2 [2] running. Also, what is your Xen command line? `dom0_mem=1G console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 loglvl=all cpufreq=none` I assume you didn't disable serrors, right? So we should have plenty of dsb(sy) and isb() in the path to there. I didn't disable serrors. So I would rule out a missing barrier here. Those prints I treat as LR content change (state PENDING->INVALID) without exiting from hyp to guest. Unfortunately, I did not find a kind of explanation to this effect in ARM IHI 0048B.b document. I have GICv2 on the die. I appreciate you find some time to look at this and express your opinion. --- diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index 511c8d7..0b9aa2d 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask) return mask; } +static void gicv2_store_lrs(void) +{ + int i; + + for ( i = 0; i < gicv2_info.nr_lrs; i++ ) + current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4); +} + static void gicv2_save_state(struct vcpu *v) { int i; @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct pending_irq *p, << GICH_V2_LR_PHYSICAL_SHIFT); writel_gich(lr_reg, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = lr_reg; } static void gicv2_clear_lr(int lr) { writel_gich(0, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = 0; } static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) uint32_t lrv; lrv = readl_gich(GICH_LR + lr * 4); + if ( lrv != current->arch.gic.v2.lr[lr]) + printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n", + current->arch.gic.v2.lr[lr], lrv); lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & GICH_V2_LR_PHYSICAL_MASK; lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & GICH_V2_LR_VIRTUAL_MASK; lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & GICH_V2_LR_PRIORITY_MASK; @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg) ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << GICH_V2_LR_GRP_SHIFT) ); writel_gich(lrv, GICH_LR + lr * 4); + current->arch.gic.v2.lr[lr] = lrv; } static void gicv2_hcr_status(uint32_t flag, bool status) @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = { .info = &gicv2_info, .init = gicv2_init, .secondary_init = gicv2_secondary_cpu_init, + .store_lrs = gicv2_store_lrs, .save_state = gicv2_save_state, .restore_state = gicv2_restore_state, .dump_state = gicv2_dump_state, diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index ed363f6..a05c518 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i) } } +void gic_store_lrs(void) +{ + if(gic_hw_ops->store_lrs) + gic_hw_ops->store_lrs(); +} + void gic_clear_lrs(struct vcpu *v) { int i = 0; diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs) if ( current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); + gic_store_lrs(); gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are not due by this function, can you move that after gic_clear_lrs()? Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it does not change PENDING to INVALID under no circumstances from one hand. From another hand, all changes to LRs are made through gic specific operations gic_hw_ops->... which are tracked by me. You can see, in the code above, that I copy all updates to the physical LR issued by hypervisor into the stored LRs. So it not the case. But I'll check on Monday. gic_clear_lrs(current); } } diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h index d3d7bda..6fe3fdb 100644 --- a/xen/include/asm-arm/gic.h +++ b/xen/include/asm-arm/gic.h @@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void); /* IRQ translation function for the device tree */ int gic_irq_xlate(const u32 *intspec, unsigned int intsize, unsigned int *out_hwirq, unsigned int *out_type); +void gic_store_lrs(void); void gic_clear_lrs(struct vcpu *v); struct gic_info { @@ -314,6 +315,8 @@ struct gic_hw_operations { /* Initialize the GIC and the boot CPU */ int (*init)(void); /* Save GIC registers */ + void (*store_lrs)(void); + /* Save GIC registers */ void (*save_state)(struct vcpu *); /* Restore GIC registers */ void (*restore_state)(const struct vcpu *); Cheers, [1] https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c [2] https://github.com/glmark2/glmark2 <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#FFFFFF"> <p>Hello Julien,<br> </p> <br> <div class="moz-cite-prefix">On 10/27/2018 12:55 AM, Julien Grall wrote:<br> </div> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">The functions below does not exist in latest Xen. So are you sure this based on mainline? <br> </blockquote> <br> Yep, I've put a wrong diff into the email, it is for XEN 4.10.<br> Please find the patch for XEN 4.12-unstable on my github [1].<br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">How many domain do you have run at that time?</blockquote> Only one domain. In my setup I'm running a BSP release under XEN as Dom0.<br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">Under which condition this is hapenning? <br> </blockquote> Under no specific conditions. During the system boot, during console operations, during glmark2 [2] running.<br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">Also, what is your Xen command line?</blockquote> <span style="color: rgb(0, 0, 0); font-family: Arial; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">`dom0_mem=1G console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 loglvl=all cpufreq=none`</span><br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">I assume you didn't disable serrors, right? So we should have plenty of dsb(sy) and isb() in the path to there.<br> </blockquote> I didn't disable serrors. <br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">So I would rule out a missing barrier here. <br> </blockquote> <br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com"><br> <blockquote type="cite"><br> <br> Those prints I treat as LR content change (state PENDING->INVALID) <br> without exiting from hyp to guest. Unfortunately, I did not find a kind <br> of explanation to this effect in ARM IHI 0048B.b document. <br> I have GICv2 on the die. <br> <br> I appreciate you find some time to look at this and express your opinion. <br> <br> <br> --- <br> <br> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c <br> index 511c8d7..0b9aa2d 100644 <br> --- a/xen/arch/arm/gic-v2.c <br> +++ b/xen/arch/arm/gic-v2.c <br> @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t <br> *cpumask) <br> return mask; <br> } <br> <br> +static void gicv2_store_lrs(void) <br> +{ <br> + int i; <br> + <br> + for ( i = 0; i < gicv2_info.nr_lrs; i++ ) <br> + current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4); <br> +} <br> + <br> static void gicv2_save_state(struct vcpu *v) <br> { <br> int i; <br> @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct <br> pending_irq *p, <br> << GICH_V2_LR_PHYSICAL_SHIFT); <br> <br> writel_gich(lr_reg, GICH_LR + lr * 4); <br> + current->arch.gic.v2.lr[lr] = lr_reg; <br> } <br> <br> static void gicv2_clear_lr(int lr) <br> { <br> writel_gich(0, GICH_LR + lr * 4); <br> + current->arch.gic.v2.lr[lr] = 0; <br> } <br> <br> static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) <br> @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) <br> uint32_t lrv; <br> <br> lrv = readl_gich(GICH_LR + lr * 4); <br> + if ( lrv != current->arch.gic.v2.lr[lr]) <br> + printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n", <br> + current->arch.gic.v2.lr[lr], lrv); <br> lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & <br> GICH_V2_LR_PHYSICAL_MASK; <br> lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & <br> GICH_V2_LR_VIRTUAL_MASK; <br> lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & <br> GICH_V2_LR_PRIORITY_MASK; <br> @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct <br> gic_lr *lr_reg) <br> ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << <br> GICH_V2_LR_GRP_SHIFT) ); <br> <br> writel_gich(lrv, GICH_LR + lr * 4); <br> + current->arch.gic.v2.lr[lr] = lrv; <br> } <br> <br> static void gicv2_hcr_status(uint32_t flag, bool status) <br> @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = { <br> .info = &gicv2_info, <br> .init = gicv2_init, <br> .secondary_init = gicv2_secondary_cpu_init, <br> + .store_lrs = gicv2_store_lrs, <br> .save_state = gicv2_save_state, <br> .restore_state = gicv2_restore_state, <br> .dump_state = gicv2_dump_state, <br> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c <br> index ed363f6..a05c518 100644 <br> --- a/xen/arch/arm/gic.c <br> +++ b/xen/arch/arm/gic.c <br> @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i) <br> } <br> } <br> <br> +void gic_store_lrs(void) <br> +{ <br> + if(gic_hw_ops->store_lrs) <br> + gic_hw_ops->store_lrs(); <br> +} <br> + <br> void gic_clear_lrs(struct vcpu *v) <br> { <br> int i = 0; <br> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c <br> index f6f6de3..985192b 100644 <br> --- a/xen/arch/arm/traps.c <br> +++ b/xen/arch/arm/traps.c <br> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct <br> cpu_user_regs *regs) <br> if ( current->arch.hcr_el2 & HCR_VA ) <br> current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); <br> <br> + gic_store_lrs(); <br> </blockquote> <br> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are not due by this function, can you move that after gic_clear_lrs()? <br> </blockquote> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it does not change PENDING to INVALID under no circumstances from one hand. From another hand, all changes to LRs are made through gic specific operations gic_hw_ops->... which are tracked by me. You can see, in the code above, that I copy all updates to the physical LR issued by hypervisor into the stored LRs. So it not the case. But I'll check on Monday.<br> <br> <blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com"><br> <blockquote type="cite"> gic_clear_lrs(current); <br> } <br> } <br> diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h <br> index d3d7bda..6fe3fdb 100644 <br> --- a/xen/include/asm-arm/gic.h <br> +++ b/xen/include/asm-arm/gic.h <br> @@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void); <br> /* IRQ translation function for the device tree */ <br> int gic_irq_xlate(const u32 *intspec, unsigned int intsize, <br> unsigned int *out_hwirq, unsigned int *out_type); <br> +void gic_store_lrs(void); <br> void gic_clear_lrs(struct vcpu *v); <br> <br> struct gic_info { <br> @@ -314,6 +315,8 @@ struct gic_hw_operations { <br> /* Initialize the GIC and the boot CPU */ <br> int (*init)(void); <br> /* Save GIC registers */ <br> + void (*store_lrs)(void); <br> + /* Save GIC registers */ <br> void (*save_state)(struct vcpu *); <br> /* Restore GIC registers */ <br> void (*restore_state)(const struct vcpu *); <br> </blockquote> <br> Cheers, <br> <br> </blockquote> <br> [1] <a class="moz-txt-link-freetext" href="https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c"> https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c</a><br> [2] <a class="moz-txt-link-freetext" href="https://github.com/glmark2/glmark2">https://github.com/glmark2/glmark2</a><br> <br> </body> </html>
On 10/27/18 1:14 PM, Andrii Anisov wrote: > Hello Julien, > > > On 10/27/2018 12:55 AM, Julien Grall wrote: >> The functions below does not exist in latest Xen. So are you sure this >> based on mainline? > > Yep, I've put a wrong diff into the email, it is for XEN 4.10. > Please find the patch for XEN 4.12-unstable on my github [1]. > >> How many domain do you have run at that time? > Only one domain. In my setup I'm running a BSP release under XEN as Dom0. > >> Under which condition this is hapenning? > Under no specific conditions. During the system boot, during console > operations, during glmark2 [2] running. So the glmark2 is running in Dom0, am I correct? > >> Also, what is your Xen command line? > `dom0_mem=1G console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 > loglvl=all cpufreq=none` > >> I assume you didn't disable serrors, right? So we should have plenty >> of dsb(sy) and isb() in the path to there. > I didn't disable serrors. > >> So I would rule out a missing barrier here. > > >> >>> >>> >>> Those prints I treat as LR content change (state PENDING->INVALID) >>> without exiting from hyp to guest. Unfortunately, I did not find a kind >>> of explanation to this effect in ARM IHI 0048B.b document. >>> I have GICv2 on the die. >>> >>> I appreciate you find some time to look at this and express your >>> opinion. >>> >>> >>> --- >>> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c >>> index 511c8d7..0b9aa2d 100644 >>> --- a/xen/arch/arm/gic-v2.c >>> +++ b/xen/arch/arm/gic-v2.c >>> @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t >>> *cpumask) >>> return mask; >>> } >>> >>> +static void gicv2_store_lrs(void) >>> +{ >>> + int i; >>> + >>> + for ( i = 0; i < gicv2_info.nr_lrs; i++ ) >>> + current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4); >>> +} >>> + >>> static void gicv2_save_state(struct vcpu *v) >>> { >>> int i; >>> @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct >>> pending_irq *p, >>> << GICH_V2_LR_PHYSICAL_SHIFT); >>> >>> writel_gich(lr_reg, GICH_LR + lr * 4); >>> + current->arch.gic.v2.lr[lr] = lr_reg; >>> } >>> >>> static void gicv2_clear_lr(int lr) >>> { >>> writel_gich(0, GICH_LR + lr * 4); >>> + current->arch.gic.v2.lr[lr] = 0; >>> } >>> >>> static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) >>> @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr >>> *lr_reg) >>> uint32_t lrv; >>> >>> lrv = readl_gich(GICH_LR + lr * 4); >>> + if ( lrv != current->arch.gic.v2.lr[lr]) >>> + printk("Stored LR is stale: stored 0x%"PRIX32" read >>> 0x%"PRIX32"\n", >>> + current->arch.gic.v2.lr[lr], lrv); >>> lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & >>> GICH_V2_LR_PHYSICAL_MASK; >>> lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & >>> GICH_V2_LR_VIRTUAL_MASK; >>> lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & >>> GICH_V2_LR_PRIORITY_MASK; >>> @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct >>> gic_lr *lr_reg) >>> ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << >>> GICH_V2_LR_GRP_SHIFT) ); >>> >>> writel_gich(lrv, GICH_LR + lr * 4); >>> + current->arch.gic.v2.lr[lr] = lrv; >>> } >>> >>> static void gicv2_hcr_status(uint32_t flag, bool status) >>> @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops >>> = { >>> .info = &gicv2_info, >>> .init = gicv2_init, >>> .secondary_init = gicv2_secondary_cpu_init, >>> + .store_lrs = gicv2_store_lrs, >>> .save_state = gicv2_save_state, >>> .restore_state = gicv2_restore_state, >>> .dump_state = gicv2_dump_state, >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c >>> index ed363f6..a05c518 100644 >>> --- a/xen/arch/arm/gic.c >>> +++ b/xen/arch/arm/gic.c >>> @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, >>> int i) >>> } >>> } >>> >>> +void gic_store_lrs(void) >>> +{ >>> + if(gic_hw_ops->store_lrs) >>> + gic_hw_ops->store_lrs(); >>> +} >>> + >>> void gic_clear_lrs(struct vcpu *v) >>> { >>> int i = 0; >>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >>> index f6f6de3..985192b 100644 >>> --- a/xen/arch/arm/traps.c >>> +++ b/xen/arch/arm/traps.c >>> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct >>> cpu_user_regs *regs) >>> if ( current->arch.hcr_el2 & HCR_VA ) >>> current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); >>> >>> + gic_store_lrs(); >> >> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are >> not due by this function, can you move that after gic_clear_lrs()? > Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it > does not change PENDING to INVALID under no circumstances from one hand. > From another hand, all changes to LRs are made through gic specific > operations gic_hw_ops->... which are tracked by me. You can see, in the > code above, that I copy all updates to the physical LR issued by > hypervisor into the stored LRs. So it not the case. But I'll check on > Monday. Ah, I missed the part where you updated the LRs. While the interrupt exception path does not re-enable interrupts early. Other traps may do it. On those trap, you would have the following path: 1) Save GP registers 2) Enable interrupts 3) Store lrs It is possible to receive an interrupts right after 2) and before 3). Because the current vGIC is touching the LRs directly (not the case on the new one). You may then end up with the information of the previous store. Could you investigate how the guest has exited (i.e sync vs interrupt)? Cheers,
Hello Julien, Sorry for the previous email sent as html. On 27.10.18 15:14, Andrii Anisov wrote: >>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c >>> index f6f6de3..985192b 100644 >>> --- a/xen/arch/arm/traps.c >>> +++ b/xen/arch/arm/traps.c >>> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct >>> cpu_user_regs *regs) >>> if ( current->arch.hcr_el2 & HCR_VA ) >>> current->arch.hcr_el2 = READ_SYSREG(HCR_EL2); >>> >>> + gic_store_lrs(); >> >> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are >> not due by this function, can you move that after gic_clear_lrs()? > Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it > does not change PENDING to INVALID under no circumstances from one > hand. From another hand, all changes to LRs are made through gic > specific operations gic_hw_ops->... which are tracked by me. You can > see, in the code above, that I copy all updates to the physical LR > issued by hypervisor into the stored LRs. So it not the case. But I'll > check on Monday. In 4.12-unstable code base I moved gic_store_lrs() after vgic_sync_from_lrs() and see significant changes. Now stale lines are printed at very high rate, and it is the proper behavior. Because the correspondent check (performed when vgic_sync_from_lrs() reads LRs) detects that VM processes interrupts and LR values are changed comparing to those set by hypervisor lately. So now it is the question, why could I detect spurious changes in LRs without exiting to VM?
On 29/10/2018 10:06, Andrii Anisov wrote: > Hello Julien, Hi, > > Sorry for the previous email sent as html. Don't worry. I didn't notice it :). > > > On 27.10.18 15:14, Andrii Anisov wrote: >>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index >>>> f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++ >>>> b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void >>>> enter_hypervisor_head(struct cpu_user_regs *regs) if ( >>>> current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 = >>>> READ_SYSREG(HCR_EL2); >>>> >>>> + gic_store_lrs(); >>> >>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs >>> are not due by this function, can you move that after >>> gic_clear_lrs()? >> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But >> it does not change PENDING to INVALID under no circumstances from >> one hand. From another hand, all changes to LRs are made through >> gic specific operations gic_hw_ops->... which are tracked by me. >> You can see, in the code above, that I copy all updates to the >> physical LR issued by hypervisor into the stored LRs. So it not the >> case. But I'll check on Monday. > > In 4.12-unstable code base I moved gic_store_lrs() after > vgic_sync_from_lrs() and see significant changes. Now stale lines > are printed at very high rate, and it is the proper behavior. Because > the correspondent check (performed when vgic_sync_from_lrs() reads > LRs) detects that VM processes interrupts and LR values are changed > comparing to those set by hypervisor lately. > > So now it is the question, why could I detect spurious changes in LRs > without exiting to VM? I wrote down an answer yesterday (sent it today) to your previous answer. You may use the LRs information from the previous guest trap as interrupts are re-enabled before storing the LRs. Can you try the patch below? commit 11e360b93be81a58a41832d714f33f797ad312a9 Author: Julien Grall <julien.grall@arm.com> Date: Mon Oct 29 13:32:56 2018 +0000 xen/arm: Re-enable interrupt later in the trap path Signed-off-by: Julien Grall <julien.grall@arm.com> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S index 97b05f53ea..8f287891b6 100644 --- a/xen/arch/arm/arm64/entry.S +++ b/xen/arch/arm/arm64/entry.S @@ -195,7 +195,6 @@ hyp_error_invalid: hyp_error: entry hyp=1 - msr daifclr, #2 mov x0, sp bl do_trap_hyp_serror exit hyp=1 @@ -203,7 +202,7 @@ hyp_error: /* Traps taken in Current EL with SP_ELx */ hyp_sync: entry hyp=1 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_hyp_sync exit hyp=1 @@ -304,7 +303,7 @@ guest_sync_slowpath: ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", "nop; nop", SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_sync 1: @@ -332,7 +331,7 @@ guest_fiq_invalid: guest_error: entry hyp=0, compat=0 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_serror exit hyp=0, compat=0 @@ -347,7 +346,7 @@ guest_sync_compat: ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", "nop; nop", SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_sync 1: @@ -375,7 +374,7 @@ guest_fiq_invalid_compat: guest_error_compat: entry hyp=0, compat=1 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_serror exit hyp=0, compat=1 diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 51d2e42c77..c18f89b41f 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs) { struct vcpu *v = current; + ASSERT(!local_irq_is_enabled()); + /* If the guest has disabled the workaround, bring it back on. */ if ( needs_ssbd_flip(v) ) arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL); @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) const union hsr hsr = { .bits = regs->hsr }; enter_hypervisor_head(regs); + local_irq_enable(); switch ( hsr.ec ) { @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) const union hsr hsr = { .bits = regs->hsr }; enter_hypervisor_head(regs); + local_irq_enable(); switch ( hsr.ec ) { @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) void do_trap_hyp_serror(struct cpu_user_regs *regs) { enter_hypervisor_head(regs); + local_irq_enable(); __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs)); } @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs) void do_trap_guest_serror(struct cpu_user_regs *regs) { enter_hypervisor_head(regs); + local_irq_enable(); __do_trap_serror(regs, true); }
Hello Julien On 29.10.18 15:36, Julien Grall wrote:0 > I wrote down an answer yesterday (sent it today) to your previous > answer. You may use the LRs information from the previous guest trap as > interrupts are re-enabled before storing the LRs. Yes, it is the case. I've overlooked that for some exceptions interrupts are enabled before enter_hypervisor_head(). > Can you try the patch below? I tried it, and it works like a charm for the issue. The only notice here is that, I suppose, changes are needed for exceptions taken from a guest mode. For the exceptions taken from hyp, we do not massage gic in enter_hypervisor_head.
On 29/10/2018 16:16, Andrii Anisov wrote: > Hello Julien Hi, > On 29.10.18 15:36, Julien Grall wrote:0 >> I wrote down an answer yesterday (sent it today) to your previous >> answer. You may use the LRs information from the previous guest trap as >> interrupts are re-enabled before storing the LRs. > > Yes, it is the case. I've overlooked that for some exceptions interrupts > are enabled before enter_hypervisor_head(). > > >> Can you try the patch below? > > I tried it, and it works like a charm for the issue. What is the issue? Is it just your print or there a latent bug in current vGIC? > > The only notice here is that, I suppose, changes are needed for > exceptions taken from a guest mode. For the exceptions taken from hyp, > we do not massage gic in enter_hypervisor_head. I actually prefer if we re-enable interrupts after entry_hypervisor_head(). This makes the code working the same way everywhere so less trouble to figure out problem. Cheers,
Hi, On 29.10.18 18:22, Julien Grall wrote: > What is the issue? Is it just your print or there a latent bug in > current vGIC? Ah, OK, that was the issue for me. > I actually prefer if we re-enable interrupts after > entry_hypervisor_head(). This makes the code working the same way > everywhere so less trouble to figure out problem. Good point. Actually I was confused by the reported behavior, because I overlooked those msr's with different arguments in entry.S and assumed enter_hypervisor_head() always runs with irqs disabled.
Hello Julien, I just wonder, do you plan to upstream the patch below? Andrii Anisov On 29/10/2018 10:06, Andrii Anisov wrote: > Hello Julien, Hi, > > Sorry for the previous email sent as html. Don't worry. I didn't notice it :). > > > On 27.10.18 15:14, Andrii Anisov wrote: >>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index >>>> f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++ >>>> b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void >>>> enter_hypervisor_head(struct cpu_user_regs *regs) if ( >>>> current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 = >>>> READ_SYSREG(HCR_EL2); >>>> >>>> + gic_store_lrs(); >>> >>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs >>> are not due by this function, can you move that after >>> gic_clear_lrs()? >> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But >> it does not change PENDING to INVALID under no circumstances from >> one hand. From another hand, all changes to LRs are made through >> gic specific operations gic_hw_ops->... which are tracked by me. >> You can see, in the code above, that I copy all updates to the >> physical LR issued by hypervisor into the stored LRs. So it not the >> case. But I'll check on Monday. > > In 4.12-unstable code base I moved gic_store_lrs() after > vgic_sync_from_lrs() and see significant changes. Now stale lines > are printed at very high rate, and it is the proper behavior. Because > the correspondent check (performed when vgic_sync_from_lrs() reads > LRs) detects that VM processes interrupts and LR values are changed > comparing to those set by hypervisor lately. > > So now it is the question, why could I detect spurious changes in LRs > without exiting to VM? I wrote down an answer yesterday (sent it today) to your previous answer. You may use the LRs information from the previous guest trap as interrupts are re-enabled before storing the LRs. Can you try the patch below? commit 11e360b93be81a58a41832d714f33f797ad312a9 Author: Julien Grall <julien.grall@arm.com> Date: Mon Oct 29 13:32:56 2018 +0000 xen/arm: Re-enable interrupt later in the trap path Signed-off-by: Julien Grall <julien.grall@arm.com> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S index 97b05f53ea..8f287891b6 100644 --- a/xen/arch/arm/arm64/entry.S +++ b/xen/arch/arm/arm64/entry.S @@ -195,7 +195,6 @@ hyp_error_invalid: hyp_error: entry hyp=1 - msr daifclr, #2 mov x0, sp bl do_trap_hyp_serror exit hyp=1 @@ -203,7 +202,7 @@ hyp_error: /* Traps taken in Current EL with SP_ELx */ hyp_sync: entry hyp=1 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_hyp_sync exit hyp=1 @@ -304,7 +303,7 @@ guest_sync_slowpath: ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", "nop; nop", SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_sync 1: @@ -332,7 +331,7 @@ guest_fiq_invalid: guest_error: entry hyp=0, compat=0 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_serror exit hyp=0, compat=0 @@ -347,7 +346,7 @@ guest_sync_compat: ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", "nop; nop", SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_sync 1: @@ -375,7 +374,7 @@ guest_fiq_invalid_compat: guest_error_compat: entry hyp=0, compat=1 - msr daifclr, #6 + msr daifclr, #4 mov x0, sp bl do_trap_guest_serror exit hyp=0, compat=1 diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 51d2e42c77..c18f89b41f 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs) { struct vcpu *v = current; + ASSERT(!local_irq_is_enabled()); + /* If the guest has disabled the workaround, bring it back on. */ if ( needs_ssbd_flip(v) ) arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL); @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) const union hsr hsr = { .bits = regs->hsr }; enter_hypervisor_head(regs); + local_irq_enable(); switch ( hsr.ec ) { @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) const union hsr hsr = { .bits = regs->hsr }; enter_hypervisor_head(regs); + local_irq_enable(); switch ( hsr.ec ) { @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) void do_trap_hyp_serror(struct cpu_user_regs *regs) { enter_hypervisor_head(regs); + local_irq_enable(); __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs)); } @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs) void do_trap_guest_serror(struct cpu_user_regs *regs) { enter_hypervisor_head(regs); + local_irq_enable(); __do_trap_serror(regs, true); }
On 09/11/2018 14:42, Andrii Anisov wrote: > Hello Julien, Hi Andrii, > > I just wonder, do you plan to upstream the patch below? I don't plan to upstream the patch below. Andre and I discussed about it extensively and haven't found potential issue with the 2 vGIC implementation we have in Xen. Cheers,
On Tue, 23 Oct 2018, Julien Grall wrote: > Devices that expose their interrupt status registers via system > registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer, > vgic (although unused by Linux), ...) rely on a context synchronising > operation on the CPU to ensure that the updated status register is > visible to the CPU when handling the interrupt. This usually happens as > a result of taking the IRQ exception in the first place, but there are > two race scenarios where this isn't the case. > > For example, let's say we have two peripherals (X and Y), where Y uses a > system register for its interrupt status. > > Case 1: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. Y then raises its interrupt line, but the update to its system > register is not yet visible to the CPU > 3. The GIC decides to expose Y's interrupt number first in the Ack > register > 4. The CPU runs the IRQ handler for Y, but the status register is stale > > Case 2: > 1. CPU takes an IRQ exception as a result of X raising an interrupt > 2. CPU reads the interrupt number for X from the Ack register and runs > its IRQ handler > 3. Y raises its interrupt line and the Ack register is updated, but > again, the update to its system register is not yet visible to the > CPU. > 4. Since the GIC drivers poll the Ack register, we read Y's interrupt > number and run its handler without a context synchronisation > operation, therefore seeing the stale register value. > > In either case, we run the risk of missing an IRQ. This patch solves the > problem by ensuring that we execute an ISB in the GIC drivers prior > to invoking the interrupt handler. > > Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 > "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". > > Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> > --- > This patch is a candidate for backporting up to Xen 4.9. > --- > xen/arch/arm/gic.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 8d7e491060..305fbd66dd 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) > if ( likely(irq >= 16 && irq < 1020) ) > { > local_irq_enable(); > + isb(); > do_IRQ(regs, irq, is_fiq); > local_irq_disable(); > } > else if ( is_lpi(irq) ) > { > local_irq_enable(); > + isb(); > gic_hw_ops->do_LPI(irq); > local_irq_disable(); > } > -- > 2.11.0 >
Julien, It's me again about your patch:) I've found this patch useful and even can give a motivation to have it in the mainline. The patch ensures that vgic_sync_from_lrs is performed on guest to hyp switch prior to any IRQ processing. So, do you plan to push it for review? Could I do that on behalf of you? On 09.11.18 16:42, Andrii Anisov wrote: > Hello Julien, > > I just wonder, do you plan to upstream the patch below? > > Andrii Anisov > > > > commit 11e360b93be81a58a41832d714f33f797ad312a9 > Author: Julien Grall <julien.grall@arm.com> > Date: Mon Oct 29 13:32:56 2018 +0000 > > xen/arm: Re-enable interrupt later in the trap path > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S > index 97b05f53ea..8f287891b6 100644 > --- a/xen/arch/arm/arm64/entry.S > +++ b/xen/arch/arm/arm64/entry.S > @@ -195,7 +195,6 @@ hyp_error_invalid: > > hyp_error: > entry hyp=1 > - msr daifclr, #2 > mov x0, sp > bl do_trap_hyp_serror > exit hyp=1 > @@ -203,7 +202,7 @@ hyp_error: > /* Traps taken in Current EL with SP_ELx */ > hyp_sync: > entry hyp=1 > - msr daifclr, #6 > + msr daifclr, #4 > mov x0, sp > bl do_trap_hyp_sync > exit hyp=1 > @@ -304,7 +303,7 @@ guest_sync_slowpath: > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", > "nop; nop", > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) > - msr daifclr, #6 > + msr daifclr, #4 > mov x0, sp > bl do_trap_guest_sync > 1: > @@ -332,7 +331,7 @@ guest_fiq_invalid: > > guest_error: > entry hyp=0, compat=0 > - msr daifclr, #6 > + msr daifclr, #4 > mov x0, sp > bl do_trap_guest_serror > exit hyp=0, compat=0 > @@ -347,7 +346,7 @@ guest_sync_compat: > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", > "nop; nop", > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) > - msr daifclr, #6 > + msr daifclr, #4 > mov x0, sp > bl do_trap_guest_sync > 1: > @@ -375,7 +374,7 @@ guest_fiq_invalid_compat: > > guest_error_compat: > entry hyp=0, compat=1 > - msr daifclr, #6 > + msr daifclr, #4 > mov x0, sp > bl do_trap_guest_serror > exit hyp=0, compat=1 > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 51d2e42c77..c18f89b41f 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs) > { > struct vcpu *v = current; > > + ASSERT(!local_irq_is_enabled()); > + > /* If the guest has disabled the workaround, bring it back on. */ > if ( needs_ssbd_flip(v) ) > arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL); > @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) > const union hsr hsr = { .bits = regs->hsr }; > > enter_hypervisor_head(regs); > + local_irq_enable(); > > switch ( hsr.ec ) > { > @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) > const union hsr hsr = { .bits = regs->hsr }; > > enter_hypervisor_head(regs); > + local_irq_enable(); > > switch ( hsr.ec ) > { > @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) > void do_trap_hyp_serror(struct cpu_user_regs *regs) > { > enter_hypervisor_head(regs); > + local_irq_enable(); > > __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs)); > } > @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs) > void do_trap_guest_serror(struct cpu_user_regs *regs) > { > enter_hypervisor_head(regs); > + local_irq_enable(); > > __do_trap_serror(regs, true); > } >
On Mon, 19 Nov 2018, 15:57 Andrii Anisov, <andrii.anisov@gmail.com> wrote: > Julien, > > > It's me again about your patch:) > > I've found this patch useful and even can give a motivation to have it > in the mainline. The patch ensures that vgic_sync_from_lrs is performed > on guest to hyp switch prior to any IRQ processing. > There are no issue about processing IRQs before the syncs. It is the same as if an IRQ was raised from ila different pCPUs. So why do you need that? > So, do you plan to push it for review? Could I do that on behalf of you? > Unless there are any bug with current code, then I don't plan to merge it. Cheers, > > On 09.11.18 16:42, Andrii Anisov wrote: > > Hello Julien, > > > > I just wonder, do you plan to upstream the patch below? > > > > Andrii Anisov > > > > > > > > commit 11e360b93be81a58a41832d714f33f797ad312a9 > > Author: Julien Grall <julien.grall@arm.com> > > Date: Mon Oct 29 13:32:56 2018 +0000 > > > > xen/arm: Re-enable interrupt later in the trap path > > > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > > > diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S > > index 97b05f53ea..8f287891b6 100644 > > --- a/xen/arch/arm/arm64/entry.S > > +++ b/xen/arch/arm/arm64/entry.S > > @@ -195,7 +195,6 @@ hyp_error_invalid: > > > > hyp_error: > > entry hyp=1 > > - msr daifclr, #2 > > mov x0, sp > > bl do_trap_hyp_serror > > exit hyp=1 > > @@ -203,7 +202,7 @@ hyp_error: > > /* Traps taken in Current EL with SP_ELx */ > > hyp_sync: > > entry hyp=1 > > - msr daifclr, #6 > > + msr daifclr, #4 > > mov x0, sp > > bl do_trap_hyp_sync > > exit hyp=1 > > @@ -304,7 +303,7 @@ guest_sync_slowpath: > > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", > > "nop; nop", > > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) > > - msr daifclr, #6 > > + msr daifclr, #4 > > mov x0, sp > > bl do_trap_guest_sync > > 1: > > @@ -332,7 +331,7 @@ guest_fiq_invalid: > > > > guest_error: > > entry hyp=0, compat=0 > > - msr daifclr, #6 > > + msr daifclr, #4 > > mov x0, sp > > bl do_trap_guest_serror > > exit hyp=0, compat=0 > > @@ -347,7 +346,7 @@ guest_sync_compat: > > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f", > > "nop; nop", > > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT) > > - msr daifclr, #6 > > + msr daifclr, #4 > > mov x0, sp > > bl do_trap_guest_sync > > 1: > > @@ -375,7 +374,7 @@ guest_fiq_invalid_compat: > > > > guest_error_compat: > > entry hyp=0, compat=1 > > - msr daifclr, #6 > > + msr daifclr, #4 > > mov x0, sp > > bl do_trap_guest_serror > > exit hyp=0, compat=1 > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > > index 51d2e42c77..c18f89b41f 100644 > > --- a/xen/arch/arm/traps.c > > +++ b/xen/arch/arm/traps.c > > @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct > cpu_user_regs *regs) > > { > > struct vcpu *v = current; > > > > + ASSERT(!local_irq_is_enabled()); > > + > > /* If the guest has disabled the workaround, bring it back > on. */ > > if ( needs_ssbd_flip(v) ) > > arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, > NULL); > > @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs) > > const union hsr hsr = { .bits = regs->hsr }; > > > > enter_hypervisor_head(regs); > > + local_irq_enable(); > > > > switch ( hsr.ec ) > > { > > @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) > > const union hsr hsr = { .bits = regs->hsr }; > > > > enter_hypervisor_head(regs); > > + local_irq_enable(); > > > > switch ( hsr.ec ) > > { > > @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs) > > void do_trap_hyp_serror(struct cpu_user_regs *regs) > > { > > enter_hypervisor_head(regs); > > + local_irq_enable(); > > > > __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs)); > > } > > @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs) > > void do_trap_guest_serror(struct cpu_user_regs *regs) > > { > > enter_hypervisor_head(regs); > > + local_irq_enable(); > > > > __do_trap_serror(regs, true); > > } > > > -- > Sincerely, > Andrii Anisov. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel <br><br><div class="gmail_quote"><div dir="ltr">On Mon, 19 Nov 2018, 15:57 Andrii Anisov, <<a href="mailto:andrii.anisov@gmail.com">andrii.anisov@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Julien,<br> <br> <br> It's me again about your patch:)<br> <br> I've found this patch useful and even can give a motivation to have it <br> in the mainline. The patch ensures that vgic_sync_from_lrs is performed <br> on guest to hyp switch prior to any IRQ processing.<br></blockquote></div><div><br></div><div><br></div><div><br>There are no issue about processing IRQs before the syncs. It is the same as if an IRQ was raised from ila different pCPUs. So why do you need that?</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> So, do you plan to push it for review? Could I do that on behalf of you?<br></blockquote></div><div><br></div><div>Unless there are any bug with current code, then I don't plan to merge it.</div><div><br></div><div>Cheers,</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> <br> On 09.11.18 16:42, Andrii Anisov wrote:<br> > Hello Julien,<br> ><br> > I just wonder, do you plan to upstream the patch below?<br> ><br> > Andrii Anisov<br> ><br> ><br> ><br> > commit 11e360b93be81a58a41832d714f33f797ad312a9<br> > Author: Julien Grall <<a href="mailto:julien.grall@arm.com" target="_blank">julien.grall@arm.com</a>><br> > Date: Mon Oct 29 13:32:56 2018 +0000<br> ><br> > xen/arm: Re-enable interrupt later in the trap path<br> ><br> > Signed-off-by: Julien Grall <<a href="mailto:julien.grall@arm.com" target="_blank">julien.grall@arm.com</a>><br> ><br> > diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S<br> > index 97b05f53ea..8f287891b6 100644<br> > --- a/xen/arch/arm/arm64/entry.S<br> > +++ b/xen/arch/arm/arm64/entry.S<br> > @@ -195,7 +195,6 @@ hyp_error_invalid:<br> ><br> > hyp_error:<br> > entry hyp=1<br> > - msr daifclr, #2<br> > mov x0, sp<br> > bl do_trap_hyp_serror<br> > exit hyp=1<br> > @@ -203,7 +202,7 @@ hyp_error:<br> > /* Traps taken in Current EL with SP_ELx */<br> > hyp_sync:<br> > entry hyp=1<br> > - msr daifclr, #6<br> > + msr daifclr, #4<br> > mov x0, sp<br> > bl do_trap_hyp_sync<br> > exit hyp=1<br> > @@ -304,7 +303,7 @@ guest_sync_slowpath:<br> > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",<br> > "nop; nop",<br> > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)<br> > - msr daifclr, #6<br> > + msr daifclr, #4<br> > mov x0, sp<br> > bl do_trap_guest_sync<br> > 1:<br> > @@ -332,7 +331,7 @@ guest_fiq_invalid:<br> ><br> > guest_error:<br> > entry hyp=0, compat=0<br> > - msr daifclr, #6<br> > + msr daifclr, #4<br> > mov x0, sp<br> > bl do_trap_guest_serror<br> > exit hyp=0, compat=0<br> > @@ -347,7 +346,7 @@ guest_sync_compat:<br> > ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",<br> > "nop; nop",<br> > SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)<br> > - msr daifclr, #6<br> > + msr daifclr, #4<br> > mov x0, sp<br> > bl do_trap_guest_sync<br> > 1:<br> > @@ -375,7 +374,7 @@ guest_fiq_invalid_compat:<br> ><br> > guest_error_compat:<br> > entry hyp=0, compat=1<br> > - msr daifclr, #6<br> > + msr daifclr, #4<br> > mov x0, sp<br> > bl do_trap_guest_serror<br> > exit hyp=0, compat=1<br> > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c<br> > index 51d2e42c77..c18f89b41f 100644<br> > --- a/xen/arch/arm/traps.c<br> > +++ b/xen/arch/arm/traps.c<br> > @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)<br> > {<br> > struct vcpu *v = current;<br> ><br> > + ASSERT(!local_irq_is_enabled());<br> > +<br> > /* If the guest has disabled the workaround, bring it back on. */<br> > if ( needs_ssbd_flip(v) )<br> > arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);<br> > @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)<br> > const union hsr hsr = { .bits = regs->hsr };<br> ><br> > enter_hypervisor_head(regs);<br> > + local_irq_enable();<br> ><br> > switch ( <a href="http://hsr.ec" rel="noreferrer" target="_blank">hsr.ec</a> )<br> > {<br> > @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)<br> > const union hsr hsr = { .bits = regs->hsr };<br> ><br> > enter_hypervisor_head(regs);<br> > + local_irq_enable();<br> ><br> > switch ( <a href="http://hsr.ec" rel="noreferrer" target="_blank">hsr.ec</a> )<br> > {<br> > @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)<br> > void do_trap_hyp_serror(struct cpu_user_regs *regs)<br> > {<br> > enter_hypervisor_head(regs);<br> > + local_irq_enable();<br> ><br> > __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));<br> > }<br> > @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)<br> > void do_trap_guest_serror(struct cpu_user_regs *regs)<br> > {<br> > enter_hypervisor_head(regs);<br> > + local_irq_enable();<br> ><br> > __do_trap_serror(regs, true);<br> > }<br> ><br> -- <br> Sincerely,<br> Andrii Anisov.<br> <br> <br> _______________________________________________<br> Xen-devel mailing list<br> <a href="mailto:Xen-devel@lists.xenproject.org" target="_blank">Xen-devel@lists.xenproject.org</a><br> <a href="https://lists.xenproject.org/mailman/listinfo/xen-devel" rel="noreferrer" target="_blank">https://lists.xenproject.org/mailman/listinfo/xen-devel</a></blockquote></div>
Hello Julien, On 19.11.18 18:42, Julien Grall wrote: > There are no issue about processing IRQs before the syncs. It is the > same as if an IRQ was raised from ila different pCPUs. > So why do you need that? From my understanding of gic-vgic code (old vgic), for the IRQs targeting the `current` vcpu, it leads to a faster processing under interrupts storm conditions. If it was all LRs set on previous switch to a guest, a the IRQ will have a chance to go directly to LR instead of setting on lr_pending queue. Also inflight_irqs queue have a chance to be shorter to insert. Moreover, maybe you can explain me, what's the point of interrupts enabling before for `do_IRQ()` call? Those interrupts would be grabbed and processed anyway, during run through the loop in `gic_interrupt()`. So I see the only outcome of interrupts enabling - spending more time for context switches.
On 20/11/2018 18:10, Andrii Anisov wrote: > Hello Julien, > > > On 19.11.18 18:42, Julien Grall wrote: >> There are no issue about processing IRQs before the syncs. It is the same as >> if an IRQ was raised from ila different pCPUs. >> So why do you need that? > > From my understanding of gic-vgic code (old vgic), for the IRQs targeting the > `current` vcpu, it leads to a faster processing under interrupts storm > conditions. If it was all LRs set on previous switch to a guest, a the IRQ will > have a chance to go directly to LR instead of setting on lr_pending queue. Also > inflight_irqs queue have a chance to be shorter to insert. Do you have actual numbers? Also to be on the same page, what is your definition of interrupts storm? Bear in mind that the old vGIC will be phased out soon. If you are worried about performance, then I would recommend to try the new vGIC and see whether it improves. > Moreover, maybe you can explain me, what's the point of interrupts enabling > before for `do_IRQ()` call? Those interrupts would be grabbed and processed > anyway, during run through the loop in `gic_interrupt()`. So I see the only > outcome of interrupts enabling - spending more time for context switches. Well, if you re-enable the interrupts you give a chance for higher priority interrupts to come up. This will not happen if you have interrupts disabled. But you seem to base your assumption on interrupts storm (yet to be defined). If you have an interrupt storm, then you are already doomed as your guest/Xen will not have time to do any other work. In any case, you need to provide number to support your optimization. Cheers,
Hello Julien, On 20.11.18 20:47, Julien Grall wrote: > > > On 20/11/2018 18:10, Andrii Anisov wrote: >> Hello Julien, >> >> >> On 19.11.18 18:42, Julien Grall wrote: >>> There are no issue about processing IRQs before the syncs. It is the >>> same as if an IRQ was raised from ila different pCPUs. >>> So why do you need that? >> >> From my understanding of gic-vgic code (old vgic), for the IRQs >> targeting the `current` vcpu, it leads to a faster processing under >> interrupts storm conditions. If it was all LRs set on previous switch >> to a guest, a the IRQ will have a chance to go directly to LR instead >> of setting on lr_pending queue. Also inflight_irqs queue have a chance >> to be shorter to insert. > > Do you have actual numbers? Unfortunately, my numbers are pretty indirect. I'm referring glmark2 benchmark results. With this and the rest of my changes (not yet published), I can cut out another percent or two of performance drop due to XEN existence in the system. BTW, that's why I recently asked Stefano about his approach of interrupt latency measurement. On my board that benchmark processing causes at least 4 different HW interrupts issuing with different frequency. Adding the reschedule IRQ makes the system tend to not fit all IRQs into 4 LRs available in my GIC. Moreover, the benchmark does not emit a network traffic or disk usage during the run. So real life cases will add more concurrent IRQs. > Also to be on the same page, what is your > definition of interrupts storm? I mean the system takes different interrupts (more IRQ sources than LRs available) with a relatively high rate. Let's say more than 7000 interrupts per second. It's not very big number, but close to what I see on my desk. > Bear in mind that the old vGIC will be phased out soon. As I remember a new vgic experimental yet. Do not support GIC-v3 yet. > If you are > worried about performance, then I would recommend to try the new vGIC > and see whether it improves. You know, we are based on XEN 4.10. Initially, when a customer said about their dissatisfaction about performance drop in benchmark due to XEN existence, I tried 4.12-unstable, both an old and a new VGIC. So performance with 4.12-unstable with the old VGIC was worse than 4.10, and the new VGIC made things even much worse. I can't remember the exact numbers or proportions, but that was the reason we do not offer upgrading XEN yet. > Well, if you re-enable the interrupts you give a chance for higher > priority interrupts to come up. This will not happen if you have > interrupts disabled. I understand the theory, but can not match it with the current XEN code. Guest interrupts prioritization within do_IRQ is pretty meaningless. They will go through the same path. And an effect would not be seen before exiting to a guest. The PPI interrupts are reflected into the processing of soft IRQs or injecting an IRQ into queues. So it does not matter much when exactly we do read the IRQ from IAR in a gic_interrupt loop. I suppose it should be faster to loop through gic_interrupt at once, collecting all interrupts, without going through exception path, then switch to soft IRQs processing in leave_hypervisor_tail. The only thing which might get a noticeable effect here is serving GIC_SGI_CALL_FUNCTION, which is executed right away from `gic_interrupt`. > But you seem to base your assumption on interrupts storm (yet to be > defined). If you have an interrupt storm, then you are already doomed as > your guest/Xen will not have time to do any other work. > > In any case, you need to provide number to support your optimization.I'm moving all my patches to current staging and would send them as RFC with a description of why is it done and how I measured results.
On 11/22/18 4:51 PM, Andrii Anisov wrote: > Hello Julien, Hi Andrii, > On 20.11.18 20:47, Julien Grall wrote: >> >> >> On 20/11/2018 18:10, Andrii Anisov wrote: >>> Hello Julien, >>> >>> >>> On 19.11.18 18:42, Julien Grall wrote: >>>> There are no issue about processing IRQs before the syncs. It is the >>>> same as if an IRQ was raised from ila different pCPUs. >>>> So why do you need that? >>> >>> From my understanding of gic-vgic code (old vgic), for the IRQs >>> targeting the `current` vcpu, it leads to a faster processing under >>> interrupts storm conditions. If it was all LRs set on previous switch >>> to a guest, a the IRQ will have a chance to go directly to LR instead >>> of setting on lr_pending queue. Also inflight_irqs queue have a >>> chance to be shorter to insert. >> >> Do you have actual numbers? > Unfortunately, my numbers are pretty indirect. I'm referring glmark2 > benchmark results. With this and the rest of my changes (not yet > published), I can cut out another percent or two of performance drop due > to XEN existence in the system. BTW, that's why I recently asked Stefano > about his approach of interrupt latency measurement. My biggest worry is you are doing optimization on a vGIC that is not fully compliant with how a GIC should behave (e.g edge vs level) and with very fragile locking. If you are interested, Andre can provides more details. > > On my board that benchmark processing causes at least 4 different HW > interrupts issuing with different frequency. Adding the reschedule IRQ > makes the system tend to not fit all IRQs into 4 LRs available in my > GIC. Moreover, the benchmark does not emit a network traffic or disk > usage during the run. So real life cases will add more concurrent IRQs. > >> Also to be on the same page, what is your definition of interrupts storm? > I mean the system takes different interrupts (more IRQ sources than LRs > available) with a relatively high rate. Let's say more than 7000 > interrupts per second. It's not very big number, but close to what I see > on my desk. > >> Bear in mind that the old vGIC will be phased out soon. > As I remember a new vgic experimental yet. Do not support GIC-v3 yet. > >> If you are worried about performance, then I would recommend to try >> the new vGIC and see whether it improves. > You know, we are based on XEN 4.10. Initially, when a customer said > about their dissatisfaction about performance drop in benchmark due to > XEN existence, I tried 4.12-unstable, both an old and a new VGIC. So > performance with 4.12-unstable with the old VGIC was worse than 4.10, > and the new VGIC made things even much worse. I can't remember the exact > numbers or proportions, but that was the reason we do not offer > upgrading XEN yet. I can't comment without any numbers here. Bear in mind that we fixed bugs in Xen 4.12 (including spectre/meltdown and missing barriers) that wasn't backported to Xen 4.10. It is entirely possible that it introduced slowness but it also ensure the code is behaving correctly. Anyway, if there are performance regression we should investigate them and discuss how we can address/limit them. Similarly for the new vGIC, if you think it is too slow, then we need to know why before we get rid of the old vGIC. > >> Well, if you re-enable the interrupts you give a chance for higher >> priority interrupts to come up. This will not happen if you have >> interrupts disabled. > I understand the theory, but can not match it with the current XEN code. > Guest interrupts prioritization within do_IRQ is pretty meaningless. There are no guest prioritization at the moment. However, we may want to introduce it to give priority to one guest over. Cheers,
On Thu, 22 Nov 2018 18:51:13 +0200 Andrii Anisov <andrii.anisov@gmail.com> wrote: Hi, > On 20.11.18 20:47, Julien Grall wrote: > > > > > > On 20/11/2018 18:10, Andrii Anisov wrote: > >> Hello Julien, > >> > >> > >> On 19.11.18 18:42, Julien Grall wrote: > >>> There are no issue about processing IRQs before the syncs. It is > >>> the same as if an IRQ was raised from ila different pCPUs. > >>> So why do you need that? > >> > >> From my understanding of gic-vgic code (old vgic), for the IRQs > >> targeting the `current` vcpu, it leads to a faster processing > >> under interrupts storm conditions. If it was all LRs set on > >> previous switch to a guest, a the IRQ will have a chance to go > >> directly to LR instead of setting on lr_pending queue. Also > >> inflight_irqs queue have a chance to be shorter to insert. > > > > Do you have actual numbers? > Unfortunately, my numbers are pretty indirect. I'm referring glmark2 > benchmark results. With this and the rest of my changes (not yet > published), I can cut out another percent or two of performance drop > due to XEN existence in the system. BTW, that's why I recently asked > Stefano about his approach of interrupt latency measurement. > > On my board that benchmark processing causes at least 4 different HW > interrupts issuing with different frequency. Is that benchmark chosen to put some interrupt load on the system? Or is that what the customer actually uses and she is really suffering from Xen's interrupt handling and emulation? If you chose this benchmark because it tends to trigger a lot of interrupts and you hope to estimate some other interrupt property with this (for instance interrupt latency or typical LR utilisation), then you might be disappointed. This seems to go into the direction of an interrupt storm, where we really don't care about performance, but just want to make sure we keep running and ideally don't penalise other guests. > Adding the reschedule > IRQ makes the system tend to not fit all IRQs into 4 LRs available in > my GIC. Moreover, the benchmark does not emit a network traffic or > disk usage during the run. So real life cases will add more > concurrent IRQs. That's rather odd. Are you sure you actually have all LRs used? What is your guest system? Linux? Can you check whether you use EOI mode 1 in the guest ("GIC: Using split EOI/Deactivate mode" in dmesg, right after "NR_IRQS: xx, nr_irqs: xx, preallocated irqs: 0")? Typically Linux EOIs an IRQ very quickly, and with EOI mode 0 you just have *one* active IRQ. So the other interrupts would be pending, which means your guest is busy with handling interrupts. Which points to other problems and might not be a valid benchmark. Also: what is the exact problem with exceeding the number of hardware LRs? If you have a high interrupt load (more than three or four interrupts pending at the same time), chances are you exit anyway quite often, in which case the LRs get updated. I don't see the huge advantage of being able to present 8 pending IRQs to the guest. > > Also to be on the same page, what is your > > definition of interrupts storm? > I mean the system takes different interrupts (more IRQ sources than > LRs available) with a relatively high rate. Let's say more than 7000 > interrupts per second. It's not very big number, but close to what I > see on my desk. The frequency of the interrupt (n per second) should be unrelated to the number of IRQs being presented simultaneously to the guest. Typically I would assume you have zero interrupts normally, because your guest is doing actual work (running the OS or userland program). Then you handle the occasional interrupt (1 LR in active state), and the timer IRQ fires during this. This lets the guest exit, and the second LR gets used with the injected pending timer IRQ. Now every now and then an IPI might also be injected from another VCPU at the same time, which brings the count up to 3. But all of the time the guest still handles this first interrupt. And chances are the interrupt handler sooner or later triggers an (MMIO) exit, at which case the number of LR becomes irrelevant. A high number of LRs would only be interesting if you are able to process all those interrupts without a single exit, typically this is only true for the virtual arch timer IRQ. Also keep in mind that some instructions here and there in the interrupt handling path in Xen might not be relevant if you exit the guest frequently (due to interrupts, for instance). The cost of an exit will probably dwarf the cost of adding a struct pending_irq to a linked list or so. Cheers, Andre. > > Bear in mind that the old vGIC will be phased out soon. > As I remember a new vgic experimental yet. Do not support GIC-v3 yet. > > > If you are > > worried about performance, then I would recommend to try the new > > vGIC and see whether it improves. > You know, we are based on XEN 4.10. Initially, when a customer said > about their dissatisfaction about performance drop in benchmark due > to XEN existence, I tried 4.12-unstable, both an old and a new VGIC. > So performance with 4.12-unstable with the old VGIC was worse than > 4.10, and the new VGIC made things even much worse. I can't remember > the exact numbers or proportions, but that was the reason we do not > offer upgrading XEN yet. > > > Well, if you re-enable the interrupts you give a chance for higher > > priority interrupts to come up. This will not happen if you have > > interrupts disabled. > I understand the theory, but can not match it with the current XEN > code. Guest interrupts prioritization within do_IRQ is pretty > meaningless. They will go through the same path. And an effect would > not be seen before exiting to a guest. > The PPI interrupts are reflected into the processing of soft IRQs or > injecting an IRQ into queues. So it does not matter much when exactly > we do read the IRQ from IAR in a gic_interrupt loop. I suppose it > should be faster to loop through gic_interrupt at once, collecting > all interrupts, without going through exception path, then switch to > soft IRQs processing in leave_hypervisor_tail. > The only thing which might get a noticeable effect here is serving > GIC_SGI_CALL_FUNCTION, which is executed right away from > `gic_interrupt`. > > > But you seem to base your assumption on interrupts storm (yet to be > > defined). If you have an interrupt storm, then you are already > > doomed as your guest/Xen will not have time to do any other work. > > > > > In any case, you need to provide number to support your > > optimization.I'm moving all my patches to current staging and would > > send them as RFC > with a description of why is it done and how I measured results. >
Hello Julien, On 22.11.18 19:22, Julien Grall wrote: > My biggest worry is you are doing optimization on a vGIC that is not > fully compliant with how a GIC should behave (e.g edge vs level) and > with very fragile locking. Yep, old VGIC locking looks pretty terrible. > If you are interested, Andre can provides more details. Being honest, I'm not fully understand edge vs level problem there. It would be good to get better view on it. > I can't comment without any numbers here. Bear in mind that we fixed > bugs in Xen 4.12 (including spectre/meltdown and missing barriers) that > wasn't backported to Xen 4.10. It is entirely possible that it > introduced slowness but it also ensure the code is behaving correctly. For sure, I know that. It was rather a political decision. > Anyway, if there are performance regression we should investigate them > and discuss how we can address/limit them. Similarly for the new vGIC, > if you think it is too slow, then we need to know why before we get rid > of the old vGIC. Yep. > There are no guest prioritization at the moment. However, we may want to > introduce it to give priority to one guest over. But still, processing of IRQs by a hypervisor (e.g. moving them from gic to vgic for guests IRQs) has a higher priority over any guest execution.
On Fri, 23 Nov 2018 12:09:41 +0200 Andrii Anisov <andrii.anisov@gmail.com> wrote: Hi, > On 22.11.18 19:22, Julien Grall wrote: > > My biggest worry is you are doing optimization on a vGIC that is > > not fully compliant with how a GIC should behave (e.g edge vs > > level) and with very fragile locking. > Yep, old VGIC locking looks pretty terrible. > > > If you are interested, Andre can provides more details. > Being honest, I'm not fully understand edge vs level problem there. > It would be good to get better view on it. Fundamentally there is a semantic difference between edge and level triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so the LR's state goes to 0), this is done and dusted, and Xen doesn't need to care about this anymore until the next IRQ occurs. For level triggered IRQs, even though the guest has handled it, we need to resample the (potentially virtual) IRQ line, as it may come up or down at the *device*'s discretion: the interrupt reason might have gone away (GPIO condition no longer true), even before we were able to inject it, or there might be another interrupt reason not yet handled (incoming serial character while serving a transmit interrupt). Also typically it's up to the interrupt handler to confirm handling the interrupt, either explicitly by clearing an interrupt bit in some status register or implicitly, for instance by draining a FIFO, say on a serial device. So even though from the (V)GIC's point of view the interrupt has been processed (EOIed), it might still be pending. My intimate "old Xen VGIC" knowledge has been swapped out from my brain meanwhile, but IIRC Xen treats every IRQ as if it would be an edge IRQ. Which works if the guest's interrupt handler behaves correctly. Most IRQ handlers have a loop to iterate over all possible interrupt reasons and process them, so the line goes indeed down before they EOI the IRQ. > > I can't comment without any numbers here. Bear in mind that we > > fixed bugs in Xen 4.12 (including spectre/meltdown and missing > > barriers) that wasn't backported to Xen 4.10. It is entirely > > possible that it introduced slowness but it also ensure the code is > > behaving correctly. > For sure, I know that. It was rather a political decision. Translating "applying a fix for a serious security issue" to "political decision" is a rather, ummh, interesting way of seeing things ;-) Cheers, Andre. > > Anyway, if there are performance regression we should investigate > > them and discuss how we can address/limit them. Similarly for the > > new vGIC, if you think it is too slow, then we need to know why > > before we get rid of the old vGIC. > Yep. > > > There are no guest prioritization at the moment. However, we may > > want to introduce it to give priority to one guest over. > But still, processing of IRQs by a hypervisor (e.g. moving them from > gic to vgic for guests IRQs) has a higher priority over any guest > execution. >
Hello Andre, On 22.11.18 20:04, Andre Przywara wrote: > > Is that benchmark chosen to put some interrupt load on the system? Or > is that what the customer actually uses and she is really suffering > from Xen's interrupt handling and emulation? That it the 3D benchmark used by customer to compare virtualization solutions from different providers. > If you chose this benchmark because it tends to trigger a lot of > interrupts and you hope to estimate some other interrupt property with > this (for instance interrupt latency or typical LR utilisation), then > you might be disappointed. This seems to go into the direction of an > interrupt storm, where we really don't care about performance, but just > want to make sure we keep running and ideally don't penalise other > guests. Well, that benchmark itself is rather interrupts oriented (on our HW). It emits GPU load, so causes very low CPU load, but lot of intrerupts from GPU, video subsytem, display subsystem. I know about the WFI/WFE problem and `vwfi=native`. But we can't use it, because our system is overcommitted. > >> Adding the reschedule >> IRQ makes the system tend to not fit all IRQs into 4 LRs available in >> my GIC. Moreover, the benchmark does not emit a network traffic or >> disk usage during the run. So real life cases will add more >> concurrent IRQs. > > That's rather odd. Are you sure you actually have all LRs used? I have to recheck. 7 > What is your guest system? Linux? Yep, LK 4.14.35 > Can you check whether you use EOI mode 1 in > the guest ("GIC: Using split EOI/Deactivate mode" in dmesg, right after > "NR_IRQS: xx, nr_irqs: xx, preallocated irqs: 0")? I didn't find such a print in dmesg: [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (virt). > The frequency of the interrupt (n per second) should be unrelated to > the number of IRQs being presented simultaneously to the guest. Actually yes. What could matter here is that we have 4 concurrent HW interrupt sources involved in 3D processing. > Typically I would assume you have zero interrupts normally, because > your guest is doing actual work (running the OS or userland program). > Then you handle the occasional interrupt (1 LR in active state), and > the timer IRQ fires during this. This lets the guest exit, and the > second LR gets used with the injected pending timer IRQ. Now every now > and then an IPI might also be injected from another VCPU at the same > time, which brings the count up to 3. But all of the time the guest > still handles this first interrupt. And chances are the interrupt > handler sooner or later triggers an (MMIO) exit, at which case the > number of LR becomes irrelevant. A high number of LRs would only be > interesting if you are able to process all those interrupts without a > single exit, typically this is only true for the virtual arch timer IRQ. I need some time to sort it out. > Also keep in mind that some instructions here and there in the > interrupt handling path in Xen might not be relevant if you exit the > guest frequently (due to interrupts, for instance). The cost of an exit > will probably dwarf the cost of adding a struct pending_irq to a linked > list or so. It is clear. As we discussed internally, even making IRQ path shorter, we may experience the benchmark results drop due to the fact that we are doing more context switches from guest instead of collecting those interrupts directly from hyp.
On 23.11.18 14:18, Andre Przywara wrote: > Fundamentally there is a semantic difference between edge and level > triggered IRQs: > When the guest has handled an *edge* IRQ (EOIed so the LR's state goes > to 0), this is done and dusted, and Xen doesn't need to care > about this anymore until the next IRQ occurs. > > For level triggered IRQs, even though the guest has handled it, we need > to resample the (potentially virtual) IRQ line, as it may come up or > down at the *device*'s discretion: the interrupt reason might have gone > away (GPIO condition no longer true), even before we were able to > inject it, or there might be another interrupt reason not yet handled > (incoming serial character while serving a transmit interrupt). Also > typically it's up to the interrupt handler to confirm handling the > interrupt, either explicitly by clearing an interrupt bit in some > status register or implicitly, for instance by draining a FIFO, say on > a serial device. So even though from the (V)GIC's point of view the > interrupt has been processed (EOIed), it might still be pending. > > My intimate "old Xen VGIC" knowledge has been swapped out from my brain > meanwhile, but IIRC Xen treats every IRQ as if it would be an edge IRQ. > Which works if the guest's interrupt handler behaves correctly. Most > IRQ handlers have a loop to iterate over all possible interrupt > reasons and process them, so the line goes indeed down before they EOI > the IRQ. Thank you for the explanation. I'll read it carefully.
On 23/11/2018 12:58, Andrii Anisov wrote: > Hello Andre, > > On 22.11.18 20:04, Andre Przywara wrote: >> >> Is that benchmark chosen to put some interrupt load on the system? Or >> is that what the customer actually uses and she is really suffering >> from Xen's interrupt handling and emulation? > That it the 3D benchmark used by customer to compare virtualization solutions > from different providers. > >> If you chose this benchmark because it tends to trigger a lot of >> interrupts and you hope to estimate some other interrupt property with >> this (for instance interrupt latency or typical LR utilisation), then >> you might be disappointed. This seems to go into the direction of an >> interrupt storm, where we really don't care about performance, but just >> want to make sure we keep running and ideally don't penalise other >> guests. > Well, that benchmark itself is rather interrupts oriented (on our HW). It emits > GPU load, so causes very low CPU load, but lot of intrerupts from GPU, video > subsytem, display subsystem. I know about the WFI/WFE problem and `vwfi=native`. The WFI/WFE problem is mostly because of the context switch. It likely can be optimized to reduce the overhead. > But we can't use it, because our system is overcommitted. Can you describe how overcommitted your system is? >> Also keep in mind that some instructions here and there in the >> interrupt handling path in Xen might not be relevant if you exit the >> guest frequently (due to interrupts, for instance). The cost of an exit >> will probably dwarf the cost of adding a struct pending_irq to a linked >> list or so. > It is clear. As we discussed internally, even making IRQ path shorter, we may > experience the benchmark results drop due to the fact that we are doing more > context switches from guest instead of collecting those interrupts directly from > hyp. I don't understand what you mean. Can you details it? Cheers,
Hello Julien, On 23.11.18 15:27, Julien Grall wrote: >> But we can't use it, because our system is overcommitted. > > Can you describe how overcommitted your system is? I did mean that we have a requirement to not limit VCPU number with PCPU number. Also we should not use cpu pinning. I guess I used a wrong word, it must be "oversubscribed". > I don't understand what you mean. Can you details it? I hope you do not mind I draw a picture for this. I failed to find a simple wording for it :( At some rate of IRQs from different sources, slight reducing of IRQ processing time in the hypervisor might result in an overly slower IRQs processing. So, on the picture attaches, in case2 IRQ processing path in XEN made shorter. But when we have IRQ1 and IRQ2 asserted at some rate, we result in two context switches and going through the IRQ processing path twice. What is longer than catching the IRQ2 right away with IRQ1 in XEN itself. We come to this idea after observing a strange effect: dropping a bit of code from IRQ processing path (i.e. several if's) led to the benchmark shown us a bit smaller numbers. -- Sincerely, Andrii Anisov.
On 11/27/18 1:30 PM, Andrii Anisov wrote: > Hello Julien, Hi Andrii, > > On 23.11.18 15:27, Julien Grall wrote: >>> But we can't use it, because our system is overcommitted. >> >> Can you describe how overcommitted your system is? > I did mean that we have a requirement to not limit VCPU number with PCPU > number. Also we should not use cpu pinning. I guess I used a wrong word, > it must be "oversubscribed". Oversubscribing is usually a pretty bad idea if you want to have good latency. There are no promise when the vCPU will get scheduled. Furthermore, if you want to have more vCPU than pCPU, then you still need to make sure that the total amount of vCPU usage is always lower (or equal) than the total capacity of your pCPUs. So can you describe how oversubscribed your platform is when doing the benchmark? > >> I don't understand what you mean. Can you details it? > I hope you do not mind I draw a picture for this. I failed to find a > simple wording for it :( > At some rate of IRQs from different sources, slight reducing of IRQ > processing time in the hypervisor might result in an overly slower IRQs > processing. > > So, on the picture attaches, in case2 IRQ processing path in XEN made > shorter. But when we have IRQ1 and IRQ2 asserted at some rate, we result > in two context switches and going through the IRQ processing path twice. > What is longer than catching the IRQ2 right away with IRQ1 in XEN itself. > We come to this idea after observing a strange effect: dropping a bit of > code from IRQ processing path (i.e. several if's) led to the benchmark > shown us a bit smaller numbers. I think I now understand your problem. The problem is not because of re-enabling the interrupt. Instead, it is because the GIC CPU priority is been dropped using EOI early (via desc->handler->end()). As soon as you drop the priority another interrupt with the same (or lower) priority can fire. Looking at do_IRQ, we don't handle the same way guest IRQ and Xen IRQ. The steps for Xen IRQ is roughly: -> read_irq -> local_irq_enable -> do_IRQ -> local_irq_enable (via spin_unlock_irq) -> call handlers -> local_irq_disable (via spin_lock_irq) -> EOI -> DIR -> local_irq_disable The steps of for Guest IRQ is roughly: -> read_irq -> local_irq_enable -> do_IRQ -> EOI -> vgic_inject_irq -> local_irq_disable (via spin_lock_irqsave) -> local_irq_enable (via spin_lock_irqrestore) -> local_irq_disable All vgic_inject_irq is pretty much running with interrupts disabled. The Xen IRQ path seem to continue pointless enable/disable. So I think the following steps should suit you. Xen IRQ: -> read_irq -> do_IRQ -> local_irq_enable (via spin_unlock_irq) -> call handlers -> local_irq_disable (via spin_lock_irq) -> EOI -> DIR Guest IRQ: -> read_irq -> local_irq_enable -> do_IRQ -> EOI -> vgic_inject_irq SGIs seems to be handled with IRQ disabled, so no change there. For LPIs, we might want to do the same (needs some investigation). Cheers,
On Thu, 22 Nov 2018, Julien Grall wrote: > > > If you are worried about performance, then I would recommend to try the > > > new vGIC and see whether it improves. > > You know, we are based on XEN 4.10. Initially, when a customer said about > > their dissatisfaction about performance drop in benchmark due to XEN > > existence, I tried 4.12-unstable, both an old and a new VGIC. So performance > > with 4.12-unstable with the old VGIC was worse than 4.10, and the new VGIC > > made things even much worse. I can't remember the exact numbers or > > proportions, but that was the reason we do not offer upgrading XEN yet. > > I can't comment without any numbers here. Bear in mind that we fixed bugs in > Xen 4.12 (including spectre/meltdown and missing barriers) that wasn't > backported to Xen 4.10. It is entirely possible that it introduced slowness > but it also ensure the code is behaving correctly. > > Anyway, if there are performance regression we should investigate them and > discuss how we can address/limit them. Similarly for the new vGIC, if you > think it is too slow, then we need to know why before we get rid of the old > vGIC. Well said! We care about interrupt performance very much and we definitely need to address any regressions with either the old or the new driver. But to do that, we need reliable numbers and to figure out exactly what the problem is so that we can fix it. I sent the way I used to measure performance in a separate email.
Hello Andre, Please see my comments below: On 23.11.18 14:18, Andre Przywara wrote: > Fundamentally there is a semantic difference between edge and level > triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so > the LR's state goes to 0), this is done and dusted, and Xen doesn't > need to care about this anymore until the next IRQ occurs.> For level > triggered IRQs, even though the guest has handled it, we need to > resample the (potentially virtual) IRQ line, as it may come up or > down at the *device*'s discretion: the interrupt reason might have > gone away (GPIO condition no longer true), even before we were able > to inject it, or there might be another interrupt reason not yet > handled (incoming serial character while serving a transmit > interrupt). Also typically it's up to the interrupt handler to > confirm handling the interrupt, either explicitly by clearing an > interrupt bit in some status register or implicitly, for instance by > draining a FIFO, say on a serial device. So even though from the > (V)GIC's point of view the interrupt has been processed (EOIed), it > might still be pending. So, as I understand the intended behavior of a vGIC for the level interrupt is following cases: 1. in case the interrupt line is still active from HW side, but interrupt handler from VM EOIs the interrupt, it should be signaled to vCPU by vGIC again 2. in case a peripheral deactivated interrupt line, but VM did not activated it yet, this interrupt should be removed from pending for VM IMO, case 1 is indirectly supported by old vgic. For HW interrupts its pretty natural: deactivation by VM in VGIC leads to deactivation in GIC, so the interrupt priority is restored and GIC will trap PCPU to reinsert it. This will be seen by VM as immediate IRQ trap after EOI. Also Case 2 is not implemented in the old vgic. It is somehow supported by new vgic, though it also relies on the trap to the hypervisor to commit the update to LRs. But it's rather a problem of GIC arch/implementation, which does not signal CPU nor updates associated LR on level interrupt deassertion. > My intimate "old Xen VGIC" knowledge has been swapped out from my > brain meanwhile, but IIRC Xen treats every IRQ as if it would be an > edge IRQ. Which works if the guest's interrupt handler behaves > correctly. Most IRQ handlers have a loop to iterate over all possible > interrupt reasons and process them, so the line goes indeed down > before they EOI the IRQ. I've spent some time to look through the new vgic implementation and I have a note about it: It's not clear why are you probing level interrupts on guest->hyp transition. While it targets case 2 described above, it seems to be more relevant to probe the level interrupts right before hyp->guest transition. Because vcpu might be descheduled and while it hangs on scheduler queues interrupt line level has more chances to be changed by peripheral itself. Also I'm pretty scared of new vgic locking scheme with per-irq locks and locking logic i.e. in vgic_queue_irq_unlock() function. Also sorting list in vgic_flush_lr_state() with vgic_irq_cmp() looks very expensive. But, for sure all that stuff performance should be properly evaluated and measured.
Hello Julien, On 27.11.18 17:13, Julien Grall wrote: > Oversubscribing is usually a pretty bad idea if you want to have good > latency. There are no promise when the vCPU will get scheduled. Yes, I know and clearly understand that. But we still have requirements. > So can you describe how oversubscribed your platform is when doing the > benchmark? On the setup a customer runs, its about 8 vCPU per 4 pCPUs. Under the benchmark conditions - system is quite idle. > I think I now understand your problem. The problem is not because of > re-enabling the interrupt. Instead, it is because the GIC CPU priority > is been dropped using EOI early (via desc->handler->end()). As soon as > you drop the priority another interrupt with the same (or lower) > priority can fire. I understand that. > Looking at do_IRQ, we don't handle the same way guest IRQ and Xen IRQ. Yep. > The steps for Xen IRQ is roughly: > -> read_irq > -> local_irq_enable > -> do_IRQ > -> local_irq_enable (via spin_unlock_irq) > -> call handlers > -> local_irq_disable (via spin_lock_irq) > -> EOI > -> DIR > -> local_irq_disable > > The steps of for Guest IRQ is roughly: > -> read_irq > -> local_irq_enable > -> do_IRQ > -> EOI So here we might have lower priority interrupt fired, getting us back to `hyp_irq()` to run `do_trap_irq()` again. > -> vgic_inject_irq > -> local_irq_disable (via spin_lock_irqsave) > -> local_irq_enable (via spin_lock_irqrestore) > -> local_irq_disable > > All vgic_inject_irq is pretty much running with interrupts disabled. The > Xen IRQ path seem to continue pointless enable/disable. > > So I think the following steps should suit you. Well, do you state they do not suit mainline? > Xen IRQ: > -> read_irq > -> do_IRQ > -> local_irq_enable (via spin_unlock_irq) I suppose, isb() should be moved here from `do_trap_irq()` as well > -> call handlers > -> local_irq_disable (via spin_lock_irq) > -> EOI > -> DIR > Guest IRQ: > -> read_irq > -> local_irq_enable As I understand, the line above should not be there. > -> do_IRQ > -> EOI > -> vgic_inject_irq > > SGIs seems to be handled with IRQ disabled, so no change there. For > LPIs, we might want to do the same (needs some investigation).
Hi Andrii, On 03/12/2018 12:08, Andrii Anisov wrote: > On 27.11.18 17:13, Julien Grall wrote: >> The steps for Xen IRQ is roughly: >> -> read_irq >> -> local_irq_enable >> -> do_IRQ >> -> local_irq_enable (via spin_unlock_irq) >> -> call handlers >> -> local_irq_disable (via spin_lock_irq) >> -> EOI >> -> DIR >> -> local_irq_disable >> >> The steps of for Guest IRQ is roughly: >> -> read_irq >> -> local_irq_enable >> -> do_IRQ >> -> EOI > So here we might have lower priority interrupt fired, getting us back to > `hyp_irq()` to run `do_trap_irq()` again. > >> -> vgic_inject_irq >> -> local_irq_disable (via spin_lock_irqsave) >> -> local_irq_enable (via spin_lock_irqrestore) >> -> local_irq_disable >> >> All vgic_inject_irq is pretty much running with interrupts disabled. The Xen >> IRQ path seem to continue pointless enable/disable. >> >> So I think the following steps should suit you. > Well, do you state they do not suit mainline? No. I meant that I would be happy with that and I think should also suit you. > >> Xen IRQ: >> -> read_irq >> -> do_IRQ >> -> local_irq_enable (via spin_unlock_irq) > I suppose, isb() should be moved here from `do_trap_irq()` as well There are no isb() in do_trap_irq(). So did you mean gic_interrupt()? But then, I am not sure why you want to avoid the isb() in the guest path. Cheers,
On 03.12.18 14:17, Julien Grall wrote: > No. I meant that I would be happy with that and I think should also suit > you. Great! > There are no isb() in do_trap_irq(). So did you mean gic_interrupt()? Right you are. > But then, I am not sure why you want to avoid the isb() in the guest path. Well, as I remember, and the commit message says, it is needed to get peripheral register to be updated before interrupt handler reads them for interrupt handling :) About guest irqs, we, actually, do not handle them, just rise a notification to guest that it needs handling. Thus that synchronization is not required in a guest interrupt processing path.
Hi Andrii, On 03/12/2018 12:58, Andrii Anisov wrote: > > On 03.12.18 14:17, Julien Grall wrote: >> No. I meant that I would be happy with that and I think should also suit you. > Great! > >> There are no isb() in do_trap_irq(). So did you mean gic_interrupt()? > Right you are. > >> But then, I am not sure why you want to avoid the isb() in the guest path. > Well, as I remember, and the commit message says, it is needed to get peripheral > register to be updated before interrupt handler reads them for interrupt > handling :) > About guest irqs, we, actually, do not handle them, just rise a notification to > guest that it needs handling. Thus that synchronization is not required in a > guest interrupt processing path. Possibly, but I would prefer to keep the isb() in the current position. It catches all the handlers so less risk for missing the isb() in the future. Do you see any performance drop? Cheers,
On 30/11/2018 19:52, Andrii Anisov wrote: > Hello Andre, > > Please see my comments below: > > On 23.11.18 14:18, Andre Przywara wrote: >> Fundamentally there is a semantic difference between edge and level >> triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so >> the LR's state goes to 0), this is done and dusted, and Xen doesn't >> need to care about this anymore until the next IRQ occurs.> For level >> triggered IRQs, even though the guest has handled it, we need to >> resample the (potentially virtual) IRQ line, as it may come up or >> down at the *device*'s discretion: the interrupt reason might have >> gone away (GPIO condition no longer true), even before we were able >> to inject it, or there might be another interrupt reason not yet >> handled (incoming serial character while serving a transmit >> interrupt). Also typically it's up to the interrupt handler to >> confirm handling the interrupt, either explicitly by clearing an >> interrupt bit in some status register or implicitly, for instance by >> draining a FIFO, say on a serial device. So even though from the >> (V)GIC's point of view the interrupt has been processed (EOIed), it >> might still be pending. > So, as I understand the intended behavior of a vGIC for the level > interrupt is following cases: > 1. in case the interrupt line is still active from HW side, but > interrupt handler from VM EOIs the interrupt, it should > be signaled to vCPU by vGIC again yes > 2. in case a peripheral deactivated interrupt line, but VM did not > activated it yet, this interrupt should be removed from pending for > VM yes > IMO, case 1 is indirectly supported by old vgic. For HW interrupts its > pretty natural: deactivation by VM in VGIC leads to deactivation in > GIC, so the interrupt priority is restored and GIC will trap PCPU to > reinsert it. This will be seen by VM as immediate IRQ trap after EOI. Yes, this is true for hardware interrupts, and this lets the old VGIC get away with it. Virtual devices with level interrupt semantics are a different story, the SBSA UART has some hacks in it to support it properly. > Also Case 2 is not implemented in the old vgic. It is somehow > supported by new vgic, though it also relies on the trap to the > hypervisor to commit the update to LRs. Yes, and there is not so much we can do about it. But that's not a real problem, as you have this problem in bare metal, too. > But it's rather a problem of > GIC arch/implementation, which does not signal CPU nor updates > associated LR on level interrupt deassertion. > >> My intimate "old Xen VGIC" knowledge has been swapped out from my >> brain meanwhile, but IIRC Xen treats every IRQ as if it would be an >> edge IRQ. Which works if the guest's interrupt handler behaves >> correctly. Most IRQ handlers have a loop to iterate over all possible >> interrupt reasons and process them, so the line goes indeed down >> before they EOI the IRQ. > > I've spent some time to look through the new vgic implementation and I > have a note about it: > It's not clear why are you probing level interrupts on guest->hyp > transition. While it targets case 2 described above, it seems to be > more relevant to probe the level interrupts right before hyp->guest > transition. Because vcpu might be descheduled and while it hangs on > scheduler queues interrupt line level has more chances to be changed > by peripheral itself. > Also I'm pretty scared of new vgic locking scheme with per-irq locks > and locking logic i.e. in vgic_queue_irq_unlock() function. Well, you should be scared of the old VGIC locking scheme instead ;-) Apart from the vgic_queue_irq_unlock() function, the rest of the new locking scheme is much clearer. And it scales much better, as we have per-IRQ locks, which should virtually never be contended, and per-CPU locks, which are very rarely contended only, as it's mostly only taken by its own VCPU during entry and exit. There is no per-domain lock for the emulation anymore. I see that it's tempting to have an "easy" locking scheme, but that typically is either one-lock-to-rule-them-all, which doesn't scale, or doesn't cover corner cases. You should always prefer correctness over performance, otherwise you just fail faster ;-) > Also sorting > list in vgic_flush_lr_state() with vgic_irq_cmp() looks very > expensive. Yes, but effectively this virtually never happens, as you have rarely more than four pending IRQs at the same time. I had patches lying around to improve this part, just never got around to post, especially with only little rationale. If you are interested, I can dig them out, though I am not sure how relevant this is. > But, for sure all that stuff performance should be properly evaluated > and measured. Indeed. Can you hack something into Xen to get some statistics on those cases? I am not sure if Xen has something to trace lock contention, but you could easily add some counters to track how many LRs we actually use, to see if this is actually a problem in your case. I think PERFCOUNTER is your friend. Cheers, Andre.
On 03/12/2018 14:46, Andre Przywara wrote: > On 30/11/2018 19:52, Andrii Anisov wrote: >> Hello Andre, >> >> Please see my comments below: >> >> On 23.11.18 14:18, Andre Przywara wrote: >>> Fundamentally there is a semantic difference between edge and level >>> triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so >>> the LR's state goes to 0), this is done and dusted, and Xen doesn't >>> need to care about this anymore until the next IRQ occurs.> For level >>> triggered IRQs, even though the guest has handled it, we need to >>> resample the (potentially virtual) IRQ line, as it may come up or >>> down at the *device*'s discretion: the interrupt reason might have >>> gone away (GPIO condition no longer true), even before we were able >>> to inject it, or there might be another interrupt reason not yet >>> handled (incoming serial character while serving a transmit >>> interrupt). Also typically it's up to the interrupt handler to >>> confirm handling the interrupt, either explicitly by clearing an >>> interrupt bit in some status register or implicitly, for instance by >>> draining a FIFO, say on a serial device. So even though from the >>> (V)GIC's point of view the interrupt has been processed (EOIed), it >>> might still be pending. >> So, as I understand the intended behavior of a vGIC for the level >> interrupt is following cases: >> 1. in case the interrupt line is still active from HW side, but >> interrupt handler from VM EOIs the interrupt, it should >> be signaled to vCPU by vGIC again > > yes > >> 2. in case a peripheral deactivated interrupt line, but VM did not >> activated it yet, this interrupt should be removed from pending for >> VM > > yes > >> IMO, case 1 is indirectly supported by old vgic. For HW interrupts its >> pretty natural: deactivation by VM in VGIC leads to deactivation in >> GIC, so the interrupt priority is restored and GIC will trap PCPU to >> reinsert it. This will be seen by VM as immediate IRQ trap after EOI. > > Yes, this is true for hardware interrupts, and this lets the old VGIC > get away with it. > Virtual devices with level interrupt semantics are a different story, > the SBSA UART has some hacks in it to support it properly. > >> Also Case 2 is not implemented in the old vgic. It is somehow >> supported by new vgic, though it also relies on the trap to the >> hypervisor to commit the update to LRs. > > Yes, and there is not so much we can do about it. But that's not a real > problem, as you have this problem in bare metal, too. > >> But it's rather a problem of >> GIC arch/implementation, which does not signal CPU nor updates >> associated LR on level interrupt deassertion. >> >>> My intimate "old Xen VGIC" knowledge has been swapped out from my >>> brain meanwhile, but IIRC Xen treats every IRQ as if it would be an >>> edge IRQ. Which works if the guest's interrupt handler behaves >>> correctly. Most IRQ handlers have a loop to iterate over all possible >>> interrupt reasons and process them, so the line goes indeed down >>> before they EOI the IRQ. >> >> I've spent some time to look through the new vgic implementation and I >> have a note about it: >> It's not clear why are you probing level interrupts on guest->hyp >> transition. While it targets case 2 described above, it seems to be >> more relevant to probe the level interrupts right before hyp->guest >> transition. Because vcpu might be descheduled and while it hangs on >> scheduler queues interrupt line level has more chances to be changed >> by peripheral itself. >> Also I'm pretty scared of new vgic locking scheme with per-irq locks >> and locking logic i.e. in vgic_queue_irq_unlock() function. > > Well, you should be scared of the old VGIC locking scheme instead ;-) > Apart from the vgic_queue_irq_unlock() function, the rest of the new > locking scheme is much clearer. And it scales much better, as we have > per-IRQ locks, which should virtually never be contended, and per-CPU > locks, which are very rarely contended only, as it's mostly only taken > by its own VCPU during entry and exit. There is no per-domain lock for > the emulation anymore. > I see that it's tempting to have an "easy" locking scheme, but that > typically is either one-lock-to-rule-them-all, which doesn't scale, or > doesn't cover corner cases. > You should always prefer correctness over performance, otherwise you > just fail faster ;-) > >> Also sorting >> list in vgic_flush_lr_state() with vgic_irq_cmp() looks very >> expensive. > > Yes, but effectively this virtually never happens, as you have rarely > more than four pending IRQs at the same time. > I had patches lying around to improve this part, just never got around > to post, especially with only little rationale. > If you are interested, I can dig them out, though I am not sure how > relevant this is. > >> But, for sure all that stuff performance should be properly evaluated >> and measured. > > Indeed. Can you hack something into Xen to get some statistics on those > cases? I am not sure if Xen has something to trace lock contention, but > you could easily add some counters to track how many LRs we actually > use, to see if this is actually a problem in your case. > I think PERFCOUNTER is your friend. CONFIG_LOCK_PROFILE and xen-lockprof on tools side? Not sure it is still working, though. Its about 9 years since I wrote and used it. Juergen
Hello Juergen, On 03.12.18 15:53, Juergen Gross wrote: > On 03/12/2018 14:46, Andre Przywara wrote: >> I think PERFCOUNTER is your friend. > > CONFIG_LOCK_PROFILE and xen-lockprof on tools side? > > Not sure it is still working, though. Its about 9 years since I wrote > and used it. It does work. I've used it recently for this theme. But it gave me unclear results, I did not match them with what I got from xentrace and rough lauterbach traces.
Hello Andre, On 03.12.18 15:46, Andre Przywara wrote: > Well, you should be scared of the old VGIC locking scheme instead ;-) Old VGIC locking is more mazy, indeed. > Apart from the vgic_queue_irq_unlock() function, the rest of the new > locking scheme is much clearer. I agree, > Yes, but effectively this virtually never happens, as you have rarely > more than four pending IRQs at the same time. I've checked that. Just put a perfcounter there. Surprisingly it happens few times per run under my simplified conditions. Should check them with more complex multimedia use-cases on a different setup. > I had patches lying around to improve this part, just never got around > to post, especially with only little rationale. > If you are interested, I can dig them out, though I am not sure how > relevant this is. I'm interested, for sure. I'm pretty sure we will need them when we have multimedia scenarios on the table.
Hi Andrii, On 12/3/18 2:36 PM, Andrii Anisov wrote: > On 03.12.18 15:53, Juergen Gross wrote: >> On 03/12/2018 14:46, Andre Przywara wrote: >>> I think PERFCOUNTER is your friend. >> >> CONFIG_LOCK_PROFILE and xen-lockprof on tools side? >> >> Not sure it is still working, though. Its about 9 years since I wrote >> and used it. > It does work. I've used it recently for this theme. > But it gave me unclear results, I did not match them with what I got > from xentrace and rough lauterbach traces. Can you expand it? What are the differences between the 3? Cheers,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 8d7e491060..305fbd66dd 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq) if ( likely(irq >= 16 && irq < 1020) ) { local_irq_enable(); + isb(); do_IRQ(regs, irq, is_fiq); local_irq_disable(); } else if ( is_lpi(irq) ) { local_irq_enable(); + isb(); gic_hw_ops->do_LPI(irq); local_irq_disable(); }
Devices that expose their interrupt status registers via system registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer, vgic (although unused by Linux), ...) rely on a context synchronising operation on the CPU to ensure that the updated status register is visible to the CPU when handling the interrupt. This usually happens as a result of taking the IRQ exception in the first place, but there are two race scenarios where this isn't the case. For example, let's say we have two peripherals (X and Y), where Y uses a system register for its interrupt status. Case 1: 1. CPU takes an IRQ exception as a result of X raising an interrupt 2. Y then raises its interrupt line, but the update to its system register is not yet visible to the CPU 3. The GIC decides to expose Y's interrupt number first in the Ack register 4. The CPU runs the IRQ handler for Y, but the status register is stale Case 2: 1. CPU takes an IRQ exception as a result of X raising an interrupt 2. CPU reads the interrupt number for X from the Ack register and runs its IRQ handler 3. Y raises its interrupt line and the Ack register is updated, but again, the update to its system register is not yet visible to the CPU. 4. Since the GIC drivers poll the Ack register, we read Y's interrupt number and run its handler without a context synchronisation operation, therefore seeing the stale register value. In either case, we run the risk of missing an IRQ. This patch solves the problem by ensuring that we execute an ISB in the GIC drivers prior to invoking the interrupt handler. Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206 "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq". Signed-off-by: Julien Grall <julien.grall@arm.com> --- This patch is a candidate for backporting up to Xen 4.9. --- xen/arch/arm/gic.c | 2 ++ 1 file changed, 2 insertions(+)