Message ID | alpine.DEB.2.02.1401281526510.4373@kaball.uk.xensource.com |
---|---|
State | New |
Headers | show |
On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote: > On Tue, 28 Jan 2014, Ian Campbell wrote: > > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote: > > > On Tue, 28 Jan 2014, Ian Campbell wrote: > > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote: > > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs > > > > > but I am a bit wary of making that change at RC2. > > > > > > > > I'm leaning the other way -- I'm wary of open coding magic locking > > > > primitives to work around this issue on a case by case basis. It's just > > > > too subtle IMHO. > > > > > > > > The IPI and cross CPU calling primitives are basically predicated on > > > > those IPIs interrupting normal interrupt handlers. > > > > > > The problem is that we don't know if we can context switch properly > > > nested interrupts. > > > > What do you mean? We don't have to context switch an IPI. > > Sorry, I meant save/restore registers, stack pointer, processor mode, > etc, for nested interrupts. Right, we do handle that actually since it is the same code as handles an IRQ during a hypercall (since a hypercall is just another type of trap). > > > Also I would need to think harder whether everything > > > would work correctly without hitches with multiple SGIs happening > > > simultaneously (with more than 2 cpus involved). > > > > Since all IPIs would be at the same higher priority only one will be > > active on each CPU at a time. If you are worried about multiple CPUs > > then that is already an issue today, just at a lower priority. > > That is correct, but if we moved the on_selected_cpus call out of the > interrupt handler I don't think the problem could occur. Sure, I'm not opposed to fixing this issue by making an architectural improvement to the code which happens to not use on_selected_cpus. > > AIUI this issue only occurs with "proto device assignment" patches added > > to 4.4, n which case I think the solution can wait until 4.5 and can be > > done properly via the IPI priority fix. > > I think this is a pretty significant problem, even if we don't commit a > fix, we should post a proper patch that we deem acceptable and link it to > the wiki so that anybody that needs it can find it and be sure that it > works correctly. FWIW I have an almost fix to the IPI priority thing which I would intend to fix in 4.5 regardless of whether this issue needs it or not. > In my opinion if we go to this length then we might as well commit it > (if it doesn't touch common code of course), but I'll leave the decision > up to you and George. > > Given the constraints, the solution I would feel more comfortable with at > this time is something like the following patch (lightly tested): I don't think this buys you anything over just fixing on_selected_cpus to work as it should, unless there is some reason why deferring this work to a tasklet is the logically correct thing to do and/or a better designh? (IOW if you are only doing this to avoid calling on_selected_cpus in interrupt context then I think this is the wrong fix). Ian. > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index e6257a7..b00ca73 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > > static unsigned nr_lrs; > > +static void gic_irq_eoi(void *info) > +{ > + int virq = (uintptr_t) info; > + GICC[GICC_DIR] = virq; > +} > + > +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq); > +static void eoi_action(unsigned long unused) > +{ > + struct pending_irq *p = this_cpu(eoi_irq); > + ASSERT(p != NULL); > + > + on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu), > + gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0); > + > + this_cpu(eoi_irq) = NULL; > +} > +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0); > + > /* The GIC mapping of CPU interfaces does not necessarily match the > * logical CPU numbering. Let's use mapping as returned by the GIC > * itself > @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d) > > } > > -static void gic_irq_eoi(void *info) > -{ > - int virq = (uintptr_t) info; > - GICC[GICC_DIR] = virq; > -} > - > static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) > { > int i = 0, virq, pirq = -1; > @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > if ( cpu == smp_processor_id() ) > gic_irq_eoi((void*)(uintptr_t)pirq); > else > - on_selected_cpus(cpumask_of(cpu), > - gic_irq_eoi, (void*)(uintptr_t)pirq, 0); > + { > + ASSERT(this_cpu(eoi_irq) == NULL); > + this_cpu(eoi_irq) = p; > + tasklet_schedule(&eoi_tasklet); > + } > } > > i++;
On Tue, 28 Jan 2014, Ian Campbell wrote: > On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote: > > On Tue, 28 Jan 2014, Ian Campbell wrote: > > > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote: > > > > On Tue, 28 Jan 2014, Ian Campbell wrote: > > > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote: > > > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs > > > > > > but I am a bit wary of making that change at RC2. > > > > > > > > > > I'm leaning the other way -- I'm wary of open coding magic locking > > > > > primitives to work around this issue on a case by case basis. It's just > > > > > too subtle IMHO. > > > > > > > > > > The IPI and cross CPU calling primitives are basically predicated on > > > > > those IPIs interrupting normal interrupt handlers. > > > > > > > > The problem is that we don't know if we can context switch properly > > > > nested interrupts. > > > > > > What do you mean? We don't have to context switch an IPI. > > > > Sorry, I meant save/restore registers, stack pointer, processor mode, > > etc, for nested interrupts. > > Right, we do handle that actually since it is the same code as handles > an IRQ during a hypercall (since a hypercall is just another type of > trap). Ah right, good. > > > > Also I would need to think harder whether everything > > > > would work correctly without hitches with multiple SGIs happening > > > > simultaneously (with more than 2 cpus involved). > > > > > > Since all IPIs would be at the same higher priority only one will be > > > active on each CPU at a time. If you are worried about multiple CPUs > > > then that is already an issue today, just at a lower priority. > > > > That is correct, but if we moved the on_selected_cpus call out of the > > interrupt handler I don't think the problem could occur. > > Sure, I'm not opposed to fixing this issue by making an architectural > improvement to the code which happens to not use on_selected_cpus. > > > > AIUI this issue only occurs with "proto device assignment" patches added > > > to 4.4, n which case I think the solution can wait until 4.5 and can be > > > done properly via the IPI priority fix. > > > > I think this is a pretty significant problem, even if we don't commit a > > fix, we should post a proper patch that we deem acceptable and link it to > > the wiki so that anybody that needs it can find it and be sure that it > > works correctly. > > FWIW I have an almost fix to the IPI priority thing which I would intend > to fix in 4.5 regardless of whether this issue needs it or not. Well, in that case if you can test the patch in the specific case where an SGI interrupts a lower priority interrupt, it might be worth sending out the patch now. > > In my opinion if we go to this length then we might as well commit it > > (if it doesn't touch common code of course), but I'll leave the decision > > up to you and George. > > > > Given the constraints, the solution I would feel more comfortable with at > > this time is something like the following patch (lightly tested): > > I don't think this buys you anything over just fixing on_selected_cpus > to work as it should, unless there is some reason why deferring this > work to a tasklet is the logically correct thing to do and/or a better > designh? > > (IOW if you are only doing this to avoid calling on_selected_cpus in > interrupt context then I think this is the wrong fix). I feel that on_selected_cpus is the kind of work that belongs to the "bottom half". I was never very happy to have the call where it is now in the first place. This approach would have the benefit of allowing us to receive regular interrupts while waiting for other cpus to handle the SGI. Also I am sure that it would work even in cases where you have more than one SGIs targeting the same cpu simultaneously. > Ian. > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index e6257a7..b00ca73 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > > > > static unsigned nr_lrs; > > > > +static void gic_irq_eoi(void *info) > > +{ > > + int virq = (uintptr_t) info; > > + GICC[GICC_DIR] = virq; > > +} > > + > > +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq); > > +static void eoi_action(unsigned long unused) > > +{ > > + struct pending_irq *p = this_cpu(eoi_irq); > > + ASSERT(p != NULL); > > + > > + on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu), > > + gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0); > > + > > + this_cpu(eoi_irq) = NULL; > > +} > > +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0); > > + > > /* The GIC mapping of CPU interfaces does not necessarily match the > > * logical CPU numbering. Let's use mapping as returned by the GIC > > * itself > > @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d) > > > > } > > > > -static void gic_irq_eoi(void *info) > > -{ > > - int virq = (uintptr_t) info; > > - GICC[GICC_DIR] = virq; > > -} > > - > > static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) > > { > > int i = 0, virq, pirq = -1; > > @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > > if ( cpu == smp_processor_id() ) > > gic_irq_eoi((void*)(uintptr_t)pirq); > > else > > - on_selected_cpus(cpumask_of(cpu), > > - gic_irq_eoi, (void*)(uintptr_t)pirq, 0); > > + { > > + ASSERT(this_cpu(eoi_irq) == NULL); > > + this_cpu(eoi_irq) = p; > > + tasklet_schedule(&eoi_tasklet); > > + } > > } > > > > i++; > >
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index e6257a7..b00ca73 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); static unsigned nr_lrs; +static void gic_irq_eoi(void *info) +{ + int virq = (uintptr_t) info; + GICC[GICC_DIR] = virq; +} + +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq); +static void eoi_action(unsigned long unused) +{ + struct pending_irq *p = this_cpu(eoi_irq); + ASSERT(p != NULL); + + on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu), + gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0); + + this_cpu(eoi_irq) = NULL; +} +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0); + /* The GIC mapping of CPU interfaces does not necessarily match the * logical CPU numbering. Let's use mapping as returned by the GIC * itself @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d) } -static void gic_irq_eoi(void *info) -{ - int virq = (uintptr_t) info; - GICC[GICC_DIR] = virq; -} - static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { int i = 0, virq, pirq = -1; @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r if ( cpu == smp_processor_id() ) gic_irq_eoi((void*)(uintptr_t)pirq); else - on_selected_cpus(cpumask_of(cpu), - gic_irq_eoi, (void*)(uintptr_t)pirq, 0); + { + ASSERT(this_cpu(eoi_irq) == NULL); + this_cpu(eoi_irq) = p; + tasklet_schedule(&eoi_tasklet); + } } i++;