Message ID | 20210206104932.29064-1-jgross@suse.com |
---|---|
Headers | show |
Series | xen/events: bug fixes and some diagnostic aids | expand |
On 08/02/2021 09:41, Jürgen Groß wrote: > On 08.02.21 10:11, Julien Grall wrote: >> Hi Juergen, >> >> On 07/02/2021 12:58, Jürgen Groß wrote: >>> On 06.02.21 19:46, Julien Grall wrote: >>>> Hi Juergen, >>>> >>>> On 06/02/2021 10:49, Juergen Gross wrote: >>>>> The first three patches are fixes for XSA-332. The avoid WARN splats >>>>> and a performance issue with interdomain events. >>>> >>>> Thanks for helping to figure out the problem. Unfortunately, I still >>>> see reliably the WARN splat with the latest Linux master >>>> (1e0d27fce010) + your first 3 patches. >>>> >>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L >>>> events ABI. >>>> >>>> After some debugging, I think I have an idea what's went wrong. The >>>> problem happens when the event is initially bound from vCPU0 to a >>>> different vCPU. >>>> >>>> From the comment in xen_rebind_evtchn_to_cpu(), we are masking the >>>> event to prevent it being delivered on an unexpected vCPU. However, >>>> I believe the following can happen: >>>> >>>> vCPU0 | vCPU1 >>>> | >>>> | Call xen_rebind_evtchn_to_cpu() >>>> receive event X | >>>> | mask event X >>>> | bind to vCPU1 >>>> <vCPU descheduled> | unmask event X >>>> | >>>> | receive event X >>>> | >>>> | handle_edge_irq(X) >>>> handle_edge_irq(X) | -> handle_irq_event() >>>> | -> set IRQD_IN_PROGRESS >>>> -> set IRQS_PENDING | >>>> | -> evtchn_interrupt() >>>> | -> clear IRQD_IN_PROGRESS >>>> | -> IRQS_PENDING is set >>>> | -> handle_irq_event() >>>> | -> evtchn_interrupt() >>>> | -> WARN() >>>> | >>>> >>>> All the lateeoi handlers expect a ONESHOT semantic and >>>> evtchn_interrupt() is doesn't tolerate any deviation. >>>> >>>> I think the problem was introduced by 7f874a0447a9 ("xen/events: fix >>>> lateeoi irq acknowledgment") because the interrupt was disabled >>>> previously. Therefore we wouldn't do another iteration in >>>> handle_edge_irq(). >>> >>> I think you picked the wrong commit for blaming, as this is just >>> the last patch of the three patches you were testing. >> >> I actually found the right commit for blaming but I copied the >> information from the wrong shell :/. The bug was introduced by: >> >> c44b849cee8c ("xen/events: switch user event channels to lateeoi model") >> >>> >>>> Aside the handlers, I think it may impact the defer EOI mitigation >>>> because in theory if a 3rd vCPU is joining the party (let say vCPU A >>>> migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, >>>> irq_epoch, eoi_time} could possibly get mangled? >>>> >>>> For a fix, we may want to consider to hold evtchn_rwlock with the >>>> write permission. Although, I am not 100% sure this is going to >>>> prevent everything. >>> >>> It will make things worse, as it would violate the locking hierarchy >>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held). >> >> Ah, right. >> >>> >>> On a first glance I think we'll need a 3rd masking state ("temporarily >>> masked") in the second patch in order to avoid a race with lateeoi. >>> >>> In order to avoid the race you outlined above we need an "event is being >>> handled" indicator checked via test_and_set() semantics in >>> handle_irq_for_port() and reset only when calling clear_evtchn(). >> >> It feels like we are trying to workaround the IRQ flow we are using >> (i.e. handle_edge_irq()). > > I'm not really sure this is the main problem here. According to your > analysis the main problem is occurring when handling the event, not when > handling the IRQ: the event is being received on two vcpus. I don't think we can easily divide the two because we rely on the IRQ framework to handle the lifecycle of the event. So... > > Our problem isn't due to the IRQ still being pending, but due it being > raised again, which should happen for a one shot IRQ the same way. ... I don't really see how the difference matter here. The idea is to re-use what's already existing rather than trying to re-invent the wheel with an extra lock (or whatever we can come up). Cheers,
On 08.02.21 10:54, Julien Grall wrote: > > > On 08/02/2021 09:41, Jürgen Groß wrote: >> On 08.02.21 10:11, Julien Grall wrote: >>> Hi Juergen, >>> >>> On 07/02/2021 12:58, Jürgen Groß wrote: >>>> On 06.02.21 19:46, Julien Grall wrote: >>>>> Hi Juergen, >>>>> >>>>> On 06/02/2021 10:49, Juergen Gross wrote: >>>>>> The first three patches are fixes for XSA-332. The avoid WARN splats >>>>>> and a performance issue with interdomain events. >>>>> >>>>> Thanks for helping to figure out the problem. Unfortunately, I >>>>> still see reliably the WARN splat with the latest Linux master >>>>> (1e0d27fce010) + your first 3 patches. >>>>> >>>>> I am using Xen 4.11 (1c7d984645f9) and dom0 is forced to use the 2L >>>>> events ABI. >>>>> >>>>> After some debugging, I think I have an idea what's went wrong. The >>>>> problem happens when the event is initially bound from vCPU0 to a >>>>> different vCPU. >>>>> >>>>> From the comment in xen_rebind_evtchn_to_cpu(), we are masking the >>>>> event to prevent it being delivered on an unexpected vCPU. However, >>>>> I believe the following can happen: >>>>> >>>>> vCPU0 | vCPU1 >>>>> | >>>>> | Call xen_rebind_evtchn_to_cpu() >>>>> receive event X | >>>>> | mask event X >>>>> | bind to vCPU1 >>>>> <vCPU descheduled> | unmask event X >>>>> | >>>>> | receive event X >>>>> | >>>>> | handle_edge_irq(X) >>>>> handle_edge_irq(X) | -> handle_irq_event() >>>>> | -> set IRQD_IN_PROGRESS >>>>> -> set IRQS_PENDING | >>>>> | -> evtchn_interrupt() >>>>> | -> clear IRQD_IN_PROGRESS >>>>> | -> IRQS_PENDING is set >>>>> | -> handle_irq_event() >>>>> | -> evtchn_interrupt() >>>>> | -> WARN() >>>>> | >>>>> >>>>> All the lateeoi handlers expect a ONESHOT semantic and >>>>> evtchn_interrupt() is doesn't tolerate any deviation. >>>>> >>>>> I think the problem was introduced by 7f874a0447a9 ("xen/events: >>>>> fix lateeoi irq acknowledgment") because the interrupt was disabled >>>>> previously. Therefore we wouldn't do another iteration in >>>>> handle_edge_irq(). >>>> >>>> I think you picked the wrong commit for blaming, as this is just >>>> the last patch of the three patches you were testing. >>> >>> I actually found the right commit for blaming but I copied the >>> information from the wrong shell :/. The bug was introduced by: >>> >>> c44b849cee8c ("xen/events: switch user event channels to lateeoi model") >>> >>>> >>>>> Aside the handlers, I think it may impact the defer EOI mitigation >>>>> because in theory if a 3rd vCPU is joining the party (let say vCPU >>>>> A migrate the event from vCPU B to vCPU C). So info->{eoi_cpu, >>>>> irq_epoch, eoi_time} could possibly get mangled? >>>>> >>>>> For a fix, we may want to consider to hold evtchn_rwlock with the >>>>> write permission. Although, I am not 100% sure this is going to >>>>> prevent everything. >>>> >>>> It will make things worse, as it would violate the locking hierarchy >>>> (xen_rebind_evtchn_to_cpu() is called with the IRQ-desc lock held). >>> >>> Ah, right. >>> >>>> >>>> On a first glance I think we'll need a 3rd masking state ("temporarily >>>> masked") in the second patch in order to avoid a race with lateeoi. >>>> >>>> In order to avoid the race you outlined above we need an "event is >>>> being >>>> handled" indicator checked via test_and_set() semantics in >>>> handle_irq_for_port() and reset only when calling clear_evtchn(). >>> >>> It feels like we are trying to workaround the IRQ flow we are using >>> (i.e. handle_edge_irq()). >> >> I'm not really sure this is the main problem here. According to your >> analysis the main problem is occurring when handling the event, not when >> handling the IRQ: the event is being received on two vcpus. > > I don't think we can easily divide the two because we rely on the IRQ > framework to handle the lifecycle of the event. So... > >> >> Our problem isn't due to the IRQ still being pending, but due it being >> raised again, which should happen for a one shot IRQ the same way. > > ... I don't really see how the difference matter here. The idea is to > re-use what's already existing rather than trying to re-invent the wheel > with an extra lock (or whatever we can come up). The difference is that the race is occurring _before_ any IRQ is involved. So I don't see how modification of IRQ handling would help. Juergen
Hi Juergen, On 08/02/2021 10:22, Jürgen Groß wrote: > On 08.02.21 10:54, Julien Grall wrote: >> ... I don't really see how the difference matter here. The idea is to >> re-use what's already existing rather than trying to re-invent the >> wheel with an extra lock (or whatever we can come up). > > The difference is that the race is occurring _before_ any IRQ is > involved. So I don't see how modification of IRQ handling would help. Roughly our current IRQ handling flow (handle_eoi_irq()) looks like: if ( irq in progress ) { set IRQS_PENDING return; } do { clear IRQS_PENDING handle_irq() } while (IRQS_PENDING is set) IRQ handling flow like handle_fasteoi_irq() looks like: if ( irq in progress ) return; handle_irq() The latter flow would catch "spurious" interrupt and ignore them. So it would handle nicely the race when changing the event affinity. Cheers,
On 08.02.21 11:40, Julien Grall wrote: > Hi Juergen, > > On 08/02/2021 10:22, Jürgen Groß wrote: >> On 08.02.21 10:54, Julien Grall wrote: >>> ... I don't really see how the difference matter here. The idea is to >>> re-use what's already existing rather than trying to re-invent the >>> wheel with an extra lock (or whatever we can come up). >> >> The difference is that the race is occurring _before_ any IRQ is >> involved. So I don't see how modification of IRQ handling would help. > > Roughly our current IRQ handling flow (handle_eoi_irq()) looks like: > > if ( irq in progress ) > { > set IRQS_PENDING > return; > } > > do > { > clear IRQS_PENDING > handle_irq() > } while (IRQS_PENDING is set) > > IRQ handling flow like handle_fasteoi_irq() looks like: > > if ( irq in progress ) > return; > > handle_irq() > > The latter flow would catch "spurious" interrupt and ignore them. So it > would handle nicely the race when changing the event affinity. Sure? Isn't "irq in progress" being reset way before our "lateeoi" is issued, thus having the same problem again? And I think we want to keep the lateeoi behavior in order to be able to control event storms. Juergen
On 08.02.21 14:09, Julien Grall wrote: > Hi Juergen, > > On 08/02/2021 12:31, Jürgen Groß wrote: >> On 08.02.21 13:16, Julien Grall wrote: >>> >>> >>> On 08/02/2021 12:14, Jürgen Groß wrote: >>>> On 08.02.21 11:40, Julien Grall wrote: >>>>> Hi Juergen, >>>>> >>>>> On 08/02/2021 10:22, Jürgen Groß wrote: >>>>>> On 08.02.21 10:54, Julien Grall wrote: >>>>>>> ... I don't really see how the difference matter here. The idea >>>>>>> is to re-use what's already existing rather than trying to >>>>>>> re-invent the wheel with an extra lock (or whatever we can come up). >>>>>> >>>>>> The difference is that the race is occurring _before_ any IRQ is >>>>>> involved. So I don't see how modification of IRQ handling would help. >>>>> >>>>> Roughly our current IRQ handling flow (handle_eoi_irq()) looks like: >>>>> >>>>> if ( irq in progress ) >>>>> { >>>>> set IRQS_PENDING >>>>> return; >>>>> } >>>>> >>>>> do >>>>> { >>>>> clear IRQS_PENDING >>>>> handle_irq() >>>>> } while (IRQS_PENDING is set) >>>>> >>>>> IRQ handling flow like handle_fasteoi_irq() looks like: >>>>> >>>>> if ( irq in progress ) >>>>> return; >>>>> >>>>> handle_irq() >>>>> >>>>> The latter flow would catch "spurious" interrupt and ignore them. >>>>> So it would handle nicely the race when changing the event affinity. >>>> >>>> Sure? Isn't "irq in progress" being reset way before our "lateeoi" is >>>> issued, thus having the same problem again? >>> >>> Sorry I can't parse this. >> >> handle_fasteoi_irq() will do nothing "if ( irq in progress )". When is >> this condition being reset again in order to be able to process another >> IRQ? > It is reset after the handler has been called. See handle_irq_event(). Right. And for us this is too early, as we want the next IRQ being handled only after we have called xen_irq_lateeoi(). > >> I believe this will be the case before our "lateeoi" handling is >> becoming active (more precise: when our IRQ handler is returning to >> handle_fasteoi_irq()), resulting in the possibility of the same race we >> are experiencing now. > > I am a bit confused what you mean by "lateeoi" handling is becoming > active. Can you clarify? See above: the next call of the handler should be allowed only after xen_irq_lateeoi() for the IRQ has been called. If the handler is being called earlier we have the race resulting in the WARN() splats. > Note that are are other IRQ flows existing. We should have a look at > them before trying to fix thing ourself. Fine with me, but it either needs to fit all use cases (interdomain, IPI, real interrupts) or we need to have a per-type IRQ flow. I think we should fix the issue locally first, then we can start to do a thorough rework planning. Its not as if the needed changes with the current flow would be so huge, and I'd really like to have a solution rather sooner than later. Changing the IRQ flow might have other side effects which need to be excluded by thorough testing. > Although, the other issue I can see so far is handle_irq_for_port() will > update info->{eoi_cpu, irq_epoch, eoi_time} without any locking. But it > is not clear this is what you mean by "becoming active". As long as a single event can't be handled on multiple cpus at the same time, there is no locking needed. Juergen
On 08/02/2021 14:20, Julien Grall wrote: >>>> I believe this will be the case before our "lateeoi" handling is >>>> becoming active (more precise: when our IRQ handler is returning to >>>> handle_fasteoi_irq()), resulting in the possibility of the same race we >>>> are experiencing now. >>> >>> I am a bit confused what you mean by "lateeoi" handling is becoming >>> active. Can you clarify? >> >> See above: the next call of the handler should be allowed only after >> xen_irq_lateeoi() for the IRQ has been called. >> >> If the handler is being called earlier we have the race resulting >> in the WARN() splats. > > I feel it is dislike to understand race with just words. Can you provide Sorry I meant difficult rather than dislike. Cheers,