diff mbox series

[v2] usb: dwc3: gadget: check drained isoc ep

Message ID 20240307-dwc3-gadget-complete-irq-v2-1-8c5e9b35f7b9@pengutronix.de
State New
Headers show
Series [v2] usb: dwc3: gadget: check drained isoc ep | expand

Commit Message

Michael Grzeschik April 2, 2024, 9:51 p.m. UTC
To avoid a potential underrun of an currently drained transfer
we add a check for that scenario in the dwc3_gadget_endpoint_trbs_complete
function. In the case of an empty trb ring, we call the stop_transfer
cmd and avoid the underrun to occur.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
---
Changes in v2:
- dropped patch "usb: dwc3: gadget: reclaim the whole started list when request was missed"
- fixed patch "usb: dwc3: gadget: check drained isoc ep"
- dropped patch "usb: dwc3: gadget: check the whole started queue for missed requests in complete"
- Link to v1: https://lore.kernel.org/r/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de
---
 drivers/usb/dwc3/gadget.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)


---
base-commit: 5bab5dc780c9ed0c69fc2f828015532acf4a7848
change-id: 20240307-dwc3-gadget-complete-irq-1a8ffa347fd1

Best regards,

Comments

Thinh Nguyen April 4, 2024, 12:29 a.m. UTC | #1
On Tue, Apr 02, 2024, Thinh Nguyen wrote:
> On Tue, Apr 02, 2024, Thinh Nguyen wrote:
> > My concern here is for the case where transfer_in_flight == true and
> 
> I mean transfer_in_flight == false
> 
> > list_empty(started_list) == false. That means that the requests in the
> > started_list are completed but are not given back to the gadget driver.
> > 
> > Since they remained in the started_list, they will be resubmitted again
> > on the next usb_ep_queue. We may send duplicate transfers right?

Actually, since the requests are completed, the HWO bits are cleared,
nothing is submitted and no duplicate. But since the requests are not
given back yet from the started_list, then the next Start_Transfer
command will begin with the TRB address of the completed request
(HWO=0), the controller may not process the next TRBs. Have you tested
this scenario?

> > 
> > You can try to cleanup requests in the started_list, but you need to be
> > careful to make sure you're not out of sync with the transfer completion
> > events and new requests from gadget driver.
> > 

Was the problem you encounter due to no_interrupt settings where the
it was set to the last request of the uvc data pump?

if that's the case, can UVC function driver make sure to not set
no_interrupt to the last request of the data pump from the UVC?

Thanks,
Thinh
Michael Grzeschik April 23, 2024, 1:37 p.m. UTC | #2
Hi Thinh,

On Thu, Apr 04, 2024 at 12:29:14AM +0000, Thinh Nguyen wrote:
>On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> > My concern here is for the case where transfer_in_flight == true and
>>
>> I mean transfer_in_flight == false
>>
>> > list_empty(started_list) == false. That means that the requests in the
>> > started_list are completed but are not given back to the gadget driver.
>> >
>> > Since they remained in the started_list, they will be resubmitted again
>> > on the next usb_ep_queue. We may send duplicate transfers right?
>
>Actually, since the requests are completed, the HWO bits are cleared,
>nothing is submitted and no duplicate. But since the requests are not
>given back yet from the started_list, then the next Start_Transfer
>command will begin with the TRB address of the completed request
>(HWO=0), the controller may not process the next TRBs. Have you tested
>this scenario?
>
>> >
>> > You can try to cleanup requests in the started_list, but you need to be
>> > careful to make sure you're not out of sync with the transfer completion
>> > events and new requests from gadget driver.
>> >
>
>Was the problem you encounter due to no_interrupt settings where the
>it was set to the last request of the uvc data pump?
>
>if that's the case, can UVC function driver make sure to not set
>no_interrupt to the last request of the data pump from the UVC?

Actually no. What I want to do is to ensure that the dwc3 stream
is stopped when the hardware was drained. Which is a valid point
in my case. Since we are actually potentially enqueueing new request
in the complete handler, be it zero length or real transfers.

Calling kick_transfer on an drained hw will absolutely run into
missed isocs if the irq thread was called late. We saw this on real hardware,
where another irq_thread was scheduled with the same priority as the
dwc3 irq_thread but was running so long that the HW was running dry in
between the hw irq and the actual dwc3_irq_thread run.

Michael
Michael Grzeschik May 4, 2024, 11:44 p.m. UTC | #3
On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
>On Tue, Apr 23, 2024, Michael Grzeschik wrote:
>> Hi Thinh,
>>
>> On Thu, Apr 04, 2024 at 12:29:14AM +0000, Thinh Nguyen wrote:
>> > On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> > > On Tue, Apr 02, 2024, Thinh Nguyen wrote:
>> > > > My concern here is for the case where transfer_in_flight == true and
>> > >
>> > > I mean transfer_in_flight == false
>> > >
>> > > > list_empty(started_list) == false. That means that the requests in the
>> > > > started_list are completed but are not given back to the gadget driver.
>> > > >
>> > > > Since they remained in the started_list, they will be resubmitted again
>> > > > on the next usb_ep_queue. We may send duplicate transfers right?
>> >
>> > Actually, since the requests are completed, the HWO bits are cleared,
>> > nothing is submitted and no duplicate. But since the requests are not
>> > given back yet from the started_list, then the next Start_Transfer
>> > command will begin with the TRB address of the completed request
>> > (HWO=0), the controller may not process the next TRBs. Have you tested
>> > this scenario?
>> >
>> > > >
>> > > > You can try to cleanup requests in the started_list, but you need to be
>> > > > careful to make sure you're not out of sync with the transfer completion
>> > > > events and new requests from gadget driver.
>> > > >
>> >
>> > Was the problem you encounter due to no_interrupt settings where the
>> > it was set to the last request of the uvc data pump?
>> >
>> > if that's the case, can UVC function driver make sure to not set
>> > no_interrupt to the last request of the data pump from the UVC?
>>
>> Actually no. What I want to do is to ensure that the dwc3 stream
>> is stopped when the hardware was drained. Which is a valid point
>> in my case. Since we are actually potentially enqueueing new request
>> in the complete handler, be it zero length or real transfers.
>>
>> Calling kick_transfer on an drained hw will absolutely run into
>> missed isocs if the irq thread was called late. We saw this on real hardware,
>> where another irq_thread was scheduled with the same priority as the
>> dwc3 irq_thread but was running so long that the HW was running dry in
>> between the hw irq and the actual dwc3_irq_thread run.
>>
>
>Right. Unfortunately, dwc3 can only "guess" when UVC function stops
>pumping more request or whether it's due to some large latency. The
>logic to workaround this underrun issue will not be foolproof. Perhaps
>we can improve upon it, but the solution is better implement at the UVC
>function driver.

Yes, the best way to solve this is in the uvc driver.

>I thought we have the mechanism in UVC function now to ensure queuing
>enough zero-length requests to account for underrun/latency issue?
>What's the issue now?

This is actually only partially true. Even with the zero-length packages
it is possible that we run into underruns. This is why we implemented
this patch. This has happened because another interrupt thread with the
same prio on the same CPU as this interrupt thread was keeping the CPU
busy. As the dwc3 interrupt thread get to its call, the time was already
over and the hw was already drained, although the started list was not
yet empty, which was causing the next queued requests to be queued to
late. (zero length or not)

Yes, this needed to be solved on the upper level first, by moving the
long running work of the other interrupt thread to another thread or
even into the userspace.

However I thought it would be great if we could somehow find out in
the dwc3 core and make the pump mechanism more robust against such
late enqueues.

This all started with that series.

https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/

And patch 2 of this series did work well so far. The next move was this
patch.

Since the last week debugging we found out that it got other issues.
It is not allways save to read the HWO bit, from the driver.

Turns out that after an new TRB was prepared with the HWO bit set
it is not save to read immideatly back from that value as the hw
will be doing some operations on that exactly new prepared TRB.

We ran into this problem when applying this patch. The trb buffer list
was actually filled but we hit a false positive where the latest HWO bit
was 0 (probably due to the hw action in the background) and therefor
went into end transfer.

Michael
Thinh Nguyen May 8, 2024, 11:03 p.m. UTC | #4
Hi Michael,

On Sun, May 05, 2024, Michael Grzeschik wrote:
> On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
> > > 
> > 
> > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
> > pumping more request or whether it's due to some large latency. The
> > logic to workaround this underrun issue will not be foolproof. Perhaps
> > we can improve upon it, but the solution is better implement at the UVC
> > function driver.
> 
> Yes, the best way to solve this is in the uvc driver.
> 
> > I thought we have the mechanism in UVC function now to ensure queuing
> > enough zero-length requests to account for underrun/latency issue?
> > What's the issue now?
> 
> This is actually only partially true. Even with the zero-length packages
> it is possible that we run into underruns. This is why we implemented
> this patch. This has happened because another interrupt thread with the
> same prio on the same CPU as this interrupt thread was keeping the CPU

Do you have the data on the worst latency?

Can this other interrupt thread lower its priority relative to UVC? For
isoc endpoint, data is time critical.

Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
latency more than 30ms? ie. no handling of transfer completion and
ep_queue for the whole 255 intervals or 30ms. If that's the case, we
have problems that cannot just be solved in dwc3.

> busy. As the dwc3 interrupt thread get to its call, the time was already
> over and the hw was already drained, although the started list was not
> yet empty, which was causing the next queued requests to be queued to
> late. (zero length or not)
> 
> Yes, this needed to be solved on the upper level first, by moving the
> long running work of the other interrupt thread to another thread or
> even into the userspace.

Right.

> 
> However I thought it would be great if we could somehow find out in
> the dwc3 core and make the pump mechanism more robust against such
> late enqueues.

The dwc3 core handling of events and ep_queue is relatively quick. I'm
all for any optimization in the dwc3 core for performance. However, I
don't want to just continue looking for workaround in the dwc3 core
without looking to solve the issue where it should be. I don't want to
sacrifice complexity and/or performance to other applications for just
UVC.

> 
> This all started with that series.
> 
> https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
> 
> And patch 2 of this series did work well so far. The next move was this
> patch.
> 
> Since the last week debugging we found out that it got other issues.
> It is not allways save to read the HWO bit, from the driver.
> 
> Turns out that after an new TRB was prepared with the HWO bit set
> it is not save to read immideatly back from that value as the hw
> will be doing some operations on that exactly new prepared TRB.
> 
> We ran into this problem when applying this patch. The trb buffer list
> was actually filled but we hit a false positive where the latest HWO bit
> was 0 (probably due to the hw action in the background) and therefor
> went into end transfer.
> 

Thanks for the update.

BR,
Thinh
Michael Grzeschik May 8, 2024, 11:53 p.m. UTC | #5
On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
>On Sun, May 05, 2024, Michael Grzeschik wrote:
>> On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
>> > >
>> >
>> > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
>> > pumping more request or whether it's due to some large latency. The
>> > logic to workaround this underrun issue will not be foolproof. Perhaps
>> > we can improve upon it, but the solution is better implement at the UVC
>> > function driver.
>>
>> Yes, the best way to solve this is in the uvc driver.
>>
>> > I thought we have the mechanism in UVC function now to ensure queuing
>> > enough zero-length requests to account for underrun/latency issue?
>> > What's the issue now?
>>
>> This is actually only partially true. Even with the zero-length packages
>> it is possible that we run into underruns. This is why we implemented
>> this patch. This has happened because another interrupt thread with the
>> same prio on the same CPU as this interrupt thread was keeping the CPU
>
>Do you have the data on the worst latency?

It was something a bit more then around 2ms AFAIR. Since with one frame
enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)

So with at least 2ms latency we did hit the sweet spot in several cases.

>Can this other interrupt thread lower its priority relative to UVC? For
>isoc endpoint, data is time critical.

The details are not that important. Sure the is a bug, that needed to be
solved. But all I wanted is to improve the overal dwc3 driver.

>Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
>zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
>latency more than 30ms? ie. no handling of transfer completion and
>ep_queue for the whole 255 intervals or 30ms. If that's the case, we
>have problems that cannot just be solved in dwc3.

Yes. But as mentioned above, this was not the case. Speaking of, there
is currently a bug in the uvc_video driver, that is not taking into
acount that actually every zero-length request should without exception
need to trigger an interrupt. Currently we also only scatter them over
the 16ms period, like with the actual payload. But since we feed the
video stream with the interrupts, we loose 2ms of potential ep_queue
calls with actual payload in the worst case.

My patch is already in the stack and will be send today.

>> busy. As the dwc3 interrupt thread get to its call, the time was already
>> over and the hw was already drained, although the started list was not
>> yet empty, which was causing the next queued requests to be queued to
>> late. (zero length or not)
>>
>> Yes, this needed to be solved on the upper level first, by moving the
>> long running work of the other interrupt thread to another thread or
>> even into the userspace.
>
>Right.
>
>>
>> However I thought it would be great if we could somehow find out in
>> the dwc3 core and make the pump mechanism more robust against such
>> late enqueues.
>
>The dwc3 core handling of events and ep_queue is relatively quick. I'm
>all for any optimization in the dwc3 core for performance. However, I
>don't want to just continue looking for workaround in the dwc3 core
>without looking to solve the issue where it should be. I don't want to
>sacrifice complexity and/or performance to other applications for just
>UVC.

I totally understand this. And as we already found out more and more
about the underlying complexity of the dwc3 driver I see more and more
clearer how we have to handle the corner cases. I just started this
conversation with Avichal and you in the other thread.

https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@google.com/

I think there is some work to come. As to be sure that everybody is on
the same page I will prepare a roadmap on how to proceed and what to
discuss. There are many cases interfering with each other which make the
solution pretty complex.

>> This all started with that series.
>>
>> https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
>>
>> And patch 2 of this series did work well so far. The next move was this
>> patch.
>>
>> Since the last week debugging we found out that it got other issues.
>> It is not allways save to read the HWO bit, from the driver.
>>
>> Turns out that after an new TRB was prepared with the HWO bit set
>> it is not save to read immideatly back from that value as the hw
>> will be doing some operations on that exactly new prepared TRB.
>>
>> We ran into this problem when applying this patch. The trb buffer list
>> was actually filled but we hit a false positive where the latest HWO bit
>> was 0 (probably due to the hw action in the background) and therefor
>> went into end transfer.
>>
>
>Thanks for the update.

Thanks,
Michael
Thinh Nguyen May 11, 2024, 12:51 a.m. UTC | #6
On Thu, May 09, 2024, Michael Grzeschik wrote:
> On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
> > On Sun, May 05, 2024, Michael Grzeschik wrote:
> > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
> > > > >
> > > >
> > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
> > > > pumping more request or whether it's due to some large latency. The
> > > > logic to workaround this underrun issue will not be foolproof. Perhaps
> > > > we can improve upon it, but the solution is better implement at the UVC
> > > > function driver.
> > > 
> > > Yes, the best way to solve this is in the uvc driver.
> > > 
> > > > I thought we have the mechanism in UVC function now to ensure queuing
> > > > enough zero-length requests to account for underrun/latency issue?
> > > > What's the issue now?
> > > 
> > > This is actually only partially true. Even with the zero-length packages
> > > it is possible that we run into underruns. This is why we implemented
> > > this patch. This has happened because another interrupt thread with the
> > > same prio on the same CPU as this interrupt thread was keeping the CPU
> > 
> > Do you have the data on the worst latency?
> 
> It was something a bit more then around 2ms AFAIR. Since with one frame
> enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)
> 
> So with at least 2ms latency we did hit the sweet spot in several cases.

For 2ms, we should be able to handle this with the zero-length requests.

> 
> > Can this other interrupt thread lower its priority relative to UVC? For
> > isoc endpoint, data is time critical.
> 
> The details are not that important. Sure the is a bug, that needed to be
> solved. But all I wanted is to improve the overal dwc3 driver.
> 
> > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
> > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
> > latency more than 30ms? ie. no handling of transfer completion and
> > ep_queue for the whole 255 intervals or 30ms. If that's the case, we
> > have problems that cannot just be solved in dwc3.
> 
> Yes. But as mentioned above, this was not the case. Speaking of, there
> is currently a bug in the uvc_video driver, that is not taking into
> acount that actually every zero-length request should without exception
> need to trigger an interrupt.

Not necessarily, you can send multiple set of zero-length requests
where, for example, IOC on the last request of the set.

> Currently we also only scatter them over
> the 16ms period, like with the actual payload. But since we feed the
> video stream with the interrupts, we loose 2ms of potential ep_queue
> calls with actual payload in the worst case.
> 
> My patch is already in the stack and will be send today.
> 
> > > busy. As the dwc3 interrupt thread get to its call, the time was already
> > > over and the hw was already drained, although the started list was not
> > > yet empty, which was causing the next queued requests to be queued to
> > > late. (zero length or not)
> > > 
> > > Yes, this needed to be solved on the upper level first, by moving the
> > > long running work of the other interrupt thread to another thread or
> > > even into the userspace.
> > 
> > Right.
> > 
> > > 
> > > However I thought it would be great if we could somehow find out in
> > > the dwc3 core and make the pump mechanism more robust against such
> > > late enqueues.
> > 
> > The dwc3 core handling of events and ep_queue is relatively quick. I'm
> > all for any optimization in the dwc3 core for performance. However, I
> > don't want to just continue looking for workaround in the dwc3 core
> > without looking to solve the issue where it should be. I don't want to
> > sacrifice complexity and/or performance to other applications for just
> > UVC.
> 
> I totally understand this. And as we already found out more and more
> about the underlying complexity of the dwc3 driver I see more and more
> clearer how we have to handle the corner cases. I just started this
> conversation with Avichal and you in the other thread.
> 
> https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@google.com/
> 
> I think there is some work to come. As to be sure that everybody is on
> the same page I will prepare a roadmap on how to proceed and what to
> discuss. There are many cases interfering with each other which make the
> solution pretty complex.

That's great. Let's do that so we can resolve this issue.

> 
> > > This all started with that series.
> > > 
> > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
> > > 
> > > And patch 2 of this series did work well so far. The next move was this
> > > patch.
> > > 
> > > Since the last week debugging we found out that it got other issues.
> > > It is not allways save to read the HWO bit, from the driver.
> > > 
> > > Turns out that after an new TRB was prepared with the HWO bit set
> > > it is not save to read immideatly back from that value as the hw
> > > will be doing some operations on that exactly new prepared TRB.
> > > 
> > > We ran into this problem when applying this patch. The trb buffer list
> > > was actually filled but we hit a false positive where the latest HWO bit
> > > was 0 (probably due to the hw action in the background) and therefor
> > > went into end transfer.
> > > 
> > 
> > Thanks for the update.
> 

Thanks,
Thinh
Michael Grzeschik May 12, 2024, 9:33 p.m. UTC | #7
On Sat, May 11, 2024 at 12:51:57AM +0000, Thinh Nguyen wrote:
>On Thu, May 09, 2024, Michael Grzeschik wrote:
>> On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
>> > On Sun, May 05, 2024, Michael Grzeschik wrote:
>> > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
>> > > > >
>> > > >
>> > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
>> > > > pumping more request or whether it's due to some large latency. The
>> > > > logic to workaround this underrun issue will not be foolproof. Perhaps
>> > > > we can improve upon it, but the solution is better implement at the UVC
>> > > > function driver.
>> > >
>> > > Yes, the best way to solve this is in the uvc driver.
>> > >
>> > > > I thought we have the mechanism in UVC function now to ensure queuing
>> > > > enough zero-length requests to account for underrun/latency issue?
>> > > > What's the issue now?
>> > >
>> > > This is actually only partially true. Even with the zero-length packages
>> > > it is possible that we run into underruns. This is why we implemented
>> > > this patch. This has happened because another interrupt thread with the
>> > > same prio on the same CPU as this interrupt thread was keeping the CPU
>> >
>> > Do you have the data on the worst latency?
>>
>> It was something a bit more then around 2ms AFAIR. Since with one frame
>> enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)
>>
>> So with at least 2ms latency we did hit the sweet spot in several cases.
>
>For 2ms, we should be able to handle this with the zero-length requests.

When the interrupt thread is the one that is enqueuing also the
zero-requests (like the uvc_video gadget) is doing now, we won't be able
to do that.

>>
>> > Can this other interrupt thread lower its priority relative to UVC? For
>> > isoc endpoint, data is time critical.
>>
>> The details are not that important. Sure the is a bug, that needed to be
>> solved. But all I wanted is to improve the overal dwc3 driver.
>>
>> > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
>> > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
>> > latency more than 30ms? ie. no handling of transfer completion and
>> > ep_queue for the whole 255 intervals or 30ms. If that's the case, we
>> > have problems that cannot just be solved in dwc3.
>>
>> Yes. But as mentioned above, this was not the case. Speaking of, there
>> is currently a bug in the uvc_video driver, that is not taking into
>> acount that actually every zero-length request should without exception
>> need to trigger an interrupt.
>
>Not necessarily, you can send multiple set of zero-length requests
>where, for example, IOC on the last request of the set.

Right. But for this we have to know if the last request that will be
enqueued will be followed by an actual data request. As the uvc_video
gadget is implemented now, we can not do this.

It is only checking if the prepared list is empty and then it is
enqueueing zero or data requests from the complete handler depending
from the outcome. It does not know if on the next enqueue the prepared
list will have some requests ready.

>> Currently we also only scatter them over
>> the 16ms period, like with the actual payload. But since we feed the
>> video stream with the interrupts, we loose 2ms of potential ep_queue
>> calls with actual payload in the worst case.
>>
>> My patch is already in the stack and will be send today.
>>
>> > > busy. As the dwc3 interrupt thread get to its call, the time was already
>> > > over and the hw was already drained, although the started list was not
>> > > yet empty, which was causing the next queued requests to be queued to
>> > > late. (zero length or not)
>> > >
>> > > Yes, this needed to be solved on the upper level first, by moving the
>> > > long running work of the other interrupt thread to another thread or
>> > > even into the userspace.
>> >
>> > Right.
>> >
>> > >
>> > > However I thought it would be great if we could somehow find out in
>> > > the dwc3 core and make the pump mechanism more robust against such
>> > > late enqueues.
>> >
>> > The dwc3 core handling of events and ep_queue is relatively quick. I'm
>> > all for any optimization in the dwc3 core for performance. However, I
>> > don't want to just continue looking for workaround in the dwc3 core
>> > without looking to solve the issue where it should be. I don't want to
>> > sacrifice complexity and/or performance to other applications for just
>> > UVC.
>>
>> I totally understand this. And as we already found out more and more
>> about the underlying complexity of the dwc3 driver I see more and more
>> clearer how we have to handle the corner cases. I just started this
>> conversation with Avichal and you in the other thread.
>>
>> https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@google.com/
>>
>> I think there is some work to come. As to be sure that everybody is on
>> the same page I will prepare a roadmap on how to proceed and what to
>> discuss. There are many cases interfering with each other which make the
>> solution pretty complex.
>
>That's great. Let's do that so we can resolve this issue.

Good

>> > > This all started with that series.
>> > >
>> > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
>> > >
>> > > And patch 2 of this series did work well so far. The next move was this
>> > > patch.
>> > >
>> > > Since the last week debugging we found out that it got other issues.
>> > > It is not allways save to read the HWO bit, from the driver.
>> > >
>> > > Turns out that after an new TRB was prepared with the HWO bit set
>> > > it is not save to read immideatly back from that value as the hw
>> > > will be doing some operations on that exactly new prepared TRB.
>> > >
>> > > We ran into this problem when applying this patch. The trb buffer list
>> > > was actually filled but we hit a false positive where the latest HWO bit
>> > > was 0 (probably due to the hw action in the background) and therefor
>> > > went into end transfer.
>> > >
>> >
>> > Thanks for the update.
>>
>
Thinh Nguyen May 17, 2024, 1:29 a.m. UTC | #8
On Sun, May 12, 2024, Michael Grzeschik wrote:
> On Sat, May 11, 2024 at 12:51:57AM +0000, Thinh Nguyen wrote:
> > On Thu, May 09, 2024, Michael Grzeschik wrote:
> > > On Wed, May 08, 2024 at 11:03:00PM +0000, Thinh Nguyen wrote:
> > > > On Sun, May 05, 2024, Michael Grzeschik wrote:
> > > > > On Wed, Apr 24, 2024 at 01:51:01AM +0000, Thinh Nguyen wrote:
> > > > > > >
> > > > > >
> > > > > > Right. Unfortunately, dwc3 can only "guess" when UVC function stops
> > > > > > pumping more request or whether it's due to some large latency. The
> > > > > > logic to workaround this underrun issue will not be foolproof. Perhaps
> > > > > > we can improve upon it, but the solution is better implement at the UVC
> > > > > > function driver.
> > > > >
> > > > > Yes, the best way to solve this is in the uvc driver.
> > > > >
> > > > > > I thought we have the mechanism in UVC function now to ensure queuing
> > > > > > enough zero-length requests to account for underrun/latency issue?
> > > > > > What's the issue now?
> > > > >
> > > > > This is actually only partially true. Even with the zero-length packages
> > > > > it is possible that we run into underruns. This is why we implemented
> > > > > this patch. This has happened because another interrupt thread with the
> > > > > same prio on the same CPU as this interrupt thread was keeping the CPU
> > > >
> > > > Do you have the data on the worst latency?
> > > 
> > > It was something a bit more then around 2ms AFAIR. Since with one frame
> > > enqueued we only trigger the interrupt every 16 requests (16*125us = 2ms)
> > > 
> > > So with at least 2ms latency we did hit the sweet spot in several cases.
> > 
> > For 2ms, we should be able to handle this with the zero-length requests.
> 
> When the interrupt thread is the one that is enqueuing also the
> zero-requests (like the uvc_video gadget) is doing now, we won't be able
> to do that.

How long does enqueuing take? Does it take longer than the number of
intervals that it enqueues?

> 
> > > 
> > > > Can this other interrupt thread lower its priority relative to UVC? For
> > > > isoc endpoint, data is time critical.
> > > 
> > > The details are not that important. Sure the is a bug, that needed to be
> > > solved. But all I wanted is to improve the overal dwc3 driver.
> > > 
> > > > Currently dwc3 can have up to 255 TRBs per endpoint, potentially 255
> > > > zero-length requests. That's 255 uframe, or roughly ~30ms. Is your worst
> > > > latency more than 30ms? ie. no handling of transfer completion and
> > > > ep_queue for the whole 255 intervals or 30ms. If that's the case, we
> > > > have problems that cannot just be solved in dwc3.
> > > 
> > > Yes. But as mentioned above, this was not the case. Speaking of, there
> > > is currently a bug in the uvc_video driver, that is not taking into
> > > acount that actually every zero-length request should without exception
> > > need to trigger an interrupt.
> > 
> > Not necessarily, you can send multiple set of zero-length requests
> > where, for example, IOC on the last request of the set.
> 
> Right. But for this we have to know if the last request that will be
> enqueued will be followed by an actual data request. As the uvc_video
> gadget is implemented now, we can not do this.
> 
> It is only checking if the prepared list is empty and then it is
> enqueueing zero or data requests from the complete handler depending
> from the outcome. It does not know if on the next enqueue the prepared
> list will have some requests ready.

Can we check if the prepare list should always have X amount of requests
instead of empty? If not, fill that up to the X amount with zero-length
requests.

BR,
Thinh

> 
> > > Currently we also only scatter them over
> > > the 16ms period, like with the actual payload. But since we feed the
> > > video stream with the interrupts, we loose 2ms of potential ep_queue
> > > calls with actual payload in the worst case.
> > > 
> > > My patch is already in the stack and will be send today.
> > > 
> > > > > busy. As the dwc3 interrupt thread get to its call, the time was already
> > > > > over and the hw was already drained, although the started list was not
> > > > > yet empty, which was causing the next queued requests to be queued to
> > > > > late. (zero length or not)
> > > > >
> > > > > Yes, this needed to be solved on the upper level first, by moving the
> > > > > long running work of the other interrupt thread to another thread or
> > > > > even into the userspace.
> > > >
> > > > Right.
> > > >
> > > > >
> > > > > However I thought it would be great if we could somehow find out in
> > > > > the dwc3 core and make the pump mechanism more robust against such
> > > > > late enqueues.
> > > >
> > > > The dwc3 core handling of events and ep_queue is relatively quick. I'm
> > > > all for any optimization in the dwc3 core for performance. However, I
> > > > don't want to just continue looking for workaround in the dwc3 core
> > > > without looking to solve the issue where it should be. I don't want to
> > > > sacrifice complexity and/or performance to other applications for just
> > > > UVC.
> > > 
> > > I totally understand this. And as we already found out more and more
> > > about the underlying complexity of the dwc3 driver I see more and more
> > > clearer how we have to handle the corner cases. I just started this
> > > conversation with Avichal and you in the other thread.
> > > 
> > > https://lore.kernel.org/all/17192e0f-7f18-49ae-96fc-71054d46f74a@google.com/
> > > 
> > > I think there is some work to come. As to be sure that everybody is on
> > > the same page I will prepare a roadmap on how to proceed and what to
> > > discuss. There are many cases interfering with each other which make the
> > > solution pretty complex.
> > 
> > That's great. Let's do that so we can resolve this issue.
> 
> Good
> 
> > > > > This all started with that series.
> > > > >
> > > > > https://lore.kernel.org/all/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
> > > > >
> > > > > And patch 2 of this series did work well so far. The next move was this
> > > > > patch.
> > > > >
> > > > > Since the last week debugging we found out that it got other issues.
> > > > > It is not allways save to read the HWO bit, from the driver.
> > > > >
> > > > > Turns out that after an new TRB was prepared with the HWO bit set
> > > > > it is not save to read immideatly back from that value as the hw
> > > > > will be doing some operations on that exactly new prepared TRB.
> > > > >
> > > > > We ran into this problem when applying this patch. The trb buffer list
> > > > > was actually filled but we hit a false positive where the latest HWO bit
> > > > > was 0 (probably due to the hw action in the background) and therefor
> > > > > went into end transfer.
> > > > >
> > > >
> > > > Thanks for the update.
> > > 
> > 
> 
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 4df2661f66751..3e9c67799259a 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3582,13 +3582,26 @@  static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
 	if (!dep->endpoint.desc)
 		return no_started_trb;
 
-	if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
-		list_empty(&dep->started_list) &&
-		(list_empty(&dep->pending_list) || status == -EXDEV))
-		dwc3_stop_active_transfer(dep, true, true);
-	else if (dwc3_gadget_ep_should_continue(dep))
+	if (usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
+		/* It is possible that the interrupt thread was delayed
+		 * by scheduling in the system, and therefor the HW has
+		 * already run dry. In that case the last trb in the
+		 * queue is already handled by the hw. By checking the
+		 * HWO bit we know to restart the whole transfer. The
+		 * condition to appear is more likely if not every req
+		 * has the IOC bit set and therefor does not trigger the
+		 * interrupt thread frequently.
+		 */
+		struct dwc3_trb *trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
+		unsigned int transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
+
+		if ((list_empty(&dep->started_list) || !transfer_in_flight) &&
+		    (list_empty(&dep->pending_list) || status == -EXDEV))
+			dwc3_stop_active_transfer(dep, true, true);
+	} else if (dwc3_gadget_ep_should_continue(dep)) {
 		if (__dwc3_gadget_kick_transfer(dep) == 0)
 			no_started_trb = false;
+	}
 
 out:
 	/*