diff mbox series

[v1,1/2] usb: xhci: Skip xhci_reset in xhci_resume if xhci is being removed

Message ID 20250522190912.457583-2-royluo@google.com
State New
Headers show
Series Revert commit 6ccb83d6c497 to fix DWC3 dual-role regression | expand

Commit Message

Roy Luo May 22, 2025, 7:09 p.m. UTC
xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is
set, without completing the xhci handshake, unless the reset completes
exceptionally quickly. This behavior causes a regression on Synopsys
DWC3 USB controllers with dual-role capabilities.

Specifically, when a DWC3 controller exits host mode and removes xhci
while a reset is still in progress, and then attempts to configure its
hardware for device mode, the ongoing, incomplete reset leads to
critical register access issues. All register reads return zero, not
just within the xHCI register space (which might be expected during a
reset), but across the entire DWC3 IP block.

This patch addresses the issue by preventing xhci_reset() from being
called in xhci_resume() and bailing out early in the reinit flow when
XHCI_STATE_REMOVING is set.

Cc: stable@vger.kernel.org
Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Suggested-by: Mathias Nyman <mathias.nyman@intel.com>
Signed-off-by: Roy Luo <royluo@google.com>
---
 drivers/usb/host/xhci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Thinh Nguyen May 23, 2025, 11:06 p.m. UTC | #1
Hi Mathias, Roy,

On Thu, May 22, 2025, Roy Luo wrote:
> xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is
> set, without completing the xhci handshake, unless the reset completes
> exceptionally quickly. This behavior causes a regression on Synopsys
> DWC3 USB controllers with dual-role capabilities.
> 
> Specifically, when a DWC3 controller exits host mode and removes xhci
> while a reset is still in progress, and then attempts to configure its
> hardware for device mode, the ongoing, incomplete reset leads to
> critical register access issues. All register reads return zero, not
> just within the xHCI register space (which might be expected during a
> reset), but across the entire DWC3 IP block.
> 
> This patch addresses the issue by preventing xhci_reset() from being
> called in xhci_resume() and bailing out early in the reinit flow when
> XHCI_STATE_REMOVING is set.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
> Suggested-by: Mathias Nyman <mathias.nyman@intel.com>
> Signed-off-by: Roy Luo <royluo@google.com>
> ---
>  drivers/usb/host/xhci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 90eb491267b5..244b12eafd95 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1084,7 +1084,10 @@ int xhci_resume(struct xhci_hcd *xhci, bool power_lost, bool is_auto_resume)
>  		xhci_dbg(xhci, "Stop HCD\n");
>  		xhci_halt(xhci);
>  		xhci_zero_64b_regs(xhci);
> -		retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
> +		if (xhci->xhc_state & XHCI_STATE_REMOVING)
> +			retval = -ENODEV;
> +		else
> +			retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);

How can this prevent the xhc_state from changing while in reset? There's
no locking in xhci-plat.

I would suggest to simply revert the commit 6ccb83d6c497 that causes
regression first. We can investigate and look into a solution to the
specific Qcom issue afterward.

Note that this commit may impact role-switching flow for all DRD dwc3
(and perhaps others), which may also impact other Qcom DRD platforms.

Thanks,
Thinh

>  		spin_unlock_irq(&xhci->lock);
>  		if (retval)
>  			return retval;
> -- 
> 2.49.0.1204.g71687c7c1d-goog
>
Mathias Nyman May 26, 2025, 7:39 a.m. UTC | #2
On 24.5.2025 2.06, Thinh Nguyen wrote:
> Hi Mathias, Roy,
> 
> On Thu, May 22, 2025, Roy Luo wrote:
>> xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is
>> set, without completing the xhci handshake, unless the reset completes
>> exceptionally quickly. This behavior causes a regression on Synopsys
>> DWC3 USB controllers with dual-role capabilities.
>>
>> Specifically, when a DWC3 controller exits host mode and removes xhci
>> while a reset is still in progress, and then attempts to configure its
>> hardware for device mode, the ongoing, incomplete reset leads to
>> critical register access issues. All register reads return zero, not
>> just within the xHCI register space (which might be expected during a
>> reset), but across the entire DWC3 IP block.
>>
>> This patch addresses the issue by preventing xhci_reset() from being
>> called in xhci_resume() and bailing out early in the reinit flow when
>> XHCI_STATE_REMOVING is set.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
>> Suggested-by: Mathias Nyman <mathias.nyman@intel.com>
>> Signed-off-by: Roy Luo <royluo@google.com>
>> ---
>>   drivers/usb/host/xhci.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
>> index 90eb491267b5..244b12eafd95 100644
>> --- a/drivers/usb/host/xhci.c
>> +++ b/drivers/usb/host/xhci.c
>> @@ -1084,7 +1084,10 @@ int xhci_resume(struct xhci_hcd *xhci, bool power_lost, bool is_auto_resume)
>>   		xhci_dbg(xhci, "Stop HCD\n");
>>   		xhci_halt(xhci);
>>   		xhci_zero_64b_regs(xhci);
>> -		retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
>> +		if (xhci->xhc_state & XHCI_STATE_REMOVING)
>> +			retval = -ENODEV;
>> +		else
>> +			retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
> 
> How can this prevent the xhc_state from changing while in reset? There's
> no locking in xhci-plat.

Patch 2/2, which is the revert of 6ccb83d6c497 prevents xhci_reset() from
aborting due to xhc_state flags change.

This patch makes sure xHC is not reset twice if xhci is resuming due to
remove being called. (XHCI_STATE_REMOVING is set).
The Qcom platform has watchdog issues with the 10 second XHCI_RESET_LONG_USEC
timeout reset during resume at remove.

> 
> I would suggest to simply revert the commit 6ccb83d6c497 that causes
> regression first. We can investigate and look into a solution to the
> specific Qcom issue afterward.

Why intentionally bring back the Qcom watchdog issue by only reverting
6ccb83d6c497 ?. Can't we solve both in one go?

> 
> Note that this commit may impact role-switching flow for all DRD dwc3
> (and perhaps others), which may also impact other Qcom DRD platforms.

Could you expand on this, I'm not sure I follow.

xHC will be reset later in remove path:

xhci_plat_remove()
   usb_remove_hcd()
     hcd->driver->stop(hcd) -> xhci_stop()
       xhci_reset(xhci, XHCI_RESET_SHORT_USEC);

Thanks
Mathias
Thinh Nguyen May 29, 2025, 1:17 a.m. UTC | #3
On Mon, May 26, 2025, Mathias Nyman wrote:
> On 24.5.2025 2.06, Thinh Nguyen wrote:
> > Hi Mathias, Roy,
> > 
> > On Thu, May 22, 2025, Roy Luo wrote:
> > > xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is
> > > set, without completing the xhci handshake, unless the reset completes
> > > exceptionally quickly. This behavior causes a regression on Synopsys
> > > DWC3 USB controllers with dual-role capabilities.
> > > 
> > > Specifically, when a DWC3 controller exits host mode and removes xhci
> > > while a reset is still in progress, and then attempts to configure its
> > > hardware for device mode, the ongoing, incomplete reset leads to
> > > critical register access issues. All register reads return zero, not
> > > just within the xHCI register space (which might be expected during a
> > > reset), but across the entire DWC3 IP block.
> > > 
> > > This patch addresses the issue by preventing xhci_reset() from being
> > > called in xhci_resume() and bailing out early in the reinit flow when
> > > XHCI_STATE_REMOVING is set.
> > > 
> > > Cc: stable@vger.kernel.org
> > > Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
> > > Suggested-by: Mathias Nyman <mathias.nyman@intel.com>
> > > Signed-off-by: Roy Luo <royluo@google.com>
> > > ---
> > >   drivers/usb/host/xhci.c | 5 ++++-
> > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> > > index 90eb491267b5..244b12eafd95 100644
> > > --- a/drivers/usb/host/xhci.c
> > > +++ b/drivers/usb/host/xhci.c
> > > @@ -1084,7 +1084,10 @@ int xhci_resume(struct xhci_hcd *xhci, bool power_lost, bool is_auto_resume)
> > >   		xhci_dbg(xhci, "Stop HCD\n");
> > >   		xhci_halt(xhci);
> > >   		xhci_zero_64b_regs(xhci);
> > > -		retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
> > > +		if (xhci->xhc_state & XHCI_STATE_REMOVING)
> > > +			retval = -ENODEV;
> > > +		else
> > > +			retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
> > 
> > How can this prevent the xhc_state from changing while in reset? There's
> > no locking in xhci-plat.
> 
> Patch 2/2, which is the revert of 6ccb83d6c497 prevents xhci_reset() from
> aborting due to xhc_state flags change.
> 
> This patch makes sure xHC is not reset twice if xhci is resuming due to
> remove being called. (XHCI_STATE_REMOVING is set).

Wouldn't it still be possible for xhci to be removed in the middle of
reset on resume? The watchdog may still timeout afterward if there's an
issue with reset right?

> The Qcom platform has watchdog issues with the 10 second XHCI_RESET_LONG_USEC
> timeout reset during resume at remove.

> 
> > 
> > I would suggest to simply revert the commit 6ccb83d6c497 that causes
> > regression first. We can investigate and look into a solution to the
> > specific Qcom issue afterward.
> 
> Why intentionally bring back the Qcom watchdog issue by only reverting
> 6ccb83d6c497 ?. Can't we solve both in one go?

I feel that the fix is doesn't cover all the scenarios, that's why I
suggest the revert for now and wait until the fix is properly tested
before applying it to stable? 

> 
> > 
> > Note that this commit may impact role-switching flow for all DRD dwc3
> > (and perhaps others), which may also impact other Qcom DRD platforms.
> 
> Could you expand on this, I'm not sure I follow.
> 
> xHC will be reset later in remove path:
> 
> xhci_plat_remove()
>   usb_remove_hcd()
>     hcd->driver->stop(hcd) -> xhci_stop()
>       xhci_reset(xhci, XHCI_RESET_SHORT_USEC);
> 

I'm referring to the offending commit 6ccb83d6c497, that it should be
prioritized and pushed out first.

BR,
Thinh
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..244b12eafd95 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1084,7 +1084,10 @@  int xhci_resume(struct xhci_hcd *xhci, bool power_lost, bool is_auto_resume)
 		xhci_dbg(xhci, "Stop HCD\n");
 		xhci_halt(xhci);
 		xhci_zero_64b_regs(xhci);
-		retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
+		if (xhci->xhc_state & XHCI_STATE_REMOVING)
+			retval = -ENODEV;
+		else
+			retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
 		spin_unlock_irq(&xhci->lock);
 		if (retval)
 			return retval;