diff mbox series

[23/24] usb: xhci: add warning message specifying which Set TR Deq failed

Message ID 20250515135621.335595-24-mathias.nyman@linux.intel.com
State New
Headers show
Series xhci features for usb-next | expand

Commit Message

Mathias Nyman May 15, 2025, 1:56 p.m. UTC
From: Niklas Neronin <niklas.neronin@linux.intel.com>

Currently, errors from the Set TR Deq command are not handled.
Add a warning message to specify the slot, endpoint, and TRB address when
a Set TR Deq command fails. This additional information will aid in
debugging such failures.

Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Michał Pecio May 16, 2025, 12:43 p.m. UTC | #1
On Thu, 15 May 2025 16:56:20 +0300, Mathias Nyman wrote:
> From: Niklas Neronin <niklas.neronin@linux.intel.com>
> 
> Currently, errors from the Set TR Deq command are not handled.
> Add a warning message to specify the slot, endpoint, and TRB address
> when a Set TR Deq command fails. This additional information will aid
> in debugging such failures.
> 
> Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci-ring.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci-ring.c
> b/drivers/usb/host/xhci-ring.c index e3c823e1caee..eff6b84dc915 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1448,6 +1448,11 @@ static void xhci_handle_cmd_set_deq(struct
> xhci_hcd *xhci, int slot_id, unsigned int ep_state;
>  		unsigned int slot_state;
>  
> +		xhci_warn(xhci, "Set TR Deq error for TRB 0x%llx in slot %d ep %u\n",
> +			  (unsigned long long)xhci_trb_virt_to_dma(ep->queued_deq_seg,
> +						ep->queued_deq_ptr),

Is printing this pointer actually useful? It doesn't tell why
the error happened. It will not relate to other messages - the
command failed, so dequeue stays at its old position.

> +			  slot_id, ep_index);
> +
>  		switch (cmd_comp_code) {
>  		case COMP_TRB_ERROR:
>  			xhci_warn(xhci, "WARN Set TR Deq Ptr cmd invalid because of stream ID configuration\n");

This will now become a multi-line log message, even though actual
information it contains could fit in one line.

How about replacing this new line and the whole switch with:

+               ep_state = GET_EP_CTX_STATE(ep_ctx);
+               slot_state = GET_SLOT_STATE(le32_to_cpu(slot_ctx->dev_state));
+
+               xhci_warn(xhci, "Set TR Dequeue %s for slot %d ep %d (%s/%s)\n",
+                               xhci_trb_comp_code_string(cmd_comp_code), slot_id, ep_index,
+                               xhci_slot_state_string(slot_state), xhci_ep_state_string(ep_state));

Example output:
xhci_hcd 0000:02:00.0: Set TR Dequeue TRB Error for slot 1 ep 6 (configured/stopped)

IMO this has the further benefit that "TRB Error" is something I can
search in the spec, while "because of stream ID configuration" is not.
And it isn't even the true treason in every case of TRB Error.

Thanks,
Michal
Neronin, Niklas May 16, 2025, 2:32 p.m. UTC | #2
On 16/05/2025 15.43, Michał Pecio wrote:
> On Thu, 15 May 2025 16:56:20 +0300, Mathias Nyman wrote:
>> From: Niklas Neronin <niklas.neronin@linux.intel.com>
>>
>> Currently, errors from the Set TR Deq command are not handled.
>> Add a warning message to specify the slot, endpoint, and TRB address
>> when a Set TR Deq command fails. This additional information will aid
>> in debugging such failures.
>>
>> Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> ---
>>  drivers/usb/host/xhci-ring.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/usb/host/xhci-ring.c
>> b/drivers/usb/host/xhci-ring.c index e3c823e1caee..eff6b84dc915 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -1448,6 +1448,11 @@ static void xhci_handle_cmd_set_deq(struct
>> xhci_hcd *xhci, int slot_id, unsigned int ep_state;
>>  		unsigned int slot_state;
>>  
>> +		xhci_warn(xhci, "Set TR Deq error for TRB 0x%llx in slot %d ep %u\n",
>> +			  (unsigned long long)xhci_trb_virt_to_dma(ep->queued_deq_seg,
>> +						ep->queued_deq_ptr),
> 
> Is printing this pointer actually useful? It doesn't tell why
> the error happened. It will not relate to other messages - the
> command failed, so dequeue stays at its old position.
> 

Apologies, I should have retracted this patch as part of the Set TR Deq series.

I still plan on trying to add the "Set TR Deq Error Handling" series in the future,
with a few tweaks (infinite re-try loop). Then each case will have its own
tailored warning, while this warning will print general error info.

Best Regards,
Niklas
Mathias Nyman May 19, 2025, 2:33 p.m. UTC | #3
On 16.5.2025 15.43, Michał Pecio wrote:
> On Thu, 15 May 2025 16:56:20 +0300, Mathias Nyman wrote:
>> From: Niklas Neronin <niklas.neronin@linux.intel.com>
>>
>> Currently, errors from the Set TR Deq command are not handled.
>> Add a warning message to specify the slot, endpoint, and TRB address
>> when a Set TR Deq command fails. This additional information will aid
>> in debugging such failures.
>>
>> Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> ---
>>   drivers/usb/host/xhci-ring.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/usb/host/xhci-ring.c
>> b/drivers/usb/host/xhci-ring.c index e3c823e1caee..eff6b84dc915 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -1448,6 +1448,11 @@ static void xhci_handle_cmd_set_deq(struct
>> xhci_hcd *xhci, int slot_id, unsigned int ep_state;
>>   		unsigned int slot_state;
>>   
>> +		xhci_warn(xhci, "Set TR Deq error for TRB 0x%llx in slot %d ep %u\n",
>> +			  (unsigned long long)xhci_trb_virt_to_dma(ep->queued_deq_seg,
>> +						ep->queued_deq_ptr),
> 
> Is printing this pointer actually useful? It doesn't tell why
> the error happened. It will not relate to other messages - the
> command failed, so dequeue stays at its old position.
  
Printing the DMA address has turned out to be quite useful, quickly shows corner
cases like end or beginning of ring segment, or address missing upper 32 bits.

Comparable with the "trb not in TD" error messages


> 
>> +			  slot_id, ep_index);
>> +
>>   		switch (cmd_comp_code) {
>>   		case COMP_TRB_ERROR:
>>   			xhci_warn(xhci, "WARN Set TR Deq Ptr cmd invalid because of stream ID configuration\n");
> 
> This will now become a multi-line log message, even though actual
> information it contains could fit in one line.
> 
> How about replacing this new line and the whole switch with:
> 
> +               ep_state = GET_EP_CTX_STATE(ep_ctx);
> +               slot_state = GET_SLOT_STATE(le32_to_cpu(slot_ctx->dev_state));
> +
> +               xhci_warn(xhci, "Set TR Dequeue %s for slot %d ep %d (%s/%s)\n",
> +                               xhci_trb_comp_code_string(cmd_comp_code), slot_id, ep_index,
> +                               xhci_slot_state_string(slot_state), xhci_ep_state_string(ep_state));
> 
> Example output:
> xhci_hcd 0000:02:00.0: Set TR Dequeue TRB Error for slot 1 ep 6 (configured/stopped)
> 
> IMO this has the further benefit that "TRB Error" is something I can
> search in the spec, while "because of stream ID configuration" is not.
> And it isn't even the true treason in every case of TRB Error.

This works as well, but tuning this message is something we can do in a later fixup.
I don't think it's worth resending the series due to this

Thanks
Mathias
Greg Kroah-Hartman May 21, 2025, 10:34 a.m. UTC | #4
On Mon, May 19, 2025 at 05:33:41PM +0300, Mathias Nyman wrote:
> On 16.5.2025 15.43, Michał Pecio wrote:
> > On Thu, 15 May 2025 16:56:20 +0300, Mathias Nyman wrote:
> > > From: Niklas Neronin <niklas.neronin@linux.intel.com>
> > > 
> > > Currently, errors from the Set TR Deq command are not handled.
> > > Add a warning message to specify the slot, endpoint, and TRB address
> > > when a Set TR Deq command fails. This additional information will aid
> > > in debugging such failures.
> > > 
> > > Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
> > > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > > ---
> > >   drivers/usb/host/xhci-ring.c | 5 +++++
> > >   1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/drivers/usb/host/xhci-ring.c
> > > b/drivers/usb/host/xhci-ring.c index e3c823e1caee..eff6b84dc915 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1448,6 +1448,11 @@ static void xhci_handle_cmd_set_deq(struct
> > > xhci_hcd *xhci, int slot_id, unsigned int ep_state;
> > >   		unsigned int slot_state;
> > > +		xhci_warn(xhci, "Set TR Deq error for TRB 0x%llx in slot %d ep %u\n",
> > > +			  (unsigned long long)xhci_trb_virt_to_dma(ep->queued_deq_seg,
> > > +						ep->queued_deq_ptr),
> > 
> > Is printing this pointer actually useful? It doesn't tell why
> > the error happened. It will not relate to other messages - the
> > command failed, so dequeue stays at its old position.
> Printing the DMA address has turned out to be quite useful, quickly shows corner
> cases like end or beginning of ring segment, or address missing upper 32 bits.

Printing kernel addresses is not allowed.  If you really want to do
this, use the proper printk flag for it (%p I think).

> Comparable with the "trb not in TD" error messages

Should that be fixed too?  Again, don't print kernel addresses please,
people consider that an information leak.

thanks,

greg k-h
Mathias Nyman May 21, 2025, 1:22 p.m. UTC | #5
>>>
>>> Is printing this pointer actually useful? It doesn't tell why
>>> the error happened. It will not relate to other messages - the
>>> command failed, so dequeue stays at its old position.
>> Printing the DMA address has turned out to be quite useful, quickly shows corner
>> cases like end or beginning of ring segment, or address missing upper 32 bits.
> 
> Printing kernel addresses is not allowed.  If you really want to do
> this, use the proper printk flag for it (%p I think).

These are all DMA bus addresses used by the device.
Wasn't aware those are bad as well

> 
>> Comparable with the "trb not in TD" error messages
> 
> Should that be fixed too?  Again, don't print kernel addresses please,
> people consider that an information leak.

Yes, there are several places that print DMA addresses using  %llx in the xhci driver
We can change those to %pad, or some index

Thanks
Mathias
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e3c823e1caee..eff6b84dc915 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1448,6 +1448,11 @@  static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 		unsigned int ep_state;
 		unsigned int slot_state;
 
+		xhci_warn(xhci, "Set TR Deq error for TRB 0x%llx in slot %d ep %u\n",
+			  (unsigned long long)xhci_trb_virt_to_dma(ep->queued_deq_seg,
+								   ep->queued_deq_ptr),
+			  slot_id, ep_index);
+
 		switch (cmd_comp_code) {
 		case COMP_TRB_ERROR:
 			xhci_warn(xhci, "WARN Set TR Deq Ptr cmd invalid because of stream ID configuration\n");