diff mbox series

[RFC] USB: core/xhci: add a buffer in urb for host controller private data

Message ID 20250512150724.4560-1-00107082@163.com
State New
Headers show
Series [RFC] USB: core/xhci: add a buffer in urb for host controller private data | expand

Commit Message

David Wang May 12, 2025, 3:07 p.m. UTC
---
I was checking memory allocation behaviors (via memory profiling[1]),
when I notice a high frequent memory allocation in xhci_urb_enqueue, about
250/s when using a USB webcam. If those alloced buffer could be kept and
reused, lots of memory allocations could be avoid over time.

This patch is just a POC, about 0/s memory allocation in xhci with this
patch, when I use my USB devices, webcam/keyboard/mouse. 

A dynamic cached memory would be better: URB keep host controller's
private data, if larger size buffer needed for private data, old buffer
released and a larger buffer alloced.

I did not observe any nagative impact with xhci's 250/s allocations
when using my system, hence no measurement of how useful this changes
can make to user. Just want to collect feedbacks before putting more
effort.


[1] https://lore.kernel.org/all/20240221194052.927623-1-surenb@google.com/
---
xhci keeps allocing new memory when enque a urb for private data,
and enque frequency could be high, about 250/s when using a usb
webcam, about 30/s for high pace USB keyboard/mouse usage.
Using a cache/buffer for those private data could avoid
lots memory allocations.

Signed-off-by: David Wang <00107082@163.com>
---
 drivers/usb/host/xhci-ring.c |  3 ++-
 drivers/usb/host/xhci.c      | 14 +++++++++++---
 include/linux/usb.h          |  1 +
 3 files changed, 14 insertions(+), 4 deletions(-)

Comments

Oliver Neukum May 13, 2025, 8:11 a.m. UTC | #1
Hi,

On 13.05.25 07:54, David Wang wrote:
> URB objects have long lifecycle, an urb can be reused between
> enqueue-dequeue loops; The private data needed by some host controller
> has very short lifecycle, the memory is alloced when enqueue, and
> released when dequeue. For example, on a system with xhci, several
> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>    drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>    drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
> Memory allocation frequency for host-controller private data can reach
> ~1k/s.

First of all, thank you for trying to tackle this long running issue.

> @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags)
>   	if (!urb)
>   		return NULL;
>   	usb_init_urb(urb);
> +	urb->hcpriv_mempool_flags = mem_flags;

No. You cannot do this. The flags you pass to usb_alloc_urb()
depend on the context you call it in. For example, if you are
allocating it while holding a spinlock, ou need to use GFP_ATOMIC

But that may or may not be the same context you submit the URB in.
Recording mem_flags here makes no sense.

	Regards
		Oliver
David Wang May 13, 2025, 8:23 a.m. UTC | #2
At 2025-05-13 16:11:20, "Oliver Neukum" <oneukum@suse.com> wrote:
>Hi,
>
>On 13.05.25 07:54, David Wang wrote:
>> URB objects have long lifecycle, an urb can be reused between
>> enqueue-dequeue loops; The private data needed by some host controller
>> has very short lifecycle, the memory is alloced when enqueue, and
>> released when dequeue. For example, on a system with xhci, several
>> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>>    drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>>    drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> Memory allocation frequency for host-controller private data can reach
>> ~1k/s.
>
>First of all, thank you for trying to tackle this long running issue.
>
>> @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags)
>>   	if (!urb)
>>   		return NULL;
>>   	usb_init_urb(urb);
>> +	urb->hcpriv_mempool_flags = mem_flags;
>
>No. You cannot do this. The flags you pass to usb_alloc_urb()
>depend on the context you call it in. For example, if you are
>allocating it while holding a spinlock, ou need to use GFP_ATOMIC
>
>But that may or may not be the same context you submit the URB in.
>Recording mem_flags here makes no sense.
>
>	Regards
>		Oliver

Thanks for reviewing this.  The memory flag thing do raise concern. 
I think I can make adjustment:  realloc the memory if flag changed.


Thanks
David
Oliver Neukum May 13, 2025, 8:46 a.m. UTC | #3
On 13.05.25 10:23, David Wang wrote:

Hi,
  > Thanks for reviewing this.  The memory flag thing do raise concern.
> I think I can make adjustment:  realloc the memory if flag changed.

I am sorry. I have been unclear. Here comes a detailed explanation:

What we call "gfp_t" is a combination of flags. They describe

A - the type of memory (always valid)
B - the way the memory can be allocated (valid only at a specific time)

The URB is a generic data structure to be processed by the CPU, _not_
the HC. It is always generic kernel memory. Flags of type A make no sense
to pass.
In fact you may not know for which device an URB will be used when you
allocate it. The only valid mem_flags you can pass to usb_alloc_urb()
are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC.

If you need to reallocate memory for private data you _must_ use
the flags passed with usb_submit_urb(). A HCD can modify them by adding
flags of type A, but you cannot change flags of type B.
For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB,
but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save
and reuse the GFP_KERNEL.

	HTH
		Oliver
Mathias Nyman May 13, 2025, 9:27 a.m. UTC | #4
Hi David

On 12.5.2025 18.07, David Wang wrote:
> ---
> I was checking memory allocation behaviors (via memory profiling[1]),
> when I notice a high frequent memory allocation in xhci_urb_enqueue, about
> 250/s when using a USB webcam. If those alloced buffer could be kept and
> reused, lots of memory allocations could be avoid over time.
> 
> This patch is just a POC, about 0/s memory allocation in xhci with this
> patch, when I use my USB devices, webcam/keyboard/mouse.

Thanks for looking at this, this is something that obviously needs more
attention

> 
> A dynamic cached memory would be better: URB keep host controller's
> private data, if larger size buffer needed for private data, old buffer
> released and a larger buffer alloced.
> 
> I did not observe any nagative impact with xhci's 250/s allocations
> when using my system, hence no measurement of how useful this changes
> can make to user. Just want to collect feedbacks before putting more
> effort.
> 

I think we can make a xhci internal solution that doesn't affect other hosts
or usb core.

How about adding a list of struct urb_priv nodes for every usb device, or maybe
even per endpoint? i.e. to struct xhci_virt_device or  struct xhci_virt_ep.

Add size and list_head entries to struct urb_priv.

In xhci_urb_enqueue() pick the first urb_priv node from list if it exists and is
large enough, otherwise just allocate a new one and set the size.

When giving back the URB zero everything in the struct urb_priv except the size,
and add it to the list.

When the device is freed there shouldn't be any nodes left in the list, but if
there are then warn and free them.

Isoc transfers are the ones with odd urb_priv sizes. If we have a per device or
per endpoint urb_priv list then sizes will match better.

Thanks
Mathias
David Wang May 13, 2025, 9:49 a.m. UTC | #5
At 2025-05-13 16:46:39, "Oliver Neukum" <oneukum@suse.com> wrote:
>On 13.05.25 10:23, David Wang wrote:
>
>Hi,
>  > Thanks for reviewing this.  The memory flag thing do raise concern.
>> I think I can make adjustment:  realloc the memory if flag changed.
>
>I am sorry. I have been unclear. Here comes a detailed explanation:
>
>What we call "gfp_t" is a combination of flags. They describe
>
>A - the type of memory (always valid)
>B - the way the memory can be allocated (valid only at a specific time)
>
>The URB is a generic data structure to be processed by the CPU, _not_
>the HC. It is always generic kernel memory. Flags of type A make no sense
>to pass.
>In fact you may not know for which device an URB will be used when you
>allocate it. The only valid mem_flags you can pass to usb_alloc_urb()
>are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC.
>
>If you need to reallocate memory for private data you _must_ use
>the flags passed with usb_submit_urb(). A HCD can modify them by adding
>flags of type A, but you cannot change flags of type B.
>For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB,
>but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save
>and reuse the GFP_KERNEL.
>
>	HTH
>		Oliver
>  

Hi, I have one question about mem flags.
If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with  flags B  and its size
is big enough,  is it safe to return this memory  to usb_submit_urb  which is in the context of flags A? 


Thanks
David
Oliver Neukum May 13, 2025, 11:02 a.m. UTC | #6
On 13.05.25 11:49, David Wang wrote:

> Hi, I have one question about mem flags.
> If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with  flags B  and its size
> is big enough,  is it safe to return this memory  to usb_submit_urb  which is in the context of flags A?

Yes, that is perfectly safe.

	HTH
		Oliver
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 423bf3649570..bc350b307758 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -831,7 +831,8 @@  static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
 				usb_amd_quirk_pll_enable();
 		}
 	}
-	xhci_urb_free_priv(urb_priv);
+	if (urb_priv != (void *)urb->hcpriv_buffer)
+		xhci_urb_free_priv(urb_priv);
 	usb_hcd_unlink_urb_from_ep(hcd, urb);
 	trace_xhci_urb_giveback(urb);
 	usb_hcd_giveback_urb(hcd, urb, status);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..85aa5fe526c8 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1539,6 +1539,7 @@  static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 	unsigned int *ep_state;
 	struct urb_priv	*urb_priv;
 	int num_tds;
+	size_t private_size;
 
 	ep_index = xhci_get_endpoint_index(&urb->ep->desc);
 
@@ -1552,7 +1553,13 @@  static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 	else
 		num_tds = 1;
 
-	urb_priv = kzalloc(struct_size(urb_priv, td, num_tds), mem_flags);
+	private_size = struct_size(urb_priv, td, num_tds);
+	if (private_size <= sizeof(urb->hcpriv_buffer)) {
+		memset(urb->hcpriv_buffer, 0, sizeof(urb->hcpriv_buffer));
+		urb_priv = (struct urb_priv *)urb->hcpriv_buffer;
+	} else {
+		urb_priv = kzalloc(private_size, mem_flags);
+	}
 	if (!urb_priv)
 		return -ENOMEM;
 
@@ -1626,7 +1633,8 @@  static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 
 	if (ret) {
 free_priv:
-		xhci_urb_free_priv(urb_priv);
+		if (urb_priv != (void *)urb->hcpriv_buffer)
+			xhci_urb_free_priv(urb_priv);
 		urb->hcpriv = NULL;
 	}
 	spin_unlock_irqrestore(&xhci->lock, flags);
@@ -1789,7 +1797,7 @@  static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	return ret;
 
 err_giveback:
-	if (urb_priv)
+	if (urb_priv &&  urb_priv != (void *)urb->hcpriv_buffer)
 		xhci_urb_free_priv(urb_priv);
 	usb_hcd_unlink_urb_from_ep(hcd, urb);
 	spin_unlock_irqrestore(&xhci->lock, flags);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..4f82bb69081c 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,7 @@  struct urb {
 	struct kref kref;		/* reference count of the URB */
 	int unlinked;			/* unlink error code */
 	void *hcpriv;			/* private data for host controller */
+	u8 hcpriv_buffer[4096];         /* small buffer if private data can fit */
 	atomic_t use_count;		/* concurrent submissions counter */
 	atomic_t reject;		/* submissions will fail */