Message ID | 20250512150724.4560-1-00107082@163.com |
---|---|
State | New |
Headers | show |
Series | [RFC] USB: core/xhci: add a buffer in urb for host controller private data | expand |
Hi, On 13.05.25 07:54, David Wang wrote: > URB objects have long lifecycle, an urb can be reused between > enqueue-dequeue loops; The private data needed by some host controller > has very short lifecycle, the memory is alloced when enqueue, and > released when dequeue. For example, on a system with xhci, several > minutes of usage of webcam/keyboard/mouse have memory alloc counts: > drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 > drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 > Memory allocation frequency for host-controller private data can reach > ~1k/s. First of all, thank you for trying to tackle this long running issue. > @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags) > if (!urb) > return NULL; > usb_init_urb(urb); > + urb->hcpriv_mempool_flags = mem_flags; No. You cannot do this. The flags you pass to usb_alloc_urb() depend on the context you call it in. For example, if you are allocating it while holding a spinlock, ou need to use GFP_ATOMIC But that may or may not be the same context you submit the URB in. Recording mem_flags here makes no sense. Regards Oliver
At 2025-05-13 16:11:20, "Oliver Neukum" <oneukum@suse.com> wrote: >Hi, > >On 13.05.25 07:54, David Wang wrote: >> URB objects have long lifecycle, an urb can be reused between >> enqueue-dequeue loops; The private data needed by some host controller >> has very short lifecycle, the memory is alloced when enqueue, and >> released when dequeue. For example, on a system with xhci, several >> minutes of usage of webcam/keyboard/mouse have memory alloc counts: >> drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661 >> drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863 >> Memory allocation frequency for host-controller private data can reach >> ~1k/s. > >First of all, thank you for trying to tackle this long running issue. > >> @@ -77,6 +78,7 @@ struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags) >> if (!urb) >> return NULL; >> usb_init_urb(urb); >> + urb->hcpriv_mempool_flags = mem_flags; > >No. You cannot do this. The flags you pass to usb_alloc_urb() >depend on the context you call it in. For example, if you are >allocating it while holding a spinlock, ou need to use GFP_ATOMIC > >But that may or may not be the same context you submit the URB in. >Recording mem_flags here makes no sense. > > Regards > Oliver Thanks for reviewing this. The memory flag thing do raise concern. I think I can make adjustment: realloc the memory if flag changed. Thanks David
On 13.05.25 10:23, David Wang wrote: Hi, > Thanks for reviewing this. The memory flag thing do raise concern. > I think I can make adjustment: realloc the memory if flag changed. I am sorry. I have been unclear. Here comes a detailed explanation: What we call "gfp_t" is a combination of flags. They describe A - the type of memory (always valid) B - the way the memory can be allocated (valid only at a specific time) The URB is a generic data structure to be processed by the CPU, _not_ the HC. It is always generic kernel memory. Flags of type A make no sense to pass. In fact you may not know for which device an URB will be used when you allocate it. The only valid mem_flags you can pass to usb_alloc_urb() are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC. If you need to reallocate memory for private data you _must_ use the flags passed with usb_submit_urb(). A HCD can modify them by adding flags of type A, but you cannot change flags of type B. For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB, but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save and reuse the GFP_KERNEL. HTH Oliver
Hi David On 12.5.2025 18.07, David Wang wrote: > --- > I was checking memory allocation behaviors (via memory profiling[1]), > when I notice a high frequent memory allocation in xhci_urb_enqueue, about > 250/s when using a USB webcam. If those alloced buffer could be kept and > reused, lots of memory allocations could be avoid over time. > > This patch is just a POC, about 0/s memory allocation in xhci with this > patch, when I use my USB devices, webcam/keyboard/mouse. Thanks for looking at this, this is something that obviously needs more attention > > A dynamic cached memory would be better: URB keep host controller's > private data, if larger size buffer needed for private data, old buffer > released and a larger buffer alloced. > > I did not observe any nagative impact with xhci's 250/s allocations > when using my system, hence no measurement of how useful this changes > can make to user. Just want to collect feedbacks before putting more > effort. > I think we can make a xhci internal solution that doesn't affect other hosts or usb core. How about adding a list of struct urb_priv nodes for every usb device, or maybe even per endpoint? i.e. to struct xhci_virt_device or struct xhci_virt_ep. Add size and list_head entries to struct urb_priv. In xhci_urb_enqueue() pick the first urb_priv node from list if it exists and is large enough, otherwise just allocate a new one and set the size. When giving back the URB zero everything in the struct urb_priv except the size, and add it to the list. When the device is freed there shouldn't be any nodes left in the list, but if there are then warn and free them. Isoc transfers are the ones with odd urb_priv sizes. If we have a per device or per endpoint urb_priv list then sizes will match better. Thanks Mathias
At 2025-05-13 16:46:39, "Oliver Neukum" <oneukum@suse.com> wrote: >On 13.05.25 10:23, David Wang wrote: > >Hi, > > Thanks for reviewing this. The memory flag thing do raise concern. >> I think I can make adjustment: realloc the memory if flag changed. > >I am sorry. I have been unclear. Here comes a detailed explanation: > >What we call "gfp_t" is a combination of flags. They describe > >A - the type of memory (always valid) >B - the way the memory can be allocated (valid only at a specific time) > >The URB is a generic data structure to be processed by the CPU, _not_ >the HC. It is always generic kernel memory. Flags of type A make no sense >to pass. >In fact you may not know for which device an URB will be used when you >allocate it. The only valid mem_flags you can pass to usb_alloc_urb() >are GFP_KERNEL, GFP_NOIO or GFP_ATOMIC. > >If you need to reallocate memory for private data you _must_ use >the flags passed with usb_submit_urb(). A HCD can modify them by adding >flags of type A, but you cannot change flags of type B. >For example, if usb_alloc_urb() used GFP_KERNEL to allocate the URB, >but uses GFP_ATOMIC in usb_submit_urb(), you will deadlock if you save >and reuse the GFP_KERNEL. > > HTH > Oliver > Hi, I have one question about mem flags. If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with flags B and its size is big enough, is it safe to return this memory to usb_submit_urb which is in the context of flags A? Thanks David
On 13.05.25 11:49, David Wang wrote: > Hi, I have one question about mem flags. > If usb_submit_urb wants a memory in context of flags A, say GFP_ATOMIC, but I already have a memory alloc with flags B and its size > is big enough, is it safe to return this memory to usb_submit_urb which is in the context of flags A? Yes, that is perfectly safe. HTH Oliver
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 423bf3649570..bc350b307758 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -831,7 +831,8 @@ static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci, usb_amd_quirk_pll_enable(); } } - xhci_urb_free_priv(urb_priv); + if (urb_priv != (void *)urb->hcpriv_buffer) + xhci_urb_free_priv(urb_priv); usb_hcd_unlink_urb_from_ep(hcd, urb); trace_xhci_urb_giveback(urb); usb_hcd_giveback_urb(hcd, urb, status); diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 90eb491267b5..85aa5fe526c8 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1539,6 +1539,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag unsigned int *ep_state; struct urb_priv *urb_priv; int num_tds; + size_t private_size; ep_index = xhci_get_endpoint_index(&urb->ep->desc); @@ -1552,7 +1553,13 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag else num_tds = 1; - urb_priv = kzalloc(struct_size(urb_priv, td, num_tds), mem_flags); + private_size = struct_size(urb_priv, td, num_tds); + if (private_size <= sizeof(urb->hcpriv_buffer)) { + memset(urb->hcpriv_buffer, 0, sizeof(urb->hcpriv_buffer)); + urb_priv = (struct urb_priv *)urb->hcpriv_buffer; + } else { + urb_priv = kzalloc(private_size, mem_flags); + } if (!urb_priv) return -ENOMEM; @@ -1626,7 +1633,8 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag if (ret) { free_priv: - xhci_urb_free_priv(urb_priv); + if (urb_priv != (void *)urb->hcpriv_buffer) + xhci_urb_free_priv(urb_priv); urb->hcpriv = NULL; } spin_unlock_irqrestore(&xhci->lock, flags); @@ -1789,7 +1797,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) return ret; err_giveback: - if (urb_priv) + if (urb_priv && urb_priv != (void *)urb->hcpriv_buffer) xhci_urb_free_priv(urb_priv); usb_hcd_unlink_urb_from_ep(hcd, urb); spin_unlock_irqrestore(&xhci->lock, flags); diff --git a/include/linux/usb.h b/include/linux/usb.h index b46738701f8d..4f82bb69081c 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1602,6 +1602,7 @@ struct urb { struct kref kref; /* reference count of the URB */ int unlinked; /* unlink error code */ void *hcpriv; /* private data for host controller */ + u8 hcpriv_buffer[4096]; /* small buffer if private data can fit */ atomic_t use_count; /* concurrent submissions counter */ atomic_t reject; /* submissions will fail */
--- I was checking memory allocation behaviors (via memory profiling[1]), when I notice a high frequent memory allocation in xhci_urb_enqueue, about 250/s when using a USB webcam. If those alloced buffer could be kept and reused, lots of memory allocations could be avoid over time. This patch is just a POC, about 0/s memory allocation in xhci with this patch, when I use my USB devices, webcam/keyboard/mouse. A dynamic cached memory would be better: URB keep host controller's private data, if larger size buffer needed for private data, old buffer released and a larger buffer alloced. I did not observe any nagative impact with xhci's 250/s allocations when using my system, hence no measurement of how useful this changes can make to user. Just want to collect feedbacks before putting more effort. [1] https://lore.kernel.org/all/20240221194052.927623-1-surenb@google.com/ --- xhci keeps allocing new memory when enque a urb for private data, and enque frequency could be high, about 250/s when using a usb webcam, about 30/s for high pace USB keyboard/mouse usage. Using a cache/buffer for those private data could avoid lots memory allocations. Signed-off-by: David Wang <00107082@163.com> --- drivers/usb/host/xhci-ring.c | 3 ++- drivers/usb/host/xhci.c | 14 +++++++++++--- include/linux/usb.h | 1 + 3 files changed, 14 insertions(+), 4 deletions(-)