diff mbox series

[v2,1/2] USB: core: add a memory pool to urb for host-controller private data

Message ID 20250513113817.11962-1-00107082@163.com
State New
Headers show
Series [v2,1/2] USB: core: add a memory pool to urb for host-controller private data | expand

Commit Message

David Wang May 13, 2025, 11:38 a.m. UTC
---
Changes:
1. Use caller's mem_flags if a larger memory is needed.
Thanks to Oliver Neukum <oneukum@suse.com>'s review.
---
URB objects have long lifecycle, an urb can be reused between
enqueue-dequeue loops; The private data needed by some host controller
has very short lifecycle, the memory is alloced when enqueue, and
released when dequeue. For example, on a system with xhci, several
minutes of usage of webcam/keyboard/mouse have memory alloc counts:
  drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
  drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
Memory allocation frequency for host-controller private data can reach
~1k/s and above.

High frequent allocations for host-controller private data can be
avoided if urb take over the ownership of memory, the memory then shares
the longer lifecycle with urb objects.

Add a mempool to urb for hcpriv usage, the mempool only holds one block
of memory and grows when larger size is requested.

Signed-off-by: David Wang <00107082@163.com>
---
 drivers/usb/core/urb.c | 23 +++++++++++++++++++++++
 include/linux/usb.h    |  3 +++
 2 files changed, 26 insertions(+)

Comments

Alan Stern May 13, 2025, 2:25 p.m. UTC | #1
On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
> ---
> Changes:
> 1. Use caller's mem_flags if a larger memory is needed.
> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
> ---
> URB objects have long lifecycle, an urb can be reused between
> enqueue-dequeue loops; The private data needed by some host controller
> has very short lifecycle, the memory is alloced when enqueue, and
> released when dequeue. For example, on a system with xhci, several
> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
> Memory allocation frequency for host-controller private data can reach
> ~1k/s and above.
> 
> High frequent allocations for host-controller private data can be
> avoided if urb take over the ownership of memory, the memory then shares
> the longer lifecycle with urb objects.
> 
> Add a mempool to urb for hcpriv usage, the mempool only holds one block
> of memory and grows when larger size is requested.
> 
> Signed-off-by: David Wang <00107082@163.com>

It should be possible to do what you want without touching the USB core 
code at all, changing only xhci-hcd.  That's what Mathias is suggesting.

Instead of having an URB keep ownership of its extra memory after it 
completes, xhci-hcd can put the memory area onto a free list.  Then 
memory areas on the free list can be reused with almost no overhead when 
URBs are enqueued later on.

This idea can become more elaborate if you maintain separate free lists 
for different devices, or even for different endpoints, or sort the free 
list by the size of the memory areas.  But the basic idea is always the 
same: Don't change usbcore (including struct urb), and make getting and 
releasing the extra memory areas have extremely low overhead.

Alan Stern
David Wang May 13, 2025, 2:41 p.m. UTC | #2
At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
>> ---
>> Changes:
>> 1. Use caller's mem_flags if a larger memory is needed.
>> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
>> ---
>> URB objects have long lifecycle, an urb can be reused between
>> enqueue-dequeue loops; The private data needed by some host controller
>> has very short lifecycle, the memory is alloced when enqueue, and
>> released when dequeue. For example, on a system with xhci, several
>> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> Memory allocation frequency for host-controller private data can reach
>> ~1k/s and above.
>> 
>> High frequent allocations for host-controller private data can be
>> avoided if urb take over the ownership of memory, the memory then shares
>> the longer lifecycle with urb objects.
>> 
>> Add a mempool to urb for hcpriv usage, the mempool only holds one block
>> of memory and grows when larger size is requested.
>> 
>> Signed-off-by: David Wang <00107082@163.com>
>
>It should be possible to do what you want without touching the USB core 
>code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
>
>Instead of having an URB keep ownership of its extra memory after it 
>completes, xhci-hcd can put the memory area onto a free list.  Then 
>memory areas on the free list can be reused with almost no overhead when 
>URBs are enqueued later on.

I have to disagree,  your suggestion has no much difference from requesting memory from
system, locks and memory pool managements, all the same are needed, why bother?

The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
and no two HCD can take over the same URB as the same time.

I think the memory pool here is not actually a pool,  or I should say the mempool consists of
pool of URBs, and each URB have only "single one" slot of mem pool in it.


>
>This idea can become more elaborate if you maintain separate free lists 
>for different devices, or even for different endpoints, or sort the free 
>list by the size of the memory areas.  But the basic idea is always the 
>same: Don't change usbcore (including struct urb), and make getting and 
>releasing the extra memory areas have extremely low overhead.

Why implements a device level memory pool would have extremely low overhead?
Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
host controllers, xhci is what I have in my system; i think similar changes would benefit other
HCs


>
>Alan Stern
Alan Stern May 13, 2025, 3:37 p.m. UTC | #3
On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote:
> 
> 
> At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
> >> ---
> >> Changes:
> >> 1. Use caller's mem_flags if a larger memory is needed.
> >> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
> >> ---
> >> URB objects have long lifecycle, an urb can be reused between
> >> enqueue-dequeue loops; The private data needed by some host controller
> >> has very short lifecycle, the memory is alloced when enqueue, and
> >> released when dequeue. For example, on a system with xhci, several
> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
> >>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
> >>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
> >> Memory allocation frequency for host-controller private data can reach
> >> ~1k/s and above.
> >> 
> >> High frequent allocations for host-controller private data can be
> >> avoided if urb take over the ownership of memory, the memory then shares
> >> the longer lifecycle with urb objects.
> >> 
> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block
> >> of memory and grows when larger size is requested.
> >> 
> >> Signed-off-by: David Wang <00107082@163.com>
> >
> >It should be possible to do what you want without touching the USB core 
> >code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
> >
> >Instead of having an URB keep ownership of its extra memory after it 
> >completes, xhci-hcd can put the memory area onto a free list.  Then 
> >memory areas on the free list can be reused with almost no overhead when 
> >URBs are enqueued later on.
> 
> I have to disagree,  your suggestion has no much difference from requesting memory from
> system, locks and memory pool managements, all the same are needed, why bother?

There are two differences.  First, xhci-hcd already has its own lock 
that it acquires when enqueuing or dequeuing URBs, so no additional 
locks would be needed.  Second, complicated memory pool management isn't 
necessary; the management can be extremely simple.  (For example, 
Mathias suggested just reusing the most recently released memory area 
unless it is too small.)

> The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
> and no two HCD can take over the same URB as the same time.
> 
> I think the memory pool here is not actually a pool,  or I should say the mempool consists of
> pool of URBs, and each URB have only "single one" slot of mem pool in it.
> 
> 
> >
> >This idea can become more elaborate if you maintain separate free lists 
> >for different devices, or even for different endpoints, or sort the free 
> >list by the size of the memory areas.  But the basic idea is always the 
> >same: Don't change usbcore (including struct urb), and make getting and 
> >releasing the extra memory areas have extremely low overhead.
> 
> Why implements a device level memory pool would have extremely low overhead?

Because then getting or releasing memory areas from the pool could be 
carried out just by manipulating a couple of pointers.

Some fraction of the time, URBs are created on demand and destroyed upon 
completion.  Your approach would not save any time for these URBs, 
whereas my suggested approach would.  (This fraction probably isn't very 
large, although I don't know how big it is.)

> Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
> host controllers, xhci is what I have in my system; i think similar changes would benefit other
> HCs

Those other HC drivers would still require changes.  They could be 
changed to support my scheme as easily as your scheme.

If I were redesigning Linux's entire USB stack from the beginning, I 
would change it so that URBs would be dedicated to particular host 
controllers and endpoint types -- maybe even to particular endpoints.  
They would contain all the additional memory required for the HCD to use 
them, all pre-allocated, so that dynamic allocation would not be needed 
during normal use.  (The gadget subsystem behaves this way already.)

Such a drastic change isn't feasible at this point, although what you 
are suggesting is a step in that direction.  In the end it comes down 
to a time/space tradeoff, and it's very difficult to know what the best 
balance is.

I'm not saying that your approach is bad or wrong, just that there are 
other possibilities to consider.

Alan Stern

Alan Stern
David Wang May 13, 2025, 4:35 p.m. UTC | #4
At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Tue, May 13, 2025 at 10:41:45PM +0800, David Wang wrote:
>> 
>> 
>> At 2025-05-13 22:25:50, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>> >On Tue, May 13, 2025 at 07:38:17PM +0800, David Wang wrote:
>> >> ---
>> >> Changes:
>> >> 1. Use caller's mem_flags if a larger memory is needed.
>> >> Thanks to Oliver Neukum <oneukum@suse.com>'s review.
>> >> ---
>> >> URB objects have long lifecycle, an urb can be reused between
>> >> enqueue-dequeue loops; The private data needed by some host controller
>> >> has very short lifecycle, the memory is alloced when enqueue, and
>> >> released when dequeue. For example, on a system with xhci, several
>> >> minutes of usage of webcam/keyboard/mouse have memory alloc counts:
>> >>   drivers/usb/core/urb.c:75 [usbcore] func:usb_alloc_urb 661
>> >>   drivers/usb/host/xhci.c:1555 [xhci_hcd] func:xhci_urb_enqueue 424863
>> >> Memory allocation frequency for host-controller private data can reach
>> >> ~1k/s and above.
>> >> 
>> >> High frequent allocations for host-controller private data can be
>> >> avoided if urb take over the ownership of memory, the memory then shares
>> >> the longer lifecycle with urb objects.
>> >> 
>> >> Add a mempool to urb for hcpriv usage, the mempool only holds one block
>> >> of memory and grows when larger size is requested.
>> >> 
>> >> Signed-off-by: David Wang <00107082@163.com>
>> >
>> >It should be possible to do what you want without touching the USB core 
>> >code at all, changing only xhci-hcd.  That's what Mathias is suggesting.
>> >
>> >Instead of having an URB keep ownership of its extra memory after it 
>> >completes, xhci-hcd can put the memory area onto a free list.  Then 
>> >memory areas on the free list can be reused with almost no overhead when 
>> >URBs are enqueued later on.
>> 
>> I have to disagree,  your suggestion has no much difference from requesting memory from
>> system, locks and memory pool managements, all the same are needed, why bother?
>
>There are two differences.  First, xhci-hcd already has its own lock 
>that it acquires when enqueuing or dequeuing URBs, so no additional 
>locks would be needed.  Second, complicated memory pool management isn't 
>necessary; the management can be extremely simple.  (For example, 
>Mathias suggested just reusing the most recently released memory area 
>unless it is too small.)

My patch also just reuse memory,  in a simpler way I would argue....

>
>> The reason I choose URB is that  URB life cycle contains the private data's lifecycle, 
>> and no two HCD can take over the same URB as the same time.
>> 
>> I think the memory pool here is not actually a pool,  or I should say the mempool consists of
>> pool of URBs, and each URB have only "single one" slot of mem pool in it.
>> 
>> 
>> >
>> >This idea can become more elaborate if you maintain separate free lists 
>> >for different devices, or even for different endpoints, or sort the free 
>> >list by the size of the memory areas.  But the basic idea is always the 
>> >same: Don't change usbcore (including struct urb), and make getting and 
>> >releasing the extra memory areas have extremely low overhead.
>> 
>> Why implements a device level memory pool would have extremely low overhead?
>
>Because then getting or releasing memory areas from the pool could be 
>carried out just by manipulating a couple of pointers.

A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?  I doubt that.
There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
or it is just that in my mind, make changes to "usb core" is a big no-no!

>
>Some fraction of the time, URBs are created on demand and destroyed upon 
>completion.  Your approach would not save any time for these URBs, 
>whereas my suggested approach would.  (This fraction probably isn't very 
>large, although I don't know how big it is.)

I am aiming to void extra tons of alloctions for "private data",  not URB allocation.
When I use my USB webcam, I would observer 1k+ allocation per seconds for private data, while
 urb allocation's frequency is already very low,  people have already done the  optimization
I guess.   The memory profiling feature introduced in 6.12 is a very good start for identifying
where improvement could be made for memory behavior. 

>
>> Why  making changes to usb core is bad? The idea in this thread is meant for all kinds of
>> host controllers, xhci is what I have in my system; i think similar changes would benefit other
>> HCs
>
>Those other HC drivers would still require changes.  They could be 
>changed to support my scheme as easily as your scheme.

>
>If I were redesigning Linux's entire USB stack from the beginning, I 
>would change it so that URBs would be dedicated to particular host 
>controllers and endpoint types -- maybe even to particular endpoints.  
>They would contain all the additional memory required for the HCD to use 
>them, all pre-allocated, so that dynamic allocation would not be needed 
>during normal use.  (The gadget subsystem behaves this way already.)
>
>Such a drastic change isn't feasible at this point, although what you 
>are suggesting is a step in that direction.  In the end it comes down 
>to a time/space tradeoff, and it's very difficult to know what the best 
>balance is.
Drastic change? Do you mean make change to USB core? Because counting by
lines of changes in this patch, I feel my patch is quite simple and efficient.
I also don't get your time/space tradeoff here,  are you talking about my patch?
or you were just talking a solution in your mind....
This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...

>
>I'm not saying that your approach is bad or wrong, just that there are 
>other possibilities to consider.
>
>Alan Stern
>
>Alan Stern


David
Alan Stern May 13, 2025, 6:21 p.m. UTC | #5
On Wed, May 14, 2025 at 12:35:29AM +0800, David Wang wrote:
> 
> At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote:
> >> I have to disagree,  your suggestion has no much difference from requesting memory from
> >> system, locks and memory pool managements, all the same are needed, why bother?
> >
> >There are two differences.  First, xhci-hcd already has its own lock 
> >that it acquires when enqueuing or dequeuing URBs, so no additional 
> >locks would be needed.  Second, complicated memory pool management isn't 
> >necessary; the management can be extremely simple.  (For example, 
> >Mathias suggested just reusing the most recently released memory area 
> >unless it is too small.)
> 
> My patch also just reuse memory,  in a simpler way I would argue....

I didn't say your approach wasn't simple.  I said that my approach has 
very low overhead, a lot lower than the existing code, whereas you said 
my approach "has no much difference" from the existing code.

> >> >This idea can become more elaborate if you maintain separate free lists 
> >> >for different devices, or even for different endpoints, or sort the free 
> >> >list by the size of the memory areas.  But the basic idea is always the 
> >> >same: Don't change usbcore (including struct urb), and make getting and 
> >> >releasing the extra memory areas have extremely low overhead.
> >> 
> >> Why implements a device level memory pool would have extremely low overhead?
> >
> >Because then getting or releasing memory areas from the pool could be 
> >carried out just by manipulating a couple of pointers.
> 
> A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?
>  I doubt that.

I didn't say pointer manipulation was simpler than a reusable buffer.  I 
said that it was very low overhead, in order to answer your question: 
"Why implements a device level memory pool would have extremely low 
overhead?"

> There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
> I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
> or it is just that in my mind, make changes to "usb core" is a big no-no!

Neither one.
  
However, you have forgotten about one thing: With your patch, each URB 
maintains ownership of a memory area even when the URB is not in use.  
With the existing code, that memory is freed when the URB is not in use, 
and with my approach the memory is shared among URBs.

In this way, your patch will use more memory than the existing code or 
my approach.  The question to answer is which is better: Using more 
memory (your patch) or using more time (the allocations and 
deallocations in the existing code or my approach)?

> >If I were redesigning Linux's entire USB stack from the beginning, I 
> >would change it so that URBs would be dedicated to particular host 
> >controllers and endpoint types -- maybe even to particular endpoints.  
> >They would contain all the additional memory required for the HCD to use 
> >them, all pre-allocated, so that dynamic allocation would not be needed 
> >during normal use.  (The gadget subsystem behaves this way already.)
> >
> >Such a drastic change isn't feasible at this point, although what you 
> >are suggesting is a step in that direction.  In the end it comes down 
> >to a time/space tradeoff, and it's very difficult to know what the best 
> >balance is.
> Drastic change? Do you mean make change to USB core?

No, I meant redesigning the entire USB stack.  It would require changing 
all the USB drivers to allocate URBs differently from the way they do 
now.  And changing every HC driver to preallocate the memory it needs to 
perform transfers.  I think you can agree that would be a pretty drastic 
change.

>  Because counting by
> lines of changes in this patch, I feel my patch is quite simple and efficient.
> I also don't get your time/space tradeoff here,  are you talking about my patch?
> or you were just talking a solution in your mind....

Both.  See my comment above.

> This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...

It also needs an extra memory area that is allocated the first time the 
URB is used, and is not deallocated until the URB is destroyed.  You 
seem to be ignoring this fact.

Alan Stern
David Wang May 13, 2025, 6:48 p.m. UTC | #6
At 2025-05-14 02:21:15, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>On Wed, May 14, 2025 at 12:35:29AM +0800, David Wang wrote:
>> 
>> At 2025-05-13 23:37:20, "Alan Stern" <stern@rowland.harvard.edu> wrote:
>> >> I have to disagree,  your suggestion has no much difference from requesting memory from
>> >> system, locks and memory pool managements, all the same are needed, why bother?
>> >
>> >There are two differences.  First, xhci-hcd already has its own lock 
>> >that it acquires when enqueuing or dequeuing URBs, so no additional 
>> >locks would be needed.  Second, complicated memory pool management isn't 
>> >necessary; the management can be extremely simple.  (For example, 
>> >Mathias suggested just reusing the most recently released memory area 
>> >unless it is too small.)
>> 
>> My patch also just reuse memory,  in a simpler way I would argue....
>
>I didn't say your approach wasn't simple.  I said that my approach has 
>very low overhead, a lot lower than the existing code, whereas you said 
>my approach "has no much difference" from the existing code.
>
>> >> >This idea can become more elaborate if you maintain separate free lists 
>> >> >for different devices, or even for different endpoints, or sort the free 
>> >> >list by the size of the memory areas.  But the basic idea is always the 
>> >> >same: Don't change usbcore (including struct urb), and make getting and 
>> >> >releasing the extra memory areas have extremely low overhead.
>> >> 
>> >> Why implements a device level memory pool would have extremely low overhead?
>> >
>> >Because then getting or releasing memory areas from the pool could be 
>> >carried out just by manipulating a couple of pointers.
>> 
>> A couple of pointers manipulation? and it would be simpler than a reusable buffer in URB?
>>  I doubt that.
>
>I didn't say pointer manipulation was simpler than a reusable buffer.  I 
>said that it was very low overhead, in order to answer your question: 
>"Why implements a device level memory pool would have extremely low 
>overhead?"

Now I feels it  becomes a wording game .....
>
>> There would be lots of details needs to consider,  detail is devil and that why we prefer simpler solution,
>> I just don't understand, you seems imply that my patch is not simple, could you elaborate more on it,
>> or it is just that in my mind, make changes to "usb core" is a big no-no!
>
>Neither one.
>  
>However, you have forgotten about one thing: With your patch, each URB 
>maintains ownership of a memory area even when the URB is not in use.  
>With the existing code, that memory is freed when the URB is not in use, 
>and with my approach the memory is shared among URBs.
>
>In this way, your patch will use more memory than the existing code or 
>my approach.  The question to answer is which is better: Using more 
>memory (your patch) or using more time (the allocations and 
>deallocations in the existing code or my approach)?
>
>> >If I were redesigning Linux's entire USB stack from the beginning, I 
>> >would change it so that URBs would be dedicated to particular host 
>> >controllers and endpoint types -- maybe even to particular endpoints.  
>> >They would contain all the additional memory required for the HCD to use 
>> >them, all pre-allocated, so that dynamic allocation would not be needed 
>> >during normal use.  (The gadget subsystem behaves this way already.)
>> >
>> >Such a drastic change isn't feasible at this point, although what you 
>> >are suggesting is a step in that direction.  In the end it comes down 
>> >to a time/space tradeoff, and it's very difficult to know what the best 
>> >balance is.
>> Drastic change? Do you mean make change to USB core?
>
>No, I meant redesigning the entire USB stack.  It would require changing 
>all the USB drivers to allocate URBs differently from the way they do 
>now.  And changing every HC driver to preallocate the memory it needs to 
>perform transfers.  I think you can agree that would be a pretty drastic 
>change.
>
>>  Because counting by
>> lines of changes in this patch, I feel my patch is quite simple and efficient.
>> I also don't get your time/space tradeoff here,  are you talking about my patch?
>> or you were just talking a solution in your mind....
>
>Both.  See my comment above.
>
>> This patch only needs a pointer and a size in URB, and URB object allocated in a slow pace...
>
>It also needs an extra memory area that is allocated the first time the 
>URB is used, and is not deallocated until the URB is destroyed.  You 
>seem to be ignoring this fact.
>
>Alan Stern

It is not an "extra" memory area,  the memory is needed by HC anyway, the memory pool just cache it.
And about not freeing memory until URB released,  you seems forgot that we are talking 
about "memory pool" .  A URB only used once could be considered a memory pool never used.

If your memory pool approach would not  "waste" memory, I would  rather happy to learn.

I want to mention the purpose of this patch again:  
A lot of "private data" allocation could be avoided if  we use a "mempool" to cache and reuse those memory.
And use URB as the holder is a very simple way to implement this,. 

And to add , base on my memory profiling, URB usage is very efficient. I think it is a very good candidate to hold
private data cache for HCs.


David
Alan Stern May 13, 2025, 7:46 p.m. UTC | #7
On Wed, May 14, 2025 at 02:48:21AM +0800, David Wang wrote:
> It is not an "extra" memory area,  the memory is needed by HC anyway, the memory pool just cache it.
> And about not freeing memory until URB released,  you seems forgot that we are talking 
> about "memory pool" .  A URB only used once could be considered a memory pool never used.
> 
> If your memory pool approach would not  "waste" memory, I would  rather happy to learn.

Here's a simple example to illustrate the point.  Suppose a driver uses 
two URBs, call them A and B, but it never has more than one URB active 
at a time.  Thus, when A completes B is submitted, and when B completes 
A is submitted.

With your approach A and B each have their own memory area.  With my 
approach, a single memory area is shared between A and B.  Therefore my 
approach uses less total memory.

Now, I admit this pattern is probably not at all common.  Usually if a 
driver is going to reuse an URB, it resubmits the URB as soon as the URB 
completes rather than waiting for some other URB to complete.  Drivers 
generally don't keep many unused URBs just sitting around -- although 
there may be exceptions, like a driver for a media device when the 
device isn't running.

> I want to mention the purpose of this patch again:  
> A lot of "private data" allocation could be avoided if  we use a "mempool" to cache and reuse those memory.
> And use URB as the holder is a very simple way to implement this,. 
> 
> And to add , base on my memory profiling, URB usage is very efficient. I think it is a very good candidate to hold
> private data cache for HCs.

All right.  I withdraw any objection to your patches.

Alan Stern
David Wang May 14, 2025, 6:44 a.m. UTC | #8
Hi, 

Update memory footprints after hours of USB devices usage
on my system:
(I have webcam/mic/keyboard/mouse/harddisk connected via USB,
a full picture of memory footprints is attached below)
+----------------------+----------------+-------------------------------------------+-----------------------+
| active memory(bytes) | active objects |               alloc location              | total objects created |
+----------------------+----------------+-------------------------------------------+-----------------------+
|        22912         |       24       | core/urb.c:1054:urb_hcpriv_mempool_zalloc |         10523         |
|        11776         |       31       |        core/urb.c:76:usb_alloc_urb        |         11027         |
+----------------------+----------------+-------------------------------------------+-----------------------+

The count for active URB objects remain at low level,
its peak is about 12KB when I copied 10G file to my harddisk.
The memory pool in this patch takes about 22KB, its peak is 23KB.
The patch meant to reuse memory via a mempool, the memory kept in pool is indeed
the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.)
How balance the tradeoff is depends on how well the mempool is managed.
This patch takes a easy approach: put faith in URB objects management and put
a single slot of mempool in URB on demands. And the changes, by counting lines
in this patch, are very simple.
Base on the profiling, the number of active URB objects are kept at a very low scale,
only several could have a very long lifecycle.
I think URB is a good candidate for caching those memory needed for private data.
But I could be very wrong, due simply to the lack of knowledge.

And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk
would yield high rate of memory allocation for priviate data in xhci_urb_enqueue:
+----------------------+----------------+-----------------------------------+-----------------------+
| active memory(bytes) | active objects |           alloc location          | total objects created |
+----------------------+----------------+-----------------------------------+-----------------------+
|        22784         |       23       | host/xhci.c:1555:xhci_urb_enqueue |         894281 << grow|ing very quick
|        10880         |       31       |    core/urb.c:75:usb_alloc_urb    |          4028         |
+----------------------+----------------+-----------------------------------+-----------------------+
I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue
when I was copying 10G file, and had my webcam opened at the same time.

And again, to be honest, I did not observe any observable performance improvement from
an enduser's point of view with this patch. The only significant improvement is memory footprint
_numbers_.
I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of
"the less allocation the better".


Full drivers/usb memory footprint on my system with this patch:
(The data was collected via memory profiling: https://docs.kernel.org/mm/allocation-profiling.html)
+----------------------+----------------+------------------------------------------------------+-----------------------+
| active memory(bytes) | active objects |                    alloc location                    | total objects created |
+----------------------+----------------+------------------------------------------------------+-----------------------+
|        40960         |       5        |      host/xhci-mem.c:951:xhci_alloc_virt_device      |           10          |
|        26624         |       26       |      core/port.c:742:usb_hub_create_port_device      |           26          |
|        24576         |       2        |     host/xhci-mem.c:2170:xhci_setup_port_arrays      |           3           |
|        22912         |       24       |      core/urb.c:1054:urb_hcpriv_mempool_zalloc       |         10523         |
|        21504         |       21       |        core/endpoint.c:157:usb_create_ep_devs        |           31          |
|        18432         |       9        |             core/usb.c:650:usb_alloc_dev             |           10          |
|        16384         |       4        |           core/hcd.c:2553:__usb_create_hcd           |           4           |
|        13312         |       13       |      core/message.c:2037:usb_set_configuration       |           14          |
|        11776         |       31       |             core/urb.c:76:usb_alloc_urb              |         11027         |
|         9216         |       9        |       core/config.c:930:usb_get_configuration        |           10          |
|         5120         |       5        |              core/hub.c:1938:hub_probe               |           5           |
|         5120         |       2        |     host/xhci-mem.c:2156:xhci_setup_port_arrays      |           3           |
|         2560         |       5        |   host/xhci-debugfs.c:578:xhci_debugfs_create_slot   |           10          |
|         2416         |       16       |        host/xhci-mem.c:49:xhci_segment_alloc         |           44          |
|         2176         |       17       |         host/xhci-mem.c:377:xhci_ring_alloc          |           38          |
|         2112         |       22       |      core/port.c:746:usb_hub_create_port_device      |           26          |
|         2048         |       32       |        host/xhci-mem.c:38:xhci_segment_alloc         |           73          |
|         1728         |       18       |   host/xhci-debugfs.c:90:xhci_debugfs_alloc_regset   |           26          |
|         1632         |       17       |        core/config.c:619:usb_parse_interface         |           18          |
|         1504         |       9        |       core/config.c:967:usb_get_configuration        |           10          |
|         1312         |       13       |      core/config.c:820:usb_parse_configuration       |           14          |
|         1024         |       1        |        host/xhci-mem.c:812:xhci_alloc_tt_info        |           2           |
|         704          |       24       |         core/message.c:1032:usb_cache_string         |           27          |
|         512          |       8        | host/xhci-debugfs.c:438:xhci_debugfs_create_endpoint |           22          |
|         320          |       10       |     host/xhci-mem.c:461:xhci_alloc_container_ctx     |           30          |
|         272          |       2        |        host/xhci-mem.c:1635:scratchpad_alloc         |           3           |
|         224          |       5        |            core/hub.c:1502:hub_configure             |           5           |
|         192          |       4        |   host/xhci-mem.c:2121:xhci_create_rhub_port_array   |           6           |
|         192          |       2        |     host/xhci-mem.c:2263:xhci_alloc_interrupter      |           3           |
|         160          |       2        |     host/xhci-mem.c:2198:xhci_setup_port_arrays      |           3           |
|         128          |       2        |          host/xhci-mem.c:2506:xhci_mem_init          |           3           |
|         128          |       2        |      core/config.c:1063:usb_get_bos_descriptor       |           3           |
|         128          |       2        |      core/config.c:1058:usb_get_bos_descriptor       |           3           |
|          80          |       5        |            core/hub.c:1454:hub_configure             |           5           |
|          72          |       9        |       core/config.c:935:usb_get_configuration        |           10          |
|          64          |       2        |        host/xhci-mem.c:1624:scratchpad_alloc         |           3           |
|          64          |       2        |           core/hcd.c:2565:__usb_create_hcd           |           2           |
|          64          |       2        |           core/hcd.c:2557:__usb_create_hcd           |           2           |
|          40          |       5        |        host/xhci-mem.c:2039:xhci_add_in_port         |           6           |
|          40          |       5        |            core/hub.c:1447:hub_configure             |           5           |
|          40          |       5        |            core/hub.c:1441:hub_configure             |           5           |
|          0           |       0        |            core/message.c:529:usb_sg_init            |         10137         |
|          0           |       0        |          core/message.c:144:usb_control_msg          |          793          |
|          0           |       0        |            core/hcd.c:491:rh_call_control            |          449          |
|          0           |       0        |       host/xhci-mem.c:1705:xhci_alloc_command        |          234          |
|          0           |       0        |       host/xhci-mem.c:1711:xhci_alloc_command        |           97          |
|          0           |       0        |            core/message.c:980:usb_string             |           31          |
|          0           |       0        |         core/message.c:1028:usb_cache_string         |           27          |
|          0           |       0        |          core/message.c:1145:usb_get_status          |           19          |
|          0           |       0        |    core/message.c:1061:usb_get_device_descriptor     |           14          |
|          0           |       0        |      core/message.c:2031:usb_set_configuration       |           10          |
|          0           |       0        |            core/hub.c:4883:hub_port_init             |           10          |
|          0           |       0        |       core/config.c:939:usb_get_configuration        |           10          |
|          0           |       0        |         core/hub.c:5311:descriptors_changed          |           4           |
|          0           |       0        |      core/config.c:1037:usb_get_bos_descriptor       |           3           |
|          0           |       0        |           storage/usb.c:540:associate_dev            |           1           |
+----------------------+----------------+------------------------------------------------------+-----------------------+

FYI
David
Greg KH May 14, 2025, 7:29 a.m. UTC | #9
On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote:
> Hi, 
> 
> Update memory footprints after hours of USB devices usage
> on my system:
> (I have webcam/mic/keyboard/mouse/harddisk connected via USB,
> a full picture of memory footprints is attached below)
> +----------------------+----------------+-------------------------------------------+-----------------------+
> | active memory(bytes) | active objects |               alloc location              | total objects created |
> +----------------------+----------------+-------------------------------------------+-----------------------+
> |        22912         |       24       | core/urb.c:1054:urb_hcpriv_mempool_zalloc |         10523         |
> |        11776         |       31       |        core/urb.c:76:usb_alloc_urb        |         11027         |
> +----------------------+----------------+-------------------------------------------+-----------------------+
> 
> The count for active URB objects remain at low level,
> its peak is about 12KB when I copied 10G file to my harddisk.
> The memory pool in this patch takes about 22KB, its peak is 23KB.
> The patch meant to reuse memory via a mempool, the memory kept in pool is indeed
> the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.)
> How balance the tradeoff is depends on how well the mempool is managed.
> This patch takes a easy approach: put faith in URB objects management and put
> a single slot of mempool in URB on demands. And the changes, by counting lines
> in this patch, are very simple.
> Base on the profiling, the number of active URB objects are kept at a very low scale,
> only several could have a very long lifecycle.
> I think URB is a good candidate for caching those memory needed for private data.
> But I could be very wrong, due simply to the lack of knowledge.
> 
> And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk
> would yield high rate of memory allocation for priviate data in xhci_urb_enqueue:
> +----------------------+----------------+-----------------------------------+-----------------------+
> | active memory(bytes) | active objects |           alloc location          | total objects created |
> +----------------------+----------------+-----------------------------------+-----------------------+
> |        22784         |       23       | host/xhci.c:1555:xhci_urb_enqueue |         894281 << grow|ing very quick
> |        10880         |       31       |    core/urb.c:75:usb_alloc_urb    |          4028         |
> +----------------------+----------------+-----------------------------------+-----------------------+
> I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue
> when I was copying 10G file, and had my webcam opened at the same time.
> 
> And again, to be honest, I did not observe any observable performance improvement from
> an enduser's point of view with this patch. The only significant improvement is memory footprint
> _numbers_.
> I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of
> "the less allocation the better".

No, this isn't necessarily true at all.  Allocations are fast, and if we
free/allocate things quickly, it's even faster.  USB is limited by the
hardware throughput, which is _very_ slow compared to memory accesses of
the allocator.

So unless you can show that we are using less CPU time, or something
else "real" that is measurable in a real way in userspace, that would
justify the extra complexity, it's going to be hard to get me to agree
that this is something that needs to be addressed at all.

Also, I'm totally confused as to what the "latest" version of this
patchset is...

thanks,

greg k-h
David Wang May 14, 2025, 8:50 a.m. UTC | #10
At 2025-05-14 15:29:42, "Greg KH" <gregkh@linuxfoundation.org> wrote:
>On Wed, May 14, 2025 at 02:44:55PM +0800, David Wang wrote:
>> Hi, 
>> 
>> Update memory footprints after hours of USB devices usage
>> on my system:
>> (I have webcam/mic/keyboard/mouse/harddisk connected via USB,
>> a full picture of memory footprints is attached below)
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> | active memory(bytes) | active objects |               alloc location              | total objects created |
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> |        22912         |       24       | core/urb.c:1054:urb_hcpriv_mempool_zalloc |         10523         |
>> |        11776         |       31       |        core/urb.c:76:usb_alloc_urb        |         11027         |
>> +----------------------+----------------+-------------------------------------------+-----------------------+
>> 
>> The count for active URB objects remain at low level,
>> its peak is about 12KB when I copied 10G file to my harddisk.
>> The memory pool in this patch takes about 22KB, its peak is 23KB.
>> The patch meant to reuse memory via a mempool, the memory kept in pool is indeed
>> the "tradeoff" when the system is idle. (Well, we are talking about mempool anyway.)
>> How balance the tradeoff is depends on how well the mempool is managed.
>> This patch takes a easy approach: put faith in URB objects management and put
>> a single slot of mempool in URB on demands. And the changes, by counting lines
>> in this patch, are very simple.
>> Base on the profiling, the number of active URB objects are kept at a very low scale,
>> only several could have a very long lifecycle.
>> I think URB is a good candidate for caching those memory needed for private data.
>> But I could be very wrong, due simply to the lack of knowledge.
>> 
>> And before, without the patch, a 10 minutes webcam usage and copying 10G file to harddisk
>> would yield high rate of memory allocation for priviate data in xhci_urb_enqueue:
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> | active memory(bytes) | active objects |           alloc location          | total objects created |
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> |        22784         |       23       | host/xhci.c:1555:xhci_urb_enqueue |         894281 << grow|ing very quick
>> |        10880         |       31       |    core/urb.c:75:usb_alloc_urb    |          4028         |
>> +----------------------+----------------+-----------------------------------+-----------------------+
>> I observe a highest allocation rate of 1.5K/s in xhci_urb_enqueue
>> when I was copying 10G file, and had my webcam opened at the same time.
>> 
>> And again, to be honest, I did not observe any observable performance improvement from
>> an enduser's point of view with this patch. The only significant improvement is memory footprint
>> _numbers_.
>> I guess memory allocation is indeed "_really damn fast_", but I still have the mindset of
>> "the less allocation the better".
>
>No, this isn't necessarily true at all.  Allocations are fast, and if we
>free/allocate things quickly, it's even faster.  USB is limited by the
>hardware throughput, which is _very_ slow compared to memory accesses of
>the allocator.
>
>So unless you can show that we are using less CPU time, or something
>else "real" that is measurable in a real way in userspace, that would
>justify the extra complexity, it's going to be hard to get me to agree
>that this is something that needs to be addressed at all.

Thanks for feedbacks~! 
That's very reasonable to me,  and I have been pondering on how
to profile a USB performance, but still no clue.

 I will keep thinking about it, hopefully this 1k+/s allocation would show up somewhere, or 
conclude that it really has no significant impact at all.


Thanks
David

>
>Also, I'm totally confused as to what the "latest" version of this
>patchset is...
>
sorry, I think I mess up the mails when I add "reply-to" header to newer patches

>thanks,
>
>greg k-h
Oliver Neukum May 14, 2025, 9:34 a.m. UTC | #11
On 14.05.25 09:29, Greg KH wrote:
  
> No, this isn't necessarily true at all.  Allocations are fast, and if we
> free/allocate things quickly, it's even faster.  USB is limited by the
> hardware throughput, which is _very_ slow compared to memory accesses of
> the allocator.

If and only if we do not trigger disk IO. If you really want to give this patch
a good performance testing you'd have to do it under memory pressure.

	Regards
		Oliver
Oliver Neukum May 14, 2025, 11:23 a.m. UTC | #12
On 13.05.25 13:38, David Wang wrote:
> ---

Hi,

still an issue after a second review.
I should have noticed earlier.

> --- a/drivers/usb/core/urb.c
> +++ b/drivers/usb/core/urb.c
> @@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
>   
>   	if (urb->transfer_flags & URB_FREE_BUFFER)
>   		kfree(urb->transfer_buffer);
> +	kfree(urb->hcpriv_mempool);

What if somebody uses usb_init_urb()?
  
>   	kfree(urb);
>   }
> @@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
>   
>   EXPORT_SYMBOL_GPL(usb_anchor_empty);
>   
> +/**
> + * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
> + * @urb: pointer to URB being used
> + * @size: memory size requested by current host controller
> + * @mem_flags: the type of memory to allocate
> + *
> + * Return: NULL if out of memory, otherwise memory are zeroed
> + */
> +void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
> +{
> +	if (urb->hcpriv_mempool_size < size) {
> +		kfree(urb->hcpriv_mempool);
> +		urb->hcpriv_mempool_size = size;
> +		urb->hcpriv_mempool = kmalloc(size, mem_flags);

That could use kzalloc().

	Regards
		Oliver
Oliver Neukum May 14, 2025, 11:27 a.m. UTC | #13
On 13.05.25 16:25, Alan Stern wrote:
  
> Instead of having an URB keep ownership of its extra memory after it
> completes, xhci-hcd can put the memory area onto a free list.  Then
> memory areas on the free list can be reused with almost no overhead when
> URBs are enqueued later on.

The number of URBs in flight is basically unlimited.
When would you shrink the number of cached privates?
That would seem to be the basic question if you do
not link this with URBs.

	Regards
		Oliver
David Wang May 14, 2025, 11:51 a.m. UTC | #14
At 2025-05-14 19:23:02, "Oliver Neukum" <oneukum@suse.com> wrote:
>
>
>On 13.05.25 13:38, David Wang wrote:
>> ---
>
>Hi,
>
>still an issue after a second review.
>I should have noticed earlier.
>
>> --- a/drivers/usb/core/urb.c
>> +++ b/drivers/usb/core/urb.c
>> @@ -23,6 +23,7 @@ static void urb_destroy(struct kref *kref)
>>   
>>   	if (urb->transfer_flags & URB_FREE_BUFFER)
>>   		kfree(urb->transfer_buffer);
>> +	kfree(urb->hcpriv_mempool);
>
>What if somebody uses usb_init_urb()?

I am not quite sure about the concern here, do you mean somebody create a urb,
and then usb_init_urb() here, and never use urb_destroy to release it?

That would cause memory leak if urb_destroy is not called......But is this really possible?.

>  
>>   	kfree(urb);
>>   }
>> @@ -1037,3 +1038,25 @@ int usb_anchor_empty(struct usb_anchor *anchor)
>>   
>>   EXPORT_SYMBOL_GPL(usb_anchor_empty);
>>   
>> +/**
>> + * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
>> + * @urb: pointer to URB being used
>> + * @size: memory size requested by current host controller
>> + * @mem_flags: the type of memory to allocate
>> + *
>> + * Return: NULL if out of memory, otherwise memory are zeroed
>> + */
>> +void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
>> +{
>> +	if (urb->hcpriv_mempool_size < size) {
>> +		kfree(urb->hcpriv_mempool);
>> +		urb->hcpriv_mempool_size = size;
>> +		urb->hcpriv_mempool = kmalloc(size, mem_flags);
>
>That could use kzalloc().

The memory would be  set to 0 before returning to user, via memset,   no matter whether the memory is 
newly alloced or just reused.  I think using kmalloc is ok here.


Thanks
David

>
>	Regards
>		Oliver
Oliver Neukum May 14, 2025, 12:03 p.m. UTC | #15
On 14.05.25 13:51, David Wang wrote:
  
> I am not quite sure about the concern here, do you mean somebody create a urb,
> and then usb_init_urb() here, and never use urb_destroy to release it?

Yes.

> 
> That would cause memory leak if urb_destroy is not called......But is this really possible?.

I think a few drivers under drivers/media do so.

	Regards
		Oliver
David Wang May 14, 2025, 12:14 p.m. UTC | #16
At 2025-05-14 20:03:02, "Oliver Neukum" <oneukum@suse.com> wrote:
>
>
>On 14.05.25 13:51, David Wang wrote:
>  
>> I am not quite sure about the concern here, do you mean somebody create a urb,
>> and then usb_init_urb() here, and never use urb_destroy to release it?
>
>Yes.
>
>> 
>> That would cause memory leak if urb_destroy is not called......But is this really possible?.
>
>I think a few drivers under drivers/media do so.

That would cause real problem here.... I will  keep this in mind...
(It is really a bad pattern to have only init function, it should be paired
with corresponding "release" to cancel side-effects......)


Thanks~
David


>
>	Regards
>		Oliver
David Wang May 16, 2025, 5:13 p.m. UTC | #17
At 2025-05-14 20:03:02, "Oliver Neukum" <oneukum@suse.com> wrote:
>
>
>On 14.05.25 13:51, David Wang wrote:
>  
>> I am not quite sure about the concern here, do you mean somebody create a urb,
>> and then usb_init_urb() here, and never use urb_destroy to release it?
>
>Yes.
>
>> 
>> That would cause memory leak if urb_destroy is not called......But is this really possible?.
>
>I think a few drivers under drivers/media do so.


I search through codes, some drivers will use usb_free_urb which would invoke urb_destroy;
But there are indeed several drivers use urb as a struct member, which is not directly kmalloced and 
should not be kfreed via usb_free_urb..... It would involve lots of changes.....

On the bright side, when I made the code check, I notice something off:
in drivers/net/wireless/realtek/rtl8xxxu/core.c


5168                 usb_free_urb(&tx_urb->urb);
5868                 usb_free_urb(&rx_urb->urb);
5890                 usb_free_urb(&rx_urb->urb);
5938                         usb_free_urb(&rx_urb->urb);

usb_free_urb would kfree the urb pointer, which would be wrong here since rx_urb and tx_urb defined 
in drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
1944 struct rtl8xxxu_rx_urb {
1945         struct urb urb;
1946         struct ieee80211_hw *hw;
1947         struct list_head list;
1948 };
1949 
1950 struct rtl8xxxu_tx_urb {
1951         struct urb urb;
1952         struct ieee80211_hw *hw;
1953         struct list_head list;
1954 };


Hi, Jes

Would you take a look? I feel usb_free_urb needs a pointer which is allokedd directly, but I would be wrong.....


Thanks
David
>
>	Regards
>		Oliver
David Wang May 17, 2025, 9:09 a.m. UTC | #18
At 2025-05-14 17:34:21, "Oliver Neukum" <oneukum@suse.com> wrote:
>On 14.05.25 09:29, Greg KH wrote:
>  
>> No, this isn't necessarily true at all.  Allocations are fast, and if we
>> free/allocate things quickly, it's even faster.  USB is limited by the
>> hardware throughput, which is _very_ slow compared to memory accesses of
>> the allocator.
>
>If and only if we do not trigger disk IO. If you really want to give this patch
>a good performance testing you'd have to do it under memory pressure.
>
>	Regards
>		Oliver


Hi, I made some test:

Using FPS for webcam and bitrate for audio mic for measurement.
When system is under no memory pressure, no significant difference could be observed w/o this patch.
When system is under heavy memory pressure, bitrate would drop from ~760.3kbits/s to ~524.3kbits/s,
but this patch dose not make any significant difference, bitrate drops are almost the same w/o this. 
When under heavy memory pressure, my whole system gets slow....

But I think, in between no memory pressure and heavy memory pressure, there would be a point where
an extra 1k/s would kick start a chain-of-effect landing a very bad performance, it is just very hard
to pinpoint.

Using my webcam would have ~250/s memory allocation rate, and my mic ~1k/s. I am imaging a system with
several usb webcam/mic connected. There would be x*1k/s allocation if those devices are used
at the same time. (Not sure whether all allctation could be avoided under heavy usage of usb devices,
but I think good part of the allocations can be reused.)

Still think this change benefits even without a solid evidence yet.
(I have send out another version addressing Oliver's comments about urb managed by drivers)


Thanks
David
diff mbox series

Patch

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 5e52a35486af..51bf25125aeb 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -23,6 +23,7 @@  static void urb_destroy(struct kref *kref)
 
 	if (urb->transfer_flags & URB_FREE_BUFFER)
 		kfree(urb->transfer_buffer);
+	kfree(urb->hcpriv_mempool);
 
 	kfree(urb);
 }
@@ -1037,3 +1038,25 @@  int usb_anchor_empty(struct usb_anchor *anchor)
 
 EXPORT_SYMBOL_GPL(usb_anchor_empty);
 
+/**
+ * urb_hcpriv_mempool_zalloc - alloc memory from mempool for hcpriv
+ * @urb: pointer to URB being used
+ * @size: memory size requested by current host controller
+ * @mem_flags: the type of memory to allocate
+ *
+ * Return: NULL if out of memory, otherwise memory are zeroed
+ */
+void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags)
+{
+	if (urb->hcpriv_mempool_size < size) {
+		kfree(urb->hcpriv_mempool);
+		urb->hcpriv_mempool_size = size;
+		urb->hcpriv_mempool = kmalloc(size, mem_flags);
+	}
+	if (urb->hcpriv_mempool)
+		memset(urb->hcpriv_mempool, 0, size);
+	else
+		urb->hcpriv_mempool_size = 0;
+	return urb->hcpriv_mempool;
+}
+EXPORT_SYMBOL_GPL(urb_hcpriv_mempool_zalloc);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index b46738701f8d..4400e41271bc 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1602,6 +1602,8 @@  struct urb {
 	struct kref kref;		/* reference count of the URB */
 	int unlinked;			/* unlink error code */
 	void *hcpriv;			/* private data for host controller */
+	void *hcpriv_mempool;           /* a single slot of cache for HCD's private data */
+	size_t hcpriv_mempool_size;     /* current size of the memory pool */
 	atomic_t use_count;		/* concurrent submissions counter */
 	atomic_t reject;		/* submissions will fail */
 
@@ -1790,6 +1792,7 @@  extern int usb_wait_anchor_empty_timeout(struct usb_anchor *anchor,
 extern struct urb *usb_get_from_anchor(struct usb_anchor *anchor);
 extern void usb_scuttle_anchored_urbs(struct usb_anchor *anchor);
 extern int usb_anchor_empty(struct usb_anchor *anchor);
+extern void *urb_hcpriv_mempool_zalloc(struct urb *urb, size_t size, gfp_t mem_flags);
 
 #define usb_unblock_urb	usb_unpoison_urb