diff mbox series

[v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null

Message ID TYUPR06MB621733B5AC690DBDF80A0DCCD2042@TYUPR06MB6217.apcprd06.prod.outlook.com
State New
Headers show
Series [v3] usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null | expand

Commit Message

胡连勤 Dec. 17, 2024, 7:58 a.m. UTC
From: Lianqin Hu <hulianqin@vivo.com>

Considering that in some extreme cases, when performing the
unbinding operation, gserial_disconnect has cleared gser->ioport,
which triggers gadget reconfiguration, and then calls gs_read_complete,
resulting in access to a null pointer. Therefore, ep is disabled before
gserial_disconnect sets port to null to prevent this from happening.

Call trace:
 gs_read_complete+0x58/0x240
 usb_gadget_giveback_request+0x40/0x160
 dwc3_remove_requests+0x170/0x484
 dwc3_ep0_out_start+0xb0/0x1d4
 __dwc3_gadget_start+0x25c/0x720
 kretprobe_trampoline.cfi_jt+0x0/0x8
 kretprobe_trampoline.cfi_jt+0x0/0x8
 udc_bind_to_driver+0x1d8/0x300
 usb_gadget_probe_driver+0xa8/0x1dc
 gadget_dev_desc_UDC_store+0x13c/0x188
 configfs_write_iter+0x160/0x1f4
 vfs_write+0x2d0/0x40c
 ksys_write+0x7c/0xf0
 __arm64_sys_write+0x20/0x30
 invoke_syscall+0x60/0x150
 el0_svc_common+0x8c/0xf8
 do_el0_svc+0x28/0xa0
 el0_svc+0x24/0x84

Fixes: c1dca562be8a ("usb gadget: split out serial core")
Cc: stable@vger.kernel.org
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
---

Changes in v3:
 - Add --- line above the version tag information
 - Remove extra blank lines in commit messages
 - Version tag information from v2 to changes in v2
 - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/

Changes in v2:
 - Remove some address information from patch descriptions
 - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/
 - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/

 drivers/usb/gadget/function/u_serial.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Prashanth K Dec. 24, 2024, 4:53 a.m. UTC | #1
On 17-12-24 01:28 pm, 胡连勤 wrote:
> From: Lianqin Hu <hulianqin@vivo.com>
> 
> Considering that in some extreme cases, when performing the
> unbinding operation, gserial_disconnect has cleared gser->ioport,
> which triggers gadget reconfiguration, and then calls gs_read_complete,
> resulting in access to a null pointer. Therefore, ep is disabled before
> gserial_disconnect sets port to null to prevent this from happening.

gs_read_complete() gets port from ep->driver_data, and it shouldn't get
affected if gser->ioport is null as long as port[port_num].port is
valid. Am I missing something here?
> 
> Call trace:
>  gs_read_complete+0x58/0x240
>  usb_gadget_giveback_request+0x40/0x160
>  dwc3_remove_requests+0x170/0x484
>  dwc3_ep0_out_start+0xb0/0x1d4
>  __dwc3_gadget_start+0x25c/0x720
>  kretprobe_trampoline.cfi_jt+0x0/0x8
>  kretprobe_trampoline.cfi_jt+0x0/0x8
>  udc_bind_to_driver+0x1d8/0x300
>  usb_gadget_probe_driver+0xa8/0x1dc
>  gadget_dev_desc_UDC_store+0x13c/0x188
>  configfs_write_iter+0x160/0x1f4
>  vfs_write+0x2d0/0x40c
>  ksys_write+0x7c/0xf0
>  __arm64_sys_write+0x20/0x30
>  invoke_syscall+0x60/0x150
>  el0_svc_common+0x8c/0xf8
>  do_el0_svc+0x28/0xa0
>  el0_svc+0x24/0x84
>
Regards,
Prashanth K
胡连勤 Dec. 25, 2024, 7:03 a.m. UTC | #2
Hello  Prashanth K:

> > Considering that in some extreme cases, when performing the unbinding
> > operation, gserial_disconnect has cleared gser->ioport, which triggers
> > gadget reconfiguration, and then calls gs_read_complete, resulting in
> > access to a null pointer. Therefore, ep is disabled before
> > gserial_disconnect sets port to null to prevent this from happening.
> 
> gs_read_complete() gets port from ep->driver_data, and it shouldn't get
> affected if gser->ioport is null as long as port[port_num].port is valid. Am I
> missing something here?

This description is not very appropriate, thank you very much for your suggestion, I will modify it.

> >
> > Call trace:
> >  gs_read_complete+0x58/0x240
> >  usb_gadget_giveback_request+0x40/0x160
> >  dwc3_remove_requests+0x170/0x484
> >  dwc3_ep0_out_start+0xb0/0x1d4
> >  __dwc3_gadget_start+0x25c/0x720
> >  kretprobe_trampoline.cfi_jt+0x0/0x8
> >  kretprobe_trampoline.cfi_jt+0x0/0x8
> >  udc_bind_to_driver+0x1d8/0x300
> >  usb_gadget_probe_driver+0xa8/0x1dc
> >  gadget_dev_desc_UDC_store+0x13c/0x188
> >  configfs_write_iter+0x160/0x1f4
> >  vfs_write+0x2d0/0x40c
> >  ksys_write+0x7c/0xf0
> >  __arm64_sys_write+0x20/0x30
> >  invoke_syscall+0x60/0x150
> >  el0_svc_common+0x8c/0xf8
> >  do_el0_svc+0x28/0xa0
> >  el0_svc+0x24/0x84
> >

Thanks
Prashanth K Dec. 26, 2024, 4:59 a.m. UTC | #3
On 25-12-24 12:33 pm, 胡连勤 wrote:
> Hello  Prashanth K:
> 
>>> Considering that in some extreme cases, when performing the unbinding
>>> operation, gserial_disconnect has cleared gser->ioport, which triggers
>>> gadget reconfiguration, and then calls gs_read_complete, resulting in
>>> access to a null pointer. Therefore, ep is disabled before
>>> gserial_disconnect sets port to null to prevent this from happening.
>>
>> gs_read_complete() gets port from ep->driver_data, and it shouldn't get
>> affected if gser->ioport is null as long as port[port_num].port is valid. Am I
>> missing something here?
> 
> This description is not very appropriate, thank you very much for your suggestion, I will modify it.

Seems like this patch has already accepted applied to Gregs tree. So I
think its fine for now, just update new description here.

> 
>>>
>>> Call trace:
>>>  gs_read_complete+0x58/0x240
>>>  usb_gadget_giveback_request+0x40/0x160
>>>  dwc3_remove_requests+0x170/0x484
>>>  dwc3_ep0_out_start+0xb0/0x1d4
>>>  __dwc3_gadget_start+0x25c/0x720
>>>  kretprobe_trampoline.cfi_jt+0x0/0x8
>>>  kretprobe_trampoline.cfi_jt+0x0/0x8
>>>  udc_bind_to_driver+0x1d8/0x300
>>>  usb_gadget_probe_driver+0xa8/0x1dc
>>>  gadget_dev_desc_UDC_store+0x13c/0x188
>>>  configfs_write_iter+0x160/0x1f4
>>>  vfs_write+0x2d0/0x40c
>>>  ksys_write+0x7c/0xf0
>>>  __arm64_sys_write+0x20/0x30
>>>  invoke_syscall+0x60/0x150
>>>  el0_svc_common+0x8c/0xf8
>>>  do_el0_svc+0x28/0xa0
>>>  el0_svc+0x24/0x84
>>>
> 
> Thanks

Regards,
Prashanth K
Greg Kroah-Hartman Jan. 16, 2025, 1:28 p.m. UTC | #4
On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote:
> Hi Greg, Lianqin,
> 
> On 17/12/2024 07:58, 胡连勤 wrote:
> > From: Lianqin Hu <hulianqin@vivo.com>
> > 
> > Considering that in some extreme cases, when performing the
> > unbinding operation, gserial_disconnect has cleared gser->ioport,
> > which triggers gadget reconfiguration, and then calls gs_read_complete,
> > resulting in access to a null pointer. Therefore, ep is disabled before
> > gserial_disconnect sets port to null to prevent this from happening.
> > 
> > Call trace:
> >   gs_read_complete+0x58/0x240
> >   usb_gadget_giveback_request+0x40/0x160
> >   dwc3_remove_requests+0x170/0x484
> >   dwc3_ep0_out_start+0xb0/0x1d4
> >   __dwc3_gadget_start+0x25c/0x720
> >   kretprobe_trampoline.cfi_jt+0x0/0x8
> >   kretprobe_trampoline.cfi_jt+0x0/0x8
> >   udc_bind_to_driver+0x1d8/0x300
> >   usb_gadget_probe_driver+0xa8/0x1dc
> >   gadget_dev_desc_UDC_store+0x13c/0x188
> >   configfs_write_iter+0x160/0x1f4
> >   vfs_write+0x2d0/0x40c
> >   ksys_write+0x7c/0xf0
> >   __arm64_sys_write+0x20/0x30
> >   invoke_syscall+0x60/0x150
> >   el0_svc_common+0x8c/0xf8
> >   do_el0_svc+0x28/0xa0
> >   el0_svc+0x24/0x84
> > 
> > Fixes: c1dca562be8a ("usb gadget: split out serial core")
> > Cc: stable@vger.kernel.org
> > Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
> > ---
> > 
> > Changes in v3:
> >   - Add --- line above the version tag information
> >   - Remove extra blank lines in commit messages
> >   - Version tag information from v2 to changes in v2
> >   - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/
> > 
> > Changes in v2:
> >   - Remove some address information from patch descriptions
> >   - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/
> >   - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/
> > 
> >   drivers/usb/gadget/function/u_serial.c | 8 ++++----
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
> > index 53d9fc41acc5..bc143a86c2dd 100644
> > --- a/drivers/usb/gadget/function/u_serial.c
> > +++ b/drivers/usb/gadget/function/u_serial.c
> > @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser)
> >   	/* REVISIT as above: how best to track this? */
> >   	port->port_line_coding = gser->port_line_coding;
> > +	/* disable endpoints, aborting down any active I/O */
> > +	usb_ep_disable(gser->out);
> > +	usb_ep_disable(gser->in);
> > +
> >   	port->port_usb = NULL;
> >   	gser->ioport = NULL;
> >   	if (port->port.count > 0) {
> > @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser)
> >   	spin_unlock(&port->port_lock);
> >   	spin_unlock_irqrestore(&serial_port_lock, flags);
> > -	/* disable endpoints, aborting down any active I/O */
> > -	usb_ep_disable(gser->out);
> > -	usb_ep_disable(gser->in);
> > -
> >   	/* finally, free any unused/unusable I/O buffers */
> >   	spin_lock_irqsave(&port->port_lock, flags);
> >   	if (port->port.count == 0)
> 
> 
> We have observed a reboot regression on Tegra234 (I have not tried other
> boards) and bisect is pointing to this commit. Reverting this on top of
> mainline is fixing the problem.
> 
> With this change, when the board reboots we see ...
> 
> [   59.918177] tegra-xudc 3550000.usb: ep 3 disabled
> [   59.923097] tegra-xudc 3550000.usb: ep 2 disabled
> [   59.927955] tegra-xudc 3550000.usb: ep 5 disabled
> [   80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU
> [   80.917354] rcu:     6-....: (5248 ticks this GP) idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623
> [   80.927146] rcu:     (t=5253 jiffies g=3781 q=1490 ncpus=12)
> [   80.932704] Sending NMI from CPU 6 to CPUs 2:
> [   90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.13.0-rc7-00043-g619f0b6fad52 #1
> [   90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024
> [   90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   90.981562] pc : smp_call_function_single+0xdc/0x1a0
> [   90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0
> [   90.981578] sp : ffff800082eb3cd0
> [   90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: ffff0000802933c0
> [   90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: ffff800080132018
> [   90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: ffff80008280d970
> [   90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 0000000000000000
> [   90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [   90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 0000000000000000
> [   90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 0000000000000001
> [   90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : ffff800080132018
> [   90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 0000000000000001
> [   90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 0000000000000004
> [   90.981599] Call trace:
> [   90.981601]  smp_call_function_single+0xdc/0x1a0 (P)
> [   90.981605]  __sync_rcu_exp_select_node_cpus+0x228/0x3c0
> [   90.981607]  sync_rcu_exp_select_cpus+0x13c/0x2a0
> [   90.981609]  wait_rcu_exp_gp+0x18/0x30
> [   90.981611]  kthread_worker_fn+0xd0/0x188
> [   90.981614]  kthread+0x118/0x11c
> [   90.981619]  ret_from_fork+0x10/0x20
> [  101.416347] sched: DL replenish lagged too much
> 

Odd, you have a usb-serial gadget device in this system that is
disconnecting somehow?  That oops doesn't point to anything in the usb
gadget codebase, "all" we have done is move the call to shutdown the
endpoints to earlier in the disconnect function.

I'm glad to revert this, but it feels really odd that this is causing
you an rcu stall issue.

confused,

greg k-h
Jon Hunter Jan. 16, 2025, 3:01 p.m. UTC | #5
On 16/01/2025 13:28, gregkh@linuxfoundation.org wrote:
> On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote:
>> Hi Greg, Lianqin,
>>
>> On 17/12/2024 07:58, 胡连勤 wrote:
>>> From: Lianqin Hu <hulianqin@vivo.com>
>>>
>>> Considering that in some extreme cases, when performing the
>>> unbinding operation, gserial_disconnect has cleared gser->ioport,
>>> which triggers gadget reconfiguration, and then calls gs_read_complete,
>>> resulting in access to a null pointer. Therefore, ep is disabled before
>>> gserial_disconnect sets port to null to prevent this from happening.
>>>
>>> Call trace:
>>>    gs_read_complete+0x58/0x240
>>>    usb_gadget_giveback_request+0x40/0x160
>>>    dwc3_remove_requests+0x170/0x484
>>>    dwc3_ep0_out_start+0xb0/0x1d4
>>>    __dwc3_gadget_start+0x25c/0x720
>>>    kretprobe_trampoline.cfi_jt+0x0/0x8
>>>    kretprobe_trampoline.cfi_jt+0x0/0x8
>>>    udc_bind_to_driver+0x1d8/0x300
>>>    usb_gadget_probe_driver+0xa8/0x1dc
>>>    gadget_dev_desc_UDC_store+0x13c/0x188
>>>    configfs_write_iter+0x160/0x1f4
>>>    vfs_write+0x2d0/0x40c
>>>    ksys_write+0x7c/0xf0
>>>    __arm64_sys_write+0x20/0x30
>>>    invoke_syscall+0x60/0x150
>>>    el0_svc_common+0x8c/0xf8
>>>    do_el0_svc+0x28/0xa0
>>>    el0_svc+0x24/0x84
>>>
>>> Fixes: c1dca562be8a ("usb gadget: split out serial core")
>>> Cc: stable@vger.kernel.org
>>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
>>> ---
>>>
>>> Changes in v3:
>>>    - Add --- line above the version tag information
>>>    - Remove extra blank lines in commit messages
>>>    - Version tag information from v2 to changes in v2
>>>    - Link to v2: https://lore.kernel.org/all/TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>
>>> Changes in v2:
>>>    - Remove some address information from patch descriptions
>>>    - Link to v1: https://lore.kernel.org/all/TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>    - Link to suggestions: https://lore.kernel.org/all/TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>
>>>    drivers/usb/gadget/function/u_serial.c | 8 ++++----
>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
>>> index 53d9fc41acc5..bc143a86c2dd 100644
>>> --- a/drivers/usb/gadget/function/u_serial.c
>>> +++ b/drivers/usb/gadget/function/u_serial.c
>>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser)
>>>    	/* REVISIT as above: how best to track this? */
>>>    	port->port_line_coding = gser->port_line_coding;
>>> +	/* disable endpoints, aborting down any active I/O */
>>> +	usb_ep_disable(gser->out);
>>> +	usb_ep_disable(gser->in);
>>> +
>>>    	port->port_usb = NULL;
>>>    	gser->ioport = NULL;
>>>    	if (port->port.count > 0) {
>>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser)
>>>    	spin_unlock(&port->port_lock);
>>>    	spin_unlock_irqrestore(&serial_port_lock, flags);
>>> -	/* disable endpoints, aborting down any active I/O */
>>> -	usb_ep_disable(gser->out);
>>> -	usb_ep_disable(gser->in);
>>> -
>>>    	/* finally, free any unused/unusable I/O buffers */
>>>    	spin_lock_irqsave(&port->port_lock, flags);
>>>    	if (port->port.count == 0)
>>
>>
>> We have observed a reboot regression on Tegra234 (I have not tried other
>> boards) and bisect is pointing to this commit. Reverting this on top of
>> mainline is fixing the problem.
>>
>> With this change, when the board reboots we see ...
>>
>> [   59.918177] tegra-xudc 3550000.usb: ep 3 disabled
>> [   59.923097] tegra-xudc 3550000.usb: ep 2 disabled
>> [   59.927955] tegra-xudc 3550000.usb: ep 5 disabled
>> [   80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [   80.917354] rcu:     6-....: (5248 ticks this GP) idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623
>> [   80.927146] rcu:     (t=5253 jiffies g=3781 q=1490 ncpus=12)
>> [   80.932704] Sending NMI from CPU 6 to CPUs 2:
>> [   90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.13.0-rc7-00043-g619f0b6fad52 #1
>> [   90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024
>> [   90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   90.981562] pc : smp_call_function_single+0xdc/0x1a0
>> [   90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0
>> [   90.981578] sp : ffff800082eb3cd0
>> [   90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: ffff0000802933c0
>> [   90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: ffff800080132018
>> [   90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: ffff80008280d970
>> [   90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 0000000000000000
>> [   90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [   90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 0000000000000000
>> [   90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 0000000000000001
>> [   90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : ffff800080132018
>> [   90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 0000000000000001
>> [   90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 0000000000000004
>> [   90.981599] Call trace:
>> [   90.981601]  smp_call_function_single+0xdc/0x1a0 (P)
>> [   90.981605]  __sync_rcu_exp_select_node_cpus+0x228/0x3c0
>> [   90.981607]  sync_rcu_exp_select_cpus+0x13c/0x2a0
>> [   90.981609]  wait_rcu_exp_gp+0x18/0x30
>> [   90.981611]  kthread_worker_fn+0xd0/0x188
>> [   90.981614]  kthread+0x118/0x11c
>> [   90.981619]  ret_from_fork+0x10/0x20
>> [  101.416347] sched: DL replenish lagged too much
>>
> 
> Odd, you have a usb-serial gadget device in this system that is
> disconnecting somehow?  That oops doesn't point to anything in the usb
> gadget codebase, "all" we have done is move the call to shutdown the
> endpoints to earlier in the disconnect function.

Yes the board starts usb-serial and usb-ethernet gadget and on reboot 
when tearing it down I am seeing the above. As soon as it disables the 
tegra-xudc endpoints (as seen above) the board appears to stall.

> I'm glad to revert this, but it feels really odd that this is causing
> you an rcu stall issue.

Thanks. I can't say I understand it either, but I am certain it is 
caused by this change.

Happy to run any tests to narrow this down a bit.

Jon
Jon Hunter Jan. 21, 2025, 12:19 p.m. UTC | #6
On 16/01/2025 15:01, Jon Hunter wrote:
> 
> On 16/01/2025 13:28, gregkh@linuxfoundation.org wrote:
>> On Thu, Jan 16, 2025 at 01:11:36PM +0000, Jon Hunter wrote:
>>> Hi Greg, Lianqin,
>>>
>>> On 17/12/2024 07:58, 胡连勤 wrote:
>>>> From: Lianqin Hu <hulianqin@vivo.com>
>>>>
>>>> Considering that in some extreme cases, when performing the
>>>> unbinding operation, gserial_disconnect has cleared gser->ioport,
>>>> which triggers gadget reconfiguration, and then calls gs_read_complete,
>>>> resulting in access to a null pointer. Therefore, ep is disabled before
>>>> gserial_disconnect sets port to null to prevent this from happening.
>>>>
>>>> Call trace:
>>>>    gs_read_complete+0x58/0x240
>>>>    usb_gadget_giveback_request+0x40/0x160
>>>>    dwc3_remove_requests+0x170/0x484
>>>>    dwc3_ep0_out_start+0xb0/0x1d4
>>>>    __dwc3_gadget_start+0x25c/0x720
>>>>    kretprobe_trampoline.cfi_jt+0x0/0x8
>>>>    kretprobe_trampoline.cfi_jt+0x0/0x8
>>>>    udc_bind_to_driver+0x1d8/0x300
>>>>    usb_gadget_probe_driver+0xa8/0x1dc
>>>>    gadget_dev_desc_UDC_store+0x13c/0x188
>>>>    configfs_write_iter+0x160/0x1f4
>>>>    vfs_write+0x2d0/0x40c
>>>>    ksys_write+0x7c/0xf0
>>>>    __arm64_sys_write+0x20/0x30
>>>>    invoke_syscall+0x60/0x150
>>>>    el0_svc_common+0x8c/0xf8
>>>>    do_el0_svc+0x28/0xa0
>>>>    el0_svc+0x24/0x84
>>>>
>>>> Fixes: c1dca562be8a ("usb gadget: split out serial core")
>>>> Cc: stable@vger.kernel.org
>>>> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Signed-off-by: Lianqin Hu <hulianqin@vivo.com>
>>>> ---
>>>>
>>>> Changes in v3:
>>>>    - Add --- line above the version tag information
>>>>    - Remove extra blank lines in commit messages
>>>>    - Version tag information from v2 to changes in v2
>>>>    - Link to v2: https://lore.kernel.org/all/ 
>>>> TYUPR06MB6217DAA095A9863D4B58D57CD23B2@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>>
>>>> Changes in v2:
>>>>    - Remove some address information from patch descriptions
>>>>    - Link to v1: https://lore.kernel.org/all/ 
>>>> TYUPR06MB621763AB815989161F4033AFD2762@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>>    - Link to suggestions: https://lore.kernel.org/all/ 
>>>> TYUPR06MB6217DE28012FFEC5E808DD64D2962@TYUPR06MB6217.apcprd06.prod.outlook.com/
>>>>
>>>>    drivers/usb/gadget/function/u_serial.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/ 
>>>> gadget/function/u_serial.c
>>>> index 53d9fc41acc5..bc143a86c2dd 100644
>>>> --- a/drivers/usb/gadget/function/u_serial.c
>>>> +++ b/drivers/usb/gadget/function/u_serial.c
>>>> @@ -1420,6 +1420,10 @@ void gserial_disconnect(struct gserial *gser)
>>>>        /* REVISIT as above: how best to track this? */
>>>>        port->port_line_coding = gser->port_line_coding;
>>>> +    /* disable endpoints, aborting down any active I/O */
>>>> +    usb_ep_disable(gser->out);
>>>> +    usb_ep_disable(gser->in);
>>>> +
>>>>        port->port_usb = NULL;
>>>>        gser->ioport = NULL;
>>>>        if (port->port.count > 0) {
>>>> @@ -1431,10 +1435,6 @@ void gserial_disconnect(struct gserial *gser)
>>>>        spin_unlock(&port->port_lock);
>>>>        spin_unlock_irqrestore(&serial_port_lock, flags);
>>>> -    /* disable endpoints, aborting down any active I/O */
>>>> -    usb_ep_disable(gser->out);
>>>> -    usb_ep_disable(gser->in);
>>>> -
>>>>        /* finally, free any unused/unusable I/O buffers */
>>>>        spin_lock_irqsave(&port->port_lock, flags);
>>>>        if (port->port.count == 0)
>>>
>>>
>>> We have observed a reboot regression on Tegra234 (I have not tried other
>>> boards) and bisect is pointing to this commit. Reverting this on top of
>>> mainline is fixing the problem.
>>>
>>> With this change, when the board reboots we see ...
>>>
>>> [   59.918177] tegra-xudc 3550000.usb: ep 3 disabled
>>> [   59.923097] tegra-xudc 3550000.usb: ep 2 disabled
>>> [   59.927955] tegra-xudc 3550000.usb: ep 5 disabled
>>> [   80.911432] rcu: INFO: rcu_preempt self-detected stall on CPU
>>> [   80.917354] rcu:     6-....: (5248 ticks this GP) 
>>> idle=ec24/1/0x4000000000000000 softirq=1213/1213 fqs=2623
>>> [   80.927146] rcu:     (t=5253 jiffies g=3781 q=1490 ncpus=12)
>>> [   80.932704] Sending NMI from CPU 6 to CPUs 2:
>>> [   90.981555] CPU: 6 UID: 0 PID: 18 Comm: rcu_exp_gp_kthr Not 
>>> tainted 6.13.0-rc7-00043-g619f0b6fad52 #1
>>> [   90.981558] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer 
>>> Kit/Jetson, BIOS 00.0.0-dev-main_92e5ae_88fd1_296de 12/16/2024
>>> [   90.981559] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS 
>>> BTYPE=--)
>>> [   90.981562] pc : smp_call_function_single+0xdc/0x1a0
>>> [   90.981574] lr : __sync_rcu_exp_select_node_cpus+0x228/0x3c0
>>> [   90.981578] sp : ffff800082eb3cd0
>>> [   90.981579] x29: ffff800082eb3cd0 x28: 0000000000000010 x27: 
>>> ffff0000802933c0
>>> [   90.981582] x26: ffff0007a8a1d700 x25: ffff800082895500 x24: 
>>> ffff800080132018
>>> [   90.981584] x23: 0000000000000014 x22: ffff800081fb7700 x21: 
>>> ffff80008280d970
>>> [   90.981586] x20: 0000000000000feb x19: ffff800082eb3d00 x18: 
>>> 0000000000000000
>>> [   90.981588] x17: 0000000000000000 x16: 0000000000000000 x15: 
>>> 0000000000000000
>>> [   90.981590] x14: ffff000080293440 x13: 0000000000000001 x12: 
>>> 0000000000000000
>>> [   90.981591] x11: ffff800081fb2388 x10: ffff0000802933c0 x9 : 
>>> 0000000000000001
>>> [   90.981593] x8 : 0000000000000040 x7 : 0000000000017068 x6 : 
>>> ffff800080132018
>>> [   90.981595] x5 : 0000000000000000 x4 : ffff0007a8a4f9c8 x3 : 
>>> 0000000000000001
>>> [   90.981597] x2 : 0000000000000000 x1 : ffff0007a8a4f9c0 x0 : 
>>> 0000000000000004
>>> [   90.981599] Call trace:
>>> [   90.981601]  smp_call_function_single+0xdc/0x1a0 (P)
>>> [   90.981605]  __sync_rcu_exp_select_node_cpus+0x228/0x3c0
>>> [   90.981607]  sync_rcu_exp_select_cpus+0x13c/0x2a0
>>> [   90.981609]  wait_rcu_exp_gp+0x18/0x30
>>> [   90.981611]  kthread_worker_fn+0xd0/0x188
>>> [   90.981614]  kthread+0x118/0x11c
>>> [   90.981619]  ret_from_fork+0x10/0x20
>>> [  101.416347] sched: DL replenish lagged too much
>>>
>>
>> Odd, you have a usb-serial gadget device in this system that is
>> disconnecting somehow?  That oops doesn't point to anything in the usb
>> gadget codebase, "all" we have done is move the call to shutdown the
>> endpoints to earlier in the disconnect function.
> 
> Yes the board starts usb-serial and usb-ethernet gadget and on reboot 
> when tearing it down I am seeing the above. As soon as it disables the 
> tegra-xudc endpoints (as seen above) the board appears to stall.
> 
>> I'm glad to revert this, but it feels really odd that this is causing
>> you an rcu stall issue.
> 
> Thanks. I can't say I understand it either, but I am certain it is 
> caused by this change.
> 
> Happy to run any tests to narrow this down a bit.


I did a bit more looking at this and I see that we setup a USB gadget 
device via the configfs as described in this doc [0]. The RCU stall 
occurs when we attempt to disable the gadget on shutdown by ...

  $ echo "" > /path/to/UDC

Jon

[0] https://docs.kernel.org/usb/gadget_configfs.html
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 53d9fc41acc5..bc143a86c2dd 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -1420,6 +1420,10 @@  void gserial_disconnect(struct gserial *gser)
 	/* REVISIT as above: how best to track this? */
 	port->port_line_coding = gser->port_line_coding;
 
+	/* disable endpoints, aborting down any active I/O */
+	usb_ep_disable(gser->out);
+	usb_ep_disable(gser->in);
+
 	port->port_usb = NULL;
 	gser->ioport = NULL;
 	if (port->port.count > 0) {
@@ -1431,10 +1435,6 @@  void gserial_disconnect(struct gserial *gser)
 	spin_unlock(&port->port_lock);
 	spin_unlock_irqrestore(&serial_port_lock, flags);
 
-	/* disable endpoints, aborting down any active I/O */
-	usb_ep_disable(gser->out);
-	usb_ep_disable(gser->in);
-
 	/* finally, free any unused/unusable I/O buffers */
 	spin_lock_irqsave(&port->port_lock, flags);
 	if (port->port.count == 0)