diff mbox series

USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up

Message ID 20240906030548.845115-1-duanchenghao@kylinos.cn
State Superseded
Headers show
Series USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up | expand

Commit Message

Duan Chenghao Sept. 6, 2024, 3:05 a.m. UTC
When a device is inserted into the USB port and an S4 wakeup is initiated,
after the USB-hub initialization is completed, it will automatically enter suspend mode.
Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery.
However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state,
it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed.

This patch makes two modifications in total:
1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd,
which were previously split between the top half and bottom half of the interrupt,
are now unified and executed solely in the bottom half of the interrupt.
This prevents the bottom half tasks from being frozen during the S4 process,
ensuring that the clean_bit process can proceed without interruption.

2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING.
When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit.
This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process.

Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
---
 drivers/usb/core/hcd.c | 1 -
 drivers/usb/core/hub.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Duan Chenghao Sept. 10, 2024, 9:34 a.m. UTC | #1
> [Please make sure that the lines in your email message don't extend 
> beyond 76 columns or so.]
> 

OK. Later, I will modify the patch format. V2 patch will be released
later

> Lots of things here seem to be wrong.
> 
> On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> 
> There is no such thing as an S4 wakeup.  Do you mean wakeup from an
> S4 
> suspend state?

Yes, waking up from the S4 suspend state.

> 
> > after the USB-hub initialization is completed, it will
> > automatically enter suspend mode.
> 
> What will enter suspend mode?  The hub that the device was plugged
> into?
> That should not happen.  The hub initialization code should detect
> that 
> a new device was plugged in and prevent the hub from suspending.
> 

Yes, the current issue is that the hub detects a new device during the
resuming process. However, the S4 wakeup is attempting to put the hub
into suspend mode, and during the suspend process, it detects that the
HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the
return of an EBUSY status.

> > Upon detecting a device on the USB port, it will proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> 
> HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.
> 
> > During the S4 wakeup process, peripherals are put into suspend
> > mode, followed by task recovery.
> 
> What do you mean by "task recovery"?  We don't need to recover any 
> tasks.
> 

S4 wakeup restores the image that was saved before the system entered
the S4 sleep state.

    S4 waking up from hibernation
    =============================
    kernel initialization
    |   
    v   
    freeze user task and kernel thread
    |   
    v   
    load saved image
    |    
    v   
    freeze the peripheral device and controller
    (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set,
     return to EBUSY and do not perform the following restore image.)
    |
    v
    restore image(task recovery)


> What do you mean by "peripherals are put into suspend mode"?  That's
> not 
> what happens.  Peripherals are set back to full power.
> 
> > However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state,
> > it will return an EBUSY status, causing the S4 suspend to fail and
> > subsequent task recovery to not proceed.
> 
> What will return an EBUSY status?

if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.

> 
> Why do you say that S4 suspend will fail?  Aren't you talking about
> S4 
> wakeup?

After returning EBUSY, the subsequent restore image operation will not
be executed.

> 
> Can you provide a kernel log that explains these points and shows
> what 
> problem you are trying to solve?

[    9.009166][ 2] [  T403] PM: Image signature found, resuming
[    9.009167][ 2] [  T403] PM: resume from hibernation
[    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
[inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
[    9.009244][ 2] [  T403] Freezing user space processes ... (elapsed
0.001 seconds) done.
[    9.010355][ 2] [  T403] OOM killer disabled.
[    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
[    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
[    9.073334][ 2] [  T403] PM: Loading and decompressing image data
(486874 pages)...
[    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[    9.095928][ 2] [  T403] PM: Image loading progress:   0%
[    9.664803][ 2] [  T403] PM: Image loading progress:  10%
[    9.794156][ 2] [  T403] PM: Image loading progress:  20%
[    9.913001][ 2] [  T403] PM: Image loading progress:  30%
[   10.034331][ 2] [  T403] PM: Image loading progress:  40%
[   10.154070][ 2] [  T403] PM: Image loading progress:  50%
[   10.277096][ 2] [  T403] PM: Image loading progress:  60%
[   10.398860][ 2] [  T403] PM: Image loading progress:  70%
[   10.533760][ 2] [  T403] PM: Image loading progress:  80%
[   10.659874][ 2] [  T403] PM: Image loading progress:  90%
[   10.760681][ 2] [  T403] PM: Image loading progress: 100%
[   10.760693][ 2] [  T403] PM: Image loading done
[   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
(1159.22 MB/s)
[   10.761982][ 2] [  T403] PM: Image successfully loaded
[   10.761988][ 2] [  T403] printk: Suspending console(s) (use
no_console_suspend to debug)
[   10.864973][ 2] [  T403] innovpu_freeze:1782
[   10.864974][ 2] [  T403] innovpu_suspend:1759
[   11.168871][ 2] [  T189] PM: pci_pm_freeze():
hcd_pci_suspend+0x0/0x38 returns -16
[   11.168875][ 2] [  T189] PM: dpm_run_callback():
pci_pm_freeze+0x0/0x108 returns -16
[   11.168876][ 2] [  T189] PM: Device 0000:05:00.0 failed to quiesce
async: error -16
[   12.270452][ 2] [  T403] innovpu_thaw:1792
[   12.405296][ 2] [  T403] PM: Failed to load hibernation image,
recovering.
[   12.486859][ 2] [  T403] PM: Basic memory bitmaps freed
[   12.486860][ 2] [  T403] OOM killer enabled.
[   12.486861][ 2] [  T403] Restarting tasks ... 

> 
> > This patch makes two modifications in total:
> > 1. The set_bit and clean_bit operations for the
> > HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> > which were previously split between the top half and bottom half of
> > the interrupt,
> > are now unified and executed solely in the bottom half of the
> > interrupt.
> > This prevents the bottom half tasks from being frozen during the S4
> > process,
> > ensuring that the clean_bit process can proceed without
> > interruption.
> 
> The name is "clear_bit" (with an 'r'), not "clean_bit".
> 
> > 2. Add a condition to the set_bit operation for the hcd status
> > HCD_FLAG_WAKEUP_PENDING.
> > When the hcd status is HC_STATE_SUSPENDED, perform the setting of
> > the aforementioned status bit.
> > This prevents a subsequent set_bit from occurring after the
> > clean_bit if the hcd is in the resuming process.
> 
> hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after
> calling 
> hcd->driver->bus_resume().  After that point,
> usb_hcd_resume_root_hub() 
> won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?
> 
> Alan Stern
> 
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > ---
> >  drivers/usb/core/hcd.c | 1 -
> >  drivers/usb/core/hub.c | 3 +++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 1ff7d901fede..a6bd0fbd82f4 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd
> > *hcd)
> >         spin_lock_irqsave (&hcd_root_hub_lock, flags);
> >         if (hcd->rh_registered) {
> >                 pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> > -               set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> >                 queue_work(pm_wq, &hcd->wakeup_work);
> >         }
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 4b93c0bd1d4b..7f847c4afc0d 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device
> > *udev, pm_message_t msg)
> >  
> >  int usb_remote_wakeup(struct usb_device *udev)
> >  {
> > +       struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> >         int     status = 0;
> >  
> >         usb_lock_device(udev);
> >         if (udev->state == USB_STATE_SUSPENDED) {
> >                 dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> > +               if (hcd->state == HC_STATE_SUSPENDED)
> > +                       set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd-
> > > flags);
> >                 status = usb_autoresume_device(udev);
> >                 if (status == 0) {
> >                         /* Let the drivers do their thing, then...
> > */
> > -- 
> > 2.34.1
> > 
> >
Duan Chenghao Sept. 10, 2024, 9:36 a.m. UTC | #2
> [Please make sure that the lines in your email message don't extend 
> beyond 76 columns or so.]
> 

OK. Later, I will modify the patch format. V2 patch will be released
later

> Lots of things here seem to be wrong.
> 
> On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> 
> There is no such thing as an S4 wakeup.  Do you mean wakeup from an
> S4 
> suspend state?

Yes, waking up from the S4 suspend state.

> 
> > after the USB-hub initialization is completed, it will
> > automatically enter suspend mode.
> 
> What will enter suspend mode?  The hub that the device was plugged
> into?
> That should not happen.  The hub initialization code should detect
> that 
> a new device was plugged in and prevent the hub from suspending.
> 

Yes, the current issue is that the hub detects a new device during the
resuming process. However, the S4 wakeup is attempting to put the hub
into suspend mode, and during the suspend process, it detects that the
HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the
return of an EBUSY status.

> > Upon detecting a device on the USB port, it will proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> 
> HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.
> 
> > During the S4 wakeup process, peripherals are put into suspend
> > mode, followed by task recovery.
> 
> What do you mean by "task recovery"?  We don't need to recover any 
> tasks.
> 

S4 wakeup restores the image that was saved before the system entered
the S4 sleep state.

    S4 waking up from hibernation
    =============================
    kernel initialization
    |   
    v   
    freeze user task and kernel thread
    |   
    v   
    load saved image
    |    
    v   
    freeze the peripheral device and controller
    (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set,
     return to EBUSY and do not perform the following restore image.)
    |
    v
    restore image(task recovery)


> What do you mean by "peripherals are put into suspend mode"?  That's
> not 
> what happens.  Peripherals are set back to full power.
> 
> > However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state,
> > it will return an EBUSY status, causing the S4 suspend to fail and
> > subsequent task recovery to not proceed.
> 
> What will return an EBUSY status?

if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.

> 
> Why do you say that S4 suspend will fail?  Aren't you talking about
> S4 
> wakeup?

After returning EBUSY, the subsequent restore image operation will not
be executed.

> 
> Can you provide a kernel log that explains these points and shows
> what 
> problem you are trying to solve?

[    9.009166][ 2] [  T403] PM: Image signature found, resuming
[    9.009167][ 2] [  T403] PM: resume from hibernation
[    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
[inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
[    9.009244][ 2] [  T403] Freezing user space processes ... (elapsed
0.001 seconds) done.
[    9.010355][ 2] [  T403] OOM killer disabled.
[    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
[    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
[    9.073334][ 2] [  T403] PM: Loading and decompressing image data
(486874 pages)...
[    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[    9.095928][ 2] [  T403] PM: Image loading progress:   0%
[    9.664803][ 2] [  T403] PM: Image loading progress:  10%
[    9.794156][ 2] [  T403] PM: Image loading progress:  20%
[    9.913001][ 2] [  T403] PM: Image loading progress:  30%
[   10.034331][ 2] [  T403] PM: Image loading progress:  40%
[   10.154070][ 2] [  T403] PM: Image loading progress:  50%
[   10.277096][ 2] [  T403] PM: Image loading progress:  60%
[   10.398860][ 2] [  T403] PM: Image loading progress:  70%
[   10.533760][ 2] [  T403] PM: Image loading progress:  80%
[   10.659874][ 2] [  T403] PM: Image loading progress:  90%
[   10.760681][ 2] [  T403] PM: Image loading progress: 100%
[   10.760693][ 2] [  T403] PM: Image loading done
[   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
(1159.22 MB/s)
[   10.761982][ 2] [  T403] PM: Image successfully loaded
[   10.761988][ 2] [  T403] printk: Suspending console(s) (use
no_console_suspend to debug)
[   10.864973][ 2] [  T403] innovpu_freeze:1782
[   10.864974][ 2] [  T403] innovpu_suspend:1759
[   11.168871][ 2] [  T189] PM: pci_pm_freeze():
hcd_pci_suspend+0x0/0x38 returns -16
[   11.168875][ 2] [  T189] PM: dpm_run_callback():
pci_pm_freeze+0x0/0x108 returns -16
[   11.168876][ 2] [  T189] PM: Device 0000:05:00.0 failed to quiesce
async: error -16
[   12.270452][ 2] [  T403] innovpu_thaw:1792
[   12.405296][ 2] [  T403] PM: Failed to load hibernation image,
recovering.
[   12.486859][ 2] [  T403] PM: Basic memory bitmaps freed
[   12.486860][ 2] [  T403] OOM killer enabled.
[   12.486861][ 2] [  T403] Restarting tasks ... 

> 
> > This patch makes two modifications in total:
> > 1. The set_bit and clean_bit operations for the
> > HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> > which were previously split between the top half and bottom half of
> > the interrupt,
> > are now unified and executed solely in the bottom half of the
> > interrupt.
> > This prevents the bottom half tasks from being frozen during the S4
> > process,
> > ensuring that the clean_bit process can proceed without
> > interruption.
> 
> The name is "clear_bit" (with an 'r'), not "clean_bit".
> 
> > 2. Add a condition to the set_bit operation for the hcd status
> > HCD_FLAG_WAKEUP_PENDING.
> > When the hcd status is HC_STATE_SUSPENDED, perform the setting of
> > the aforementioned status bit.
> > This prevents a subsequent set_bit from occurring after the
> > clean_bit if the hcd is in the resuming process.
> 
> hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after
> calling 
> hcd->driver->bus_resume().  After that point,
> usb_hcd_resume_root_hub() 
> won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?
> 
> Alan Stern
> 
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > ---
> >  drivers/usb/core/hcd.c | 1 -
> >  drivers/usb/core/hub.c | 3 +++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 1ff7d901fede..a6bd0fbd82f4 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd
> > *hcd)
> >         spin_lock_irqsave (&hcd_root_hub_lock, flags);
> >         if (hcd->rh_registered) {
> >                 pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> > -               set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> >                 queue_work(pm_wq, &hcd->wakeup_work);
> >         }
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 4b93c0bd1d4b..7f847c4afc0d 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device
> > *udev, pm_message_t msg)
> >  
> >  int usb_remote_wakeup(struct usb_device *udev)
> >  {
> > +       struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> >         int     status = 0;
> >  
> >         usb_lock_device(udev);
> >         if (udev->state == USB_STATE_SUSPENDED) {
> >                 dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> > +               if (hcd->state == HC_STATE_SUSPENDED)
> > +                       set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd-
> > > flags);
> >                 status = usb_autoresume_device(udev);
> >                 if (status == 0) {
> >                         /* Let the drivers do their thing, then...
> > */
> > -- 
> > 2.34.1
> > 
> >
Duan Chenghao Sept. 12, 2024, 3:21 a.m. UTC | #3
在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > S4 wakeup restores the image that was saved before the system
> > entered
> > the S4 sleep state.
> > 
> >     S4 waking up from hibernation
> >     =============================
> >     kernel initialization
> >     |   
> >     v   
> >     freeze user task and kernel thread
> >     |   
> >     v   
> >     load saved image
> >     |    
> >     v   
> >     freeze the peripheral device and controller
> >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > set,
> >      return to EBUSY and do not perform the following restore
> > image.)
> 
> Why is the flag set at this point?  It should not be; the device and 
> controller should have been frozen with wakeup disabled.
> 
This is check point, not set point. When the USB goes into a suspend
state, the HCD_FLAG_WAKEUP_PENDING flag is checked, and if it is found
that the USB is in the process of resuming, then an EBUSY error is
returned.

> >     |
> >     v
> >     restore image(task recovery)
> 
> > > > However, upon detecting that the hcd is in the
> > > > HCD_FLAG_WAKEUP_PENDING state,
> > > > it will return an EBUSY status, causing the S4 suspend to fail
> > > > and
> > > > subsequent task recovery to not proceed.
> > > 
> > > What will return an EBUSY status?
> > 
> > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.
> 
> I meant: Which function will return EBUSY status?  The answer is in
> the 
> log below; hcd_pci_suspend() does this.
> 
> > > Why do you say that S4 suspend will fail?  Aren't you talking
> > > about
> > > S4 
> > > wakeup?
> > 
> > After returning EBUSY, the subsequent restore image operation will
> > not
> > be executed.
> > 
> > > 
> > > Can you provide a kernel log that explains these points and shows
> > > what 
> > > problem you are trying to solve?
> > 
> > [    9.009166][ 2] [  T403] PM: Image signature found, resuming
> > [    9.009167][ 2] [  T403] PM: resume from hibernation
> > [    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
> > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
> > [    9.009244][ 2] [  T403] Freezing user space processes ...
> > (elapsed
> > 0.001 seconds) done.
> > [    9.010355][ 2] [  T403] OOM killer disabled.
> > [    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
> > (elapsed 0.000 seconds) done.
> > [    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
> > [    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
> > [    9.073334][ 2] [  T403] PM: Loading and decompressing image
> > data
> > (486874 pages)...
> > [    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0
> > [mpidr:0x0]
> > [    9.095928][ 2] [  T403] PM: Image loading progress:   0%
> > [    9.664803][ 2] [  T403] PM: Image loading progress:  10%
> > [    9.794156][ 2] [  T403] PM: Image loading progress:  20%
> > [    9.913001][ 2] [  T403] PM: Image loading progress:  30%
> > [   10.034331][ 2] [  T403] PM: Image loading progress:  40%
> > [   10.154070][ 2] [  T403] PM: Image loading progress:  50%
> > [   10.277096][ 2] [  T403] PM: Image loading progress:  60%
> > [   10.398860][ 2] [  T403] PM: Image loading progress:  70%
> > [   10.533760][ 2] [  T403] PM: Image loading progress:  80%
> > [   10.659874][ 2] [  T403] PM: Image loading progress:  90%
> > [   10.760681][ 2] [  T403] PM: Image loading progress: 100%
> > [   10.760693][ 2] [  T403] PM: Image loading done
> > [   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
> > (1159.22 MB/s)
> > [   10.761982][ 2] [  T403] PM: Image successfully loaded
> > [   10.761988][ 2] [  T403] printk: Suspending console(s) (use
> > no_console_suspend to debug)
> > [   10.864973][ 2] [  T403] innovpu_freeze:1782
> > [   10.864974][ 2] [  T403] innovpu_suspend:1759
> > [   11.168871][ 2] [  T189] PM: pci_pm_freeze():
> > hcd_pci_suspend+0x0/0x38 returns -16
> 
> This should not be allowed to happen.  Freezing is mandatory and not 
> subject to wakeup requests.
> 
> Is your problem related to the one discussed in this email thread?
> 
> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> 
> Would the suggestion I made there -- i.e., have the xhci-hcd 
> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> hub 
> was suspended with wakeup = 0 -- fix your problem?

Skipping usb_hcd_resume_root_hub() should generally be possible, but
it's important to ensure that normal remote wakeup functionality is not
compromised. Is it HUB_SUSPEND that the hub you are referring to is in
a suspended state?

v2 patch:
https://lore.kernel.org/all/20240910105714.148976-1-duanchenghao@kylinos.cn/
> 
> Alan Stern
Alan Stern Sept. 12, 2024, 3 p.m. UTC | #4
On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > S4 wakeup restores the image that was saved before the system
> > > entered
> > > the S4 sleep state.
> > > 
> > >     S4 waking up from hibernation
> > >     =============================
> > >     kernel initialization
> > >     |   
> > >     v   
> > >     freeze user task and kernel thread
> > >     |   
> > >     v   
> > >     load saved image
> > >     |    
> > >     v   
> > >     freeze the peripheral device and controller
> > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > > set,
> > >      return to EBUSY and do not perform the following restore
> > > image.)
> > 
> > Why is the flag set at this point?  It should not be; the device and 
> > controller should have been frozen with wakeup disabled.
> > 
> This is check point, not set point.

Yes, I know that.  But when the flag was checked, why did the code find 
that it was set?  The flag should have been clear.

> > Is your problem related to the one discussed in this email thread?
> > 
> > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > 
> > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> > hub 
> > was suspended with wakeup = 0 -- fix your problem?
> 
> Skipping usb_hcd_resume_root_hub() should generally be possible, but
> it's important to ensure that normal remote wakeup functionality is not
> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
> a suspended state?

I don't understand this question.  hub_quiesce() gets called with 
HUB_SUSPEND when the hub enters a suspended state.

You are correct about the need for normal remote wakeup to work 
properly.  The interrupt handler should skip calling 
usb_hcd_resume_root_hub() for port connect or disconnect changes and for 
port overcurrent changes (when the root hub is suspended with wakeup = 
0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port 
resume events.

Alan Stern
Duan Chenghao Sept. 13, 2024, 1:51 a.m. UTC | #5
在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > S4 wakeup restores the image that was saved before the system
> > > > entered
> > > > the S4 sleep state.
> > > > 
> > > >     S4 waking up from hibernation
> > > >     =============================
> > > >     kernel initialization
> > > >     |   
> > > >     v   
> > > >     freeze user task and kernel thread
> > > >     |   
> > > >     v   
> > > >     load saved image
> > > >     |    
> > > >     v   
> > > >     freeze the peripheral device and controller
> > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it
> > > > is
> > > > set,
> > > >      return to EBUSY and do not perform the following restore
> > > > image.)
> > > 
> > > Why is the flag set at this point?  It should not be; the device
> > > and 
> > > controller should have been frozen with wakeup disabled.
> > > 
> > This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code
> find 
> that it was set?  The flag should have been clear.

Yes, the current issue is that during S4 testing, there is a
probabilistic scenario where clear_bit is not called after set_bit, or
clear_bit is called but does not execute after set_bit. Please refer to
the two modification points in the v2 patch for details, as both of
them can cause the current issue.

> 
> > > Is your problem related to the one discussed in this email
> > > thread?
> > > 
> > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > 
> > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > root
> > > hub 
> > > was suspended with wakeup = 0 -- fix your problem?
> > 
> > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > but
> > it's important to ensure that normal remote wakeup functionality is
> > not
> > compromised. Is it HUB_SUSPEND that the hub you are referring to is
> > in
> > a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with 
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work 
> properly.  The interrupt handler should skip calling 
> usb_hcd_resume_root_hub() for port connect or disconnect changes and
> for 
> port overcurrent changes (when the root hub is suspended with wakeup
> = 
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> port 
> resume events.

The current issue arises when rh_state is detected as RH_SUSPEND and
usb_hcd_resume_root_hub() is called to resume the root hub. However,
there is no mutual exclusion between the suspend flag, set_bit, and
clear_bit, which can lead to two scenarios:

    1. After set_bit is called, the state of the USB device is modified
by another process to !USB_STATE_SUSPEND, preventing the hub's resume
from being executed, and consequently, clear_bit is not called again.

    2. In another scenario, during the hub resume process, after
HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
been set to !RH_SUSPENDED. At this point, set_bit is executed, but
since the hub has already entered the running state, the clear_bit
associated with the resume operation is not executed.

Please review the v2 patch, where I have described both the logical
flow before the modification and the revised logical flow after the
modification.

> 
> Alan Stern
Duan Chenghao Sept. 13, 2024, 9:38 a.m. UTC | #6
在 2024-09-13星期五的 09:51 +0800,duanchenghao写道:
> 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > > 
> > > > >     S4 waking up from hibernation
> > > > >     =============================
> > > > >     kernel initialization
> > > > >     |   
> > > > >     v   
> > > > >     freeze user task and kernel thread
> > > > >     |   
> > > > >     v   
> > > > >     load saved image
> > > > >     |    
> > > > >     v   
> > > > >     freeze the peripheral device and controller
> > > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it
> > > > > is
> > > > > set,
> > > > >      return to EBUSY and do not perform the following restore
> > > > > image.)
> > > > 
> > > > Why is the flag set at this point?  It should not be; the
> > > > device
> > > > and 
> > > > controller should have been frozen with wakeup disabled.
> > > > 
> > > This is check point, not set point.
> > 
> > Yes, I know that.  But when the flag was checked, why did the code
> > find 
> > that it was set?  The flag should have been clear.
> 
> Yes, the current issue is that during S4 testing, there is a
> probabilistic scenario where clear_bit is not called after set_bit,
> or
> clear_bit is called but does not execute after set_bit. Please refer
> to
> the two modification points in the v2 patch for details, as both of
> them can cause the current issue.
> 
> > 
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > > 
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > 
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub 
> > > > was suspended with wakeup = 0 -- fix your problem?
> > > 
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is
> > > not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is
> > > in
> > > a suspended state?
> > 
> > I don't understand this question.  hub_quiesce() gets called with 
> > HUB_SUSPEND when the hub enters a suspended state.
> > 
> > You are correct about the need for normal remote wakeup to work 
> > properly.  The interrupt handler should skip calling 
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and
> > for 
> > port overcurrent changes (when the root hub is suspended with
> > wakeup
> > = 
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port 
> > resume events.
> 
> The current issue arises when rh_state is detected as RH_SUSPEND and
> usb_hcd_resume_root_hub() is called to resume the root hub. However,
> there is no mutual exclusion between the suspend flag, set_bit, and
> clear_bit, which can lead to two scenarios:
> 
>     1. After set_bit is called, the state of the USB device is
> modified
> by another process to !USB_STATE_SUSPEND, preventing the hub's resume
> from being executed, and consequently, clear_bit is not called again.
> 
>     2. In another scenario, during the hub resume process, after
> HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
> been set to !RH_SUSPENDED. At this point, set_bit is executed, but
> since the hub has already entered the running state, the clear_bit
> associated with the resume operation is not executed.
> 
> Please review the v2 patch, where I have described both the logical
> flow before the modification and the revised logical flow after the
> modification.
> 

In fact, issue point 2 in the patch is introduced by issue point 1, and
issue point 2 represents a further improvement. The main issue lies in
point 1, where after the execution of the top half of the interrupt is
completed, the bottom half is frozen by S4. As a result, the USB resume
is not executed during this S4 process, and clear_bit is not called as
well. This further leads to a situation where during the process of S4
putting the USB controller into suspend, the check for
HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY.

> > 
> > Alan Stern
>
Hongyu Xie Sept. 14, 2024, 2:43 a.m. UTC | #7
From: Hongyu Xie <xiehongyu1@kylinos.cn>


Hi Alan,
On 2024/9/12 23:00, Alan Stern wrote:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
>> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
>>> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
>>>> S4 wakeup restores the image that was saved before the system
>>>> entered
>>>> the S4 sleep state.
>>>>
>>>>      S4 waking up from hibernation
>>>>      =============================
>>>>      kernel initialization
>>>>      |
>>>>      v
>>>>      freeze user task and kernel thread
>>>>      |
>>>>      v
>>>>      load saved image
>>>>      |
>>>>      v
>>>>      freeze the peripheral device and controller
>>>>      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
>>>> set,
>>>>       return to EBUSY and do not perform the following restore
>>>> image.)
>>>
>>> Why is the flag set at this point?  It should not be; the device and
>>> controller should have been frozen with wakeup disabled.
>>>
>> This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code find
> that it was set?  The flag should have been clear.
Maybe duanchenghao means this,
freeze_kernel_threads
load_image_and_restore
   suspend roothub
                                      interrupt occurred
                                        usb_hcd_resume_root_hub
                                          set HCD_FLAG_WAKEUP_PENDING
                                          queue_work // freezed
   suspend pci
     return -EBUSY  because HCD_FLAG_WAKEUP_PENDING

So s4 resume failed.
> 
>>> Is your problem related to the one discussed in this email thread?
>>>
>>> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
>>>
>>> Would the suggestion I made there -- i.e., have the xhci-hcd
>>> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
>>> hub
>>> was suspended with wakeup = 0 -- fix your problem?
>>
>> Skipping usb_hcd_resume_root_hub() should generally be possible, but
>> it's important to ensure that normal remote wakeup functionality is not
>> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
>> a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work
> properly.  The interrupt handler should skip calling
> usb_hcd_resume_root_hub() for port connect or disconnect changes and for
> port overcurrent changes (when the root hub is suspended with wakeup =
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port
> resume events.
> 
> Alan Stern
> 

Hongyu Xie
Duan Chenghao Sept. 23, 2024, 8 a.m. UTC | #8
Hi Alan,

Do you think this plan is feasible, or is there any unclear part in my
description that needs to be supplemented?

duanchenghao


在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> From: Hongyu Xie <xiehongyu1@kylinos.cn>
> 
> 
> Hi Alan,
> On 2024/9/12 23:00, Alan Stern wrote:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > > 
> > > > >      S4 waking up from hibernation
> > > > >      =============================
> > > > >      kernel initialization
> > > > >      |
> > > > >      v
> > > > >      freeze user task and kernel thread
> > > > >      |
> > > > >      v
> > > > >      load saved image
> > > > >      |
> > > > >      v
> > > > >      freeze the peripheral device and controller
> > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it is
> > > > > set,
> > > > >       return to EBUSY and do not perform the following
> > > > > restore
> > > > > image.)
> > > > 
> > > > Why is the flag set at this point?  It should not be; the
> > > > device and
> > > > controller should have been frozen with wakeup disabled.
> > > > 
> > > This is check point, not set point.
> > 
> > Yes, I know that.  But when the flag was checked, why did the code
> > find
> > that it was set?  The flag should have been clear.
> Maybe duanchenghao means this,
> freeze_kernel_threads
> load_image_and_restore
>    suspend roothub
>                                       interrupt occurred
>                                         usb_hcd_resume_root_hub
>                                           set HCD_FLAG_WAKEUP_PENDING
>                                           queue_work // freezed
>    suspend pci
>      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> 
> So s4 resume failed.
> > 
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > > 
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > 
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub
> > > > was suspended with wakeup = 0 -- fix your problem?
> > > 
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is in
> > > a suspended state?
> > 
> > I don't understand this question.  hub_quiesce() gets called with
> > HUB_SUSPEND when the hub enters a suspended state.
> > 
> > You are correct about the need for normal remote wakeup to work
> > properly.  The interrupt handler should skip calling
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and for
> > port overcurrent changes (when the root hub is suspended with
> > wakeup =
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port
> > resume events.
> > 
> > Alan Stern
> > 
> 
> Hongyu Xie
Alan Stern Sept. 24, 2024, 1:38 p.m. UTC | #9
On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote:
> Hi Alan,
> 
> Do you think this plan is feasible, or is there any unclear part in my
> description that needs to be supplemented?

I apologize for not getting back to you earlier -- I've been incredibly 
busy during the last few weeks.

I still haven't had time to go over this throroughly.  If I don't 
respond by the end of this week, remind me again.

Alan Stern

> duanchenghao
> 
> 
> 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> > From: Hongyu Xie <xiehongyu1@kylinos.cn>
> > 
> > 
> > Hi Alan,
> > On 2024/9/12 23:00, Alan Stern wrote:
> > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > > S4 wakeup restores the image that was saved before the system
> > > > > > entered
> > > > > > the S4 sleep state.
> > > > > > 
> > > > > >      S4 waking up from hibernation
> > > > > >      =============================
> > > > > >      kernel initialization
> > > > > >      |
> > > > > >      v
> > > > > >      freeze user task and kernel thread
> > > > > >      |
> > > > > >      v
> > > > > >      load saved image
> > > > > >      |
> > > > > >      v
> > > > > >      freeze the peripheral device and controller
> > > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > > it is
> > > > > > set,
> > > > > >       return to EBUSY and do not perform the following
> > > > > > restore
> > > > > > image.)
> > > > > 
> > > > > Why is the flag set at this point?  It should not be; the
> > > > > device and
> > > > > controller should have been frozen with wakeup disabled.
> > > > > 
> > > > This is check point, not set point.
> > > 
> > > Yes, I know that.  But when the flag was checked, why did the code
> > > find
> > > that it was set?  The flag should have been clear.
> > Maybe duanchenghao means this,
> > freeze_kernel_threads
> > load_image_and_restore
> >    suspend roothub
> >                                       interrupt occurred
> >                                         usb_hcd_resume_root_hub
> >                                           set HCD_FLAG_WAKEUP_PENDING
> >                                           queue_work // freezed
> >    suspend pci
> >      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> > 
> > So s4 resume failed.
> > > 
> > > > > Is your problem related to the one discussed in this email
> > > > > thread?
> > > > > 
> > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > > 
> > > > > Would the suggestion I made there -- i.e., have the xhci-hcd
> > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > > root
> > > > > hub
> > > > > was suspended with wakeup = 0 -- fix your problem?
> > > > 
> > > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > > but
> > > > it's important to ensure that normal remote wakeup functionality
> > > > is not
> > > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > > is in
> > > > a suspended state?
> > > 
> > > I don't understand this question.  hub_quiesce() gets called with
> > > HUB_SUSPEND when the hub enters a suspended state.
> > > 
> > > You are correct about the need for normal remote wakeup to work
> > > properly.  The interrupt handler should skip calling
> > > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > > and for
> > > port overcurrent changes (when the root hub is suspended with
> > > wakeup =
> > > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > > port
> > > resume events.
> > > 
> > > Alan Stern
> > > 
> > 
> > Hongyu Xie
>
Alan Stern Oct. 9, 2024, 1:54 a.m. UTC | #10
On Wed, Oct 09, 2024 at 09:19:39AM +0800, duanchenghao wrote:
> Hi Alan,
> 
> I haven't received a reply from you since my last email. Could you
> please confirm if you have received this one?

I have.

> I'm worried that there might be an issue with the email system and you
> might not be receiving them.

I sent a message 9 days ago.  See

https://lore.kernel.org/all/85105e45-3553-4a8c-b132-3875c4657c4b@rowland.harvard.edu/

You were CC'ed on that message; maybe you didn't receive it.

Maybe the topic of that thread isn't exactly the same as the topic of 
your thread; I tend to get the two of them confused.

Alan Stern
Alan Stern Oct. 9, 2024, 3:50 p.m. UTC | #11
On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote:
> Hi Alan,
> 
> These are two patches, each addressing the same issue. The current
> patch is a direct solution to the problem of the interrupt bottom half
> being frozen. The patch you replied with is another, alternative
> solution to the same problem. Please review which solution is more
> suitable, or if there are any other revised proposals.
> 
> 
> Please review the patch I mentioned:
> https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/

I still think your whole approach is wrong.  There is no need to change 
the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's not 
the reason for the problem you're seeing.

The problem occurs because when suspend_common() in 
drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling 
device_may_wakeup(), and the value that function returns is not what we 
want.  The value is based on whether the controller's power/wakeup 
attribute is set, but we also have to take into account the type of 
suspend that's happening.

Basically, if msg is one of the suspend types for which wakeups should 
always be disabled, then do_wakeup should be set to false.  There isn't 
a macro to test for these things, but there should be.  Something like 
PMSG_IS_AUTO() in include/linux/pm.h; maybe call it PMSG_NO_WAKEUP().  
This macro should return true if the PM_EVENT_FREEZE or PM_EVENT_QUIESCE 
bits are set in msg.event.

Rafael, please correct me if I got any of this wrong.

So the right way to fix the problem is to add to pm.h:

#define PMSG_NO_WAKEUP(msg)	(((msg.event) & \
		(PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)

and in suspend_common():

	if (PMSG_IS_AUTO(msg))
		do_wakeup = true;
	else if (PMSG_NO_WAKEUP(msg))
		do_wakeup = false;
	else
		do_wakeup = device_may_wakeup(dev);

Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() tests 
in the following code will be executed, so the routine won't fail during 
the freeze or restore phase with -EBUSY.

You'll also have to define an hcd_pci_freeze() routine, just 
like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of 
PMSG_SUSPEND.  And the .freeze callback in usb_hcd_pci_pm_ops should be 
changed to hcd_pci_freeze.

In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c 
could also use the new macro, instead of checking for FREEZE or QUIESCE 
directly the way it does now.

Alan Stern
Alan Stern Oct. 12, 2024, 3:01 p.m. UTC | #12
On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> 
> Hi Alan,
> 
> The V3 patch has been sent. Please review it to check if it aligns with
> the solution you described.

Yes, that is what I meant.

Have you and all the other people at kylinos.cn tested the patch to make 
sure that it fixes the problem?

Alan Stern
Duan Chenghao Oct. 14, 2024, 1:41 a.m. UTC | #13
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
Hi Saranya,

> On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > 
> > Hi Alan,
> > 
> > The V3 patch has been sent. Please review it to check if it aligns
> > with
> > the solution you described.
> 
> Yes, that is what I meant.
> 
> Have you and all the other people at kylinos.cn tested the patch to
> make 
> sure that it fixes the problem?
> 
> Alan Stern

If you have time, you can arrange to test your issue using the V3
version. This way, we can jointly verify whether the problem has been
resolved.

Thanks
Duan Chenghao
Duan Chenghao Oct. 16, 2024, 9:06 a.m. UTC | #14
Hi Saranya,

在 2024-10-16星期三的 08:57 +0000,Gopal, Saranya写道:
> Hi Duan Chenghao,
> 
> > -----Original Message-----
> > From: duanchenghao <duanchenghao@kylinos.cn>
> > Sent: Monday, October 14, 2024 7:11 AM
> > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya
> > <saranya.gopal@intel.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie
> > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux-
> > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux-
> > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz;
> > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie
> > <xiehongyu1@kylinos.cn>
> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure
> > caused
> > by USB status when S4 wakes up
> > 
> > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
> > Hi Saranya,
> > 
> > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > > > 
> > > > Hi Alan,
> > > > 
> > > > The V3 patch has been sent. Please review it to check if it
> > > > aligns
> > > > with
> > > > the solution you described.
> > > 
> > > Yes, that is what I meant.
> > > 
> > > Have you and all the other people at kylinos.cn tested the patch
> > > to
> > > make
> > > sure that it fixes the problem?
> > > 
> > > Alan Stern
> > 
> > If you have time, you can arrange to test your issue using the V3
> > version. This way, we can jointly verify whether the problem has
> > been
> > resolved.
> > 
> My testing completed today.
> I couldn't reproduce the issue when I ran 1500 S4 cycles with v3
> version of your patch.
> The issue was anyway rare before with an occurrence rate of 20/1500
> cycles in our system.
> So, please conclude after verifying from your side also.
> 

Thank you very much for your verification. I am currently conducting
parallel testing on multiple machines, and the results should be
available soon as well.

Thanks,
Duan Chenghao

> Thanks,
> Saranya
> 
> > Thanks
> > Duan Chenghao
> > 
> > 
>
Duan Chenghao Nov. 6, 2024, 1:29 a.m. UTC | #15
Hi Greg k-h & Alan,

Excuse me, both of you. I've noticed that you haven't replied to the
emails for quite some time, which prompted me to send this one. I'd
like to inquire if the proposal in the current email is feasible and if
it can be integrated into the community.

Thanks,

Duan Chenghao

在 2024-10-29星期二的 16:01 +0800,duanchenghao写道:
> hi greg k-h,
> 
> 在 2024-10-29星期二的 04:27 +0100,Greg KH写道:
> > On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote:
> > > hi greg k-h,
> > > 
> > > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> > > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > > > > When a device is inserted into the USB port and an S4 wakeup
> > > > > is
> > > > > initiated,
> > > > > after the USB-hub initialization is completed, it will
> > > > > automatically enter
> > > > > suspend mode. Upon detecting a device on the USB port, it
> > > > > will
> > > > > proceed with
> > > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> > > > > During
> > > > > the S4
> > > > > wakeup process, peripherals are put into suspend mode,
> > > > > followed
> > > > > by
> > > > > task
> > > > > recovery. However, upon detecting that the hcd is in the
> > > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY
> > > > > status,
> > > > > causing the
> > > > > S4 suspend to fail and subsequent task recovery to not
> > > > > proceed.
> > > > > -
> > > > > [   27.594598][ 1]  PM: pci_pm_freeze():
> > > > > hcd_pci_suspend+0x0/0x28
> > > > > returns -16
> > > > > [   27.594601][ 1]  PM: dpm_run_callback():
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returns -16
> > > > > [   27.603420][ 1]  ehci-pci 0000:00:04.1:
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returned 0 after 3 usecs
> > > > > [   27.612233][ 1]  ehci-pci 0000:00:05.1:
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returned -16 after 17223 usecs
> > > > > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > > > > async: error -16
> > > > > [   27.816988][ 1]  PM: quiesce of devices aborted after
> > > > > 1833.282
> > > > > msecs
> > > > > [   27.823302][ 1]  PM: start quiesce of devices aborted
> > > > > after
> > > > > 1839.975 msecs
> > > > > ......
> > > > > [   31.303172][ 1]  PM: recover of devices complete after
> > > > > 3473.039
> > > > > msecs
> > > > > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > > > > recovering.
> > > > > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > > > > [   31.352686][ 1]  OOM killer enabled.
> > > > > [   31.356232][ 1]  Restarting tasks ... done.
> > > > > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > > > > [   31.365800][ 1]  PM: Hibernation image not present or
> > > > > could
> > > > > not
> > > > > be loaded.
> > > > > 
> > > > > The "do_wakeup" is determined based on whether the
> > > > > controller's
> > > > > power/wakeup attribute is set. The current issue necessitates
> > > > > considering
> > > > > the type of suspend that is occurring. If the suspend type is
> > > > > either
> > > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should
> > > > > be
> > > > > set
> > > > > to
> > > > > false.
> > > > > 
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > > Closes:
> > > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > > > 
> > > > What commit id does this fix?
> > > 
> > > The current patch is not intended to fix an issue with a specific
> > > commit, but rather to address a long-standing problem in the USB
> > > core.
> > 
> > So should it be backported to older stable kernels?  If so, how far
> > back?
> 
> yes, It needs to be backported. The stable branches such as 6.6.y,
> 6.10.y, and 6.11.y can be considered for the backport.
> 
> Should we backport to these versions?
> 
> Thanks 
> 
> Duan Chenghao
> > 
> > > > And I missed where Alan provided a signed-off-by, where was
> > > > that?
> > > 
> > > In the following email, Alan proposed using "Signed-off-by" for
> > > signing.
> > > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/
> > 
> > Ah, missed that, sorry.
> > 
> > thanks,
> > 
> > greg k-h
>
Duan Chenghao Nov. 20, 2024, 2:14 a.m. UTC | #16
Hi Greg KH/Alan,

I haven't received a reply from both of you for a long time. Sorry for
the disturbance. May I ask if the current patch can be merged into the
mainline or stable branch? I need this conclusion as it is very
important to me. Thank you very much.

Thanks
Duan Chenghao 

在 2024-10-29星期二的 04:27 +0100,Greg KH写道:
> On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote:
> > hi greg k-h,
> > 
> > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > > > When a device is inserted into the USB port and an S4 wakeup is
> > > > initiated,
> > > > after the USB-hub initialization is completed, it will
> > > > automatically enter
> > > > suspend mode. Upon detecting a device on the USB port, it will
> > > > proceed with
> > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> > > > During
> > > > the S4
> > > > wakeup process, peripherals are put into suspend mode, followed
> > > > by
> > > > task
> > > > recovery. However, upon detecting that the hcd is in the
> > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status,
> > > > causing the
> > > > S4 suspend to fail and subsequent task recovery to not proceed.
> > > > -
> > > > [   27.594598][ 1]  PM: pci_pm_freeze():
> > > > hcd_pci_suspend+0x0/0x28
> > > > returns -16
> > > > [   27.594601][ 1]  PM: dpm_run_callback():
> > > > pci_pm_freeze+0x0/0x100
> > > > returns -16
> > > > [   27.603420][ 1]  ehci-pci 0000:00:04.1:
> > > > pci_pm_freeze+0x0/0x100
> > > > returned 0 after 3 usecs
> > > > [   27.612233][ 1]  ehci-pci 0000:00:05.1:
> > > > pci_pm_freeze+0x0/0x100
> > > > returned -16 after 17223 usecs
> > > > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > > > async: error -16
> > > > [   27.816988][ 1]  PM: quiesce of devices aborted after
> > > > 1833.282
> > > > msecs
> > > > [   27.823302][ 1]  PM: start quiesce of devices aborted after
> > > > 1839.975 msecs
> > > > ......
> > > > [   31.303172][ 1]  PM: recover of devices complete after
> > > > 3473.039
> > > > msecs
> > > > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > > > recovering.
> > > > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > > > [   31.352686][ 1]  OOM killer enabled.
> > > > [   31.356232][ 1]  Restarting tasks ... done.
> > > > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > > > [   31.365800][ 1]  PM: Hibernation image not present or could
> > > > not
> > > > be loaded.
> > > > 
> > > > The "do_wakeup" is determined based on whether the controller's
> > > > power/wakeup attribute is set. The current issue necessitates
> > > > considering
> > > > the type of suspend that is occurring. If the suspend type is
> > > > either
> > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be
> > > > set
> > > > to
> > > > false.
> > > > 
> > > > Reported-by: kernel test robot <
> > > > lkp@intel.com
> > > > >
> > > > Closes:
> > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > > > 
> > > > Signed-off-by: Alan Stern <
> > > > stern@rowland.harvard.edu
> > > > >
> > > > Signed-off-by: Duan Chenghao <
> > > > duanchenghao@kylinos.cn
> > > > >
> > > 
> > > What commit id does this fix?
> > 
> > The current patch is not intended to fix an issue with a specific
> > commit, but rather to address a long-standing problem in the USB
> > core.
> 
> So should it be backported to older stable kernels?  If so, how far
> back?
> 
> > > And I missed where Alan provided a signed-off-by, where was that?
> > 
> > In the following email, Alan proposed using "Signed-off-by" for
> > signing.
> > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/
> > 
> 
> Ah, missed that, sorry.
> 
> thanks,
> 
> greg k-h
Greg KH Nov. 21, 2024, 7:09 p.m. UTC | #17
On Wed, Nov 06, 2024 at 09:29:15AM +0800, duanchenghao wrote:
> Hi Greg k-h & Alan,
> 
> Excuse me, both of you. I've noticed that you haven't replied to the
> emails for quite some time, which prompted me to send this one. I'd
> like to inquire if the proposal in the current email is feasible and if
> it can be integrated into the community.

Sorry, I missed this in the last round of patch reviews.  I'll queue it
up after 6.13-rc1 is out as the merge window is closed for adding new
stuff to my trees.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 1ff7d901fede..a6bd0fbd82f4 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -2389,7 +2389,6 @@  void usb_hcd_resume_root_hub (struct usb_hcd *hcd)
 	spin_lock_irqsave (&hcd_root_hub_lock, flags);
 	if (hcd->rh_registered) {
 		pm_wakeup_event(&hcd->self.root_hub->dev, 0);
-		set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		queue_work(pm_wq, &hcd->wakeup_work);
 	}
 	spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 4b93c0bd1d4b..7f847c4afc0d 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3835,11 +3835,14 @@  int usb_port_resume(struct usb_device *udev, pm_message_t msg)
 
 int usb_remote_wakeup(struct usb_device *udev)
 {
+	struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
 	int	status = 0;
 
 	usb_lock_device(udev);
 	if (udev->state == USB_STATE_SUSPENDED) {
 		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
+		if (hcd->state == HC_STATE_SUSPENDED)
+			set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		status = usb_autoresume_device(udev);
 		if (status == 0) {
 			/* Let the drivers do their thing, then... */