diff mbox series

[03/12] PCI/ACPI: Add aux power grant notifier

Message ID 20250401153225.96379-4-anshuman.gupta@intel.com
State New
Headers show
Series VRAM Self Refresh | expand

Commit Message

Gupta, Anshuman April 1, 2025, 3:32 p.m. UTC
Adding a notifier to notify all PCIe child devices about the
block main power removal. It is needed because theoretically
multiple PCIe Endpoint devices on same Root Port
can request for AUX power and _DSM method can return with
80000000h signifies that the hierarchy connected via
the slot cannot support main power removal when in D3Cold.
This may disrupt functionality of other child device.

Let's notify all PCIe devices requested Aux power resource
and Let PCIe End Point driver handle it in its callback.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
---
 drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
 include/linux/pci-acpi.h | 13 +++++++++++++
 2 files changed, 44 insertions(+), 3 deletions(-)

Comments

Gupta, Anshuman April 3, 2025, 7:56 a.m. UTC | #1
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Wednesday, April 2, 2025 1:44 AM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > Adding a notifier to notify all PCIe child devices about the block
> > main power removal. It is needed because theoretically multiple PCIe
> > Endpoint devices on same Root Port can request for AUX power and _DSM
> > method can return with 80000000h signifies that the hierarchy
> > connected via the slot cannot support main power removal when in
> > D3Cold.
> 
> I wish the spec used different language here.  "D3cold" *means* "main power
> is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense to say that
> "the slot cannot support main power removal when in D3cold".  If a device is
> in D3cold, its main power has been removed by definition.
> 
> I suppose the spec is trying to say if the driver has called this _DSM with
> 80000000h, it means the platform must not remove main power from the
> device while the system is in S0?  Is that what you think it means?
Thanks for review.
what I understand here, S0 term essentially means here PCIe Runtime PM or s2idle 
(both refers system is S0 state). AFAIU during both Runtime PM and s2ilde path 
Root Port can transition  to D3Cold if it support _PR3. 
I observed Root Port either have D0 or D3Cold state.
/sys/bus/pci0/devices/$root_port_bdf/firmware_node/real_power_state

Driver can disable the D3cold by pci_d3cold_disable(), so I wonder there is no use
case driver calling _DSM 0xa with 80000000h.  
PCIe specs shall be simplified by removal of value 80000000h from _DSM 0xa Arg.
> 
> The 2h return value description says it "indicates that the platform will not
> remove main power from the slot while the system is in S0,"
> so I guess that must be it.
> 
> In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> aux_pwr_limit value, so the driver cannot *request* that the platform not
> remove main power.
Yeah but that value Xe driver received from GPU firmware.
> 
> So I guess the only scenario where the _DSM returns 80000000h is when the
> platform itself or other devices prevent the removal of main power.  And the
> point of the notifier is to tell devices that their main power will never be
> removed while the system is in S0.  Is that right?
Yes the notifier will safeguard against the use case if some other PCIe device
calls _DSM 0xa with Arg3 Value of 80000000h or its request return with 0x2h.
> 
> > This may disrupt functionality of other child device.
> 
> What sort of disruption could happen?  I would think that if the _DSM returns
> 80000000h, it just means the device will not have main power removed, so it
> won't see that power state transition.  The only "disruption" would be that
> we're using more power.
Let's say during Xe Driver initialization BIOS firmware grants the Xe driver 
Aux power request successfully.
Xe driver will prepare to transition D3Cold state with VRAM Self Refresh.
VRAM Self Refresh relies on vram shall derive power from Aux power.
But later at any point if some other PCIe device under same root port 
Calls _DSM either with 080000000h or _DSM returns with 02h, will
Block the main power removal. But Xe driver is unaware of it can still
program the VRAM Self Refresh and that make VRAM to derive power 
from Aux and that will disrupt the GPU VRAM as there is no switch to Aux
Power.
Thanks,
Anshuman
> 
> > Let's notify all PCIe devices requested Aux power resource and Let
> > PCIe End Point driver handle it in its callback.
> 
> Wrap to fill 75 columns.
> 
> s/Adding/Add/
> s/Let's notify/Notify/
> s/and Let/and let/
> s/End Point/Endpoint/ (several places here and below)
> 
> > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > ---
> >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> >  include/linux/pci-acpi.h | 13 +++++++++++++
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > 04149f037664..d1ca1649e6e8 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> pci_dev *pdev,
> >  	ACPI_FREE(obj);
> >  }
> >
> > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > +
> > +/**
> > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > + * @nb: notifier block
> > + *
> > + * This function shall be called by PCIe End Point device requested
> > +the Aux
> > + * power resource in order to handle the any scenario gracefully when
> > +other
> > + * child PCIe devices Aux power request returns with No main power
> removal.
> > + * PCIe devices which register this notifier shall handle No main
> > +power
> > + * removal scenario accordingly.
> 
> This would actually be called by the *driver* (not by the device).
> 
> Reword in imperative mood if possible.
> 
> > + *
> > + * Return: Returns 0 on success and errno on failure.
> > + */
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb) {
> > +	return
> > +blocking_notifier_chain_register(&pci_acpi_aux_power_notify_list,
> > +nb); } EXPORT_SYMBOL_GPL(pci_acpi_register_aux_power_notifier);
> > +
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb) {
> > +	blocking_notifier_chain_unregister(&pci_acpi_aux_power_notify_list,
> > +nb); } EXPORT_SYMBOL_GPL(pci_acpi_unregister_aux_power_notifier);
> > +
> >  /**
> >   * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via
> ACPI DSM
> >   * @dev: PCI device instance
> > @@ -1466,17 +1492,19 @@ int
> pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> requested_power)
> >  	result = out_obj->integer.value;
> >
> >  	switch (result) {
> > -	case 0x0:
> > +	case ACPI_AUX_PW_DENIED:
> 
> Add these constants in the patch that adds the _DSM.  Then this patch will be
> just notifier-related code.
> 
> >  		dev_dbg(&dev->dev, "D3cold Aux Power %umW request
> denied\n",
> >  			requested_power);
> >  		break;
> > -	case 0x1:
> > +	case ACPI_AUX_PW_GRANTED:
> >  		dev_info(&dev->dev, "D3cold Aux Power request granted:
> %umW\n",
> >  			 requested_power);
> >  		ret = 0;
> >  		break;
> > -	case 0x2:
> > +	case ACPI_NO_MAIN_PW_REMOVAL:
> >  		dev_info(&dev->dev, "D3cold Aux Power: Main power won't
> be
> > removed\n");
> > +		blocking_notifier_call_chain(&pci_acpi_aux_power_notify_list,
> > +					     ACPI_NO_MAIN_PW_REMOVAL,
> dev);
> >  		ret = -EBUSY;
> >  		break;
> >  	default:
> > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index
> > 4b7373f91a9a..793b979af98b 100644
> > --- a/include/linux/pci-acpi.h
> > +++ b/include/linux/pci-acpi.h
> > @@ -124,6 +124,10 @@ extern const guid_t pci_acpi_dsm_guid;
> >  #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
> >  #define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
> >
> > +#define ACPI_AUX_PW_DENIED			0x0
> > +#define ACPI_AUX_PW_GRANTED			0x1
> > +#define ACPI_NO_MAIN_PW_REMOVAL			0x2
> > +
> >  #ifdef CONFIG_PCIE_EDR
> >  void pci_acpi_add_edr_notifier(struct pci_dev *pdev);  void
> > pci_acpi_remove_edr_notifier(struct pci_dev *pdev); @@ -134,12 +138,21
> > @@ static inline void pci_acpi_remove_edr_notifier(struct pci_dev
> > *pdev) { }
> >
> >  int pci_acpi_set_companion_lookup_hook(struct acpi_device
> > *(*func)(struct pci_dev *));  void
> > pci_acpi_clear_companion_lookup_hook(void);
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb);
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb);
> >  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> > requested_power);  int pci_acpi_add_perst_assertion_delay(struct
> > pci_dev *dev, u32 delay_us);
> >
> >  #else	/* CONFIG_ACPI */
> >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }  static
> > inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
> > +int pci_acpi_register_aux_power_notifier(struct notifier_block *nb) {
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +void pci_acpi_unregister_aux_power_notifier(struct notifier_block
> > +*nb) { }
> > +
> >  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32
> > requested_power)  {
> >  	return -EOPNOTSUPP;
> > --
> > 2.43.0
> >
Gupta, Anshuman April 3, 2025, 11:30 a.m. UTC | #2
> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Wednesday, April 2, 2025 4:54 PM
> To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> <anshuman.gupta@intel.com>
> Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > Adding a notifier to notify all PCIe child devices about the block
> > > main power removal. It is needed because theoretically multiple PCIe
> > > Endpoint devices on same Root Port can request for AUX power and
> > > _DSM method can return with 80000000h signifies that the hierarchy
> > > connected via the slot cannot support main power removal when in
> > > D3Cold.
> >
> > I wish the spec used different language here.  "D3cold" *means* "main
> > power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
> > to say that "the slot cannot support main power removal when in
> > D3cold".  If a device is in D3cold, its main power has been removed by
> > definition.
> >
> > I suppose the spec is trying to say if the driver has called this _DSM
> > with 80000000h, it means the platform must not remove main power from
> > the device while the system is in S0?  Is that what you think it
> > means?
> >
> > The 2h return value description says it "indicates that the platform
> > will not remove main power from the slot while the system is in S0,"
> > so I guess that must be it.
> >
> > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > aux_pwr_limit value, so the driver cannot *request* that the platform
> > not remove main power.
> >
> > So I guess the only scenario where the _DSM returns 80000000h is when
> > the platform itself or other devices prevent the removal of main
> > power.  And the point of the notifier is to tell devices that their
> > main power will never be removed while the system is in S0.  Is that
> > right?
> >
> > > This may disrupt functionality of other child device.
> >
> > What sort of disruption could happen?  I would think that if the _DSM
> > returns 80000000h, it just means the device will not have main power
> > removed, so it won't see that power state transition.  The only
> > "disruption" would be that we're using more power.
> >
> > > Let's notify all PCIe devices requested Aux power resource and Let
> > > PCIe End Point driver handle it in its callback.
> >
> > Wrap to fill 75 columns.
> >
> > s/Adding/Add/
> > s/Let's notify/Notify/
> > s/and Let/and let/
> > s/End Point/Endpoint/ (several places here and below)
> >
> > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > ---
> > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > > 04149f037664..d1ca1649e6e8 100644
> > > --- a/drivers/pci/pci-acpi.c
> > > +++ b/drivers/pci/pci-acpi.c
> > > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> pci_dev *pdev,
> > >       ACPI_FREE(obj);
> > >  }
> > >
> > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > +
> > > +/**
> > > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > > + * @nb: notifier block
> > > + *
> > > + * This function shall be called by PCIe End Point device requested
> > > +the Aux
> > > + * power resource in order to handle the any scenario gracefully
> > > +when other
> > > + * child PCIe devices Aux power request returns with No main power
> removal.
> > > + * PCIe devices which register this notifier shall handle No main
> > > +power
> > > + * removal scenario accordingly.
> >
> > This would actually be called by the *driver* (not by the device).
> 
Hi Rafael,
Thanks for review.
> Apart from this, there seems to be a design issue here because it won't notify
> every driver that has requested the Aux power, just the ones that have also
> registered notifiers.
IMHO that is what intended, if any device has functional impact in our case it is
INTEL GPU has functional impact if other PCIe device's (on same root port) Aux
Power request returns with 0x2. We need to handle such case to handle it gracefully.
> 
> So this appears to be an opt-in from getting notifications on Aux power
> request rejection responses to requests from other drivers requesting Aux
> power for the same Root Port, but the changelog of the first patch already
> claimed that the aggregation of requests was not supported.  So if only one
> driver will be allowed to request the Aux power for the given Root Port, why
> would the notifiers be necessary after all?
Please guide us, if we remove the above limitation from the commit log.
Then do we have any design issue with notifier approach ?
Thanks,
Anshuman.
Rafael J. Wysocki April 3, 2025, 1:34 p.m. UTC | #3
On Thu, Apr 3, 2025 at 1:30 PM Gupta, Anshuman <anshuman.gupta@intel.com> wrote:
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Wednesday, April 2, 2025 4:54 PM
> > To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> > <anshuman.gupta@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org; linux-
> > pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> > Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> > ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> >
> > On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > > Adding a notifier to notify all PCIe child devices about the block
> > > > main power removal. It is needed because theoretically multiple PCIe
> > > > Endpoint devices on same Root Port can request for AUX power and
> > > > _DSM method can return with 80000000h signifies that the hierarchy
> > > > connected via the slot cannot support main power removal when in
> > > > D3Cold.
> > >
> > > I wish the spec used different language here.  "D3cold" *means* "main
> > > power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't make sense
> > > to say that "the slot cannot support main power removal when in
> > > D3cold".  If a device is in D3cold, its main power has been removed by
> > > definition.
> > >
> > > I suppose the spec is trying to say if the driver has called this _DSM
> > > with 80000000h, it means the platform must not remove main power from
> > > the device while the system is in S0?  Is that what you think it
> > > means?
> > >
> > > The 2h return value description says it "indicates that the platform
> > > will not remove main power from the slot while the system is in S0,"
> > > so I guess that must be it.
> > >
> > > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > > aux_pwr_limit value, so the driver cannot *request* that the platform
> > > not remove main power.
> > >
> > > So I guess the only scenario where the _DSM returns 80000000h is when
> > > the platform itself or other devices prevent the removal of main
> > > power.  And the point of the notifier is to tell devices that their
> > > main power will never be removed while the system is in S0.  Is that
> > > right?
> > >
> > > > This may disrupt functionality of other child device.
> > >
> > > What sort of disruption could happen?  I would think that if the _DSM
> > > returns 80000000h, it just means the device will not have main power
> > > removed, so it won't see that power state transition.  The only
> > > "disruption" would be that we're using more power.
> > >
> > > > Let's notify all PCIe devices requested Aux power resource and Let
> > > > PCIe End Point driver handle it in its callback.
> > >
> > > Wrap to fill 75 columns.
> > >
> > > s/Adding/Add/
> > > s/Let's notify/Notify/
> > > s/and Let/and let/
> > > s/End Point/Endpoint/ (several places here and below)
> > >
> > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > > ---
> > > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index
> > > > 04149f037664..d1ca1649e6e8 100644
> > > > --- a/drivers/pci/pci-acpi.c
> > > > +++ b/drivers/pci/pci-acpi.c
> > > > @@ -1421,6 +1421,32 @@ static void pci_acpi_optimize_delay(struct
> > pci_dev *pdev,
> > > >       ACPI_FREE(obj);
> > > >  }
> > > >
> > > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > > +
> > > > +/**
> > > > + * pci_acpi_register_aux_power_notifier - Register driver notifier
> > > > + * @nb: notifier block
> > > > + *
> > > > + * This function shall be called by PCIe End Point device requested
> > > > +the Aux
> > > > + * power resource in order to handle the any scenario gracefully
> > > > +when other
> > > > + * child PCIe devices Aux power request returns with No main power
> > removal.
> > > > + * PCIe devices which register this notifier shall handle No main
> > > > +power
> > > > + * removal scenario accordingly.
> > >
> > > This would actually be called by the *driver* (not by the device).
> >
> Hi Rafael,
> Thanks for review.
> > Apart from this, there seems to be a design issue here because it won't notify
> > every driver that has requested the Aux power, just the ones that have also
> > registered notifiers.
> IMHO that is what intended, if any device has functional impact in our case it is
> INTEL GPU has functional impact if other PCIe device's (on same root port) Aux
> Power request returns with 0x2. We need to handle such case to handle it gracefully.
> >
> > So this appears to be an opt-in from getting notifications on Aux power
> > request rejection responses to requests from other drivers requesting Aux
> > power for the same Root Port, but the changelog of the first patch already
> > claimed that the aggregation of requests was not supported.  So if only one
> > driver will be allowed to request the Aux power for the given Root Port, why
> > would the notifiers be necessary after all?
> >
> Please guide us, if we remove the above limitation from the commit log.
> Then do we have any design issue with notifier approach ?

Exactly like I said: If you only allow one driver to use the _DSM to
request the Aux power from a given Root Port, it will have all of the
information and does not need to be notified about any changes.  Since
no one else is allowed to use this interface for that Root Port, no
one else will need a notifier either.  For this to work, you need some
mechanism allowing drivers to claim the interface so no one else can
use it (on a per Root Port basis) which is currently missing AFAICS.
I think that this is the option you want to pursue.

OTOH, if you want to allow multiple drivers to use this interface for
the same Root Port, you need to aggregate the requests (as required by
the spec IIUC) in the first place, or else the users can override each
other's request.  In that case you'll likely need a notification
mechanism of some sort to let the requesters know whether or not the
Aux power will be provided, but maybe that can be done through a
callback associated with a request.

The second option is genuinely more complicated because all of the
devices requesting the Aux power from the same Root Port cannot be
suspended independently in that case, so until there is a clear
real-world use case for it, my recommendation would be to go for the
first option.

But the first option means exclusive access and so no need for
notifiers and such.
Gupta, Anshuman April 3, 2025, 4:08 p.m. UTC | #4
> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Thursday, April 3, 2025 7:04 PM
> To: Gupta, Anshuman <anshuman.gupta@intel.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Bjorn Helgaas
> <helgaas@kernel.org>; intel-xe@lists.freedesktop.org; linux-
> acpi@vger.kernel.org; linux-pci@vger.kernel.org; lenb@kernel.org;
> bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Nilawar,
> Badal <badal.nilawar@intel.com>; Gupta, Varun <varun.gupta@intel.com>;
> ville.syrjala@linux.intel.com; Shankar, Uma <uma.shankar@intel.com>
> Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> 
> On Thu, Apr 3, 2025 at 1:30 PM Gupta, Anshuman
> <anshuman.gupta@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Wednesday, April 2, 2025 4:54 PM
> > > To: Bjorn Helgaas <helgaas@kernel.org>; Gupta, Anshuman
> > > <anshuman.gupta@intel.com>
> > > Cc: intel-xe@lists.freedesktop.org; linux-acpi@vger.kernel.org;
> > > linux- pci@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > > bhelgaas@google.com; ilpo.jarvinen@linux.intel.com; De Marchi, Lucas
> > > <lucas.demarchi@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>;
> > > Nilawar, Badal <badal.nilawar@intel.com>; Gupta, Varun
> > > <varun.gupta@intel.com>; ville.syrjala@linux.intel.com; Shankar, Uma
> > > <uma.shankar@intel.com>
> > > Subject: Re: [PATCH 03/12] PCI/ACPI: Add aux power grant notifier
> > >
> > > On Tue, Apr 1, 2025 at 10:13 PM Bjorn Helgaas <helgaas@kernel.org>
> wrote:
> > > >
> > > > On Tue, Apr 01, 2025 at 09:02:16PM +0530, Anshuman Gupta wrote:
> > > > > Adding a notifier to notify all PCIe child devices about the
> > > > > block main power removal. It is needed because theoretically
> > > > > multiple PCIe Endpoint devices on same Root Port can request for
> > > > > AUX power and _DSM method can return with 80000000h signifies
> > > > > that the hierarchy connected via the slot cannot support main
> > > > > power removal when in D3Cold.
> > > >
> > > > I wish the spec used different language here.  "D3cold" *means*
> > > > "main power is removed" (PCIe r6.0, sec 5.3.1.4.2), so it doesn't
> > > > make sense to say that "the slot cannot support main power removal
> > > > when in D3cold".  If a device is in D3cold, its main power has
> > > > been removed by definition.
> > > >
> > > > I suppose the spec is trying to say if the driver has called this
> > > > _DSM with 80000000h, it means the platform must not remove main
> > > > power from the device while the system is in S0?  Is that what you
> > > > think it means?
> > > >
> > > > The 2h return value description says it "indicates that the
> > > > platform will not remove main power from the slot while the system is in
> S0,"
> > > > so I guess that must be it.
> > > >
> > > > In this series, pci_acpi_aux_power_setup() only supplies a 16-bit
> > > > aux_pwr_limit value, so the driver cannot *request* that the
> > > > platform not remove main power.
> > > >
> > > > So I guess the only scenario where the _DSM returns 80000000h is
> > > > when the platform itself or other devices prevent the removal of
> > > > main power.  And the point of the notifier is to tell devices that
> > > > their main power will never be removed while the system is in S0.
> > > > Is that right?
> > > >
> > > > > This may disrupt functionality of other child device.
> > > >
> > > > What sort of disruption could happen?  I would think that if the
> > > > _DSM returns 80000000h, it just means the device will not have
> > > > main power removed, so it won't see that power state transition.
> > > > The only "disruption" would be that we're using more power.
> > > >
> > > > > Let's notify all PCIe devices requested Aux power resource and
> > > > > Let PCIe End Point driver handle it in its callback.
> > > >
> > > > Wrap to fill 75 columns.
> > > >
> > > > s/Adding/Add/
> > > > s/Let's notify/Notify/
> > > > s/and Let/and let/
> > > > s/End Point/Endpoint/ (several places here and below)
> > > >
> > > > > Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> > > > > ---
> > > > >  drivers/pci/pci-acpi.c   | 34 +++++++++++++++++++++++++++++++---
> > > > >  include/linux/pci-acpi.h | 13 +++++++++++++
> > > > >  2 files changed, 44 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > > > > index
> > > > > 04149f037664..d1ca1649e6e8 100644
> > > > > --- a/drivers/pci/pci-acpi.c
> > > > > +++ b/drivers/pci/pci-acpi.c
> > > > > @@ -1421,6 +1421,32 @@ static void
> > > > > pci_acpi_optimize_delay(struct
> > > pci_dev *pdev,
> > > > >       ACPI_FREE(obj);
> > > > >  }
> > > > >
> > > > > +static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
> > > > > +
> > > > > +/**
> > > > > + * pci_acpi_register_aux_power_notifier - Register driver
> > > > > +notifier
> > > > > + * @nb: notifier block
> > > > > + *
> > > > > + * This function shall be called by PCIe End Point device
> > > > > +requested the Aux
> > > > > + * power resource in order to handle the any scenario
> > > > > +gracefully when other
> > > > > + * child PCIe devices Aux power request returns with No main
> > > > > +power
> > > removal.
> > > > > + * PCIe devices which register this notifier shall handle No
> > > > > +main power
> > > > > + * removal scenario accordingly.
> > > >
> > > > This would actually be called by the *driver* (not by the device).
> > >
> > Hi Rafael,
> > Thanks for review.
> > > Apart from this, there seems to be a design issue here because it
> > > won't notify every driver that has requested the Aux power, just the
> > > ones that have also registered notifiers.
> > IMHO that is what intended, if any device has functional impact in our
> > case it is INTEL GPU has functional impact if other PCIe device's (on
> > same root port) Aux Power request returns with 0x2. We need to handle
> such case to handle it gracefully.
> > >
> > > So this appears to be an opt-in from getting notifications on Aux
> > > power request rejection responses to requests from other drivers
> > > requesting Aux power for the same Root Port, but the changelog of
> > > the first patch already claimed that the aggregation of requests was
> > > not supported.  So if only one driver will be allowed to request the
> > > Aux power for the given Root Port, why would the notifiers be necessary
> after all?
> > >
> > Please guide us, if we remove the above limitation from the commit log.
> > Then do we have any design issue with notifier approach ?
> 
> Exactly like I said: If you only allow one driver to use the _DSM to request the
> Aux power from a given Root Port, it will have all of the information and does
> not need to be notified about any changes.  Since no one else is allowed to use
> this interface for that Root Port, no one else will need a notifier either.  For this
> to work, you need some mechanism allowing drivers to claim the interface so
> no one else can use it (on a per Root Port basis) which is currently missing
> AFAICS.
IMHO such kind of mechanism will require to add Root Port specific data structure to claim 
the interface. But real problem is the criteria  to claim the interface.
Is first PCIe Non-Bridge Endpoint Function 0 driver can be criteria  to claim the
Interface. Or first come and first serve approach ?
> I think that this is the option you want to pursue.
> 
We will prefer to stick with this option, if it can guaranty that XeKMD will the only driver
allowed to request Aux power to enable VRSR.
> OTOH, if you want to allow multiple drivers to use this interface for the same
> Root Port, you need to aggregate the requests (as required by the spec IIUC) in
> the first place, or else the users can override each other's request.  In that case
INHO how Linux Kernel will aggregate the Aux Power request, without knowing the
total Aux power budget. AFAIU aggregated Aux power request without knowledge of total 
power budget can also be denied by BIOS[1].  
Therefore how aggregation can be useful ?
  
[1]As per r3.3 Section 4.6.10 
Platform Firmware Budgeting of Aux Power Availability:
Platform firmware must not grant more power than what is available within the system. For 
example, a board may be designed with four CEM slots (one x16 slot, one x4 slot, and two x1 
slots). The board may implement a power delivery circuit capable of supplying 2 W of power for 
the 3.3 Vaux rail supplying all 4 slots. The 3.3 Vaux pins on each CEM slot can supply 1 W each. 
Platform firmware may use the retry mechanism to prioritize requests from devices in preferred 
slots in the following manner:

-> Requests from a device in the highest priority slot (e.g., x16) are granted immediately, if 
available. 

-> Requests from devices in lower priority slots (e.g., x2, x1) are initially rejected, with a retry 
interval inversely proportional to the slot priority. For instance, if the x2 slot is higher priority 
than the x1 slots, so the retry interval for the x2 slot may be 4 seconds, and the x1 slots may be 
8 and 10 seconds.

->As requests are granted, the granted values are subtracted from the available budget.
 Retried requests are granted based on the remaining power budget or denied if insufficient 
power budget is available. Another retry is not permitted.

-> When there is insufficient power budget for a request, firmware may choose to keep main 
power on and return no power removal (2h).

Thanks,
Anshuman.
> you'll likely need a notification mechanism of some sort to let the requesters
> know whether or not the Aux power will be provided, but maybe that can be
> done through a callback associated with a request.
> 
> The second option is genuinely more complicated because all of the devices
> requesting the Aux power from the same Root Port cannot be suspended
> independently in that case, so until there is a clear real-world use case for it,
> my recommendation would be to go for the first option.
Multiple devices under same root ports requesting Aux Power is a theoretical 
use case.   
> 
> But the first option means exclusive access and so no need for notifiers and
> such.
diff mbox series

Patch

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 04149f037664..d1ca1649e6e8 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1421,6 +1421,32 @@  static void pci_acpi_optimize_delay(struct pci_dev *pdev,
 	ACPI_FREE(obj);
 }
 
+static BLOCKING_NOTIFIER_HEAD(pci_acpi_aux_power_notify_list);
+
+/**
+ * pci_acpi_register_aux_power_notifier - Register driver notifier
+ * @nb: notifier block
+ *
+ * This function shall be called by PCIe End Point device requested the Aux
+ * power resource in order to handle the any scenario gracefully when other
+ * child PCIe devices Aux power request returns with No main power removal.
+ * PCIe devices which register this notifier shall handle No main power
+ * removal scenario accordingly.
+ *
+ * Return: Returns 0 on success and errno on failure.
+ */
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&pci_acpi_aux_power_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(pci_acpi_register_aux_power_notifier);
+
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&pci_acpi_aux_power_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(pci_acpi_unregister_aux_power_notifier);
+
 /**
  * pci_acpi_request_d3cold_aux_power - Request D3cold aux power via ACPI DSM
  * @dev: PCI device instance
@@ -1466,17 +1492,19 @@  int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 	result = out_obj->integer.value;
 
 	switch (result) {
-	case 0x0:
+	case ACPI_AUX_PW_DENIED:
 		dev_dbg(&dev->dev, "D3cold Aux Power %umW request denied\n",
 			requested_power);
 		break;
-	case 0x1:
+	case ACPI_AUX_PW_GRANTED:
 		dev_info(&dev->dev, "D3cold Aux Power request granted: %umW\n",
 			 requested_power);
 		ret = 0;
 		break;
-	case 0x2:
+	case ACPI_NO_MAIN_PW_REMOVAL:
 		dev_info(&dev->dev, "D3cold Aux Power: Main power won't be removed\n");
+		blocking_notifier_call_chain(&pci_acpi_aux_power_notify_list,
+					     ACPI_NO_MAIN_PW_REMOVAL, dev);
 		ret = -EBUSY;
 		break;
 	default:
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 4b7373f91a9a..793b979af98b 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -124,6 +124,10 @@  extern const guid_t pci_acpi_dsm_guid;
 #define DSM_PCI_D3COLD_AUX_POWER_LIMIT		0x0A
 #define DSM_PCI_PERST_ASSERTION_DELAY		0x0B
 
+#define ACPI_AUX_PW_DENIED			0x0
+#define ACPI_AUX_PW_GRANTED			0x1
+#define ACPI_NO_MAIN_PW_REMOVAL			0x2
+
 #ifdef CONFIG_PCIE_EDR
 void pci_acpi_add_edr_notifier(struct pci_dev *pdev);
 void pci_acpi_remove_edr_notifier(struct pci_dev *pdev);
@@ -134,12 +138,21 @@  static inline void pci_acpi_remove_edr_notifier(struct pci_dev *pdev) { }
 
 int pci_acpi_set_companion_lookup_hook(struct acpi_device *(*func)(struct pci_dev *));
 void pci_acpi_clear_companion_lookup_hook(void);
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb);
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb);
 int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power);
 int pci_acpi_add_perst_assertion_delay(struct pci_dev *dev, u32 delay_us);
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
 static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
+int pci_acpi_register_aux_power_notifier(struct notifier_block *nb)
+{
+	return -EOPNOTSUPP;
+}
+
+void pci_acpi_unregister_aux_power_notifier(struct notifier_block *nb) { }
+
 int pci_acpi_request_d3cold_aux_power(struct pci_dev *dev, u32 requested_power)
 {
 	return -EOPNOTSUPP;