mbox series

[v4,0/8] Add PCIe bandwidth controller

Message ID 20240105112547.7301-1-ilpo.jarvinen@linux.intel.com
Headers show
Series Add PCIe bandwidth controller | expand

Message

Ilpo Järvinen Jan. 5, 2024, 11:25 a.m. UTC
Hi all,

This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
cooling driver to the thermal core side for limiting PCIe Link Speed
due to thermal reasons. PCIe bandwidth controller is a PCI express bus
port service driver. A cooling device is created for each port the
service driver finds if they support changing speeds.

This series only adds support for controlling PCIe Link Speed.
Controlling PCIe Link Width might also be useful but AFAIK, there is no
mechanism for that until PCIe 6.0 (L0p) so Link Width throttling is not
added by this series.

bwctrl is built on top of BW notifications revert.

It tried to look into using cached link speed values more in code
unrelated to what bwctrl needs but every case I looked non-trivial so I
left such attempts as further work.

v4:
- Merge Port's and Endpoint's Supported Link Speeds Vectors into
  supported_speeds in the struct pci_bus
- Reuse pcie_get_speed_cap()'s code for pcie_get_supported_speeds()
- Setup supported_speeds with PCI_EXP_LNKCAP2_SLS_2_5GB when no
  Endpoint exists
- Squash revert + add bwctrl patches into one
- Change to use threaded IRQ + IRQF_ONESHOT
- Enable also LABIE / LABS
- Convert Link Speed selection to use bit logic instead of loop
- Allocate before requesting IRQ during probe
- Use devm_*()
- Use u8 for speed_conv array instead of u16
- Removed READ_ONCE()
- Improve changelogs, comments, and Kconfig
- Name functions slightly more consistently
- Use bullet list for RMW protected registers in docs

v3:
- Correct hfi1 shortlog prefix
- Improve error prints in hfi1
- Add L: linux-pci to the MAINTAINERS entry

v2:
- Adds LNKCTL2 to RMW safe list in Documentation/PCI/pciebus-howto.rst
- Renamed cooling devices from PCIe_Port_* to PCIe_Port_Link_Speed_* in
  order to plan for possibility of adding Link Width cooling devices
  later on
- Moved struct thermal_cooling_device declaration to the correct patch
- Small tweaks to Kconfig texts
- Series rebased to resolve conflict (in the selftest list)


Ilpo Järvinen (8):
  PCI: Protect Link Control 2 Register with RMW locking
  drm/radeon: Use RMW accessors for changing LNKCTL2
  drm/amdgpu: Use RMW accessors for changing LNKCTL2
  RDMA/hfi1: Use RMW accessors for changing LNKCTL2
  PCI: Store all PCIe Supported Link Speeds
  PCI/link: Re-add BW notification portdrv as PCIe BW controller
  thermal: Add PCIe cooling driver
  selftests/pcie_bwctrl: Create selftests

 Documentation/PCI/pciebus-howto.rst           |  14 +-
 MAINTAINERS                                   |   9 +
 drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
 drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
 drivers/gpu/drm/radeon/cik.c                  |  40 +--
 drivers/gpu/drm/radeon/si.c                   |  40 +--
 drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
 drivers/pci/pci.c                             |  59 ++--
 drivers/pci/pcie/Kconfig                      |  12 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/bwctrl.c                     | 269 ++++++++++++++++++
 drivers/pci/pcie/portdrv.c                    |   9 +-
 drivers/pci/pcie/portdrv.h                    |  10 +-
 drivers/pci/probe.c                           |   8 +
 drivers/pci/remove.c                          |   3 +
 drivers/thermal/Kconfig                       |  10 +
 drivers/thermal/Makefile                      |   2 +
 drivers/thermal/pcie_cooling.c                | 107 +++++++
 include/linux/pci-bwctrl.h                    |  33 +++
 include/linux/pci.h                           |  11 +
 include/uapi/linux/pci_regs.h                 |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 ++++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 +++++
 25 files changed, 789 insertions(+), 153 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c
 create mode 100644 drivers/thermal/pcie_cooling.c
 create mode 100644 include/linux/pci-bwctrl.h
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

Comments

Christophe JAILLET Jan. 5, 2024, 11:56 a.m. UTC | #1
Le 05/01/2024 à 12:25, Ilpo Järvinen a écrit :
> Add a thermal cooling driver to provide path to access PCIe bandwidth
> controller using the usual thermal interfaces.
> 
> A cooling device is instantiated for controllable PCIe Ports from the
> bwctrl service driver.
> 
> The thermal side state 0 means no throttling, i.e., maximum supported
> PCIe Link Speed.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Acked-by: Rafael J. Wysocki <rafael@kernel.org> # From the cooling device interface perspective
> ---

...

> +struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
> +							    struct pcie_device *pdev)
> +{
> +	struct pcie_cooling_device *pcie_cdev;
> +	struct thermal_cooling_device *cdev;
> +	size_t name_len;
> +	char *name;
> +
> +	pcie_cdev = kzalloc(sizeof(*pcie_cdev), GFP_KERNEL);
> +	if (!pcie_cdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pcie_cdev->port = port;
> +	pcie_cdev->pdev = pdev;
> +
> +	name_len = strlen(COOLING_DEV_TYPE_PREFIX) + strlen(pci_name(port)) + 1;
> +	name = kzalloc(name_len, GFP_KERNEL);
> +	if (!name) {
> +		kfree(pcie_cdev);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	snprintf(name, name_len, COOLING_DEV_TYPE_PREFIX "%s", pci_name(port));

Nit: kasprintf() ?

> +	cdev = thermal_cooling_device_register(name, pcie_cdev, &pcie_cooling_ops);
> +	kfree(name);
> +
> +	return cdev;
> +}