mbox series

[v4,0/7] Add managed SOFT RESERVE resource handling

Message ID 20250603221949.53272-1-Smita.KoralahalliChannabasappa@amd.com
Headers show
Series Add managed SOFT RESERVE resource handling | expand

Message

Koralahalli Channabasappa, Smita June 3, 2025, 10:19 p.m. UTC
Add the ability to manage SOFT RESERVE iomem resources prior to them being
added to the iomem resource tree. This allows drivers, such as CXL, to
remove any pieces of the SOFT RESERVE resource that intersect with created
CXL regions.

The current approach of leaving the SOFT RESERVE resources as is can cause
failures during hotplug of devices, such as CXL, because the resource is
not available for reuse after teardown of the device.

The approach is to add SOFT RESERVE resources to a separate tree during
boot. This allows any drivers to update the SOFT RESERVE resources before
they are merged into the iomem resource tree. In addition a notifier chain
is added so that drivers can be notified when these SOFT RESERVE resources
are added to the ioeme resource tree.

The CXL driver is modified to use a worker thread that waits for the CXL
PCI and CXL mem drivers to be loaded and for their probe routine to
complete. Then the driver walks through any created CXL regions to trim any
intersections with SOFT RESERVE resources in the iomem tree.

The dax driver uses the new soft reserve notifier chain so it can consume
any remaining SOFT RESERVES once they're added to the iomem tree.

The following scenarios have been tested:

Example 1: Exact alignment, soft reserved is a child of the region

|---------- "Soft Reserved" -----------|
|-------------- "Region #" ------------|

Before:
  1050000000-304fffffff : CXL Window 0
    1050000000-304fffffff : region0
      1050000000-304fffffff : Soft Reserved
        1080000000-2fffffffff : dax0.0
          1080000000-2fffffffff : System RAM (kmem)

After:
  1050000000-304fffffff : CXL Window 0
    1050000000-304fffffff : region1
      1080000000-2fffffffff : dax0.0
        1080000000-2fffffffff : System RAM (kmem)

Example 2: Start and/or end aligned and soft reserved spans multiple
regions

|----------- "Soft Reserved" -----------|
|-------- "Region #" -------|
or
|----------- "Soft Reserved" -----------|
|-------- "Region #" -------|

Before:
  850000000-684fffffff : Soft Reserved
    850000000-284fffffff : CXL Window 0
      850000000-284fffffff : region3
        850000000-284fffffff : dax0.0
          850000000-284fffffff : System RAM (kmem)
    2850000000-484fffffff : CXL Window 1
      2850000000-484fffffff : region4
        2850000000-484fffffff : dax1.0
          2850000000-484fffffff : System RAM (kmem)
    4850000000-684fffffff : CXL Window 2
      4850000000-684fffffff : region5
        4850000000-684fffffff : dax2.0
          4850000000-684fffffff : System RAM (kmem)

After:
  850000000-284fffffff : CXL Window 0
    850000000-284fffffff : region3
      850000000-284fffffff : dax0.0
        850000000-284fffffff : System RAM (kmem)
  2850000000-484fffffff : CXL Window 1
    2850000000-484fffffff : region4
      2850000000-484fffffff : dax1.0
        2850000000-484fffffff : System RAM (kmem)
  4850000000-684fffffff : CXL Window 2
    4850000000-684fffffff : region5
      4850000000-684fffffff : dax2.0
        4850000000-684fffffff : System RAM (kmem)

Example 3: No alignment
|---------- "Soft Reserved" ----------|
	|---- "Region #" ----|

Before:
  00000000-3050000ffd : Soft Reserved
    ..
    ..
    1050000000-304fffffff : CXL Window 0
      1050000000-304fffffff : region1
        1080000000-2fffffffff : dax0.0
          1080000000-2fffffffff : System RAM (kmem)

After:
  00000000-104fffffff : Soft Reserved
    ..
    ..
  1050000000-304fffffff : CXL Window 0
    1050000000-304fffffff : region1
      1080000000-2fffffffff : dax0.0
        1080000000-2fffffffff : System RAM (kmem)
  3050000000-3050000ffd : Soft Reserved

v4 updates:
 - Split first patch into 4 smaller patches.
 - Correct the logic for cxl_pci_loaded() and cxl_mem_active() to return
   false at default instead of true.
 - Cleanup cxl_wait_for_pci_mem() to remove config checks for cxl_pci
   and cxl_mem.
 - Fixed multiple bugs and build issues which includes correcting
   walk_iomem_resc_desc() and calculations of alignments.
 
v3 updates:
 - Remove srmem resource tree from kernel/resource.c, this is no longer
   needed in the current implementation. All SOFT RESERVE resources now
   put on the iomem resource tree.
 - Remove the no longer needed SOFT_RESERVED_MANAGED kernel config option.
 - Add the 'nid' parameter back to hmem_register_resource();
 - Remove the no longer used soft reserve notification chain (introduced
   in v2). The dax driver is now notified of SOFT RESERVED resources by
   the CXL driver.

v2 updates:
 - Add config option SOFT_RESERVE_MANAGED to control use of the
   separate srmem resource tree at boot.
 - Only add SOFT RESERVE resources to the soft reserve tree during
   boot, they go to the iomem resource tree after boot.
 - Remove the resource trimming code in the previous patch to re-use
   the existing code in kernel/resource.c
 - Add functionality for the cxl acpi driver to wait for the cxl PCI
   and me drivers to load.

Smita Koralahalli (7):
  cxl/region: Avoid null pointer dereference in is_cxl_region()
  cxl/core: Remove CONFIG_CXL_SUSPEND and always build suspend.o
  cxl/pci: Add pci_loaded tracking to mark PCI driver readiness
  cxl/acpi: Add background worker to wait for cxl_pci and cxl_mem probe
  cxl/region: Introduce SOFT RESERVED resource removal on region
    teardown
  dax/hmem: Save the DAX HMEM platform device pointer
  cxl/dax: Defer DAX consumption of SOFT RESERVED resources until after
    CXL region creation

 drivers/cxl/Kconfig        |   4 -
 drivers/cxl/acpi.c         |  25 ++++++
 drivers/cxl/core/Makefile  |   2 +-
 drivers/cxl/core/region.c  | 163 ++++++++++++++++++++++++++++++++++++-
 drivers/cxl/core/suspend.c |  34 +++++++-
 drivers/cxl/cxl.h          |   7 ++
 drivers/cxl/cxlmem.h       |   9 --
 drivers/cxl/cxlpci.h       |   1 +
 drivers/cxl/pci.c          |   2 +
 drivers/dax/hmem/device.c  |  47 +++++------
 drivers/dax/hmem/hmem.c    |  10 ++-
 include/linux/dax.h        |  11 ++-
 include/linux/pm.h         |   7 --
 13 files changed, 270 insertions(+), 52 deletions(-)

Comments

Zhijian Li (Fujitsu) June 4, 2025, 8:43 a.m. UTC | #1
Smita,

Thanks for your awesome work. I just tested the scenarios you listed, and they work as expected. Thanks again.
(Minor comments inlined)

Tested-by: Li Zhijian <lizhijian@fujitsu.com>


To the CXL community,

The scenarios mentioned here essentially cover what a correct firmware may provide. However,
I would like to discuss one more scenario that I can simulate with a modified QEMU:
The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot.

As an example, after boot, the iomem tree is as follows:
1050000000-304fffffff : CXL Window 0
   1050000000-304fffffff : Soft Reserved
     <No region>

In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region.
My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch?




On 04/06/2025 06:19, Smita Koralahalli wrote:
> Add the ability to manage SOFT RESERVE iomem resources prior to them being
> added to the iomem resource tree. This allows drivers, such as CXL, to
> remove any pieces of the SOFT RESERVE resource that intersect with created
> CXL regions.
> 
> The current approach of leaving the SOFT RESERVE resources as is can cause
> failures during hotplug of devices, such as CXL, because the resource is
> not available for reuse after teardown of the device.
> 
> The approach is to add SOFT RESERVE resources to a separate tree during
> boot.

No special tree at all since V3


> This allows any drivers to update the SOFT RESERVE resources before
> they are merged into the iomem resource tree. In addition a notifier chain
> is added so that drivers can be notified when these SOFT RESERVE resources
> are added to the ioeme resource tree.
> 
> The CXL driver is modified to use a worker thread that waits for the CXL
> PCI and CXL mem drivers to be loaded and for their probe routine to
> complete. Then the driver walks through any created CXL regions to trim any
> intersections with SOFT RESERVE resources in the iomem tree.
> 
> The dax driver uses the new soft reserve notifier chain so it can consume
> any remaining SOFT RESERVES once they're added to the iomem tree.
> 
> The following scenarios have been tested:
> 
> Example 1: Exact alignment, soft reserved is a child of the region
> 
> |---------- "Soft Reserved" -----------|
> |-------------- "Region #" ------------|
> 
> Before:
>    1050000000-304fffffff : CXL Window 0
>      1050000000-304fffffff : region0
>        1050000000-304fffffff : Soft Reserved
>          1080000000-2fffffffff : dax0.0

BTW, I'm curious how to set up a dax with an address range different from its corresponding region.


>            1080000000-2fffffffff : System RAM (kmem)
> 
> After:
>    1050000000-304fffffff : CXL Window 0
>      1050000000-304fffffff : region1
>        1080000000-2fffffffff : dax0.0
>          1080000000-2fffffffff : System RAM (kmem)
> 
> Example 2: Start and/or end aligned and soft reserved spans multiple
> regions

Tested

> 
> |----------- "Soft Reserved" -----------|
> |-------- "Region #" -------|
> or
> |----------- "Soft Reserved" -----------|
> |-------- "Region #" -------|

Typo? should be:
|----------- "Soft Reserved" -----------|
             |-------- "Region #" -------|

> 
> Example 3: No alignment
> |---------- "Soft Reserved" ----------|
> 	|---- "Region #" ----|

Tested.


Thanks
Zhijian
Zhijian Li (Fujitsu) June 4, 2025, 9:29 a.m. UTC | #2
On 04/06/2025 06:19, Smita Koralahalli wrote:
> Introduce a pci_loaded flag similar to mem_active, and define
> mark_cxl_pci_loaded() to indicate when the PCI driver has initialized.
> 
> This will be used by other CXL components, such as cxl_acpi and the soft
> reserved resource handling logic, to coordinate initialization and ensure
> that dependent operations proceed only after both cxl_pci and cxl_mem
> drivers are ready.
> 
> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>   drivers/cxl/core/suspend.c | 8 ++++++++
>   drivers/cxl/cxlpci.h       | 1 +
>   drivers/cxl/pci.c          | 2 ++
>   3 files changed, 11 insertions(+)
> 
> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
> index 5ba4b4de0e33..72818a2c8ec8 100644
> --- a/drivers/cxl/core/suspend.c
> +++ b/drivers/cxl/core/suspend.c
> @@ -3,8 +3,10 @@
>   #include <linux/atomic.h>
>   #include <linux/export.h>
>   #include "cxlmem.h"
> +#include "cxlpci.h"
>   
>   static atomic_t mem_active;
> +static atomic_t pci_loaded;


I find it odd to place these changes in suspend.c. Also, I noticed that in a
subsequent patch, DJ has mentioned (and I agree) that this patch is unnecessary.


Thanks
Zhijian


>   
>   bool cxl_mem_active(void)
>   {
> @@ -25,3 +27,9 @@ void cxl_mem_active_dec(void)
>   	atomic_dec(&mem_active);
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL");
> +
> +void mark_cxl_pci_loaded(void)
> +{
> +	atomic_inc(&pci_loaded);
> +}
> +EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL");
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 54e219b0049e..5a811ac63fcf 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -135,4 +135,5 @@ void read_cdat_data(struct cxl_port *port);
>   void cxl_cor_error_detected(struct pci_dev *pdev);
>   pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>   				    pci_channel_state_t state);
> +void mark_cxl_pci_loaded(void);
>   #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 785aa2af5eaa..b019bd324dba 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1189,6 +1189,8 @@ static int __init cxl_pci_driver_init(void)
>   	if (rc)
>   		pci_unregister_driver(&cxl_pci_driver);
>   
> +	mark_cxl_pci_loaded();
> +
>   	return rc;
>   }
>
Zhijian Li (Fujitsu) June 4, 2025, 9:40 a.m. UTC | #3
On 04/06/2025 06:19, Smita Koralahalli wrote:
>   drivers/cxl/acpi.c         | 23 +++++++++++++++++++++++
>   drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++
>   drivers/cxl/cxl.h          |  2 ++
>   3 files changed, 46 insertions(+)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index cb14829bb9be..978f63b32b41 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data)
>   	return 0;
>   }
>   
> +static void cxl_softreserv_mem_work_fn(struct work_struct *work)
> +{
> +	/* Wait for cxl_pci and cxl_mem drivers to load */
> +	cxl_wait_for_pci_mem();
> +
> +	/*
> +	 * Wait for the driver probe routines to complete after cxl_pci
> +	 * and cxl_mem drivers are loaded.
> +	 */
> +	wait_for_device_probe();
> +}
> +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn);
> +
> +static void cxl_softreserv_mem_update(void)
> +{
> +	schedule_work(&cxl_sr_work);
> +}
> +
>   static int cxl_acpi_probe(struct platform_device *pdev)
>   {
>   	int rc;
> @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>   
>   	/* In case PCI is scanned before ACPI re-trigger memdev attach */
>   	cxl_bus_rescan();
> +
> +	/* Update SOFT RESERVE resources that intersect with CXL regions */
> +	cxl_softreserv_mem_update();
> +
>   	return 0;
>   }
>   
> @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void)
>   
>   static void __exit cxl_acpi_exit(void)
>   {
> +	cancel_work_sync(&cxl_sr_work);
>   	platform_driver_unregister(&cxl_acpi_driver);
>   	cxl_bus_drain();
>   }
> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
> index 72818a2c8ec8..c0d8f70aed56 100644
> --- a/drivers/cxl/core/suspend.c
> +++ b/drivers/cxl/core/suspend.c
> @@ -2,12 +2,15 @@
>   /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>   #include <linux/atomic.h>
>   #include <linux/export.h>
> +#include <linux/wait.h>
>   #include "cxlmem.h"
>   #include "cxlpci.h"
>   
>   static atomic_t mem_active;
>   static atomic_t pci_loaded;
>   
> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue);

Given that this file (suspend.c) focuses on power management functions,
it might be more appropriate to move the wait queue declaration and its
related changes to acpi.c in where the its caller is.


Thanks
Zhijian

> +
>   bool cxl_mem_active(void)
>   {
>   	if (IS_ENABLED(CONFIG_CXL_MEM))
> @@ -19,6 +22,7 @@ bool cxl_mem_active(void)
>   void cxl_mem_active_inc(void)
>   {
>   	atomic_inc(&mem_active);
> +	wake_up(&cxl_wait_queue);
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL");
>   
> @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL");
>   
> +static bool cxl_pci_loaded(void)
> +{
> +	if (IS_ENABLED(CONFIG_CXL_PCI))
> +		return atomic_read(&pci_loaded) != 0;
> +
> +	return false;
> +}
> +
>   void mark_cxl_pci_loaded(void)
>   {
>   	atomic_inc(&pci_loaded);
> +	wake_up(&cxl_wait_queue);
>   }
>   EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL");
> +
> +void cxl_wait_for_pci_mem(void)
> +{
> +	if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() &&
> +				cxl_mem_active(), 30 * HZ))
> +		pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n");
> +}
Dave Jiang June 4, 2025, 2:35 p.m. UTC | #4
On 6/4/25 2:40 AM, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 04/06/2025 06:19, Smita Koralahalli wrote:
>>   drivers/cxl/acpi.c         | 23 +++++++++++++++++++++++
>>   drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++
>>   drivers/cxl/cxl.h          |  2 ++
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index cb14829bb9be..978f63b32b41 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data)
>>   	return 0;
>>   }
>>   
>> +static void cxl_softreserv_mem_work_fn(struct work_struct *work)
>> +{
>> +	/* Wait for cxl_pci and cxl_mem drivers to load */
>> +	cxl_wait_for_pci_mem();
>> +
>> +	/*
>> +	 * Wait for the driver probe routines to complete after cxl_pci
>> +	 * and cxl_mem drivers are loaded.
>> +	 */
>> +	wait_for_device_probe();
>> +}
>> +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn);
>> +
>> +static void cxl_softreserv_mem_update(void)
>> +{
>> +	schedule_work(&cxl_sr_work);
>> +}
>> +
>>   static int cxl_acpi_probe(struct platform_device *pdev)
>>   {
>>   	int rc;
>> @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>>   
>>   	/* In case PCI is scanned before ACPI re-trigger memdev attach */
>>   	cxl_bus_rescan();
>> +
>> +	/* Update SOFT RESERVE resources that intersect with CXL regions */
>> +	cxl_softreserv_mem_update();
>> +
>>   	return 0;
>>   }
>>   
>> @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void)
>>   
>>   static void __exit cxl_acpi_exit(void)
>>   {
>> +	cancel_work_sync(&cxl_sr_work);
>>   	platform_driver_unregister(&cxl_acpi_driver);
>>   	cxl_bus_drain();
>>   }
>> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
>> index 72818a2c8ec8..c0d8f70aed56 100644
>> --- a/drivers/cxl/core/suspend.c
>> +++ b/drivers/cxl/core/suspend.c
>> @@ -2,12 +2,15 @@
>>   /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>>   #include <linux/atomic.h>
>>   #include <linux/export.h>
>> +#include <linux/wait.h>
>>   #include "cxlmem.h"
>>   #include "cxlpci.h"
>>   
>>   static atomic_t mem_active;
>>   static atomic_t pci_loaded;
>>   
>> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue);
> 
> Given that this file (suspend.c) focuses on power management functions,
> it might be more appropriate to move the wait queue declaration and its
> related changes to acpi.c in where the its caller is.

You mean drivers/cxl/acpi.c and not drivers/cxl/core/acpi.c right? The core one is my mistake and I'm going to create a patch to remove that.

DJ

> 
> 
> Thanks
> Zhijian
> 
>> +
>>   bool cxl_mem_active(void)
>>   {
>>   	if (IS_ENABLED(CONFIG_CXL_MEM))
>> @@ -19,6 +22,7 @@ bool cxl_mem_active(void)
>>   void cxl_mem_active_inc(void)
>>   {
>>   	atomic_inc(&mem_active);
>> +	wake_up(&cxl_wait_queue);
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL");
>>   
>> @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL");
>>   
>> +static bool cxl_pci_loaded(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_CXL_PCI))
>> +		return atomic_read(&pci_loaded) != 0;
>> +
>> +	return false;
>> +}
>> +
>>   void mark_cxl_pci_loaded(void)
>>   {
>>   	atomic_inc(&pci_loaded);
>> +	wake_up(&cxl_wait_queue);
>>   }
>>   EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL");
>> +
>> +void cxl_wait_for_pci_mem(void)
>> +{
>> +	if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() &&
>> +				cxl_mem_active(), 30 * HZ))
>> +		pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n");
>> +}
Koralahalli Channabasappa, Smita June 4, 2025, 6:31 p.m. UTC | #5
On 6/3/2025 4:45 PM, Dave Jiang wrote:
> 
> 
> On 6/3/25 3:19 PM, Smita Koralahalli wrote:
>> Introduce a waitqueue mechanism to coordinate initialization between the
>> cxl_pci and cxl_mem drivers.
>>
>> Launch a background worker from cxl_acpi_probe() that waits for both
>> drivers to complete initialization before invoking wait_for_device_probe().
>> Without this, the probe completion wait could begin prematurely, before
>> the drivers are present, leading to missed updates.
>>
>> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
>> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
>> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   drivers/cxl/acpi.c         | 23 +++++++++++++++++++++++
>>   drivers/cxl/core/suspend.c | 21 +++++++++++++++++++++
>>   drivers/cxl/cxl.h          |  2 ++
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index cb14829bb9be..978f63b32b41 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -813,6 +813,24 @@ static int pair_cxl_resource(struct device *dev, void *data)
>>   	return 0;
>>   }
>>   
>> +static void cxl_softreserv_mem_work_fn(struct work_struct *work)
>> +{
>> +	/* Wait for cxl_pci and cxl_mem drivers to load */
>> +	cxl_wait_for_pci_mem();
>> +
>> +	/*
>> +	 * Wait for the driver probe routines to complete after cxl_pci
>> +	 * and cxl_mem drivers are loaded.
>> +	 */
>> +	wait_for_device_probe();
>> +}
>> +static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn);
>> +
>> +static void cxl_softreserv_mem_update(void)
>> +{
>> +	schedule_work(&cxl_sr_work);
>> +}
>> +
>>   static int cxl_acpi_probe(struct platform_device *pdev)
>>   {
>>   	int rc;
>> @@ -887,6 +905,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>>   
>>   	/* In case PCI is scanned before ACPI re-trigger memdev attach */
>>   	cxl_bus_rescan();
>> +
>> +	/* Update SOFT RESERVE resources that intersect with CXL regions */
>> +	cxl_softreserv_mem_update();
>> +
>>   	return 0;
>>   }
>>   
>> @@ -918,6 +940,7 @@ static int __init cxl_acpi_init(void)
>>   
>>   static void __exit cxl_acpi_exit(void)
>>   {
>> +	cancel_work_sync(&cxl_sr_work);
>>   	platform_driver_unregister(&cxl_acpi_driver);
>>   	cxl_bus_drain();
>>   }
>> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
>> index 72818a2c8ec8..c0d8f70aed56 100644
>> --- a/drivers/cxl/core/suspend.c
>> +++ b/drivers/cxl/core/suspend.c
>> @@ -2,12 +2,15 @@
>>   /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>>   #include <linux/atomic.h>
>>   #include <linux/export.h>
>> +#include <linux/wait.h>
>>   #include "cxlmem.h"
>>   #include "cxlpci.h"
>>   
>>   static atomic_t mem_active;
>>   static atomic_t pci_loaded;
>>   
>> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue);
>> +
>>   bool cxl_mem_active(void)
>>   {
>>   	if (IS_ENABLED(CONFIG_CXL_MEM))
>> @@ -19,6 +22,7 @@ bool cxl_mem_active(void)
>>   void cxl_mem_active_inc(void)
>>   {
>>   	atomic_inc(&mem_active);
>> +	wake_up(&cxl_wait_queue);
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_inc, "CXL");
>>   
>> @@ -28,8 +32,25 @@ void cxl_mem_active_dec(void)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_mem_active_dec, "CXL");
>>   
>> +static bool cxl_pci_loaded(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_CXL_PCI))
>> +		return atomic_read(&pci_loaded) != 0;
>> +
>> +	return false;
>> +}
>> +
>>   void mark_cxl_pci_loaded(void)
>>   {
>>   	atomic_inc(&pci_loaded);
>> +	wake_up(&cxl_wait_queue);
>>   }
>>   EXPORT_SYMBOL_NS_GPL(mark_cxl_pci_loaded, "CXL");
>> +
>> +void cxl_wait_for_pci_mem(void)
>> +{
>> +	if (!wait_event_timeout(cxl_wait_queue, cxl_pci_loaded() &&
>> +				cxl_mem_active(), 30 * HZ))
> 
> I'm trying to understand why cxl_pci_loaded() is needed. cxl_mem_active() goes above 0 when a cxl_mem_probe() instance succeeds. cxl_mem_probe() being triggered implies that an instance of cxl_pci_probe() has been called since cxl_mem_probe() is triggered from devm_cxl_add_memdev() with memdev being added and cxl_mem driver also have been loaded. So does cxl_mem_active() not imply cxl_pci_loaded() and makes it unnecessary?

Yeah you are right. I will remove this check.

Thanks
Smita
> 
> DJ
> 
> 
>> +		pr_debug("Timeout waiting for cxl_pci or cxl_mem probing\n");
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_wait_for_pci_mem, "CXL");
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index a9ab46eb0610..1ba7d39c2991 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -902,6 +902,8 @@ void cxl_coordinates_combine(struct access_coordinate *out,
>>   
>>   bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
>>   
>> +void cxl_wait_for_pci_mem(void);
>> +
>>   /*
>>    * Unit test builds overrides this to __weak, find the 'strong' version
>>    * of these symbols in tools/testing/cxl/.
>
Koralahalli Channabasappa, Smita June 4, 2025, 6:59 p.m. UTC | #6
Hi Zhijian,

Thanks for testing my patches.

On 6/4/2025 1:43 AM, Zhijian Li (Fujitsu) wrote:
> Smita,
> 
> Thanks for your awesome work. I just tested the scenarios you listed, and they work as expected. Thanks again.
> (Minor comments inlined)
> 
> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> 
> To the CXL community,
> 
> The scenarios mentioned here essentially cover what a correct firmware may provide. However,
> I would like to discuss one more scenario that I can simulate with a modified QEMU:
> The E820 exposes a SOFT RESERVED region which is the same as a CFMW, but the HDM decoders are not committed. This means no region will be auto-created during boot.
> 
> As an example, after boot, the iomem tree is as follows:
> 1050000000-304fffffff : CXL Window 0
>     1050000000-304fffffff : Soft Reserved
>       <No region>
> 
> In this case, the SOFT RESERVED resource is not trimmed, so the end-user cannot create a new region.
> My question is: Is this scenario a problem? If it is, should we fix it in this patchset or create a new patch?
> 

I believe firmware should handle this correctly by ensuring that any 
exposed SOFT RESERVED ranges correspond to committed HDM decoders and 
result in region creation.

That said, I’d be interested in hearing what the rest of the community 
thinks.

> 
> 
> 
> On 04/06/2025 06:19, Smita Koralahalli wrote:
>> Add the ability to manage SOFT RESERVE iomem resources prior to them being
>> added to the iomem resource tree. This allows drivers, such as CXL, to
>> remove any pieces of the SOFT RESERVE resource that intersect with created
>> CXL regions.
>>
>> The current approach of leaving the SOFT RESERVE resources as is can cause
>> failures during hotplug of devices, such as CXL, because the resource is
>> not available for reuse after teardown of the device.
>>
>> The approach is to add SOFT RESERVE resources to a separate tree during
>> boot.
> 
> No special tree at all since V3

Will make changes. I overlooked the cover letter.

> 
> 
>> This allows any drivers to update the SOFT RESERVE resources before
>> they are merged into the iomem resource tree. In addition a notifier chain
>> is added so that drivers can be notified when these SOFT RESERVE resources
>> are added to the ioeme resource tree.
>>
>> The CXL driver is modified to use a worker thread that waits for the CXL
>> PCI and CXL mem drivers to be loaded and for their probe routine to
>> complete. Then the driver walks through any created CXL regions to trim any
>> intersections with SOFT RESERVE resources in the iomem tree.
>>
>> The dax driver uses the new soft reserve notifier chain so it can consume
>> any remaining SOFT RESERVES once they're added to the iomem tree.
>>
>> The following scenarios have been tested:
>>
>> Example 1: Exact alignment, soft reserved is a child of the region
>>
>> |---------- "Soft Reserved" -----------|
>> |-------------- "Region #" ------------|
>>
>> Before:
>>     1050000000-304fffffff : CXL Window 0
>>       1050000000-304fffffff : region0
>>         1050000000-304fffffff : Soft Reserved
>>           1080000000-2fffffffff : dax0.0
> 
> BTW, I'm curious how to set up a dax with an address range different from its corresponding region.

Hmm, this configuration was provided directly by our BIOS. The DAX 
device was mapped to a subset of the region's address space as part of 
the platform's firmware setup, so I did not explicitly configure it..

> 
> 
>>             1080000000-2fffffffff : System RAM (kmem)
>>
>> After:
>>     1050000000-304fffffff : CXL Window 0
>>       1050000000-304fffffff : region1
>>         1080000000-2fffffffff : dax0.0
>>           1080000000-2fffffffff : System RAM (kmem)
>>
>> Example 2: Start and/or end aligned and soft reserved spans multiple
>> regions
> 
> Tested
> 
>>
>> |----------- "Soft Reserved" -----------|
>> |-------- "Region #" -------|
>> or
>> |----------- "Soft Reserved" -----------|
>> |-------- "Region #" -------|
> 
> Typo? should be:
> |----------- "Soft Reserved" -----------|
>               |-------- "Region #" -------|

Yeah, Will fix.

> 
>>
>> Example 3: No alignment
>> |---------- "Soft Reserved" ----------|
>> 	|---- "Region #" ----|
> 
> Tested.
> 
> 
> Thanks
> Zhijian

Thanks
Smita
Zhijian Li (Fujitsu) June 5, 2025, 2:18 a.m. UTC | #7
On 04/06/2025 22:35, Dave Jiang wrote:
>>>    
>>> +static DECLARE_WAIT_QUEUE_HEAD(cxl_wait_queue);
>> Given that this file (suspend.c) focuses on power management functions,
>> it might be more appropriate to move the wait queue declaration and its
>> related changes to acpi.c in where the its caller is.
> You mean drivers/cxl/acpi.c and not drivers/cxl/core/acpi.c right? 

Yes


> The core one is my mistake and I'm going to create a patch to remove that.


Completely missed its existence - thank you for catching and cleaning that up!

Thanks
Zhijian


> 
> DJ
Zhijian Li (Fujitsu) June 6, 2025, 4:16 a.m. UTC | #8
The term "region teardown" in the subject line appears inaccurate.


As I understand, cxl_region_softreserv_update() should work only
when the region is ready, but the current logic only guarantees that the *memdev* is ready.

Is there truly no timing gap between region readiness and memdev readiness?
If this assumption is true, could we document this relationship in both the commit log and code comments?


Thanks
Zhijian


On 04/06/2025 06:19, Smita Koralahalli wrote:
> Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
> 
> Previously, when CXL regions were created through autodiscovery and their
> resources overlapped with SOFT RESERVED ranges, the soft reserved resource
> remained in place after region teardown. This left the HPA range
> unavailable for reuse even after the region was destroyed.
> 
> Enhance the logic to reliably remove SOFT RESERVED resources associated
> with a region, regardless of alignment or hierarchy in the iomem tree.
> 
> Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
> Co-developed-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>   drivers/cxl/acpi.c        |   2 +
>   drivers/cxl/core/region.c | 151 ++++++++++++++++++++++++++++++++++++++
>   drivers/cxl/cxl.h         |   5 ++
>   3 files changed, 158 insertions(+)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 978f63b32b41..1b1388feb36d 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -823,6 +823,8 @@ static void cxl_softreserv_mem_work_fn(struct work_struct *work)
>   	 * and cxl_mem drivers are loaded.
>   	 */
>   	wait_for_device_probe();
> +
> +	cxl_region_softreserv_update();
>   }
>   static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn);
>   
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 109b8a98c4c7..3a5ca44d65f3 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3443,6 +3443,157 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
>   }
>   EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
>   
> +static int add_soft_reserved(resource_size_t start, resource_size_t len,
> +			     unsigned long flags)
> +{
> +	struct resource *res = kmalloc(sizeof(*res), GFP_KERNEL);
> +	int rc;
> +
> +	if (!res)
> +		return -ENOMEM;
> +
> +	*res = DEFINE_RES_MEM_NAMED(start, len, "Soft Reserved");
> +
> +	res->desc = IORES_DESC_SOFT_RESERVED;
> +	res->flags = flags;
> +	rc = insert_resource(&iomem_resource, res);
> +	if (rc) {
> +		kfree(res);
> +		return rc;
> +	}
> +
> +	return 0;
> +}
> +
> +static void remove_soft_reserved(struct cxl_region *cxlr, struct resource *soft,
> +				 resource_size_t start, resource_size_t end)
> +{
> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> +	resource_size_t new_start, new_end;
> +	int rc;
> +
> +	/* Prevent new usage while removing or adjusting the resource */
> +	guard(mutex)(&cxlrd->range_lock);
> +
> +	/* Aligns at both resource start and end */
> +	if (soft->start == start && soft->end == end)
> +		goto remove;
> +
> +	/* Aligns at either resource start or end */
> +	if (soft->start == start || soft->end == end) {
> +		if (soft->start == start) {
> +			new_start = end + 1;
> +			new_end = soft->end;
> +		} else {
> +			new_start = soft->start;
> +			new_end = start - 1;
> +		}
> +
> +		rc = add_soft_reserved(new_start, new_end - new_start + 1,
> +				       soft->flags);
> +		if (rc)
> +			dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa\n",
> +				 &new_start);
> +
> +		/* Remove the original Soft Reserved resource */
> +		goto remove;
> +	}
> +
> +	/*
> +	 * No alignment. Attempt a 3-way split that removes the part of
> +	 * the resource the region occupied, and then creates new soft
> +	 * reserved resources for the leading and trailing addr space.
> +	 */
> +	new_start = soft->start;
> +	new_end = soft->end;
> +
> +	rc = add_soft_reserved(new_start, start - new_start, soft->flags);
> +	if (rc)
> +		dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa\n",
> +			 &new_start);
> +
> +	rc = add_soft_reserved(end + 1, new_end - end, soft->flags);
> +	if (rc)
> +		dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa + 1\n",
> +			 &end);
> +
> +remove:
> +	rc = remove_resource(soft);
> +	if (rc)
> +		dev_warn(&cxlr->dev, "cannot remove soft reserved resource %pr\n",
> +			 soft);
> +}
> +
> +/*
> + * normalize_resource
> + *
> + * The walk_iomem_res_desc() returns a copy of a resource, not a reference
> + * to the actual resource in the iomem_resource tree. As a result,
> + * __release_resource() which relies on pointer equality will fail.
> + *
> + * This helper walks the children of the resource's parent to find and
> + * return the original resource pointer that matches the given resource's
> + * start and end addresses.
> + *
> + * Return: Pointer to the matching original resource in iomem_resource, or
> + *         NULL if not found or invalid input.
> + */
> +static struct resource *normalize_resource(struct resource *res)
> +{
> +	if (!res || !res->parent)
> +		return NULL;
> +
> +	for (struct resource *res_iter = res->parent->child;
> +	     res_iter != NULL; res_iter = res_iter->sibling) {
> +		if ((res_iter->start == res->start) &&
> +		    (res_iter->end == res->end))
> +			return res_iter;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int __cxl_region_softreserv_update(struct resource *soft,
> +					  void *_cxlr)
> +{
> +	struct cxl_region *cxlr = _cxlr;
> +	struct resource *res = cxlr->params.res;
> +
> +	/* Skip non-intersecting soft-reserved regions */
> +	if (soft->end < res->start || soft->start > res->end)
> +		return 0;
> +
> +	soft = normalize_resource(soft);
> +	if (!soft)
> +		return -EINVAL;
> +
> +	remove_soft_reserved(cxlr, soft, res->start, res->end);
> +
> +	return 0;
> +}
> +
> +int cxl_region_softreserv_update(void)
> +{
> +	struct device *dev = NULL;
> +
> +	while ((dev = bus_find_next_device(&cxl_bus_type, dev))) {
> +		struct device *put_dev __free(put_device) = dev;
> +		struct cxl_region *cxlr;
> +
> +		if (!is_cxl_region(dev))
> +			continue;
> +
> +		cxlr = to_cxl_region(dev);
> +
> +		walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
> +				    IORESOURCE_MEM, 0, -1, cxlr,
> +				    __cxl_region_softreserv_update);
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_region_softreserv_update, "CXL");
> +
>   u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
>   {
>   	struct cxl_region_ref *iter;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1ba7d39c2991..fc39c4b24745 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -859,6 +859,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
>   int cxl_add_to_region(struct cxl_port *root,
>   		      struct cxl_endpoint_decoder *cxled);
>   struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
> +int cxl_region_softreserv_update(void);
>   u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
>   #else
>   static inline bool is_cxl_pmem_region(struct device *dev)
> @@ -878,6 +879,10 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
>   {
>   	return NULL;
>   }
> +static inline int cxl_region_softreserv_update(void)
> +{
> +	return 0;
> +}
>   static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
>   					       u64 spa)
>   {
Koralahalli Channabasappa, Smita June 6, 2025, 5:53 p.m. UTC | #9
On 6/5/2025 9:16 PM, Zhijian Li (Fujitsu) wrote:
> 
> The term "region teardown" in the subject line appears inaccurate.
> 
> 
> As I understand, cxl_region_softreserv_update() should work only
> when the region is ready, but the current logic only guarantees that the *memdev* is ready.
> 
> Is there truly no timing gap between region readiness and memdev readiness?
> If this assumption is true, could we document this relationship in both the commit log and code comments?
> 
> 
> Thanks
> Zhijian

I think you're right in pointing this out.

We cannot guarantee region readiness unless there's an explicit ordering
or dependency ensuring that cxl_region_probe() always completes after or 
synchronously with memdev probing and wait_for_device_probe().

I don't believe we currently have such a guarantee.

I was considering adding a null check for cxlr->params.res in 
__cxl_region_softreserv_update(), along with a comment like below:

static int __cxl_region_softreserv_update(struct resource *soft,
					  void *_cxlr)
{
	struct cxl_region *cxlr = _cxlr;
	struct resource *res = cxlr->params.res;

	/*
	 * Skip if the region has not yet been fully initialized.
	 *
	 * This code assumes all cxl_region devices have completed
	 * probing before this function runs, and that params.res is
	 * valid.
	 */

	if (!res)
		return 0;

..
..

}

This would prevent a null dereference and make the assumption around 
region readiness more explicit.

That said, I'd appreciate input from Terry, Nathan, or others on whether 
there's a better way to guarantee that region creation has completed by 
the time this function is invoked.

Thanks
Smita

> 
> 
> On 04/06/2025 06:19, Smita Koralahalli wrote:
>> Reworked from a patch by Alison Schofield <alison.schofield@intel.com>
>>
>> Previously, when CXL regions were created through autodiscovery and their
>> resources overlapped with SOFT RESERVED ranges, the soft reserved resource
>> remained in place after region teardown. This left the HPA range
>> unavailable for reuse even after the region was destroyed.
>>
>> Enhance the logic to reliably remove SOFT RESERVED resources associated
>> with a region, regardless of alignment or hierarchy in the iomem tree.
>>
>> Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
>> Co-developed-by: Alison Schofield <alison.schofield@intel.com>
>> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
>> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>    drivers/cxl/acpi.c        |   2 +
>>    drivers/cxl/core/region.c | 151 ++++++++++++++++++++++++++++++++++++++
>>    drivers/cxl/cxl.h         |   5 ++
>>    3 files changed, 158 insertions(+)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 978f63b32b41..1b1388feb36d 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -823,6 +823,8 @@ static void cxl_softreserv_mem_work_fn(struct work_struct *work)
>>    	 * and cxl_mem drivers are loaded.
>>    	 */
>>    	wait_for_device_probe();
>> +
>> +	cxl_region_softreserv_update();
>>    }
>>    static DECLARE_WORK(cxl_sr_work, cxl_softreserv_mem_work_fn);
>>    
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 109b8a98c4c7..3a5ca44d65f3 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -3443,6 +3443,157 @@ int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
>>    }
>>    EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL");
>>    
>> +static int add_soft_reserved(resource_size_t start, resource_size_t len,
>> +			     unsigned long flags)
>> +{
>> +	struct resource *res = kmalloc(sizeof(*res), GFP_KERNEL);
>> +	int rc;
>> +
>> +	if (!res)
>> +		return -ENOMEM;
>> +
>> +	*res = DEFINE_RES_MEM_NAMED(start, len, "Soft Reserved");
>> +
>> +	res->desc = IORES_DESC_SOFT_RESERVED;
>> +	res->flags = flags;
>> +	rc = insert_resource(&iomem_resource, res);
>> +	if (rc) {
>> +		kfree(res);
>> +		return rc;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void remove_soft_reserved(struct cxl_region *cxlr, struct resource *soft,
>> +				 resource_size_t start, resource_size_t end)
>> +{
>> +	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>> +	resource_size_t new_start, new_end;
>> +	int rc;
>> +
>> +	/* Prevent new usage while removing or adjusting the resource */
>> +	guard(mutex)(&cxlrd->range_lock);
>> +
>> +	/* Aligns at both resource start and end */
>> +	if (soft->start == start && soft->end == end)
>> +		goto remove;
>> +
>> +	/* Aligns at either resource start or end */
>> +	if (soft->start == start || soft->end == end) {
>> +		if (soft->start == start) {
>> +			new_start = end + 1;
>> +			new_end = soft->end;
>> +		} else {
>> +			new_start = soft->start;
>> +			new_end = start - 1;
>> +		}
>> +
>> +		rc = add_soft_reserved(new_start, new_end - new_start + 1,
>> +				       soft->flags);
>> +		if (rc)
>> +			dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa\n",
>> +				 &new_start);
>> +
>> +		/* Remove the original Soft Reserved resource */
>> +		goto remove;
>> +	}
>> +
>> +	/*
>> +	 * No alignment. Attempt a 3-way split that removes the part of
>> +	 * the resource the region occupied, and then creates new soft
>> +	 * reserved resources for the leading and trailing addr space.
>> +	 */
>> +	new_start = soft->start;
>> +	new_end = soft->end;
>> +
>> +	rc = add_soft_reserved(new_start, start - new_start, soft->flags);
>> +	if (rc)
>> +		dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa\n",
>> +			 &new_start);
>> +
>> +	rc = add_soft_reserved(end + 1, new_end - end, soft->flags);
>> +	if (rc)
>> +		dev_warn(&cxlr->dev, "cannot add new soft reserved resource at %pa + 1\n",
>> +			 &end);
>> +
>> +remove:
>> +	rc = remove_resource(soft);
>> +	if (rc)
>> +		dev_warn(&cxlr->dev, "cannot remove soft reserved resource %pr\n",
>> +			 soft);
>> +}
>> +
>> +/*
>> + * normalize_resource
>> + *
>> + * The walk_iomem_res_desc() returns a copy of a resource, not a reference
>> + * to the actual resource in the iomem_resource tree. As a result,
>> + * __release_resource() which relies on pointer equality will fail.
>> + *
>> + * This helper walks the children of the resource's parent to find and
>> + * return the original resource pointer that matches the given resource's
>> + * start and end addresses.
>> + *
>> + * Return: Pointer to the matching original resource in iomem_resource, or
>> + *         NULL if not found or invalid input.
>> + */
>> +static struct resource *normalize_resource(struct resource *res)
>> +{
>> +	if (!res || !res->parent)
>> +		return NULL;
>> +
>> +	for (struct resource *res_iter = res->parent->child;
>> +	     res_iter != NULL; res_iter = res_iter->sibling) {
>> +		if ((res_iter->start == res->start) &&
>> +		    (res_iter->end == res->end))
>> +			return res_iter;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int __cxl_region_softreserv_update(struct resource *soft,
>> +					  void *_cxlr)
>> +{
>> +	struct cxl_region *cxlr = _cxlr;
>> +	struct resource *res = cxlr->params.res;
>> +
>> +	/* Skip non-intersecting soft-reserved regions */
>> +	if (soft->end < res->start || soft->start > res->end)
>> +		return 0;
>> +
>> +	soft = normalize_resource(soft);
>> +	if (!soft)
>> +		return -EINVAL;
>> +
>> +	remove_soft_reserved(cxlr, soft, res->start, res->end);
>> +
>> +	return 0;
>> +}
>> +
>> +int cxl_region_softreserv_update(void)
>> +{
>> +	struct device *dev = NULL;
>> +
>> +	while ((dev = bus_find_next_device(&cxl_bus_type, dev))) {
>> +		struct device *put_dev __free(put_device) = dev;
>> +		struct cxl_region *cxlr;
>> +
>> +		if (!is_cxl_region(dev))
>> +			continue;
>> +
>> +		cxlr = to_cxl_region(dev);
>> +
>> +		walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
>> +				    IORESOURCE_MEM, 0, -1, cxlr,
>> +				    __cxl_region_softreserv_update);
>> +	}
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_region_softreserv_update, "CXL");
>> +
>>    u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa)
>>    {
>>    	struct cxl_region_ref *iter;
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 1ba7d39c2991..fc39c4b24745 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -859,6 +859,7 @@ struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
>>    int cxl_add_to_region(struct cxl_port *root,
>>    		      struct cxl_endpoint_decoder *cxled);
>>    struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
>> +int cxl_region_softreserv_update(void);
>>    u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
>>    #else
>>    static inline bool is_cxl_pmem_region(struct device *dev)
>> @@ -878,6 +879,10 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
>>    {
>>    	return NULL;
>>    }
>> +static inline int cxl_region_softreserv_update(void)
>> +{
>> +	return 0;
>> +}
>>    static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
>>    					       u64 spa)
>>    {
Jonathan Cameron June 9, 2025, 11:02 a.m. UTC | #10
On Tue, 3 Jun 2025 22:19:44 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> In preparation for soft-reserved resource handling, make the suspend
> infrastructure always available by removing the CONFIG_CXL_SUSPEND
> Kconfig option.
> 
> This ensures cxl_mem_active_inc()/dec() and cxl_mem_active() are
> unconditionally available, enabling coordination between cxl_pci and
> cxl_mem drivers during region setup and hotplug operations.

If these are no longer just being used for suspend, given there
is nothing else in the file, maybe move them to somewhere else?


> 
> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
>  drivers/cxl/Kconfig        | 4 ----
>  drivers/cxl/core/Makefile  | 2 +-
>  drivers/cxl/core/suspend.c | 5 ++++-
>  drivers/cxl/cxlmem.h       | 9 ---------
>  include/linux/pm.h         | 7 -------
>  5 files changed, 5 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index cf1ba673b8c2..d09144c2002e 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -118,10 +118,6 @@ config CXL_PORT
>  	default CXL_BUS
>  	tristate
>  
> -config CXL_SUSPEND
> -	def_bool y
> -	depends on SUSPEND && CXL_MEM
> -
>  config CXL_REGION
>  	bool "CXL: Region Support"
>  	default CXL_BUS
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 086df97a0fcf..035864db8a32 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_CXL_BUS) += cxl_core.o
> -obj-$(CONFIG_CXL_SUSPEND) += suspend.o
> +obj-y += suspend.o
>  
>  ccflags-y += -I$(srctree)/drivers/cxl
>  CFLAGS_trace.o = -DTRACE_INCLUDE_PATH=. -I$(src)
> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
> index 29aa5cc5e565..5ba4b4de0e33 100644
> --- a/drivers/cxl/core/suspend.c
> +++ b/drivers/cxl/core/suspend.c
> @@ -8,7 +8,10 @@ static atomic_t mem_active;
>  
>  bool cxl_mem_active(void)
>  {
> -	return atomic_read(&mem_active) != 0;
> +	if (IS_ENABLED(CONFIG_CXL_MEM))
> +		return atomic_read(&mem_active) != 0;
> +
> +	return false;
>  }
>  
>  void cxl_mem_active_inc(void)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 3ec6b906371b..1bd1e88c4cc0 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -853,17 +853,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  
> -#ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> -#else
> -static inline void cxl_mem_active_inc(void)
> -{
> -}
> -static inline void cxl_mem_active_dec(void)
> -{
> -}
> -#endif
>  
>  int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
>  
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index f0bd8fbae4f2..415928e0b6ca 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -35,14 +35,7 @@ static inline void pm_vt_switch_unregister(struct device *dev)
>  }
>  #endif /* CONFIG_VT_CONSOLE_SLEEP */
>  
> -#ifdef CONFIG_CXL_SUSPEND
>  bool cxl_mem_active(void);
> -#else
> -static inline bool cxl_mem_active(void)
> -{
> -	return false;
> -}
> -#endif
>  
>  /*
>   * Device power management
Koralahalli Channabasappa, Smita June 9, 2025, 11:25 p.m. UTC | #11
On 6/9/2025 4:02 AM, Jonathan Cameron wrote:
> On Tue, 3 Jun 2025 22:19:44 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:
> 
>> In preparation for soft-reserved resource handling, make the suspend
>> infrastructure always available by removing the CONFIG_CXL_SUSPEND
>> Kconfig option.
>>
>> This ensures cxl_mem_active_inc()/dec() and cxl_mem_active() are
>> unconditionally available, enabling coordination between cxl_pci and
>> cxl_mem drivers during region setup and hotplug operations.
> 
> If these are no longer just being used for suspend, given there
> is nothing else in the file, maybe move them to somewhere else?

There was recommendation to move the wait queue declaration and its
related changes to acpi.c. I was considering that. Let me know if there 
is any other best place for this.

Thanks
Smita
> 
> 
>>
>> Co-developed-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
>> Signed-off-by: Nathan Fontenot <Nathan.Fontenot@amd.com>
>> Co-developed-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
>> ---
>>   drivers/cxl/Kconfig        | 4 ----
>>   drivers/cxl/core/Makefile  | 2 +-
>>   drivers/cxl/core/suspend.c | 5 ++++-
>>   drivers/cxl/cxlmem.h       | 9 ---------
>>   include/linux/pm.h         | 7 -------
>>   5 files changed, 5 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
>> index cf1ba673b8c2..d09144c2002e 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -118,10 +118,6 @@ config CXL_PORT
>>   	default CXL_BUS
>>   	tristate
>>   
>> -config CXL_SUSPEND
>> -	def_bool y
>> -	depends on SUSPEND && CXL_MEM
>> -
>>   config CXL_REGION
>>   	bool "CXL: Region Support"
>>   	default CXL_BUS
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
>> index 086df97a0fcf..035864db8a32 100644
>> --- a/drivers/cxl/core/Makefile
>> +++ b/drivers/cxl/core/Makefile
>> @@ -1,6 +1,6 @@
>>   # SPDX-License-Identifier: GPL-2.0
>>   obj-$(CONFIG_CXL_BUS) += cxl_core.o
>> -obj-$(CONFIG_CXL_SUSPEND) += suspend.o
>> +obj-y += suspend.o
>>   
>>   ccflags-y += -I$(srctree)/drivers/cxl
>>   CFLAGS_trace.o = -DTRACE_INCLUDE_PATH=. -I$(src)
>> diff --git a/drivers/cxl/core/suspend.c b/drivers/cxl/core/suspend.c
>> index 29aa5cc5e565..5ba4b4de0e33 100644
>> --- a/drivers/cxl/core/suspend.c
>> +++ b/drivers/cxl/core/suspend.c
>> @@ -8,7 +8,10 @@ static atomic_t mem_active;
>>   
>>   bool cxl_mem_active(void)
>>   {
>> -	return atomic_read(&mem_active) != 0;
>> +	if (IS_ENABLED(CONFIG_CXL_MEM))
>> +		return atomic_read(&mem_active) != 0;
>> +
>> +	return false;
>>   }
>>   
>>   void cxl_mem_active_inc(void)
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 3ec6b906371b..1bd1e88c4cc0 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -853,17 +853,8 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>>   int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>>   int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>>   
>> -#ifdef CONFIG_CXL_SUSPEND
>>   void cxl_mem_active_inc(void);
>>   void cxl_mem_active_dec(void);
>> -#else
>> -static inline void cxl_mem_active_inc(void)
>> -{
>> -}
>> -static inline void cxl_mem_active_dec(void)
>> -{
>> -}
>> -#endif
>>   
>>   int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
>>   
>> diff --git a/include/linux/pm.h b/include/linux/pm.h
>> index f0bd8fbae4f2..415928e0b6ca 100644
>> --- a/include/linux/pm.h
>> +++ b/include/linux/pm.h
>> @@ -35,14 +35,7 @@ static inline void pm_vt_switch_unregister(struct device *dev)
>>   }
>>   #endif /* CONFIG_VT_CONSOLE_SLEEP */
>>   
>> -#ifdef CONFIG_CXL_SUSPEND
>>   bool cxl_mem_active(void);
>> -#else
>> -static inline bool cxl_mem_active(void)
>> -{
>> -	return false;
>> -}
>> -#endif
>>   
>>   /*
>>    * Device power management
>