mbox series

[v5,0/5] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors

Message ID 20250114120427.149260-1-Smita.KoralahalliChannabasappa@amd.com
Headers show
Series acpi/ghes, cper, cxl: Process CXL CPER Protocol errors | expand

Message

Koralahalli Channabasappa, Smita Jan. 14, 2025, 12:04 p.m. UTC
This patchset adds logging support for CXL CPER endpoint and port protocol
errors.

The first 3 patches update the existing codebase to support CXL CPER
Protocol error reporting.

The last 2 patches introduce recognizing and reporting CXL CPER Protocol
errors.

Should be based on top of:
https://lore.kernel.org/linux-cxl/20250107143852.3692571-1-terry.bowman@amd.com

Link to v4:
https://lore.kernel.org/linux-cxl/20241217022032.87298-1-Smita.KoralahalliChannabasappa@amd.com

Changes in v4 -> v5:
[Dave]: Reviewed-by tags.
[Jonathan]: Remove blank line.
[Jonathan, Ira]: Change CXL -> "CXL".
[Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.

Changes in v3 -> v4:
[Ira]: Use memcpy() for RAS Cap struct.
[Jonathan]: Commit description edits.
[Jonathan]: Use separate work registration functions for protocol and
component errors.
[Jonathan, Ira]: Replace flags with separate functions for port and
device errors.
[Jonathan]: Use goto for register and unregister calls.

Changes in v2 -> v3:
[Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
reusing existing workqueue which handles CXL CPER events.
[Dan] Update function and struct names.
[Ira] Don't define common function get_cxl_devstate().
[Dan] Use switch cases rather than defining array of structures.
[Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
[Dan] Use pr_err_ratelimited().
[Dan] Use AER_ severities directly. Don't define CXL_ severities.
[Dan] Limit either to Device ID or Agent Info check.
[Dan] Validate size of RAS field matches expectations.

Changes in v2 -> v1:
[Jonathan] Refactor code for trace support. Rename get_cxl_dev()
to get_cxl_devstate().
[Jonathan] Cleanups for get_cxl_devstate().
[Alison, Jonathan]: Define array of structures for Device ID and Serial
number comparison.
[Dave] p_err -> rec/p_rec.
[Jonathan] Remove pr_warn.

Smita Koralahalli (5):
  efi/cper, cxl: Prefix protocol error struct and function names with
    cxl_
  efi/cper, cxl: Make definitions and structures global
  efi/cper, cxl: Remove cper_cxl.h
  acpi/ghes, cper: Recognize and cache CXL Protocol errors
  acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors

 drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
 drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
 drivers/cxl/cxlpci.h            |   9 +++
 drivers/cxl/pci.c               |  59 +++++++++++++++++-
 drivers/firmware/efi/cper.c     |   6 +-
 drivers/firmware/efi/cper_cxl.c |  39 +-----------
 drivers/firmware/efi/cper_cxl.h |  66 --------------------
 include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
 include/linux/cper.h            |   8 +++
 9 files changed, 347 insertions(+), 106 deletions(-)
 delete mode 100644 drivers/firmware/efi/cper_cxl.h

Comments

Dave Jiang Jan. 14, 2025, 3:42 p.m. UTC | #1
On 1/14/25 5:04 AM, Smita Koralahalli wrote:
> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
> CPER records. Introduce support for handling and logging CXL Protocol
> errors.
> 
> The defined trace events cxl_aer_uncorrectable_error and
> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
> trace native CXL AER port errors. Reuse both sets to trace FW-First
> protocol errors.
> 
> Since the CXL code is required to be called from process context and
> GHES is in interrupt context, use workqueues for processing.
> 
> Similar to CXL CPER event handling, use kfifo to handle errors as it
> simplifies queue processing by providing lock free fifo operations.
> 
> Add the ability for the CXL sub-system to register a workqueue to
> process CXL CPER protocol errors.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Just a small nit below, otherwise
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++
>  drivers/cxl/core/pci.c   | 62 ++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlpci.h     |  9 ++++++
>  drivers/cxl/pci.c        | 59 +++++++++++++++++++++++++++++++++++++-
>  include/cxl/event.h      | 15 ++++++++++
>  5 files changed, 193 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 4ab3c8ae1360..8dc693e9b2d0 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -676,6 +676,15 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
>  	schedule_work(&entry->work);
>  }
>  
> +/* Room for 8 entries */
> +#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
> +static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
> +		    CXL_CPER_PROT_ERR_FIFO_DEPTH);
> +
> +/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
> +static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
> +struct work_struct *cxl_cper_prot_err_work;
> +
>  static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>  				   int severity)
>  {
> @@ -702,6 +711,11 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>  	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
>  		pr_warn(FW_WARN "CXL CPER no device serial number\n");
>  
> +	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
> +
> +	if (!cxl_cper_prot_err_work)
> +		return;
> +
>  	switch (prot_err->agent_type) {
>  	case RCD:
>  	case DEVICE:
> @@ -723,9 +737,44 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
>  				   prot_err->agent_type);
>  		return;
>  	}
> +
> +	if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
> +		pr_err_ratelimited("CXL CPER kfifo overflow\n");
> +		return;
> +	}
> +
> +	schedule_work(cxl_cper_prot_err_work);
>  #endif
>  }
>  
> +int cxl_cper_register_prot_err_work(struct work_struct *work)
> +{
> +	if (cxl_cper_prot_err_work)
> +		return -EINVAL;
> +
> +	guard(spinlock)(&cxl_cper_prot_err_work_lock);
> +	cxl_cper_prot_err_work = work;
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
> +
> +int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> +{
> +	if (cxl_cper_prot_err_work != work)
> +		return -EINVAL;
> +
> +	guard(spinlock)(&cxl_cper_prot_err_work_lock);
> +	cxl_cper_prot_err_work = NULL;
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
> +
> +int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
> +{
> +	return kfifo_get(&cxl_cper_prot_err_fifo, wd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
> +
>  /* Room for 8 entries for each of the 4 event log queues */
>  #define CXL_CPER_FIFO_DEPTH 32
>  DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 6c10eab82eff..4ec060032ae8 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -650,6 +650,68 @@ void read_cdat_data(struct cxl_port *port)
>  }
>  EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
>  
> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev,
> +				  struct cxl_ras_capability_regs ras_cap)
> +{
> +	u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
> +	struct cxl_dev_state *cxlds;
> +
> +	cxlds = pci_get_drvdata(pdev);
> +	if (!cxlds)
> +		return;
> +
> +	trace_cxl_aer_correctable_error(cxlds->cxlmd, status);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_prot_err, "CXL");
> +
> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev,
> +				    struct cxl_ras_capability_regs ras_cap)
> +{
> +	u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> +	struct cxl_dev_state *cxlds;
> +	u32 fe;
> +
> +	if (hweight32(status) > 1)
> +		fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> +				   ras_cap.cap_control));
> +	else
> +		fe = status;
> +
> +	cxlds = pci_get_drvdata(pdev);
> +	if (!cxlds)
> +		return;

Maybe do this check first in the function so it doesn't waste cycles retrieving 'fe' earlier for the no driver case.

DJ


> +
> +	trace_cxl_aer_uncorrectable_error(cxlds->cxlmd, status, fe,
> +					  ras_cap.header_log);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_prot_err, "CXL");
> +
> +void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> +				       struct cxl_ras_capability_regs ras_cap)
> +{
> +	u32 status = ras_cap.cor_status & ~ras_cap.cor_mask;
> +
> +	trace_cxl_port_aer_correctable_error(&pdev->dev, status);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_corr_port_prot_err, "CXL");
> +
> +void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
> +					 struct cxl_ras_capability_regs ras_cap)
> +{
> +	u32 status = ras_cap.uncor_status & ~ras_cap.uncor_mask;
> +	u32 fe;
> +
> +	if (hweight32(status) > 1)
> +		fe = BIT(FIELD_GET(CXL_RAS_CAP_CONTROL_FE_MASK,
> +				   ras_cap.cap_control));
> +	else
> +		fe = status;
> +
> +	trace_cxl_port_aer_uncorrectable_error(&pdev->dev, status, fe,
> +					       ras_cap.header_log);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_trace_uncorr_port_prot_err, "CXL");
> +
>  static void __cxl_handle_cor_ras(struct device *dev,
>  				 void __iomem *ras_base)
>  {
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 4da07727ab9c..23f2b1c9bd13 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -129,4 +129,13 @@ void read_cdat_data(struct cxl_port *port);
>  void cxl_cor_error_detected(struct pci_dev *pdev);
>  pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
>  				    pci_channel_state_t state);
> +struct cxl_ras_capability_regs;
> +void cxl_cper_trace_corr_prot_err(struct pci_dev *pdev,
> +				  struct cxl_ras_capability_regs ras_cap);
> +void cxl_cper_trace_uncorr_prot_err(struct pci_dev *pdev,
> +				    struct cxl_ras_capability_regs ras_cap);
> +void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> +				       struct cxl_ras_capability_regs ras_cap);
> +void cxl_cper_trace_uncorr_port_prot_err(struct pci_dev *pdev,
> +					 struct cxl_ras_capability_regs ras_cap);
>  #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 6d94ff4a4f1a..766447c169c8 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1160,6 +1160,50 @@ static void cxl_cper_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn);
>  
> +static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
> +{
> +	unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
> +				       data->prot_err.agent_addr.function);
> +	struct pci_dev *pdev __free(pci_dev_put) =
> +		pci_get_domain_bus_and_slot(data->prot_err.agent_addr.segment,
> +					    data->prot_err.agent_addr.bus,
> +					    devfn);
> +	int port_type;
> +
> +	if (!pdev)
> +		return;
> +
> +	guard(device)(&pdev->dev);
> +	if (pdev->driver != &cxl_pci_driver)
> +		return;
> +
> +	port_type = pci_pcie_type(pdev);
> +	if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
> +	    port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    port_type == PCI_EXP_TYPE_UPSTREAM) {
> +		if (data->severity == AER_CORRECTABLE)
> +			cxl_cper_trace_corr_port_prot_err(pdev, data->ras_cap);
> +		else
> +			cxl_cper_trace_uncorr_port_prot_err(pdev, data->ras_cap);
> +
> +		return;
> +	}
> +
> +	if (data->severity == AER_CORRECTABLE)
> +		cxl_cper_trace_corr_prot_err(pdev, data->ras_cap);
> +	else
> +		cxl_cper_trace_uncorr_prot_err(pdev, data->ras_cap);
> +}
> +
> +static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> +{
> +	struct cxl_cper_prot_err_work_data wd;
> +
> +	while (cxl_cper_prot_err_kfifo_get(&wd))
> +		cxl_cper_handle_prot_err(&wd);
> +}
> +static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
> +
>  static int __init cxl_pci_driver_init(void)
>  {
>  	int rc;
> @@ -1170,7 +1214,18 @@ static int __init cxl_pci_driver_init(void)
>  
>  	rc = cxl_cper_register_work(&cxl_cper_work);
>  	if (rc)
> -		pci_unregister_driver(&cxl_pci_driver);
> +		goto err_unreg;
> +
> +	rc = cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> +	if (rc)
> +		goto err_unregister_work;
> +
> +	return 0;
> +
> +err_unregister_work:
> +	cxl_cper_unregister_work(&cxl_cper_work);
> +err_unreg:
> +	pci_unregister_driver(&cxl_pci_driver);
>  
>  	return rc;
>  }
> @@ -1178,7 +1233,9 @@ static int __init cxl_pci_driver_init(void)
>  static void __exit cxl_pci_driver_exit(void)
>  {
>  	cxl_cper_unregister_work(&cxl_cper_work);
> +	cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>  	cancel_work_sync(&cxl_cper_work);
> +	cancel_work_sync(&cxl_cper_prot_err_work);
>  	pci_unregister_driver(&cxl_pci_driver);
>  }
>  
> diff --git a/include/cxl/event.h b/include/cxl/event.h
> index ee1c3dec62fa..359a8f44a2e0 100644
> --- a/include/cxl/event.h
> +++ b/include/cxl/event.h
> @@ -242,6 +242,9 @@ struct cxl_cper_prot_err_work_data {
>  int cxl_cper_register_work(struct work_struct *work);
>  int cxl_cper_unregister_work(struct work_struct *work);
>  int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
> +int cxl_cper_register_prot_err_work(struct work_struct *work);
> +int cxl_cper_unregister_prot_err_work(struct work_struct *work);
> +int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd);
>  #else
>  static inline int cxl_cper_register_work(struct work_struct *work)
>  {
> @@ -256,6 +259,18 @@ static inline int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
>  {
>  	return 0;
>  }
> +static inline int cxl_cper_register_prot_err_work(struct work_struct *work)
> +{
> +	return 0;
> +}
> +static inline int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> +{
> +	return 0;
> +}
> +static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
> +{
> +	return 0;
> +}
>  #endif
>  
>  #endif /* _LINUX_CXL_EVENT_H */
Jonathan Cameron Jan. 15, 2025, 1:29 p.m. UTC | #2
On Tue, 14 Jan 2025 12:04:27 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> When PCIe AER is in FW-First, OS should process CXL Protocol errors from
> CPER records. Introduce support for handling and logging CXL Protocol
> errors.
> 
> The defined trace events cxl_aer_uncorrectable_error and
> cxl_aer_correctable_error trace native CXL AER endpoint errors, while
> cxl_cper_trace_corr_prot_err and cxl_cper_trace_uncorr_prot_err
> trace native CXL AER port errors. Reuse both sets to trace FW-First
> protocol errors.
> 
> Since the CXL code is required to be called from process context and
> GHES is in interrupt context, use workqueues for processing.
> 
> Similar to CXL CPER event handling, use kfifo to handle errors as it
> simplifies queue processing by providing lock free fifo operations.
> 
> Add the ability for the CXL sub-system to register a workqueue to
> process CXL CPER protocol errors.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Dave Jiang Jan. 21, 2025, 8:32 p.m. UTC | #3
On 1/14/25 5:04 AM, Smita Koralahalli wrote:
> This patchset adds logging support for CXL CPER endpoint and port protocol
> errors.

Ard,
Do you mind giving an ACK for the series if it looks ok to you? I can pick up the series through the CXL tree with your ACKs. Thanks!

Smita,
Can you please respin a v6 based on top of latest upstream instead of Terry's changes? I think we are ready to pick this up before Terry's series. Thank you. 

> 
> The first 3 patches update the existing codebase to support CXL CPER
> Protocol error reporting.
> 
> The last 2 patches introduce recognizing and reporting CXL CPER Protocol
> errors.
> 
> Should be based on top of:
> https://lore.kernel.org/linux-cxl/20250107143852.3692571-1-terry.bowman@amd.com
> 
> Link to v4:
> https://lore.kernel.org/linux-cxl/20241217022032.87298-1-Smita.KoralahalliChannabasappa@amd.com
> 
> Changes in v4 -> v5:
> [Dave]: Reviewed-by tags.
> [Jonathan]: Remove blank line.
> [Jonathan, Ira]: Change CXL -> "CXL".
> [Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.
> 
> Changes in v3 -> v4:
> [Ira]: Use memcpy() for RAS Cap struct.
> [Jonathan]: Commit description edits.
> [Jonathan]: Use separate work registration functions for protocol and
> component errors.
> [Jonathan, Ira]: Replace flags with separate functions for port and
> device errors.
> [Jonathan]: Use goto for register and unregister calls.
> 
> Changes in v2 -> v3:
> [Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
> reusing existing workqueue which handles CXL CPER events.
> [Dan] Update function and struct names.
> [Ira] Don't define common function get_cxl_devstate().
> [Dan] Use switch cases rather than defining array of structures.
> [Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
> [Dan] Use pr_err_ratelimited().
> [Dan] Use AER_ severities directly. Don't define CXL_ severities.
> [Dan] Limit either to Device ID or Agent Info check.
> [Dan] Validate size of RAS field matches expectations.
> 
> Changes in v2 -> v1:
> [Jonathan] Refactor code for trace support. Rename get_cxl_dev()
> to get_cxl_devstate().
> [Jonathan] Cleanups for get_cxl_devstate().
> [Alison, Jonathan]: Define array of structures for Device ID and Serial
> number comparison.
> [Dave] p_err -> rec/p_rec.
> [Jonathan] Remove pr_warn.
> 
> Smita Koralahalli (5):
>   efi/cper, cxl: Prefix protocol error struct and function names with
>     cxl_
>   efi/cper, cxl: Make definitions and structures global
>   efi/cper, cxl: Remove cper_cxl.h
>   acpi/ghes, cper: Recognize and cache CXL Protocol errors
>   acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
> 
>  drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
>  drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
>  drivers/cxl/cxlpci.h            |   9 +++
>  drivers/cxl/pci.c               |  59 +++++++++++++++++-
>  drivers/firmware/efi/cper.c     |   6 +-
>  drivers/firmware/efi/cper_cxl.c |  39 +-----------
>  drivers/firmware/efi/cper_cxl.h |  66 --------------------
>  include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
>  include/linux/cper.h            |   8 +++
>  9 files changed, 347 insertions(+), 106 deletions(-)
>  delete mode 100644 drivers/firmware/efi/cper_cxl.h
>
Koralahalli Channabasappa, Smita Jan. 22, 2025, 7:35 p.m. UTC | #4
Hi Dave,

On 1/21/2025 12:32 PM, Dave Jiang wrote:
> 
> 
> On 1/14/25 5:04 AM, Smita Koralahalli wrote:
>> This patchset adds logging support for CXL CPER endpoint and port protocol
>> errors.
> 
> Ard,
> Do you mind giving an ACK for the series if it looks ok to you? I can pick up the series through the CXL tree with your ACKs. Thanks!
> 
> Smita,
> Can you please respin a v6 based on top of latest upstream instead of Terry's changes? I think we are ready to pick this up before Terry's series. Thank you.

Sure. These patches depend on trace logging functions written by Terry 
trace_cxl_port_aer_uncorrectable_error() and 
trace_cxl_port_aer_correctable_error(). Let me work with him to just 
merge the trace logging functions and I will post v6 on latest upstream.

Thanks
Smita
> 
>>
>> The first 3 patches update the existing codebase to support CXL CPER
>> Protocol error reporting.
>>
>> The last 2 patches introduce recognizing and reporting CXL CPER Protocol
>> errors.
>>
>> Should be based on top of:
>> https://lore.kernel.org/linux-cxl/20250107143852.3692571-1-terry.bowman@amd.com
>>
>> Link to v4:
>> https://lore.kernel.org/linux-cxl/20241217022032.87298-1-Smita.KoralahalliChannabasappa@amd.com
>>
>> Changes in v4 -> v5:
>> [Dave]: Reviewed-by tags.
>> [Jonathan]: Remove blank line.
>> [Jonathan, Ira]: Change CXL -> "CXL".
>> [Ira]: Fix build error for CONFIG_ACPI_APEI_PCIEAER.
>>
>> Changes in v3 -> v4:
>> [Ira]: Use memcpy() for RAS Cap struct.
>> [Jonathan]: Commit description edits.
>> [Jonathan]: Use separate work registration functions for protocol and
>> component errors.
>> [Jonathan, Ira]: Replace flags with separate functions for port and
>> device errors.
>> [Jonathan]: Use goto for register and unregister calls.
>>
>> Changes in v2 -> v3:
>> [Dan]: Define a new workqueue for CXL CPER Protocol errors and avoid
>> reusing existing workqueue which handles CXL CPER events.
>> [Dan] Update function and struct names.
>> [Ira] Don't define common function get_cxl_devstate().
>> [Dan] Use switch cases rather than defining array of structures.
>> [Dan] Pass the entire cxl_cper_prot_err struct for CXL subsystem.
>> [Dan] Use pr_err_ratelimited().
>> [Dan] Use AER_ severities directly. Don't define CXL_ severities.
>> [Dan] Limit either to Device ID or Agent Info check.
>> [Dan] Validate size of RAS field matches expectations.
>>
>> Changes in v2 -> v1:
>> [Jonathan] Refactor code for trace support. Rename get_cxl_dev()
>> to get_cxl_devstate().
>> [Jonathan] Cleanups for get_cxl_devstate().
>> [Alison, Jonathan]: Define array of structures for Device ID and Serial
>> number comparison.
>> [Dave] p_err -> rec/p_rec.
>> [Jonathan] Remove pr_warn.
>>
>> Smita Koralahalli (5):
>>    efi/cper, cxl: Prefix protocol error struct and function names with
>>      cxl_
>>    efi/cper, cxl: Make definitions and structures global
>>    efi/cper, cxl: Remove cper_cxl.h
>>    acpi/ghes, cper: Recognize and cache CXL Protocol errors
>>    acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
>>
>>   drivers/acpi/apei/ghes.c        | 103 ++++++++++++++++++++++++++++++++
>>   drivers/cxl/core/pci.c          |  62 +++++++++++++++++++
>>   drivers/cxl/cxlpci.h            |   9 +++
>>   drivers/cxl/pci.c               |  59 +++++++++++++++++-
>>   drivers/firmware/efi/cper.c     |   6 +-
>>   drivers/firmware/efi/cper_cxl.c |  39 +-----------
>>   drivers/firmware/efi/cper_cxl.h |  66 --------------------
>>   include/cxl/event.h             | 101 +++++++++++++++++++++++++++++++
>>   include/linux/cper.h            |   8 +++
>>   9 files changed, 347 insertions(+), 106 deletions(-)
>>   delete mode 100644 drivers/firmware/efi/cper_cxl.h
>>
>