Message ID | 20240228-cxl-cper3-v1-0-6aa3f1343c6c@intel.com |
---|---|
Headers | show |
Series | efi/cxl-cper: Report CXL CPER events through tracing | expand |
Ira Weiny wrote: > Additional event testing using the cxl-event.h header revealed that it > did not include the correct headers for the types used. Compile errors > such as: > > include/linux/cxl-event.h|11 col 9| error: unknown type name ‘u8’ > > ... were seen. Were seen where? Should this have the trio of Reported-by: Closes: and Fixes tags?
Ira Weiny wrote: > If the firmware has configured CXL event support to be firmware first > the OS can process those events through CPER records. The CXL layer has > unique DPA to HPA knowledge and standard event trace parsing in place. > > CPER records contain Bus, Device, Function information which can be used > to identify the PCI device which is sending the event. > > Add a CXL CPER callback to process events through the CXL trace > subsystem. > > Future patches will provide additional region information such as HPA. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes: > [iweiny: Add back in after the revert in 6.8] > --- > drivers/cxl/pci.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 68 insertions(+), 1 deletion(-) > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c > index 2ff361e756d6..6cf8336d1b33 100644 > --- a/drivers/cxl/pci.c > +++ b/drivers/cxl/pci.c > @@ -974,6 +974,73 @@ static struct pci_driver cxl_pci_driver = { > }, > }; > > -module_pci_driver(cxl_pci_driver); > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0) > +static void cxl_cper_event_call(enum cxl_event_type ev_type, > + struct cxl_cper_event_rec *rec) > +{ > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id; > + struct pci_dev *pdev __free(pci_dev_put) = NULL; > + enum cxl_event_log_type log_type; > + struct cxl_dev_state *cxlds; > + unsigned int devfn; > + u32 hdr_flags; > + > + pr_debug("CPER event for device %u:%u:%u.%u\n", > + device_id->segment_num, device_id->bus_num, > + device_id->device_num, device_id->func_num); > + > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num); > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num, > + device_id->bus_num, devfn); > + if (!pdev) { > + pr_err("CPER event device %u:%u:%u.%u not found\n", > + device_id->segment_num, device_id->bus_num, > + device_id->device_num, device_id->func_num); > + return; > + } > + > + dev_dbg(&pdev->dev, "Found device %u:%u.%u\n", device_id->bus_num, > + device_id->device_num, device_id->func_num); These print statements are excessive. The dev_dbg() already encodes the device BDF into the device name. The pr_err() is not actionable and somewhat redundant with the default cper_estatus_print_section() print. I would just delete all of them.
Dan Williams wrote: > Ira Weiny wrote: > > Additional event testing using the cxl-event.h header revealed that it > > did not include the correct headers for the types used. Compile errors > > such as: > > > > include/linux/cxl-event.h|11 col 9| error: unknown type name ‘u8’ > > > > ... were seen. > > Were seen where? Should this have the trio of Reported-by: Closes: and > Fixes tags? As I said after this in the commit message: "Omit the fixes tag because this does not cause any issues until the header is used again in other code." I'll clarify it was seen when I used cxl-event.h in the testing code and this happened. Ira
CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a mostly CXL event payload in an EFI Common Platform Error Record (CPER) record. If a device is configured for firmware first CXL event records are not sent directly to the host. The CXL sub-system uniquely has DPA to HPA translation information. It also already has event decoding/tracing. Such translations are very useful for users to determine which system issues may correspond to specific hardware events. The restructuring of the event data structures in 6.8 made sharing the data between CPER/event logs more efficient. Now re-wire the sending of CPER records to the CXL sub-system. In addition provide a default RAS event should the CXL module not be loaded [ie callback not registered]. Series status/background ======================== Smita and Jonathan have been a great help with this series. Once again thank you. Unfortunately, with all the churn surrounding the bug which Dan Carpenter found the maintainers were force to revert this work. Therefore, this is a whole new series based on what is in 6.8. Testing ======= I've hacked up a quick debugfs patch to facilitate easier testing.[1] With this I have verified that the bug Dan Carpenter found is fixed. However, the tp_printk bug Jonathan found remains. The taking of the device lock in the callback is required and the tp_printk issue is unlikely to be fixed. Fortunately, tp_printk is not widely used so it is anticipated this will not be an issue. No other locking issues were found with this test and locking debug turned on. [1] https://github.com/weiny2/linux-kernel/commit/6c540a23cb1194d67a9dcfefb702774a99afc3b1 Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- Ira Weiny (4): cxl/event: Add missing include files acpi/ghes: Process CXL Component Events cxl/pci: Register for and process CPER events ras/events: Trace CXL CPER events even without the CXL stack loaded drivers/acpi/apei/ghes.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/cxl/pci.c | 69 +++++++++++++++++++++++- include/linux/cxl-event.h | 21 ++++++++ include/ras/ras_event.h | 90 ++++++++++++++++++++++++++++++++ 4 files changed, 309 insertions(+), 1 deletion(-) --- base-commit: daeacfa75d08954e1a5b71c36a8fbfcdd0b3fec9 change-id: 20240220-cxl-cper3-30e55279f936 Best regards,