Message ID | 20240527144356.246220-1-fabio.m.de.francesco@linux.intel.com |
---|---|
Headers | show |
Series | Make ELOG log and trace consistently with GHES | expand |
On Monday, May 27, 2024 4:43:39 PM GMT+2 Fabio M. De Francesco wrote: > When Firmware First is enabled, BIOS handles errors first and then it > makes them available to the kernel via the Common Platform Error Record > (CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections > via one of two similar paths, either ELOG or GHES. > > Currently, ELOG and GHES show some inconsistencies in how they print to > the kernel log as well as in how they report to userspace via trace > events. > > Make the two mentioned paths act similarly for what relates to logging > and tracing. Gentle ping. Thanks, Fabio > --- Changes for v1 --- > > - Drop the RFC prefix and restart from PATCH v1 > - Drop patch 3/3 because a discussion on it has not yet been > settled > - Drop namespacing in export of pci_print_aer while() (Dan) > - Don't use '#ifdef' in *.c files (Dan) > - Drop a reference on pdev after operation is complete (Dan) > - Don't log an error message if pdev is NULL (Dan) > > --- Changes for RFC v2 --- > > - 0/3: rework the subject line and the letter. > - 1/3: no changes. > - 2/3: trace CPER PCIe Section only if CONFIG_ACPI_APEI_PCIEAER > is defined; the kernel test robot reported the use of two > undefined symbols because the test for the config option was > missing; rewrite the subject line and part of commit message. > - 3/3: no changes. > > Fabio M. De Francesco (2): > ACPI: extlog: Trace CPER Non-standard Section Body > ACPI: extlog: Trace CPER PCI Express Error Section > > drivers/acpi/acpi_extlog.c | 35 +++++++++++++++++++++++++++++++++++ > drivers/pci/pcie/aer.c | 2 +- > include/linux/aer.h | 9 ++++++--- > 3 files changed, 42 insertions(+), 4 deletions(-) > >
Fabio M. De Francesco wrote: > In extlog_print(), trace "Non-standard Section Body" reported by firmware > to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N > 2.3) to add further debug information and so to make ELOG log > consistently with ghes_do_proc() (GHES). I think this description could be clearer, how about: --- ghes_do_proc() has a catch-all for unknown or unhandled CPER formats (UEFI v2.10 Appendix N 2.3), extlog_print() does not. This gap was noticed by a RAS test that injected CXL protocol errors which were notified to extlog_print() via the IOMCA (I/O Machine Check Architecture) mechanism. Bring parity to the extlog_print() path by including a similar trace_non_standard_event(). --- > > Cc: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> > --- > drivers/acpi/acpi_extlog.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c > index f055609d4b64..e025ae390737 100644 > --- a/drivers/acpi/acpi_extlog.c > +++ b/drivers/acpi/acpi_extlog.c > @@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val, > if (gdata->error_data_length >= sizeof(*mem)) > trace_extlog_mem_event(mem, err_seq, fru_id, fru_text, > (u8)gdata->error_severity); > + } else { > + void *err = acpi_hest_get_payload(gdata); > + > + trace_non_standard_event(sec_type, fru_id, fru_text, > + gdata->error_severity, err, > + gdata->error_data_length); > } ...with the above changelog update the code change looks good to me, you can add: Reviewed-by: Dan Williams <dan.j.williams@intel.com>