Message ID | 20230525204422.4754-2-Avadhut.Naik@amd.com |
---|---|
State | New |
Headers | show |
Series | Add support for Vendor Defined Error Types in Einj Module | expand |
On 5/25/23 4:44 PM, Avadhut Naik wrote: > OSPM can discover the error injection capabilities of the platform by > executing GET_ERROR_TYPE error injection action.[1] The action returns > a DWORD representing a bitmap of platform supported error injections.[2] > > The available_error_type_show() function determines the bits set within > this DWORD and provides a verbose output, from einj_error_type_string > array, through /sys/kernel/debug/apei/einj/available_error_type file. > > The function however, assumes one to one correspondence between an error's > position in the bitmap and its array entry offset. Consequently, some > errors like Vendor Defined Error Type fail this assumption and will > incorrectly be shown as not supported, even if their corresponding bit is > set in the bitmap and they have an entry in the array. > > Navigate around the issue by converting einj_error_type_string into an > array of structures with a predetermined mask for all error types > corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE > action. The same breaks the aforementioned assumption resulting in all > supported error types by a platform being outputted through the above > available_error_type file. > > [1] ACPI specification 6.5, Table 18.25 > [2] ACPI specification 6.5, Table 18.30 > > Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> > Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> > --- > drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++-------------------- > 1 file changed, 22 insertions(+), 21 deletions(-) > > diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c > index 013eb621dc92..d5f8dc4df7a5 100644 > --- a/drivers/acpi/apei/einj.c > +++ b/drivers/acpi/apei/einj.c > @@ -577,25 +577,25 @@ static u64 error_param2; > static u64 error_param3; > static u64 error_param4; > static struct dentry *einj_debug_dir; > -static const char * const einj_error_type_string[] = { > - "0x00000001\tProcessor Correctable\n", > - "0x00000002\tProcessor Uncorrectable non-fatal\n", > - "0x00000004\tProcessor Uncorrectable fatal\n", > - "0x00000008\tMemory Correctable\n", > - "0x00000010\tMemory Uncorrectable non-fatal\n", > - "0x00000020\tMemory Uncorrectable fatal\n", > - "0x00000040\tPCI Express Correctable\n", > - "0x00000080\tPCI Express Uncorrectable non-fatal\n", > - "0x00000100\tPCI Express Uncorrectable fatal\n", > - "0x00000200\tPlatform Correctable\n", > - "0x00000400\tPlatform Uncorrectable non-fatal\n", > - "0x00000800\tPlatform Uncorrectable fatal\n", > - "0x00001000\tCXL.cache Protocol Correctable\n", > - "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", > - "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", > - "0x00008000\tCXL.mem Protocol Correctable\n", > - "0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n", > - "0x00020000\tCXL.mem Protocol Uncorrectable fatal\n", > +static struct { u32 mask; const char *str; } const einj_error_type_string[] = { > + {0x00000001, "Processor Correctable"}, > + {0x00000002, "Processor Uncorrectable non-fatal"}, > + {0x00000004, "Processor Uncorrectable fatal"}, > + {0x00000008, "Memory Correctable"}, > + {0x00000010, "Memory Uncorrectable non-fatal"}, > + {0x00000020, "Memory Uncorrectable fatal"}, > + {0x00000040, "PCI Express Correctable"}, > + {0x00000080, "PCI Express Uncorrectable non-fatal"}, > + {0x00000100, "PCI Express Uncorrectable fatal"}, > + {0x00000200, "Platform Correctable"}, > + {0x00000400, "Platform Uncorrectable non-fatal"}, > + {0x00000800, "Platform Uncorrectable fatal"}, > + {0x00001000, "CXL.cache Protocol Correctable"}, > + {0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"}, > + {0x00004000, "CXL.cache Protocol Uncorrectable fatal"}, > + {0x00008000, "CXL.mem Protocol Correctable"}, > + {0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"}, > + {0x00020000, "CXL.mem Protocol Uncorrectable fatal"}, > }; > I think it'd be easier to read if the masks used the BIT() macro rather than a hex value. Thanks, Yazen
On 8/6/23 00:20, Yazen Ghannam wrote: > On 5/25/23 4:44 PM, Avadhut Naik wrote: >> OSPM can discover the error injection capabilities of the platform by >> executing GET_ERROR_TYPE error injection action.[1] The action returns >> a DWORD representing a bitmap of platform supported error injections.[2] >> >> The available_error_type_show() function determines the bits set within >> this DWORD and provides a verbose output, from einj_error_type_string >> array, through /sys/kernel/debug/apei/einj/available_error_type file. >> >> The function however, assumes one to one correspondence between an error's >> position in the bitmap and its array entry offset. Consequently, some >> errors like Vendor Defined Error Type fail this assumption and will >> incorrectly be shown as not supported, even if their corresponding bit is >> set in the bitmap and they have an entry in the array. >> >> Navigate around the issue by converting einj_error_type_string into an >> array of structures with a predetermined mask for all error types >> corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE >> action. The same breaks the aforementioned assumption resulting in all >> supported error types by a platform being outputted through the above >> available_error_type file. >> >> [1] ACPI specification 6.5, Table 18.25 >> [2] ACPI specification 6.5, Table 18.30 >> >> Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> >> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> >> --- >> drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++-------------------- >> 1 file changed, 22 insertions(+), 21 deletions(-) >> >> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c >> index 013eb621dc92..d5f8dc4df7a5 100644 >> --- a/drivers/acpi/apei/einj.c >> +++ b/drivers/acpi/apei/einj.c >> @@ -577,25 +577,25 @@ static u64 error_param2; >> static u64 error_param3; >> static u64 error_param4; >> static struct dentry *einj_debug_dir; >> -static const char * const einj_error_type_string[] = { >> - "0x00000001\tProcessor Correctable\n", >> - "0x00000002\tProcessor Uncorrectable non-fatal\n", >> - "0x00000004\tProcessor Uncorrectable fatal\n", >> - "0x00000008\tMemory Correctable\n", >> - "0x00000010\tMemory Uncorrectable non-fatal\n", >> - "0x00000020\tMemory Uncorrectable fatal\n", >> - "0x00000040\tPCI Express Correctable\n", >> - "0x00000080\tPCI Express Uncorrectable non-fatal\n", >> - "0x00000100\tPCI Express Uncorrectable fatal\n", >> - "0x00000200\tPlatform Correctable\n", >> - "0x00000400\tPlatform Uncorrectable non-fatal\n", >> - "0x00000800\tPlatform Uncorrectable fatal\n", >> - "0x00001000\tCXL.cache Protocol Correctable\n", >> - "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", >> - "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", >> - "0x00008000\tCXL.mem Protocol Correctable\n", >> - "0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n", >> - "0x00020000\tCXL.mem Protocol Uncorrectable fatal\n", >> +static struct { u32 mask; const char *str; } const einj_error_type_string[] = { >> + {0x00000001, "Processor Correctable"}, >> + {0x00000002, "Processor Uncorrectable non-fatal"}, >> + {0x00000004, "Processor Uncorrectable fatal"}, >> + {0x00000008, "Memory Correctable"}, >> + {0x00000010, "Memory Uncorrectable non-fatal"}, >> + {0x00000020, "Memory Uncorrectable fatal"}, >> + {0x00000040, "PCI Express Correctable"}, >> + {0x00000080, "PCI Express Uncorrectable non-fatal"}, >> + {0x00000100, "PCI Express Uncorrectable fatal"}, >> + {0x00000200, "Platform Correctable"}, >> + {0x00000400, "Platform Uncorrectable non-fatal"}, >> + {0x00000800, "Platform Uncorrectable fatal"}, >> + {0x00001000, "CXL.cache Protocol Correctable"}, >> + {0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"}, >> + {0x00004000, "CXL.cache Protocol Uncorrectable fatal"}, >> + {0x00008000, "CXL.mem Protocol Correctable"}, >> + {0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"}, >> + {0x00020000, "CXL.mem Protocol Uncorrectable fatal"}, >> }; >> > > I think it'd be easier to read if the masks used the BIT() macro rather > than a hex value. Makes sense but I'd say because it is easier to match the APCI spec which uses the bit numbers, not easier to read (which is arguable). > > Thanks, > Yazen
Hi, Thanks for reviewing. On 6/7/2023 22:48, Alexey Kardashevskiy wrote: > > > On 8/6/23 00:20, Yazen Ghannam wrote: >> On 5/25/23 4:44 PM, Avadhut Naik wrote: >>> OSPM can discover the error injection capabilities of the platform by >>> executing GET_ERROR_TYPE error injection action.[1] The action returns >>> a DWORD representing a bitmap of platform supported error injections.[2] >>> >>> The available_error_type_show() function determines the bits set within >>> this DWORD and provides a verbose output, from einj_error_type_string >>> array, through /sys/kernel/debug/apei/einj/available_error_type file. >>> >>> The function however, assumes one to one correspondence between an error's >>> position in the bitmap and its array entry offset. Consequently, some >>> errors like Vendor Defined Error Type fail this assumption and will >>> incorrectly be shown as not supported, even if their corresponding bit is >>> set in the bitmap and they have an entry in the array. >>> >>> Navigate around the issue by converting einj_error_type_string into an >>> array of structures with a predetermined mask for all error types >>> corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE >>> action. The same breaks the aforementioned assumption resulting in all >>> supported error types by a platform being outputted through the above >>> available_error_type file. >>> >>> [1] ACPI specification 6.5, Table 18.25 >>> [2] ACPI specification 6.5, Table 18.30 >>> >>> Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> >>> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> >>> --- >>> drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++-------------------- >>> 1 file changed, 22 insertions(+), 21 deletions(-) >>> >>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c >>> index 013eb621dc92..d5f8dc4df7a5 100644 >>> --- a/drivers/acpi/apei/einj.c >>> +++ b/drivers/acpi/apei/einj.c >>> @@ -577,25 +577,25 @@ static u64 error_param2; >>> static u64 error_param3; >>> static u64 error_param4; >>> static struct dentry *einj_debug_dir; >>> -static const char * const einj_error_type_string[] = { >>> - "0x00000001\tProcessor Correctable\n", >>> - "0x00000002\tProcessor Uncorrectable non-fatal\n", >>> - "0x00000004\tProcessor Uncorrectable fatal\n", >>> - "0x00000008\tMemory Correctable\n", >>> - "0x00000010\tMemory Uncorrectable non-fatal\n", >>> - "0x00000020\tMemory Uncorrectable fatal\n", >>> - "0x00000040\tPCI Express Correctable\n", >>> - "0x00000080\tPCI Express Uncorrectable non-fatal\n", >>> - "0x00000100\tPCI Express Uncorrectable fatal\n", >>> - "0x00000200\tPlatform Correctable\n", >>> - "0x00000400\tPlatform Uncorrectable non-fatal\n", >>> - "0x00000800\tPlatform Uncorrectable fatal\n", >>> - "0x00001000\tCXL.cache Protocol Correctable\n", >>> - "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", >>> - "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", >>> - "0x00008000\tCXL.mem Protocol Correctable\n", >>> - "0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n", >>> - "0x00020000\tCXL.mem Protocol Uncorrectable fatal\n", >>> +static struct { u32 mask; const char *str; } const einj_error_type_string[] = { >>> + {0x00000001, "Processor Correctable"}, >>> + {0x00000002, "Processor Uncorrectable non-fatal"}, >>> + {0x00000004, "Processor Uncorrectable fatal"}, >>> + {0x00000008, "Memory Correctable"}, >>> + {0x00000010, "Memory Uncorrectable non-fatal"}, >>> + {0x00000020, "Memory Uncorrectable fatal"}, >>> + {0x00000040, "PCI Express Correctable"}, >>> + {0x00000080, "PCI Express Uncorrectable non-fatal"}, >>> + {0x00000100, "PCI Express Uncorrectable fatal"}, >>> + {0x00000200, "Platform Correctable"}, >>> + {0x00000400, "Platform Uncorrectable non-fatal"}, >>> + {0x00000800, "Platform Uncorrectable fatal"}, >>> + {0x00001000, "CXL.cache Protocol Correctable"}, >>> + {0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"}, >>> + {0x00004000, "CXL.cache Protocol Uncorrectable fatal"}, >>> + {0x00008000, "CXL.mem Protocol Correctable"}, >>> + {0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"}, >>> + {0x00020000, "CXL.mem Protocol Uncorrectable fatal"}, >>> }; >>> >> >> I think it'd be easier to read if the masks used the BIT() macro rather >> than a hex value. > > Makes sense but I'd say because it is easier to match the APCI spec which uses the bit numbers, not easier to read (which is arguable). > Agreed, will replace the hex values with BIT() macro. Thanks, Avadhut Naik > >> >> Thanks, >> Yazen > --
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index 013eb621dc92..d5f8dc4df7a5 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -577,25 +577,25 @@ static u64 error_param2; static u64 error_param3; static u64 error_param4; static struct dentry *einj_debug_dir; -static const char * const einj_error_type_string[] = { - "0x00000001\tProcessor Correctable\n", - "0x00000002\tProcessor Uncorrectable non-fatal\n", - "0x00000004\tProcessor Uncorrectable fatal\n", - "0x00000008\tMemory Correctable\n", - "0x00000010\tMemory Uncorrectable non-fatal\n", - "0x00000020\tMemory Uncorrectable fatal\n", - "0x00000040\tPCI Express Correctable\n", - "0x00000080\tPCI Express Uncorrectable non-fatal\n", - "0x00000100\tPCI Express Uncorrectable fatal\n", - "0x00000200\tPlatform Correctable\n", - "0x00000400\tPlatform Uncorrectable non-fatal\n", - "0x00000800\tPlatform Uncorrectable fatal\n", - "0x00001000\tCXL.cache Protocol Correctable\n", - "0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n", - "0x00004000\tCXL.cache Protocol Uncorrectable fatal\n", - "0x00008000\tCXL.mem Protocol Correctable\n", - "0x00010000\tCXL.mem Protocol Uncorrectable non-fatal\n", - "0x00020000\tCXL.mem Protocol Uncorrectable fatal\n", +static struct { u32 mask; const char *str; } const einj_error_type_string[] = { + {0x00000001, "Processor Correctable"}, + {0x00000002, "Processor Uncorrectable non-fatal"}, + {0x00000004, "Processor Uncorrectable fatal"}, + {0x00000008, "Memory Correctable"}, + {0x00000010, "Memory Uncorrectable non-fatal"}, + {0x00000020, "Memory Uncorrectable fatal"}, + {0x00000040, "PCI Express Correctable"}, + {0x00000080, "PCI Express Uncorrectable non-fatal"}, + {0x00000100, "PCI Express Uncorrectable fatal"}, + {0x00000200, "Platform Correctable"}, + {0x00000400, "Platform Uncorrectable non-fatal"}, + {0x00000800, "Platform Uncorrectable fatal"}, + {0x00001000, "CXL.cache Protocol Correctable"}, + {0x00002000, "CXL.cache Protocol Uncorrectable non-fatal"}, + {0x00004000, "CXL.cache Protocol Uncorrectable fatal"}, + {0x00008000, "CXL.mem Protocol Correctable"}, + {0x00010000, "CXL.mem Protocol Uncorrectable non-fatal"}, + {0x00020000, "CXL.mem Protocol Uncorrectable fatal"}, }; static int available_error_type_show(struct seq_file *m, void *v) @@ -607,8 +607,9 @@ static int available_error_type_show(struct seq_file *m, void *v) if (rc) return rc; for (int pos = 0; pos < ARRAY_SIZE(einj_error_type_string); pos++) - if (available_error_type & BIT(pos)) - seq_puts(m, einj_error_type_string[pos]); + if (available_error_type & einj_error_type_string[pos].mask) + seq_printf(m, "0x%08x\t%s\n", einj_error_type_string[pos].mask, + einj_error_type_string[pos].str); return 0; }
OSPM can discover the error injection capabilities of the platform by executing GET_ERROR_TYPE error injection action.[1] The action returns a DWORD representing a bitmap of platform supported error injections.[2] The available_error_type_show() function determines the bits set within this DWORD and provides a verbose output, from einj_error_type_string array, through /sys/kernel/debug/apei/einj/available_error_type file. The function however, assumes one to one correspondence between an error's position in the bitmap and its array entry offset. Consequently, some errors like Vendor Defined Error Type fail this assumption and will incorrectly be shown as not supported, even if their corresponding bit is set in the bitmap and they have an entry in the array. Navigate around the issue by converting einj_error_type_string into an array of structures with a predetermined mask for all error types corresponding to their bit position in the DWORD returned by GET_ERROR_TYPE action. The same breaks the aforementioned assumption resulting in all supported error types by a platform being outputted through the above available_error_type file. [1] ACPI specification 6.5, Table 18.25 [2] ACPI specification 6.5, Table 18.30 Suggested-by: Alexey Kardashevskiy <alexey.kardashevskiy@amd.com> Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> --- drivers/acpi/apei/einj.c | 43 ++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 21 deletions(-)