Message ID | 20230317073310.4237-1-xueshuai@linux.alibaba.com |
---|---|
State | Superseded |
Headers | show |
Series | ACPI: APEI: EINJ: warn on invalid argument when explicitly indicated by platform | expand |
- if (val != EINJ_STATUS_SUCCESS) + if (val == EINJ_STATUS_FAIL) return -EBUSY; + else if (val == EINJ_STATUS_INVAL) + return -EINVAL; The ACPI Specification is really vague here. Documented error codes are 0 = Success (Linux #define EINJ_STATUS_SUCCESS) 1 = Unknown failure (Linux #define EINJ_STATUS_FAIL) 2 = Invalid Access (Linux #define EINJ_STATUS_INVAL) I don't see how reporting -EBUSY for the "Unknown Failure" case is actually better. -Tony
On 2023/3/18 AM5:24, Luck, Tony wrote: > - if (val != EINJ_STATUS_SUCCESS) > + if (val == EINJ_STATUS_FAIL) > return -EBUSY; > + else if (val == EINJ_STATUS_INVAL) > + return -EINVAL; > > The ACPI Specification is really vague here. Documented error codes are > > 0 = Success (Linux #define EINJ_STATUS_SUCCESS) > 1 = Unknown failure (Linux #define EINJ_STATUS_FAIL) > 2 = Invalid Access (Linux #define EINJ_STATUS_INVAL) Absolutely right. > > I don't see how reporting -EBUSY for the "Unknown Failure" case is > actually better. Tony, did you misunderstand this patch? The original code report -EBUSY for both "Unknown Failure" and "Invalid Access" cases. This patch intends to report -EINVAL for "Invalid Access" case and keeps reporting -EBUSY for "Unknown Failure" case unchanged. Although -EBUSY for "Unknown Failure" case is not a good choice. Will -EIO for "Unknown failure" case be better? By the way, do you think -EIO for time out case is suitable. for (;;) { rc = apei_exec_run(&ctx, ACPI_EINJ_CHECK_BUSY_STATUS); if (rc) return rc; val = apei_exec_ctx_get_output(&ctx); if (!(val & EINJ_OP_BUSY)) break; if (einj_timedout(&timeout)) return -EIO; For example, the OSPM will may warn: Firmware does not respond in time. And a message is printed on the console: echo: write error: Input/output error Will -EBUSY or -ETIME for timeout be better? > > -Tony Thank you for comments. Best Regards. Shuai
>> I don't see how reporting -EBUSY for the "Unknown Failure" case is >> actually better. > > Tony, did you misunderstand this patch? > > The original code report -EBUSY for both "Unknown Failure" and > "Invalid Access" cases. I mixed up what was already in the kernel with what the patch was changing. > This patch intends to report -EINVAL for "Invalid Access" case > and keeps reporting -EBUSY for "Unknown Failure" case unchanged. > Although -EBUSY for "Unknown Failure" case is not a good choice. > Will -EIO for "Unknown failure" case be better? Is this for some real use case? Do you have a BIOS EINJ implementation that is returning these different codes? What will the user do differently if they see these different error strings? # echo 1 > error_inject ... different error messages here ... -Tony
On 2023/3/21 AM12:32, Luck, Tony wrote: >>> I don't see how reporting -EBUSY for the "Unknown Failure" case is >>> actually better. >> >> Tony, did you misunderstand this patch? >> >> The original code report -EBUSY for both "Unknown Failure" and >> "Invalid Access" cases. > > I mixed up what was already in the kernel with what the patch was changing. > >> This patch intends to report -EINVAL for "Invalid Access" case >> and keeps reporting -EBUSY for "Unknown Failure" case unchanged. >> Although -EBUSY for "Unknown Failure" case is not a good choice. >> Will -EIO for "Unknown failure" case be better? > > Is this for some real use case? > > Do you have a BIOS EINJ implementation that is returning these different codes? Yes, our BIOS tester complains that EINJ always reports EBUSY, and he has no idea about it. It can not help him determine whether it is a BIOS bug or an injection operation error. > What will the user do differently if they see these different error strings? > > # echo 1 > error_inject > ... different error messages here ... For example, with original code: # select a invalid core or device to inject # echo 1 > error_inject echo: write error: Device or resource busy When tester sees that, he will submit a bug to BIOS developer. Actually, firmware will do some platform dependent sanity checks and returns different error codes. In this case, user injects to a invalid device, platform returns "Invalid Access". And user is expected to see: # select a invalid core or device to inject # echo 1 > error_inject echo: write error: Invalid argument Then user is expected to check his injection argument first. > > -Tony > Best Regards, Shuai
> Actually, firmware will do some platform dependent sanity checks and returns > different error codes. In this case, user injects to a invalid device, platform > returns "Invalid Access". And user is expected to see: > > # select a invalid core or device to inject > # echo 1 > error_inject > echo: write error: Invalid argument > > Then user is expected to check his injection argument first. Thanks. This makes sense. You want EINVAL when the user chose bad arguments, and some other code for problem in BIOS. If the BIOS has an issue, is it possible, or likely, that it is a temporary problem? If so, EBUSY may be OK. The message " Device or resource busy" might encourage the user to wait and try again. If it is not going to get better by itself, then one of: #define EIO 5 /* I/O error */ #define ENXIO 6 /* No such device or address */ might be a better choice. -Tony
On 2023/3/22 AM12:09, Luck, Tony wrote: >> Actually, firmware will do some platform dependent sanity checks and returns >> different error codes. In this case, user injects to a invalid device, platform >> returns "Invalid Access". And user is expected to see: >> >> # select a invalid core or device to inject >> # echo 1 > error_inject >> echo: write error: Invalid argument >> >> Then user is expected to check his injection argument first. > > Thanks. This makes sense. You want EINVAL when the user chose > bad arguments, and some other code for problem in BIOS. Yes, exactly. > > If the BIOS has an issue, is it possible, or likely, that it is a temporary > problem? If so, EBUSY may be OK. The message " Device or resource busy" > might encourage the user to wait and try again. > > If it is not going to get better by itself, then one of: > > #define EIO 5 /* I/O error */ > #define ENXIO 6 /* No such device or address */ > > might be a better choice. Yes, BIOS may temporarily not complete error injection (ACPI_EINJ_EXECUTE_OPERATION) on time, in which case, kernel return EIO. for (;;) { rc = apei_exec_run(&ctx, ACPI_EINJ_CHECK_BUSY_STATUS); if (rc) return rc; val = apei_exec_ctx_get_output(&ctx); if (!(val & EINJ_OP_BUSY)) break; if (einj_timedout(&timeout)) return -EIO; In summary, you are asking that: - report -EINVAL for "Invalid Access" case - keeps reporting -EBUSY for "Unknown Failure" case unchanged. - and keeps EIO for temporarily time out unchanged. right? (This patch is doing so) > > -Tony Thanks. Best Regards, Shuai
> Fix to return -EINVAL in the __einj_error_inject() error handling case > instead of -EBUSY, when explicitly indicated by the platform in the status > of the completed operation. Needs a bit longer description on the use case based on follow up discussion. Key information is the EINVAL is an indicator to the user that the parameters they supplied cannot be used for injection. But for the code: Reviewed-by: Tony Luck <tony.luck@intel.com> -Tony
On Wed, Mar 22, 2023 at 5:13 PM Luck, Tony <tony.luck@intel.com> wrote: > > > Fix to return -EINVAL in the __einj_error_inject() error handling case > > instead of -EBUSY, when explicitly indicated by the platform in the status > > of the completed operation. > > Needs a bit longer description on the use case based on follow up discussion. > Key information is the EINVAL is an indicator to the user that the parameters they > supplied cannot be used for injection. Right. So Shuai, please resend the patch with a more elaborate changelog. > But for the code: > > Reviewed-by: Tony Luck <tony.luck@intel.com> And add the above tag to it when resending.
On 2023/3/23 AM12:07, Luck, Tony wrote: >> Fix to return -EINVAL in the __einj_error_inject() error handling case >> instead of -EBUSY, when explicitly indicated by the platform in the status >> of the completed operation. > > Needs a bit longer description on the use case based on follow up discussion. > Key information is the EINVAL is an indicator to the user that the parameters they > supplied cannot be used for injection. > > But for the code: > > Reviewed-by: Tony Luck <tony.luck@intel.com> > > -Tony Got, I will rephrase the commit log based on follow up discussion. Thank you for reviews. Cheers, Shuai
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index b4373e575660..fa0b4320312e 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -489,9 +489,15 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, if (rc) return rc; val = apei_exec_ctx_get_output(&ctx); - if (val != EINJ_STATUS_SUCCESS) + if (val == EINJ_STATUS_FAIL) return -EBUSY; + else if (val == EINJ_STATUS_INVAL) + return -EINVAL; + /* + * The error is injected into the platform successfully, then it needs + * to trigger the error. + */ rc = apei_exec_run(&ctx, ACPI_EINJ_GET_TRIGGER_TABLE); if (rc) return rc;
Fix to return -EINVAL in the __einj_error_inject() error handling case instead of -EBUSY, when explicitly indicated by the platform in the status of the completed operation. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> --- drivers/acpi/apei/einj.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)