Message ID | 20250117022957.25227-1-tanxiaofei@huawei.com |
---|---|
State | New |
Headers | show |
Series | [v3] acpi: Fix HED module initialization order when it is built-in | expand |
On Fri, 17 Jan 2025 10:29:57 +0800 Xiaofei Tan <tanxiaofei@huawei.com> wrote: > When the module HED is built-in, the module HED init is behind EVGED > as the driver are in the same initcall level, then the order is determined > by Makefile order. That order violates expectations. Because RAS records > can't be handled in the special time window that EVGED has initialized > while HED not. > > If the number of such RAS records is more than the APEI HEST error source > number, the HEST resources could be occupied all, and then could affect > subsequent RAS error reporting. > > Change the initcall level of HED to subsys_init to fix the issue. If build > HED as a module, the problem remains. To solve this problem completely, > change the ACPI_HED from tristate to bool. > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Given the change in approach (even though I reviewed this internally) should probably have dropped my RB. Anyhow, consider this me giving it again on list. Thanks, Jonathan > Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> > --- > drivers/acpi/Kconfig | 2 +- > drivers/acpi/hed.c | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index d81b55f5068c..7f10aa38269d 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -452,7 +452,7 @@ config ACPI_SBS > the modules will be called sbs and sbshc. > > config ACPI_HED > - tristate "Hardware Error Device" > + bool "Hardware Error Device" > help > This driver supports the Hardware Error Device (PNP0C33), > which is used to report some hardware errors notified via > diff --git a/drivers/acpi/hed.c b/drivers/acpi/hed.c > index 7652515a6be1..677dfcce2990 100644 > --- a/drivers/acpi/hed.c > +++ b/drivers/acpi/hed.c > @@ -81,6 +81,7 @@ static struct acpi_driver acpi_hed_driver = { > }, > }; > module_acpi_driver(acpi_hed_driver); > +subsys_initcall(acpi_hed_driver_init); > > MODULE_AUTHOR("Huang Ying"); > MODULE_DESCRIPTION("ACPI Hardware Error Device Driver");
在 2025/1/20 19:04, Jonathan Cameron 写道: > On Fri, 17 Jan 2025 10:29:57 +0800 > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > >> When the module HED is built-in, the module HED init is behind EVGED >> as the driver are in the same initcall level, then the order is determined >> by Makefile order. That order violates expectations. Because RAS records >> can't be handled in the special time window that EVGED has initialized >> while HED not. >> >> If the number of such RAS records is more than the APEI HEST error source >> number, the HEST resources could be occupied all, and then could affect >> subsequent RAS error reporting. >> >> Change the initcall level of HED to subsys_init to fix the issue. If build >> HED as a module, the problem remains. To solve this problem completely, >> change the ACPI_HED from tristate to bool. >> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Given the change in approach (even though I reviewed this internally) > should probably have dropped my RB. Anyhow, consider this me > giving it again on list. OK. thanks. > Thanks, > > Jonathan > >> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> >> --- >> drivers/acpi/Kconfig | 2 +- >> drivers/acpi/hed.c | 1 + >> 2 files changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig >> index d81b55f5068c..7f10aa38269d 100644 >> --- a/drivers/acpi/Kconfig >> +++ b/drivers/acpi/Kconfig >> @@ -452,7 +452,7 @@ config ACPI_SBS >> the modules will be called sbs and sbshc. >> >> config ACPI_HED >> - tristate "Hardware Error Device" >> + bool "Hardware Error Device" >> help >> This driver supports the Hardware Error Device (PNP0C33), >> which is used to report some hardware errors notified via >> diff --git a/drivers/acpi/hed.c b/drivers/acpi/hed.c >> index 7652515a6be1..677dfcce2990 100644 >> --- a/drivers/acpi/hed.c >> +++ b/drivers/acpi/hed.c >> @@ -81,6 +81,7 @@ static struct acpi_driver acpi_hed_driver = { >> }, >> }; >> module_acpi_driver(acpi_hed_driver); >> +subsys_initcall(acpi_hed_driver_init); >> >> MODULE_AUTHOR("Huang Ying"); >> MODULE_DESCRIPTION("ACPI Hardware Error Device Driver"); > .
On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > 在 2025/1/20 19:04, Jonathan Cameron 写道: > > On Fri, 17 Jan 2025 10:29:57 +0800 > > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > >> When the module HED is built-in, the module HED init is behind EVGED > >> as the driver are in the same initcall level, then the order is determined > >> by Makefile order. That order violates expectations. Because RAS records > >> can't be handled in the special time window that EVGED has initialized > >> while HED not. > >> > >> If the number of such RAS records is more than the APEI HEST error source > >> number, the HEST resources could be occupied all, and then could affect > >> subsequent RAS error reporting. > >> > >> Change the initcall level of HED to subsys_init to fix the issue. If build > >> HED as a module, the problem remains. To solve this problem completely, > >> change the ACPI_HED from tristate to bool. > >> > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Given the change in approach (even though I reviewed this internally) > > should probably have dropped my RB. Anyhow, consider this me > > giving it again on list. > OK. thanks. Applied as 6.14-rc material with a rewritten changelog and under a new subject: "ACPI: HED: Always initialize before evged". Thanks!
On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: > On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > > 在 2025/1/20 19:04, Jonathan Cameron 写道: > > > On Fri, 17 Jan 2025 10:29:57 +0800 > > > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > >> When the module HED is built-in, the module HED init is behind EVGED > > >> as the driver are in the same initcall level, then the order is determined > > >> by Makefile order. That order violates expectations. Because RAS records > > >> can't be handled in the special time window that EVGED has initialized > > >> while HED not. > > >> > > >> If the number of such RAS records is more than the APEI HEST error source > > >> number, the HEST resources could be occupied all, and then could affect > > >> subsequent RAS error reporting. > > >> > > >> Change the initcall level of HED to subsys_init to fix the issue. If build > > >> HED as a module, the problem remains. To solve this problem completely, > > >> change the ACPI_HED from tristate to bool. > > >> > > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > Given the change in approach (even though I reviewed this internally) > > > should probably have dropped my RB. Anyhow, consider this me > > > giving it again on list. > > OK. thanks. > > Applied as 6.14-rc material with a rewritten changelog and under a new > subject: "ACPI: HED: Always initialize before evged". > > Thanks! For what it's worth, I just bisected a new error message that I see when booting several x86_64 distribution configurations in QEMU to this change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize before evged"): $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage $ qemu-system-x86_64 \ -display none \ -nodefaults \ -M q35 \ -d unimp,guest_errors \ -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ -kernel arch/x86/boot/bzImage \ -initrd rootfs.cpio \ -cpu host \ -enable-kvm \ -m 512m \ -smp 8 -serial mon:stdio ... [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... ... If there is any additional information I can provide or patches I can test, I am more than happy to do so. Apologies if this has already been reported or resolved, I did a search on the mailing list and I did not see anything. Cheers, Nathan # bad: [9a87ce288fe30f268b3a598422fe76af9bb2c2d2] Add linux-next specific files for 20250128 # good: [805ba04cb7ccfc7d72e834ebd796e043142156ba] Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux git bisect start '9a87ce288fe30f268b3a598422fe76af9bb2c2d2' '805ba04cb7ccfc7d72e834ebd796e043142156ba' # bad: [e317bfe93c72e10dc02caac8b8b4b064291c352e] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git git bisect bad e317bfe93c72e10dc02caac8b8b4b064291c352e # good: [31d2d7cb6ca6c6159e22bf708c089b7af11f1585] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux.git git bisect good 31d2d7cb6ca6c6159e22bf708c089b7af11f1585 # good: [bd19f1e807f92f27654e0bb2790fe31b6af58daf] Merge branch 'docs-next' of git://git.lwn.net/linux.git git bisect good bd19f1e807f92f27654e0bb2790fe31b6af58daf # bad: [1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git git bisect bad 1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79 # bad: [f572a6cf38985ca5d4df4cae1e3e74464774d033] Merge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.git git bisect bad f572a6cf38985ca5d4df4cae1e3e74464774d033 # bad: [e30de3809c23cc17c49c139e11e7180248381017] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git git bisect bad e30de3809c23cc17c49c139e11e7180248381017 # good: [56ca981eec373ae4779e3114a3807cc15ad230f9] Merge branch 'pm-cpufreq' into linux-next git bisect good 56ca981eec373ae4779e3114a3807cc15ad230f9 # bad: [ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3] Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git git bisect bad ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3 # bad: [39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6] Merge branches 'acpi-x86' and 'acpi-apei' into linux-next git bisect bad 39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6 # good: [c881e66eb84a4f944df317294eedf6b2b2882214] Merge branches 'pm-sleep' and 'pm-powercap' into linux-next git bisect good c881e66eb84a4f944df317294eedf6b2b2882214 # good: [8f62ca9c338aae4f73e9ce0221c3d4668359ddd8] ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V git bisect good 8f62ca9c338aae4f73e9ce0221c3d4668359ddd8 # bad: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged git bisect bad 19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac # first bad commit: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged
On Wed, Jan 29, 2025 at 5:33 AM Nathan Chancellor <nathan@kernel.org> wrote: > > On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: > > On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > > > > > 在 2025/1/20 19:04, Jonathan Cameron 写道: > > > > On Fri, 17 Jan 2025 10:29:57 +0800 > > > > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > > > >> When the module HED is built-in, the module HED init is behind EVGED > > > >> as the driver are in the same initcall level, then the order is determined > > > >> by Makefile order. That order violates expectations. Because RAS records > > > >> can't be handled in the special time window that EVGED has initialized > > > >> while HED not. > > > >> > > > >> If the number of such RAS records is more than the APEI HEST error source > > > >> number, the HEST resources could be occupied all, and then could affect > > > >> subsequent RAS error reporting. > > > >> > > > >> Change the initcall level of HED to subsys_init to fix the issue. If build > > > >> HED as a module, the problem remains. To solve this problem completely, > > > >> change the ACPI_HED from tristate to bool. > > > >> > > > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > Given the change in approach (even though I reviewed this internally) > > > > should probably have dropped my RB. Anyhow, consider this me > > > > giving it again on list. > > > OK. thanks. > > > > Applied as 6.14-rc material with a rewritten changelog and under a new > > subject: "ACPI: HED: Always initialize before evged". > > > > Thanks! > > For what it's worth, I just bisected a new error message that I see when > booting several x86_64 distribution configurations in QEMU to this > change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize > before evged"): > > $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage > > $ qemu-system-x86_64 \ > -display none \ > -nodefaults \ > -M q35 \ > -d unimp,guest_errors \ > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ > -kernel arch/x86/boot/bzImage \ > -initrd rootfs.cpio \ > -cpu host \ > -enable-kvm \ > -m 512m \ > -smp 8 > -serial mon:stdio > ... > [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... > ... > > If there is any additional information I can provide or patches I can > test, I am more than happy to do so. Apologies if this has already been > reported or resolved, I did a search on the mailing list and I did not > see anything. No, it hasn't. So AFAICS the commit in question needs to do more to switch over hed to non-modular. I'll drop it for now, thanks!
Hi Nathan, 在 2025/1/29 12:33, Nathan Chancellor 写道: > On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: >> On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: >>> >>> 在 2025/1/20 19:04, Jonathan Cameron 写道: >>>> On Fri, 17 Jan 2025 10:29:57 +0800 >>>> Xiaofei Tan <tanxiaofei@huawei.com> wrote: >>>> >>>>> When the module HED is built-in, the module HED init is behind EVGED >>>>> as the driver are in the same initcall level, then the order is determined >>>>> by Makefile order. That order violates expectations. Because RAS records >>>>> can't be handled in the special time window that EVGED has initialized >>>>> while HED not. >>>>> >>>>> If the number of such RAS records is more than the APEI HEST error source >>>>> number, the HEST resources could be occupied all, and then could affect >>>>> subsequent RAS error reporting. >>>>> >>>>> Change the initcall level of HED to subsys_init to fix the issue. If build >>>>> HED as a module, the problem remains. To solve this problem completely, >>>>> change the ACPI_HED from tristate to bool. >>>>> >>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>> Given the change in approach (even though I reviewed this internally) >>>> should probably have dropped my RB. Anyhow, consider this me >>>> giving it again on list. >>> OK. thanks. >> Applied as 6.14-rc material with a rewritten changelog and under a new >> subject: "ACPI: HED: Always initialize before evged". >> >> Thanks! > For what it's worth, I just bisected a new error message that I see when > booting several x86_64 distribution configurations in QEMU to this > change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize > before evged"): > > $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage > > $ qemu-system-x86_64 \ > -display none \ > -nodefaults \ > -M q35 \ > -d unimp,guest_errors \ > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ > -kernel arch/x86/boot/bzImage \ > -initrd rootfs.cpio \ > -cpu host \ > -enable-kvm \ > -m 512m \ > -smp 8 > -serial mon:stdio > ... > [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... It seems that the startup script of the test case loads some fixed ko, even when the bzImage changed. If so, we could update the rootfs.cpio. This may introduce some adaptation work, if you want to roll this patch to some distribution. I think we could do this when release new iso. > ... > > If there is any additional information I can provide or patches I can > test, I am more than happy to do so. Apologies if this has already been > reported or resolved, I did a search on the mailing list and I did not > see anything. > > Cheers, > Nathan > > # bad: [9a87ce288fe30f268b3a598422fe76af9bb2c2d2] Add linux-next specific files for 20250128 > # good: [805ba04cb7ccfc7d72e834ebd796e043142156ba] Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux > git bisect start '9a87ce288fe30f268b3a598422fe76af9bb2c2d2' '805ba04cb7ccfc7d72e834ebd796e043142156ba' > # bad: [e317bfe93c72e10dc02caac8b8b4b064291c352e] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git > git bisect bad e317bfe93c72e10dc02caac8b8b4b064291c352e > # good: [31d2d7cb6ca6c6159e22bf708c089b7af11f1585] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux.git > git bisect good 31d2d7cb6ca6c6159e22bf708c089b7af11f1585 > # good: [bd19f1e807f92f27654e0bb2790fe31b6af58daf] Merge branch 'docs-next' of git://git.lwn.net/linux.git > git bisect good bd19f1e807f92f27654e0bb2790fe31b6af58daf > # bad: [1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git > git bisect bad 1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79 > # bad: [f572a6cf38985ca5d4df4cae1e3e74464774d033] Merge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.git > git bisect bad f572a6cf38985ca5d4df4cae1e3e74464774d033 > # bad: [e30de3809c23cc17c49c139e11e7180248381017] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git > git bisect bad e30de3809c23cc17c49c139e11e7180248381017 > # good: [56ca981eec373ae4779e3114a3807cc15ad230f9] Merge branch 'pm-cpufreq' into linux-next > git bisect good 56ca981eec373ae4779e3114a3807cc15ad230f9 > # bad: [ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3] Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git > git bisect bad ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3 > # bad: [39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6] Merge branches 'acpi-x86' and 'acpi-apei' into linux-next > git bisect bad 39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6 > # good: [c881e66eb84a4f944df317294eedf6b2b2882214] Merge branches 'pm-sleep' and 'pm-powercap' into linux-next > git bisect good c881e66eb84a4f944df317294eedf6b2b2882214 > # good: [8f62ca9c338aae4f73e9ce0221c3d4668359ddd8] ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V > git bisect good 8f62ca9c338aae4f73e9ce0221c3d4668359ddd8 > # bad: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged > git bisect bad 19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac > # first bad commit: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged > > .
Hi Xiaofei, On Thu, Feb 06, 2025 at 10:44:13AM +0800, Xiaofei Tan wrote: > 在 2025/1/29 12:33, Nathan Chancellor 写道: > > On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: > > > On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > > > > 在 2025/1/20 19:04, Jonathan Cameron 写道: > > > > > On Fri, 17 Jan 2025 10:29:57 +0800 > > > > > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > > > > > > > > > > When the module HED is built-in, the module HED init is behind EVGED > > > > > > as the driver are in the same initcall level, then the order is determined > > > > > > by Makefile order. That order violates expectations. Because RAS records > > > > > > can't be handled in the special time window that EVGED has initialized > > > > > > while HED not. > > > > > > > > > > > > If the number of such RAS records is more than the APEI HEST error source > > > > > > number, the HEST resources could be occupied all, and then could affect > > > > > > subsequent RAS error reporting. > > > > > > > > > > > > Change the initcall level of HED to subsys_init to fix the issue. If build > > > > > > HED as a module, the problem remains. To solve this problem completely, > > > > > > change the ACPI_HED from tristate to bool. > > > > > > > > > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > > Given the change in approach (even though I reviewed this internally) > > > > > should probably have dropped my RB. Anyhow, consider this me > > > > > giving it again on list. > > > > OK. thanks. > > > Applied as 6.14-rc material with a rewritten changelog and under a new > > > subject: "ACPI: HED: Always initialize before evged". > > > > > > Thanks! > > For what it's worth, I just bisected a new error message that I see when > > booting several x86_64 distribution configurations in QEMU to this > > change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize > > before evged"): > > > > $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config > > > > $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage > > > > $ qemu-system-x86_64 \ > > -display none \ > > -nodefaults \ > > -M q35 \ > > -d unimp,guest_errors \ > > -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ > > -kernel arch/x86/boot/bzImage \ > > -initrd rootfs.cpio \ > > -cpu host \ > > -enable-kvm \ > > -m 512m \ > > -smp 8 > > -serial mon:stdio > > ... > > [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... > > It seems that the startup script of the test case loads some fixed ko, even when the bzImage changed. > > If so, we could update the rootfs.cpio. This may introduce some adaptation work, if you want to roll > > this patch to some distribution. I think we could do this when release new iso. I am confused, this issue is technically reproducible without a rootfs? Removing the '-initrd rootfs.cpio' from the QEMU command I provided above does not hide the error. > > If there is any additional information I can provide or patches I can > > test, I am more than happy to do so. Apologies if this has already been > > reported or resolved, I did a search on the mailing list and I did not > > see anything. > > > > Cheers, > > Nathan > > > > # bad: [9a87ce288fe30f268b3a598422fe76af9bb2c2d2] Add linux-next specific files for 20250128 > > # good: [805ba04cb7ccfc7d72e834ebd796e043142156ba] Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux > > git bisect start '9a87ce288fe30f268b3a598422fe76af9bb2c2d2' '805ba04cb7ccfc7d72e834ebd796e043142156ba' > > # bad: [e317bfe93c72e10dc02caac8b8b4b064291c352e] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git > > git bisect bad e317bfe93c72e10dc02caac8b8b4b064291c352e > > # good: [31d2d7cb6ca6c6159e22bf708c089b7af11f1585] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux.git > > git bisect good 31d2d7cb6ca6c6159e22bf708c089b7af11f1585 > > # good: [bd19f1e807f92f27654e0bb2790fe31b6af58daf] Merge branch 'docs-next' of git://git.lwn.net/linux.git > > git bisect good bd19f1e807f92f27654e0bb2790fe31b6af58daf > > # bad: [1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git > > git bisect bad 1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79 > > # bad: [f572a6cf38985ca5d4df4cae1e3e74464774d033] Merge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.git > > git bisect bad f572a6cf38985ca5d4df4cae1e3e74464774d033 > > # bad: [e30de3809c23cc17c49c139e11e7180248381017] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git > > git bisect bad e30de3809c23cc17c49c139e11e7180248381017 > > # good: [56ca981eec373ae4779e3114a3807cc15ad230f9] Merge branch 'pm-cpufreq' into linux-next > > git bisect good 56ca981eec373ae4779e3114a3807cc15ad230f9 > > # bad: [ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3] Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git > > git bisect bad ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3 > > # bad: [39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6] Merge branches 'acpi-x86' and 'acpi-apei' into linux-next > > git bisect bad 39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6 > > # good: [c881e66eb84a4f944df317294eedf6b2b2882214] Merge branches 'pm-sleep' and 'pm-powercap' into linux-next > > git bisect good c881e66eb84a4f944df317294eedf6b2b2882214 > > # good: [8f62ca9c338aae4f73e9ce0221c3d4668359ddd8] ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V > > git bisect good 8f62ca9c338aae4f73e9ce0221c3d4668359ddd8 > > # bad: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged > > git bisect bad 19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac > > # first bad commit: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged > > > > .
在 2025/2/7 0:49, Nathan Chancellor 写道: > Hi Xiaofei, > > On Thu, Feb 06, 2025 at 10:44:13AM +0800, Xiaofei Tan wrote: >> 在 2025/1/29 12:33, Nathan Chancellor 写道: >>> On Thu, Jan 23, 2025 at 08:35:51PM +0100, Rafael J. Wysocki wrote: >>>> On Tue, Jan 21, 2025 at 3:23 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: >>>>> 在 2025/1/20 19:04, Jonathan Cameron 写道: >>>>>> On Fri, 17 Jan 2025 10:29:57 +0800 >>>>>> Xiaofei Tan <tanxiaofei@huawei.com> wrote: >>>>>> >>>>>>> When the module HED is built-in, the module HED init is behind EVGED >>>>>>> as the driver are in the same initcall level, then the order is determined >>>>>>> by Makefile order. That order violates expectations. Because RAS records >>>>>>> can't be handled in the special time window that EVGED has initialized >>>>>>> while HED not. >>>>>>> >>>>>>> If the number of such RAS records is more than the APEI HEST error source >>>>>>> number, the HEST resources could be occupied all, and then could affect >>>>>>> subsequent RAS error reporting. >>>>>>> >>>>>>> Change the initcall level of HED to subsys_init to fix the issue. If build >>>>>>> HED as a module, the problem remains. To solve this problem completely, >>>>>>> change the ACPI_HED from tristate to bool. >>>>>>> >>>>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>>>> Given the change in approach (even though I reviewed this internally) >>>>>> should probably have dropped my RB. Anyhow, consider this me >>>>>> giving it again on list. >>>>> OK. thanks. >>>> Applied as 6.14-rc material with a rewritten changelog and under a new >>>> subject: "ACPI: HED: Always initialize before evged". >>>> >>>> Thanks! >>> For what it's worth, I just bisected a new error message that I see when >>> booting several x86_64 distribution configurations in QEMU to this >>> change in -next as commit 19badc4e57c6 ("ACPI: HED: Always initialize >>> before evged"): >>> >>> $ curl -LSso .config https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/raw/main/config >>> >>> $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig bzImage >>> >>> $ qemu-system-x86_64 \ >>> -display none \ >>> -nodefaults \ >>> -M q35 \ >>> -d unimp,guest_errors \ >>> -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \ >>> -kernel arch/x86/boot/bzImage \ >>> -initrd rootfs.cpio \ >>> -cpu host \ >>> -enable-kvm \ >>> -m 512m \ >>> -smp 8 >>> -serial mon:stdio >>> ... >>> [ 0.535126] Error: Driver 'hardware_error_device' is already registered, aborting... >> It seems that the startup script of the test case loads some fixed ko, even when the bzImage changed. >> >> If so, we could update the rootfs.cpio. This may introduce some adaptation work, if you want to roll >> >> this patch to some distribution. I think we could do this when release new iso. > I am confused, this issue is technically reproducible without a rootfs? > Removing the '-initrd rootfs.cpio' from the QEMU command I provided > above does not hide the error. Hi Nathan, I have reproduced this issue, and nothing to do with rootfs.cpio, thanks for this info. > >>> If there is any additional information I can provide or patches I can >>> test, I am more than happy to do so. Apologies if this has already been >>> reported or resolved, I did a search on the mailing list and I did not >>> see anything. >>> >>> Cheers, >>> Nathan >>> >>> # bad: [9a87ce288fe30f268b3a598422fe76af9bb2c2d2] Add linux-next specific files for 20250128 >>> # good: [805ba04cb7ccfc7d72e834ebd796e043142156ba] Merge tag 'mips_6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux >>> git bisect start '9a87ce288fe30f268b3a598422fe76af9bb2c2d2' '805ba04cb7ccfc7d72e834ebd796e043142156ba' >>> # bad: [e317bfe93c72e10dc02caac8b8b4b064291c352e] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git >>> git bisect bad e317bfe93c72e10dc02caac8b8b4b064291c352e >>> # good: [31d2d7cb6ca6c6159e22bf708c089b7af11f1585] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux.git >>> git bisect good 31d2d7cb6ca6c6159e22bf708c089b7af11f1585 >>> # good: [bd19f1e807f92f27654e0bb2790fe31b6af58daf] Merge branch 'docs-next' of git://git.lwn.net/linux.git >>> git bisect good bd19f1e807f92f27654e0bb2790fe31b6af58daf >>> # bad: [1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git >>> git bisect bad 1df2cdd95b5ca1e2d520e3df2b2b1a12bd31cb79 >>> # bad: [f572a6cf38985ca5d4df4cae1e3e74464774d033] Merge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.git >>> git bisect bad f572a6cf38985ca5d4df4cae1e3e74464774d033 >>> # bad: [e30de3809c23cc17c49c139e11e7180248381017] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git >>> git bisect bad e30de3809c23cc17c49c139e11e7180248381017 >>> # good: [56ca981eec373ae4779e3114a3807cc15ad230f9] Merge branch 'pm-cpufreq' into linux-next >>> git bisect good 56ca981eec373ae4779e3114a3807cc15ad230f9 >>> # bad: [ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3] Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git >>> git bisect bad ed752cc25bbd8ee26498a91ac7a63bcf50ea16f3 >>> # bad: [39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6] Merge branches 'acpi-x86' and 'acpi-apei' into linux-next >>> git bisect bad 39b6b3b09b32c7b4ed6b0e8e87f359c588fe19d6 >>> # good: [c881e66eb84a4f944df317294eedf6b2b2882214] Merge branches 'pm-sleep' and 'pm-powercap' into linux-next >>> git bisect good c881e66eb84a4f944df317294eedf6b2b2882214 >>> # good: [8f62ca9c338aae4f73e9ce0221c3d4668359ddd8] ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V >>> git bisect good 8f62ca9c338aae4f73e9ce0221c3d4668359ddd8 >>> # bad: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged >>> git bisect bad 19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac >>> # first bad commit: [19badc4e57c6a5b87d3ce6eb6ec24bed62ec57ac] ACPI: HED: Always initialize before evged >>> >>> . > .
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index d81b55f5068c..7f10aa38269d 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -452,7 +452,7 @@ config ACPI_SBS the modules will be called sbs and sbshc. config ACPI_HED - tristate "Hardware Error Device" + bool "Hardware Error Device" help This driver supports the Hardware Error Device (PNP0C33), which is used to report some hardware errors notified via diff --git a/drivers/acpi/hed.c b/drivers/acpi/hed.c index 7652515a6be1..677dfcce2990 100644 --- a/drivers/acpi/hed.c +++ b/drivers/acpi/hed.c @@ -81,6 +81,7 @@ static struct acpi_driver acpi_hed_driver = { }, }; module_acpi_driver(acpi_hed_driver); +subsys_initcall(acpi_hed_driver_init); MODULE_AUTHOR("Huang Ying"); MODULE_DESCRIPTION("ACPI Hardware Error Device Driver");