Message ID | 20231124145338.3112416-1-o.rempel@pengutronix.de |
---|---|
Headers | show |
Series | introduce priority-based shutdown support | expand |
On 25/11/2023 08:50, Oleksij Rempel wrote: > On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote: >> On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote: >>> On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: >>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: >>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: >>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: >>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: >>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: >>>>>>> >>>>>>>>> This came out of some discussions about trying to handle emergency power >>>>>>>>> failure notifications. >>>>>>> >>>>>>>> I'm sorry, but I don't know what that means. Are you saying that the >>>>>>>> kernel is now going to try to provide a hard guarantee that some devices >>>>>>>> are going to be shut down in X number of seconds when asked? If so, why >>>>>>>> not do this in userspace? >>>>>>> >>>>>>> No, it was initially (or when I initially saw it anyway) handling of >>>>>>> notifications from regulators that they're in trouble and we have some >>>>>>> small amount of time to do anything we might want to do about it before >>>>>>> we expire. >>>>>> >>>>>> So we are going to guarantee a "time" in which we are going to do >>>>>> something? Again, if that's required, why not do it in userspace using >>>>>> a RT kernel? >>>>> >>>>> For the HW in question I have only 100ms time before power loss. By >>>>> doing it over use space some we will have even less time to react. >>>> >>>> Why can't userspace react that fast? Why will the kernel be somehow >>>> faster? Speed should be the same, just get the "power is cut" signal >>>> and have userspace flush and unmount the disk before power is gone. Why >>>> can the kernel do this any differently? >>>> >>>>> In fact, this is not a new requirement. It exist on different flavors of >>>>> automotive Linux for about 10 years. Linux in cars should be able to >>>>> handle voltage drops for example on ignition and so on. The only new thing is >>>>> the attempt to mainline it. >>>> >>>> But your patch is not guaranteeing anything, it's just doing a "I want >>>> this done before the other devices are handled", that's it. There is no >>>> chance that 100ms is going to be a requirement, or that some other >>>> device type is not going to come along and demand to be ahead of your >>>> device in the list. >>>> >>>> So you are going to have a constant fight among device types over the >>>> years, and people complaining that the kernel is now somehow going to >>>> guarantee that a device is shutdown in a set amount of time, which >>>> again, the kernel can not guarantee here. >>>> >>>> This might work as a one-off for a specific hardware platform, which is >>>> odd, but not anything you really should be adding for anyone else to use >>>> here as your reasoning for it does not reflect what the code does. >>> >>> I see. Good point. >>> >>> In my case umount is not needed, there is not enough time to write down >>> the data. We should send a shutdown command to the eMMC ASAP. >> >> If you don't care about the data, why is a shutdown command to the >> hardware needed? What does that do that makes anything "safe" if your >> data is lost. > > It prevents HW damage. In a typical automotive under-voltage labor it is > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y > amount of under-voltage cycles (I do not have exact numbers right now). > Even if the numbers not so high in the labor tests (sometimes something > like one bricked device in a month of tests), the field returns are > significant enough to care about software solution for this problem. > > Same problem was seen not only in automotive devices, but also in > industrial or agricultural. With other words, it is important enough to bring > some kind of solution mainline. > IMO that is a serious problem with the used storage / eMMC in that case and it is not suitable for industrial/automotive uses? Any industrial/automotive-suitable storage device should detect under-voltage and just treat it as a power-down/loss, and while that isn't nice for the storage device, it really shouldn't be able to brick a device (within <1M cycles anyway). What does the storage module vendor say about this? BR, Christian
pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman (gregkh@linuxfoundation.org) kirjoitti: > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > failure notifications. > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > not do this in userspace? > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > notifications from regulators that they're in trouble and we have some > > > > small amount of time to do anything we might want to do about it before > > > > we expire. > > > > > > So we are going to guarantee a "time" in which we are going to do > > > something? Again, if that's required, why not do it in userspace using > > > a RT kernel? > > > > For the HW in question I have only 100ms time before power loss. By > > doing it over use space some we will have even less time to react. > > Why can't userspace react that fast? Why will the kernel be somehow > faster? Speed should be the same, just get the "power is cut" signal > and have userspace flush and unmount the disk before power is gone. Why > can the kernel do this any differently? > > > In fact, this is not a new requirement. It exist on different flavors of > > automotive Linux for about 10 years. Linux in cars should be able to > > handle voltage drops for example on ignition and so on. The only new thing is > > the attempt to mainline it. > > But your patch is not guaranteeing anything, it's just doing a "I want > this done before the other devices are handled", that's it. There is no > chance that 100ms is going to be a requirement, or that some other > device type is not going to come along and demand to be ahead of your > device in the list. > > So you are going to have a constant fight among device types over the > years, and people complaining that the kernel is now somehow going to > guarantee that a device is shutdown in a set amount of time, which > again, the kernel can not guarantee here. > > This might work as a one-off for a specific hardware platform, which is > odd, but not anything you really should be adding for anyone else to use > here as your reasoning for it does not reflect what the code does. I was (am) interested in knowing how/where the regulator error notifications are utilized - hence I asked this in ELCE last summer. Replies indeed mostly pointed to automotive and handling the under voltage events. As to what has changed (I think this was asked in another mail on this topic) - I understood from the discussions that the demand of running systems with as low power as possible is even more important/desirable. Hence, the under-voltage events are more usual than they were when cars used to be working by burning flammable liquids :) Anyways, what I thought I'd comment on is that the severity of the regulator error notifications can be given from device-tree. Rationale behind this is that figuring out whether a certain detected problem is fatal or not (in embedded systems) should be done by the board designers, per board. Maybe the understanding which hardware should react first is also a property of hardware and could come from the device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" set unconditionally for EMMC, systems could set shutdown priority per board and per device explicitly using device-tree? Yours, -- Matti
On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote: > pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman > (gregkh@linuxfoundation.org) kirjoitti: > > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > > failure notifications. > > > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > > not do this in userspace? > > > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > > notifications from regulators that they're in trouble and we have some > > > > > small amount of time to do anything we might want to do about it before > > > > > we expire. > > > > > > > > So we are going to guarantee a "time" in which we are going to do > > > > something? Again, if that's required, why not do it in userspace using > > > > a RT kernel? > > > > > > For the HW in question I have only 100ms time before power loss. By > > > doing it over use space some we will have even less time to react. > > > > Why can't userspace react that fast? Why will the kernel be somehow > > faster? Speed should be the same, just get the "power is cut" signal > > and have userspace flush and unmount the disk before power is gone. Why > > can the kernel do this any differently? > > > > > In fact, this is not a new requirement. It exist on different flavors of > > > automotive Linux for about 10 years. Linux in cars should be able to > > > handle voltage drops for example on ignition and so on. The only new thing is > > > the attempt to mainline it. > > > > But your patch is not guaranteeing anything, it's just doing a "I want > > this done before the other devices are handled", that's it. There is no > > chance that 100ms is going to be a requirement, or that some other > > device type is not going to come along and demand to be ahead of your > > device in the list. > > > > So you are going to have a constant fight among device types over the > > years, and people complaining that the kernel is now somehow going to > > guarantee that a device is shutdown in a set amount of time, which > > again, the kernel can not guarantee here. > > > > This might work as a one-off for a specific hardware platform, which is > > odd, but not anything you really should be adding for anyone else to use > > here as your reasoning for it does not reflect what the code does. > > I was (am) interested in knowing how/where the regulator error > notifications are utilized - hence I asked this in ELCE last summer. > Replies indeed mostly pointed to automotive and handling the under > voltage events. > > As to what has changed (I think this was asked in another mail on this > topic) - I understood from the discussions that the demand of running > systems with as low power as possible is even more > important/desirable. Hence, the under-voltage events are more usual > than they were when cars used to be working by burning flammable > liquids :) > > Anyways, what I thought I'd comment on is that the severity of the > regulator error notifications can be given from device-tree. Rationale > behind this is that figuring out whether a certain detected problem is > fatal or not (in embedded systems) should be done by the board > designers, per board. Maybe the understanding which hardware should > react first is also a property of hardware and could come from the > device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" > set unconditionally for EMMC, systems could set shutdown priority per > board and per device explicitly using device-tree? Yes, using device tree would be good, but now you have created something that is device-tree-specific and not all the world is device tree :( Also, many devices are finally moving out to non-device-tree busses, like PCI and USB, so how would you handle them in this type of scheme? thanks, greg k-h
On Mon, Nov 27, 2023 at 01:08:24PM +0000, Greg Kroah-Hartman wrote: > Yes, using device tree would be good, but now you have created something > that is device-tree-specific and not all the world is device tree :( AFAICT the idiomatic thing for ACPI would be platform quirks based on DMI information. Yay ACPI. If the system is more Linux targetted then you can use _DSD properties to store DT properties, these can then be parsed out in a firmware interface neutral way via the fwnode API. I'm not sure there's any avoiding dealing with firmware interface specifics at some point if we need platform description. > Also, many devices are finally moving out to non-device-tree busses, > like PCI and USB, so how would you handle them in this type of scheme? DT does have bindings for devices on discoverable buses like PCI - I think the original thing was for vendors cheaping out on EEPROMs though it's also useful when things are soldered down in embedded systems.
On 11/27/23 15:08, Greg Kroah-Hartman wrote: > On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote: >> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman >> (gregkh@linuxfoundation.org) kirjoitti: >>> >>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: >>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: >>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: >>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: >>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: >>>>>> >>>>>>>> This came out of some discussions about trying to handle emergency power >>>>>>>> failure notifications. >>>>>> >>>>>>> I'm sorry, but I don't know what that means. Are you saying that the >>>>>>> kernel is now going to try to provide a hard guarantee that some devices >>>>>>> are going to be shut down in X number of seconds when asked? If so, why >>>>>>> not do this in userspace? >>>>>> >>>>>> No, it was initially (or when I initially saw it anyway) handling of >>>>>> notifications from regulators that they're in trouble and we have some >>>>>> small amount of time to do anything we might want to do about it before >>>>>> we expire. >>>>> >>>>> So we are going to guarantee a "time" in which we are going to do >>>>> something? Again, if that's required, why not do it in userspace using >>>>> a RT kernel? >>>> >>>> For the HW in question I have only 100ms time before power loss. By >>>> doing it over use space some we will have even less time to react. >>> >>> Why can't userspace react that fast? Why will the kernel be somehow >>> faster? Speed should be the same, just get the "power is cut" signal >>> and have userspace flush and unmount the disk before power is gone. Why >>> can the kernel do this any differently? >>> >>>> In fact, this is not a new requirement. It exist on different flavors of >>>> automotive Linux for about 10 years. Linux in cars should be able to >>>> handle voltage drops for example on ignition and so on. The only new thing is >>>> the attempt to mainline it. >>> >>> But your patch is not guaranteeing anything, it's just doing a "I want >>> this done before the other devices are handled", that's it. There is no >>> chance that 100ms is going to be a requirement, or that some other >>> device type is not going to come along and demand to be ahead of your >>> device in the list. >>> >>> So you are going to have a constant fight among device types over the >>> years, and people complaining that the kernel is now somehow going to >>> guarantee that a device is shutdown in a set amount of time, which >>> again, the kernel can not guarantee here. >>> >>> This might work as a one-off for a specific hardware platform, which is >>> odd, but not anything you really should be adding for anyone else to use >>> here as your reasoning for it does not reflect what the code does. >> >> I was (am) interested in knowing how/where the regulator error >> notifications are utilized - hence I asked this in ELCE last summer. >> Replies indeed mostly pointed to automotive and handling the under >> voltage events. >> >> As to what has changed (I think this was asked in another mail on this >> topic) - I understood from the discussions that the demand of running >> systems with as low power as possible is even more >> important/desirable. Hence, the under-voltage events are more usual >> than they were when cars used to be working by burning flammable >> liquids :) >> >> Anyways, what I thought I'd comment on is that the severity of the >> regulator error notifications can be given from device-tree. Rationale >> behind this is that figuring out whether a certain detected problem is >> fatal or not (in embedded systems) should be done by the board >> designers, per board. Maybe the understanding which hardware should >> react first is also a property of hardware and could come from the >> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" >> set unconditionally for EMMC, systems could set shutdown priority per >> board and per device explicitly using device-tree? > > Yes, using device tree would be good, but now you have created something > that is device-tree-specific and not all the world is device tree :( True. However, my understanding is that the regulator subsystem is largely written to work with DT-based systems. Hence supporting the DT-based solution would probably fit to this specific use-case as source of problem notifications is the regulator subsystem. > Also, many devices are finally moving out to non-device-tree busses, > like PCI and USB, so how would you handle them in this type of scheme? I do readily admit I don't have [all ;) ] the answers. I also think that if we add support for prioritized shutdown on device-tree-based systems, people may eventually want to use this on non device-tree setups too. There may also be other use-cases for prioritized shutdown (Don't know what they would be though). For now I would leave that to be the problem of the folks who need non device-tree systems when (if) this needs realizes. Assuming there was the handling of priorities in place, the missing piece would then be to find out the place to store this hardware specific priority information. If this is solved for the non DT cases, then the DT-based and non DT-based solutions can co-exist. Just a suggestion though. I am not working on under-voltage "stuff" right now. Yours, -- Matti
On Fri, 24 Nov 2023 at 15:53, Oleksij Rempel <o.rempel@pengutronix.de> wrote: > > Hi, > > This patch series introduces support for prioritized device shutdown. > The main goal is to enable prioritization for shutting down specific > devices, particularly crucial in scenarios like power loss where > hardware damage can occur if not handled properly. > > Oleksij Rempel (3): > driver core: move core part of device_shutdown() to a separate > function > driver core: introduce prioritized device shutdown sequence > mmc: core: increase shutdown priority for MMC devices > > drivers/base/core.c | 157 +++++++++++++++++++++++++++-------------- > drivers/mmc/core/bus.c | 2 + > include/linux/device.h | 51 ++++++++++++- > kernel/reboot.c | 4 +- > 4 files changed, 157 insertions(+), 57 deletions(-) > Sorry for joining the discussions a bit late! Besides the valuable feedback that you already received from others (which indicates that we have quite some work to do in the commit messages to better explain and justify these changes), I wanted to share my overall thoughts around this. So, I fully understand the reason behind the $subject series, as we unfortunately can't rely on flash-based (NAND/NOR) storage devices being 100% tolerant to sudden-power failures. Besides for the reasons already discussed in the thread, the robustness simply depends on the "quality" of the FTL (flash translation layer) and the NAND/NOR/etc device it runs. For example, back in the days when Android showed up, we were testing YAFFS and UBIFS on rawNAND, which failed miserably after just a few thousands of power-cycles. It was even worse with ext3/4 on the early variants of eMMC devices, as those survived only a few hundreds of power-cycles. Now, I assume this has improved a lot over the years, but I haven't really verified this myself. That said, for eMMC and other flash-based storage devices, industrial or not, I think it would make sense to try to notify the device about the power-failure, if possible. This would add another level of mitigation, I think.
Hello all, On Mon, Nov 27, 2023 at 12:36:11PM +0100, Oleksij Rempel wrote: > On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote: > > > Same problem was seen not only in automotive devices, but also in > > > industrial or agricultural. With other words, it is important enough to bring > > > some kind of solution mainline. > > > > > > > IMO that is a serious problem with the used storage / eMMC in that case and it > > is not suitable for industrial/automotive uses? > > Any industrial/automotive-suitable storage device should detect under-voltage and > > just treat it as a power-down/loss, and while that isn't nice for the storage device, > > it really shouldn't be able to brick a device (within <1M cycles anyway). > > What does the storage module vendor say about this? > > Good question. I do not have insights ATM. I'll forward it.