Message ID | 1360429077-14616-1-git-send-email-haojian.zhuang@linaro.org |
---|---|
State | Changes Requested |
Headers | show |
On Sun, 10 Feb 2013 00:57:57 +0800 Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > do_initcalls() could call all driver initialization code in kernel_init > thread. It means that probe() function will be also called from that > time. After this, kernel could access console & release __init section > in the same thread. > > But if device probe fails and moves into deferred probe list, a new > thread is created to reinvoke probe. If the device is serial console, > kernel has to open console failure & release __init section before > reinvoking failure. Because there's no synchronization mechanism. > Now add wait event to synchronize after do_initcalls(). It sounds like this: static int __ref kernel_init(void *unused) { kernel_init_freeable(); /* need to finish all async __init code before freeing the memory */ async_synchronize_full(); is designed to prevent the problem you describe? > --- a/init/main.c > +++ b/init/main.c > @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) > do_ctors(); > usermodehelper_enable(); > do_initcalls(); > + wait_for_device_probe(); > } Needs a nice comment here explaining what's going on.
On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote: > On Sun, 10 Feb 2013 00:57:57 +0800 > Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > >> do_initcalls() could call all driver initialization code in kernel_init >> thread. It means that probe() function will be also called from that >> time. After this, kernel could access console & release __init section >> in the same thread. >> >> But if device probe fails and moves into deferred probe list, a new >> thread is created to reinvoke probe. If the device is serial console, >> kernel has to open console failure & release __init section before >> reinvoking failure. Because there's no synchronization mechanism. >> Now add wait event to synchronize after do_initcalls(). > > It sounds like this: > > static int __ref kernel_init(void *unused) > { > kernel_init_freeable(); > /* need to finish all async __init code before freeing the memory */ > async_synchronize_full(); > > is designed to prevent the problem you describe? > It can't prevent the problem that I described. Because deferred_probe() is introduced recently. All synchronization should be finished just after do_initcalls(). Since load_default_modules() is also called in the end of kernel_init_freeable(), I'm not sure that whether I could remove async_synchronize_full() here. So I didn't touch it. >> --- a/init/main.c >> +++ b/init/main.c >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) >> do_ctors(); >> usermodehelper_enable(); >> do_initcalls(); >> + wait_for_device_probe(); >> } > > Needs a nice comment here explaining what's going on. No problem. I'll add comment here. Regards Haojian
On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Sun, 10 Feb 2013 00:57:57 +0800 > > Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > > > >> do_initcalls() could call all driver initialization code in kernel_init > >> thread. It means that probe() function will be also called from that > >> time. After this, kernel could access console & release __init section > >> in the same thread. > >> > >> But if device probe fails and moves into deferred probe list, a new > >> thread is created to reinvoke probe. If the device is serial console, > >> kernel has to open console failure & release __init section before > >> reinvoking failure. Because there's no synchronization mechanism. > >> Now add wait event to synchronize after do_initcalls(). > > > > It sounds like this: > > > > static int __ref kernel_init(void *unused) > > { > > kernel_init_freeable(); > > /* need to finish all async __init code before freeing the memory */ > > async_synchronize_full(); > > > > is designed to prevent the problem you describe? > > > It can't prevent the problem that I described. Because deferred_probe() > is introduced recently. > > All synchronization should be finished just after do_initcalls(). Since > load_default_modules() is also called in the end of kernel_init_freeable(), > I'm not sure that whether I could remove async_synchronize_full() > here. So I didn't touch it. > > >> --- a/init/main.c > >> +++ b/init/main.c > >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) > >> do_ctors(); > >> usermodehelper_enable(); > >> do_initcalls(); > >> + wait_for_device_probe(); > >> } > > > > Needs a nice comment here explaining what's going on. > > No problem. I'll add comment here. Actually, this approach will create new problems. There is no guarantee that a given device will be able to initialize before exiting do_basic_setup(). If, for instance, a device depends on a resource provided by a module, then it will just keep deferring. In that case you've got a hung kernel. I think what you really want is the following: static int deferred_probe_initcall(void) { deferred_wq = create_singlethread_workqueue("deferwq"); if (WARN_ON(!deferred_wq)) return -ENOMEM; driver_deferred_probe_enable = true; + deferred_probe_work_func(NULL); - driver_deferred_probe_trigger(); return 0; } late_initcall(deferred_probe_initcall); Or something similar. That would guarantee that as many passes as are needed (which in practical terms only means a couple) for device probing to settle down before exiting the initcall processing. That should achieve the effect you desire. It still masks the __init section issue by making it a lot less likely, but it does ensure that all of the built-in driver dependency order issues are processed before continuing on to userspace. g.
On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote: > On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: >> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote: >> > On Sun, 10 Feb 2013 00:57:57 +0800 >> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote: >> > >> >> do_initcalls() could call all driver initialization code in kernel_init >> >> thread. It means that probe() function will be also called from that >> >> time. After this, kernel could access console & release __init section >> >> in the same thread. >> >> >> >> But if device probe fails and moves into deferred probe list, a new >> >> thread is created to reinvoke probe. If the device is serial console, >> >> kernel has to open console failure & release __init section before >> >> reinvoking failure. Because there's no synchronization mechanism. >> >> Now add wait event to synchronize after do_initcalls(). >> > >> > It sounds like this: >> > >> > static int __ref kernel_init(void *unused) >> > { >> > kernel_init_freeable(); >> > /* need to finish all async __init code before freeing the memory */ >> > async_synchronize_full(); >> > >> > is designed to prevent the problem you describe? >> > >> It can't prevent the problem that I described. Because deferred_probe() >> is introduced recently. >> >> All synchronization should be finished just after do_initcalls(). Since >> load_default_modules() is also called in the end of kernel_init_freeable(), >> I'm not sure that whether I could remove async_synchronize_full() >> here. So I didn't touch it. >> >> >> --- a/init/main.c >> >> +++ b/init/main.c >> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) >> >> do_ctors(); >> >> usermodehelper_enable(); >> >> do_initcalls(); >> >> + wait_for_device_probe(); >> >> } >> > >> > Needs a nice comment here explaining what's going on. >> >> No problem. I'll add comment here. > > Actually, this approach will create new problems. There is no guarantee > that a given device will be able to initialize before exiting > do_basic_setup(). If, for instance, a device depends on a resource > provided by a module, then it will just keep deferring. In that case > you've got a hung kernel. > > I think what you really want is the following: > > static int deferred_probe_initcall(void) > { > deferred_wq = create_singlethread_workqueue("deferwq"); > if (WARN_ON(!deferred_wq)) > return -ENOMEM; > > driver_deferred_probe_enable = true; > + deferred_probe_work_func(NULL); > - driver_deferred_probe_trigger(); > return 0; > } > late_initcall(deferred_probe_initcall); > > Or something similar. That would guarantee that as many passes as are needed > (which in practical terms only means a couple) for device probing to > settle down before exiting the initcall processing. That should achieve > the effect you desire. > > It still masks the __init section issue by making it a lot less likely, Grant, Can you please explain me this problem?My understanding is below: If all the detection of devices with there respective driver is done before __init section is freed then we will not have the problem mentioned. However if the driver requests the probing to be deferred then __init section of the deferred driver will not be freed right? I am afraid but the patch description is bit cryptic for me specially this line "kernel has to open console failure & release __init section before reinvoking failure". > but it does ensure that all of the built-in driver dependency order > issues are processed before continuing on to userspace. > > g. > > -- > Grant Likely, B.Sc, P.Eng. > Secret Lab Technologies, Ltd. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Thursday 14 February 2013, anish singh wrote: > Grant, Can you please explain me this problem?My understanding is below: > If all the detection of devices with there respective driver is done before > __init section is freed then we will not have the problem mentioned. > However if the driver requests the probing to be deferred then __init section > of the deferred driver will not be freed right? The kernel has no idea which drivers are deferred at the time when all the __init sections are freed. > I am afraid but the patch description is bit cryptic for me specially > this line "kernel has to open console failure & release __init section before > reinvoking failure". I would put it this way: With the introduction of deferred probing, the rules for the use of __init sections have changed slightly for some corner cases. While normal device drivers can, as before, not call __init functions from their .probe() callbacks, we could do that in drivers as long as they were built-in and did not support hotplug, and that exception was used in console drivers. This exception has now become more specific, and those drivers also must not use deferred probing that depends on other loadable modules or hotpluggable devices. Grant's patch fixes the corner case where you have a device whose .probe() callback gets deferred and waits for some object that is provided by a different built-in driver for a non-hotpluggable device, by making sure that this particular deferred probe has completed before the __init section is freed. Unlike Haojian's patch, it allows other deferred device probe functions that do not need to call __init functions to be delayed until much later, when a driver module is loaded or a device is hotplugged. Arnd
On Thu, Feb 14, 2013 at 09:56:36AM +0000, Arnd Bergmann wrote: > I would put it this way: With the introduction of deferred probing, the > rules for the use of __init sections have changed slightly for some > corner cases. While normal device drivers can, as before, not call > __init functions from their .probe() callbacks, we could do that in > drivers as long as they were built-in and did not support hotplug, > and that exception was used in console drivers. This exception has > now become more specific, and those drivers also must not use > deferred probing that depends on other loadable modules or hotpluggable > devices. In the general case, that remains true, but it's still _not_ true for console drivers. The console _should_ be initialised before it is attempted to be opened before passing control to userspace, which happens before the .init section is freed. If the console is deferred past that point, then userspace has no console. The behaviour of userspace in that situation can be very interesting, and I'd suggest that such is not well tested; consider the effect of not having fd 0,1,2 connected to something like a console but your filesystem and something doing a printf(). You can hope that userspace will take care of that condition, but I personally would not put much faith in it. With the plethora of 'init' daemon solutions we now have, I have less faith than I used to that such a condition would be correctly handled by all of them.
On Thursday 14 February 2013, Russell King - ARM Linux wrote: > In the general case, that remains true, but it's still not true for > console drivers. > > The console should be initialised before it is attempted to be opened > before passing control to userspace, which happens before the .init > section is freed. Yes, I forgot about that. This is indeed an additional requirement besides what I listed. The late_initcall in which Grant was adding the serialization however is executed just before the /dev/console gets opened, which seems like an appropriate place. What might get into the way is that other late_initcalls get executed after this one and are required for the console. In that case, we would have to move the deferred_probe_initcall from a late_initcall to the end of do_basic_setup, after all the other initcalls. Arnd
On 14 February 2013 05:36, Grant Likely <grant.likely@secretlab.ca> wrote: > On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: >> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote: >> > On Sun, 10 Feb 2013 00:57:57 +0800 >> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote: >> > >> >> do_initcalls() could call all driver initialization code in kernel_init >> >> thread. It means that probe() function will be also called from that >> >> time. After this, kernel could access console & release __init section >> >> in the same thread. >> >> >> >> But if device probe fails and moves into deferred probe list, a new >> >> thread is created to reinvoke probe. If the device is serial console, >> >> kernel has to open console failure & release __init section before >> >> reinvoking failure. Because there's no synchronization mechanism. >> >> Now add wait event to synchronize after do_initcalls(). >> > >> > It sounds like this: >> > >> > static int __ref kernel_init(void *unused) >> > { >> > kernel_init_freeable(); >> > /* need to finish all async __init code before freeing the memory */ >> > async_synchronize_full(); >> > >> > is designed to prevent the problem you describe? >> > >> It can't prevent the problem that I described. Because deferred_probe() >> is introduced recently. >> >> All synchronization should be finished just after do_initcalls(). Since >> load_default_modules() is also called in the end of kernel_init_freeable(), >> I'm not sure that whether I could remove async_synchronize_full() >> here. So I didn't touch it. >> >> >> --- a/init/main.c >> >> +++ b/init/main.c >> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) >> >> do_ctors(); >> >> usermodehelper_enable(); >> >> do_initcalls(); >> >> + wait_for_device_probe(); >> >> } >> > >> > Needs a nice comment here explaining what's going on. >> >> No problem. I'll add comment here. > > Actually, this approach will create new problems. There is no guarantee > that a given device will be able to initialize before exiting > do_basic_setup(). If, for instance, a device depends on a resource > provided by a module, then it will just keep deferring. In that case > you've got a hung kernel. > > I think what you really want is the following: > > static int deferred_probe_initcall(void) > { > deferred_wq = create_singlethread_workqueue("deferwq"); > if (WARN_ON(!deferred_wq)) > return -ENOMEM; > > driver_deferred_probe_enable = true; > + deferred_probe_work_func(NULL); > - driver_deferred_probe_trigger(); If you can change it into code in below, it could work. Otherwise, it always fails. driver_deferred_probe_enable = true; driver_deferred_probe_trigger(); + deferred_probe_work_func(NULL); return 0; Because deferred_probe_work_func() depends on that deferred_probe is added into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called first, the deferred uart probe can't be added into active list. So even you call work_func at here, it doesn't help. Would you send it again? If so, you can add tested-by with my signature. > return 0; > } > late_initcall(deferred_probe_initcall); > > Or something similar. That would guarantee that as many passes as are needed > (which in practical terms only means a couple) for device probing to > settle down before exiting the initcall processing. That should achieve > the effect you desire. > > It still masks the __init section issue by making it a lot less likely, > but it does ensure that all of the built-in driver dependency order > issues are processed before continuing on to userspace. > > g. > > -- > Grant Likely, B.Sc, P.Eng. > Secret Lab Technologies, Ltd.
On Thursday 14 February 2013, Haojian Zhuang wrote: > If you can change it into code in below, it could work. Otherwise, it > always fails. > driver_deferred_probe_enable = true; > driver_deferred_probe_trigger(); > + deferred_probe_work_func(NULL); > return 0; > > Because deferred_probe_work_func() depends on that deferred_probe is added > into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called > first, the deferred uart probe can't be added into active list. So even you call > work_func at here, it doesn't help. > Would that not cause two instances of the work function to run at the same time? That sounds like a source for a lot of problems. Arnd
On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote: > On Thursday 14 February 2013, Haojian Zhuang wrote: >> If you can change it into code in below, it could work. Otherwise, it >> always fails. >> driver_deferred_probe_enable = true; >> driver_deferred_probe_trigger(); >> + deferred_probe_work_func(NULL); >> return 0; >> >> Because deferred_probe_work_func() depends on that deferred_probe is added >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called >> first, the deferred uart probe can't be added into active list. So even you call >> work_func at here, it doesn't help. >> > > Would that not cause two instances of the work function to run at the same time? > That sounds like a source for a lot of problems. > > Arnd Two instances of the work function? I'm sorry that I don't understanding your meaning. Could you help explain your question? Best Regards Haojian
On Thu, 14 Feb 2013 08:57:17 +0530, anish singh <anish198519851985@gmail.com> wrote: > On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote: > > static int deferred_probe_initcall(void) > > { > > deferred_wq = create_singlethread_workqueue("deferwq"); > > if (WARN_ON(!deferred_wq)) > > return -ENOMEM; > > > > driver_deferred_probe_enable = true; > > + deferred_probe_work_func(NULL); > > - driver_deferred_probe_trigger(); > > return 0; > > } > > late_initcall(deferred_probe_initcall); > > > > Or something similar. That would guarantee that as many passes as are needed > > (which in practical terms only means a couple) for device probing to > > settle down before exiting the initcall processing. That should achieve > > the effect you desire. > > > > It still masks the __init section issue by making it a lot less likely, > Grant, Can you please explain me this problem?My understanding is below: > If all the detection of devices with there respective driver is done before > __init section is freed then we will not have the problem mentioned. > However if the driver requests the probing to be deferred then __init section > of the deferred driver will not be freed right? > > I am afraid but the patch description is bit cryptic for me specially > this line "kernel has to open console failure & release __init section before > reinvoking failure". Yes I can, but first I'll briefly describe the Linux driver model to make sure we're talking about the same thing... drivercore in Linux is oriented around two data structures: 1) Devices (struct device), and 2) Drivers (struct device_driver) Hardware is modeled with instances of 'struct device'. For each device that Linux knows about there is one 'struct device'[1]. The devices are organized into a hierarchical tree, and you can see it by looking in /sys/devices. Device drivers are represended by struct device_driver. Each driver, whether built-in or a module, is responsible to register itself by embedding a struct device_driver, or a structure that contains a struct device_driver. For example, struct platform_driver has an embedded struct device_driver. The whole purpose of drivercore is to match up drivers to devices. Each bus_type has its own mechanism for deciding which devices and device_drivers go together, but it still results in trying to bind a struct device_driver to each struct device. An important detail here is that drivercore is entirely asynchronous. There are no requirements on what order devices and device_drivers are registered. When a device gets registered, drivercore attempts to bind it to any device_driver that it already knows about. Similarly, when a new device_driver gets registered, drivercore will see if there are any unbound devices that it can bind it to. It is even possible to trigger a bind attempt sometime after both device and device_driver have been registered.[2] This is the reason that deferred_probe is an option. As long as the kernel keeps track of which device_drivers requested deferred probe, we can nudge drivercore to reattempt probing. It really doesn't matter what order or when drivers get bound... ...except when it does. Here's where we get into the issues related to __init sections and deferred probe. Since the driver model can bind a driver at any time, including after userspace has started, the expectation is that none of the code paths associated with probing will be discarded. That is why .probe hooks cannot be __init annotated. The problem for consoles is that the console init hook gets called from the probe path and a lot of console init hooks are __init annotated. Before deferred probe, this was rarely (if ever) an actual problem. In general the order of operations during kernel init is: 1) (early boot) create and register a bunch of struct devices 2) (initcalls) register a bunch of struct device_drivers - a bunch of binds happen at this point as device drivers get registered - This is why device initialization order is primarily driven by driver link order. 3) discard __init sections 4) userspace. It's not that bind is guaranteed to occur before __init discard, but rather nothing prevented it from happening after. Deferred probe changes that. With deferred probe, Haojian correctly analyzed that driver bind is getting pushed after __init is discarded. However, the problem with his solution was that it assumed that *all* deferred drivers must be resolved before proceeding with init which cannot be guaranteed. It is absolutely possble for a built-in driver to depend on something provided by a module. As rmk pointed out, that is not okay for console devices, so it is still important to make sure everything needed for the console is built-in, but non-critical devices may not care. The difference between my fix and Haojian's is that my fix forces a single pass of the deferred list in the same context as the initcalls before the deferred probe workqueue takes over. That will ensure that all inter-driver dependencies get themselves sorted out before completing the initcalls, and therefore before __init gets discarded. Phew, that was a lot more long-winded that I indended. I hope that helps though. g. [1] It's not quite that simple. Some devices have more than one struct device, but for the purpose of this discussion, one per device is sufficient. [2] It is also possible to unbind drivers from devices without unregistering the device driver, but I don't want to get too complicated in this description.
On Thursday 14 February 2013, Haojian Zhuang wrote: > On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 14 February 2013, Haojian Zhuang wrote: > >> If you can change it into code in below, it could work. Otherwise, it > >> always fails. > >> driver_deferred_probe_enable = true; > >> driver_deferred_probe_trigger(); > >> + deferred_probe_work_func(NULL); > >> return 0; > >> > >> Because deferred_probe_work_func() depends on that deferred_probe is added > >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called > >> first, the deferred uart probe can't be added into active list. So even you call > >> work_func at here, it doesn't help. > >> > > > > Would that not cause two instances of the work function to run at the same time? > > That sounds like a source for a lot of problems. > > > > Arnd > > Two instances of the work function? I'm sorry that I don't > understanding your meaning. > Could you help explain your question? I mean you end up calling the work function directly, while it gets run as part of the work queue on a different CPU at the same time. I just noticed that there is actually locking in place in deferred_probe_work_func that prevents any actual bugs, but you are still adding extra overhead here. Maybe just add flush_workqueue(deferred_wq); here? Arnd
On 15 February 2013 00:50, Arnd Bergmann <arnd@arndb.de> wrote: > On Thursday 14 February 2013, Haojian Zhuang wrote: >> On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Thursday 14 February 2013, Haojian Zhuang wrote: >> >> If you can change it into code in below, it could work. Otherwise, it >> >> always fails. >> >> driver_deferred_probe_enable = true; >> >> driver_deferred_probe_trigger(); >> >> + deferred_probe_work_func(NULL); >> >> return 0; >> >> >> >> Because deferred_probe_work_func() depends on that deferred_probe is added >> >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called >> >> first, the deferred uart probe can't be added into active list. So even you call >> >> work_func at here, it doesn't help. >> >> >> > >> > Would that not cause two instances of the work function to run at the same time? >> > That sounds like a source for a lot of problems. >> > >> > Arnd >> >> Two instances of the work function? I'm sorry that I don't >> understanding your meaning. >> Could you help explain your question? > > I mean you end up calling the work function directly, while it gets run as part > of the work queue on a different CPU at the same time. I just noticed that > there is actually locking in place in deferred_probe_work_func that prevents > any actual bugs, but you are still adding extra overhead here. > > Maybe just add > > flush_workqueue(deferred_wq); > > here? > > Arnd It's fine to me. Since both of them are flushing workqueue. Tested-by: <haojian.zhuang@linaro.org>
On Thu, 14 Feb 2013 23:52:14 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > On 14 February 2013 05:36, Grant Likely <grant.likely@secretlab.ca> wrote: > > On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > >> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote: > >> > On Sun, 10 Feb 2013 00:57:57 +0800 > >> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > >> > > >> >> do_initcalls() could call all driver initialization code in kernel_init > >> >> thread. It means that probe() function will be also called from that > >> >> time. After this, kernel could access console & release __init section > >> >> in the same thread. > >> >> > >> >> But if device probe fails and moves into deferred probe list, a new > >> >> thread is created to reinvoke probe. If the device is serial console, > >> >> kernel has to open console failure & release __init section before > >> >> reinvoking failure. Because there's no synchronization mechanism. > >> >> Now add wait event to synchronize after do_initcalls(). > >> > > >> > It sounds like this: > >> > > >> > static int __ref kernel_init(void *unused) > >> > { > >> > kernel_init_freeable(); > >> > /* need to finish all async __init code before freeing the memory */ > >> > async_synchronize_full(); > >> > > >> > is designed to prevent the problem you describe? > >> > > >> It can't prevent the problem that I described. Because deferred_probe() > >> is introduced recently. > >> > >> All synchronization should be finished just after do_initcalls(). Since > >> load_default_modules() is also called in the end of kernel_init_freeable(), > >> I'm not sure that whether I could remove async_synchronize_full() > >> here. So I didn't touch it. > >> > >> >> --- a/init/main.c > >> >> +++ b/init/main.c > >> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) > >> >> do_ctors(); > >> >> usermodehelper_enable(); > >> >> do_initcalls(); > >> >> + wait_for_device_probe(); > >> >> } > >> > > >> > Needs a nice comment here explaining what's going on. > >> > >> No problem. I'll add comment here. > > > > Actually, this approach will create new problems. There is no guarantee > > that a given device will be able to initialize before exiting > > do_basic_setup(). If, for instance, a device depends on a resource > > provided by a module, then it will just keep deferring. In that case > > you've got a hung kernel. > > > > I think what you really want is the following: > > > > static int deferred_probe_initcall(void) > > { > > deferred_wq = create_singlethread_workqueue("deferwq"); > > if (WARN_ON(!deferred_wq)) > > return -ENOMEM; > > > > driver_deferred_probe_enable = true; > > + deferred_probe_work_func(NULL); > > - driver_deferred_probe_trigger(); > > If you can change it into code in below, it could work. Otherwise, it > always fails. > driver_deferred_probe_enable = true; > driver_deferred_probe_trigger(); > + deferred_probe_work_func(NULL); > return 0; > > Because deferred_probe_work_func() depends on that deferred_probe is added > into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called > first, the deferred uart probe can't be added into active list. So even you call > work_func at here, it doesn't help. > > Would you send it again? If so, you can add tested-by with my signature. Hmmm, that works, but it isn't very elegant. With that solution both the workqueue and the initcall are processing the deferred devices at the same time. It quite possibly could result in missed devices when walking the deferred list. Consider the scenario: B depends on A, C and D depend on B 1) Inital conditions pending:A-B-C-D active:empty WQ:idle IC:idle 2) Trigger pending:empty active:A-B-C-D 3) WQ: pop device pending:empty active:B-C-D WQ:A IC:idle 4) IC: pop device pending:empty active:C-D WQ:A IC:B 5) WQ: A probed & pop pending:empty active:D WQ:C IC:B 6) IC: B defers & pop pending:B active:empty WQ:C IC:D /* B defers because A wasn't ready, but when A did complete B wasn't * onthe pending list - this is bad */ 7) WQ: C defers pending:B-C active:empty WQ:idle IC:D 8) IC: D defers pending:B-C-D active:empty WQ:idle IC:idle When A successfully probes, everthing in the pending list gets appended to the active one, but if there are multiple threads poping devices off the list, then there can be devices that /should/ be moved to the active list but don't because they've been popped off by a separate thread. With the current code there really should only be one processor of the deferred list at a time. Try this instead: driver_deferred_probe_enable = true; driver_deferred_probe_trigger(); + flush_workqueue(deferred_wq); return 0; I had tried to keep the first pass of the list within the initcall context, but it created a lot of special cases in the code that I wasn't happy with. This is a lot simpler and has exactly the same visible behaviour. Let me know if that works for you and I'll send a proper patch to Linus with your Tested-by. He may yell at me for sending something so incredibly late in the v3.8 stablization, but this really is a bug fix. If he won't take it then I'll ask Greg to put it into v3.8.1. g.
On Thu, 2013-02-14 at 16:33 +0000, Grant Likely wrote: > On Thu, 14 Feb 2013 08:57:17 +0530, anish singh <anish198519851985@gmail.com> wrote: > > On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote: > > > static int deferred_probe_initcall(void) > > > { > > > deferred_wq = create_singlethread_workqueue("deferwq"); > > > if (WARN_ON(!deferred_wq)) > > > return -ENOMEM; > > > > > > driver_deferred_probe_enable = true; > > > + deferred_probe_work_func(NULL); > > > - driver_deferred_probe_trigger(); > > > return 0; > > > } > > > late_initcall(deferred_probe_initcall); > > > > > > Or something similar. That would guarantee that as many passes as are needed > > > (which in practical terms only means a couple) for device probing to > > > settle down before exiting the initcall processing. That should achieve > > > the effect you desire. > > > > > > It still masks the __init section issue by making it a lot less likely, > > Grant, Can you please explain me this problem?My understanding is below: > > If all the detection of devices with there respective driver is done before > > __init section is freed then we will not have the problem mentioned. > > However if the driver requests the probing to be deferred then __init section > > of the deferred driver will not be freed right? > > > > I am afraid but the patch description is bit cryptic for me specially > > this line "kernel has to open console failure & release __init section before > > reinvoking failure". > > Yes I can, but first I'll briefly describe the Linux driver model to make sure > we're talking about the same thing... > > drivercore in Linux is oriented around two data structures: > 1) Devices (struct device), and > 2) Drivers (struct device_driver) > > Hardware is modeled with instances of 'struct device'. For each device > that Linux knows about there is one 'struct device'[1]. The devices are > organized into a hierarchical tree, and you can see it by looking in > /sys/devices. > > Device drivers are represended by struct device_driver. Each driver, > whether built-in or a module, is responsible to register itself by > embedding a struct device_driver, or a structure that contains a struct > device_driver. For example, struct platform_driver has an embedded > struct device_driver. > > The whole purpose of drivercore is to match up drivers to devices. Each > bus_type has its own mechanism for deciding which devices and > device_drivers go together, but it still results in trying to bind a > struct device_driver to each struct device. > > An important detail here is that drivercore is entirely asynchronous. > There are no requirements on what order devices and device_drivers are > registered. When a device gets registered, drivercore attempts to bind > it to any device_driver that it already knows about. Similarly, when a > new device_driver gets registered, drivercore will see if there are any > unbound devices that it can bind it to. It is even possible to trigger a > bind attempt sometime after both device and device_driver have been > registered.[2] > > This is the reason that deferred_probe is an option. As long as the > kernel keeps track of which device_drivers requested deferred probe, we > can nudge drivercore to reattempt probing. It really doesn't matter what > order or when drivers get bound... > > ...except when it does. Here's where we get into the issues related to > __init sections and deferred probe. Since the driver model can bind a > driver at any time, including after userspace has started, the > expectation is that none of the code paths associated with probing will > be discarded. That is why .probe hooks cannot be __init annotated. The > problem for consoles is that the console init hook gets called from > the probe path and a lot of console init hooks are __init annotated. > > Before deferred probe, this was rarely (if ever) an actual problem. In > general the order of operations during kernel init is: > > 1) (early boot) create and register a bunch of struct devices > 2) (initcalls) register a bunch of struct device_drivers > - a bunch of binds happen at this point as device drivers get > registered > - This is why device initialization order is primarily driven by driver > link order. > 3) discard __init sections > 4) userspace. > > It's not that bind is guaranteed to occur before __init discard, but > rather nothing prevented it from happening after. Deferred probe changes > that. With deferred probe, Haojian correctly analyzed that driver bind > is getting pushed after __init is discarded. > > However, the problem with his solution was that it assumed that *all* > deferred drivers must be resolved before proceeding with init which > cannot be guaranteed. It is absolutely possble for a built-in driver to > depend on something provided by a module. As rmk pointed out, that is > not okay for console devices, so it is still important to make sure > everything needed for the console is built-in, but non-critical devices > may not care. > > The difference between my fix and Haojian's is that my fix forces a > single pass of the deferred list in the same context as the initcalls > before the deferred probe workqueue takes over. That will ensure that > all inter-driver dependencies get themselves sorted out before > completing the initcalls, and therefore before __init gets discarded. > > Phew, that was a lot more long-winded that I indended. I hope that helps > though. I hope part of this explanation goes into the patch description or at least a comment in the code added by you. I can't imagine a better explanation than this.Thanks to you, Arnd Bergmann and Russell King. > > g. > > [1] It's not quite that simple. Some devices have more than one struct > device, but for the purpose of this discussion, one per device is > sufficient. > [2] It is also possible to unbind drivers from devices without > unregistering the device driver, but I don't want to get too complicated > in this description.
On Thu, 14 Feb 2013 15:57:18 +0000, Arnd Bergmann <arnd@arndb.de> wrote: > On Thursday 14 February 2013, Haojian Zhuang wrote: > > If you can change it into code in below, it could work. Otherwise, it > > always fails. > > driver_deferred_probe_enable = true; > > driver_deferred_probe_trigger(); > > + deferred_probe_work_func(NULL); > > return 0; > > > > Because deferred_probe_work_func() depends on that deferred_probe is added > > into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called > > first, the deferred uart probe can't be added into active list. So even you call > > work_func at here, it doesn't help. > > > > Would that not cause two instances of the work function to run at the same time? > That sounds like a source for a lot of problems. Yes. Even ignoring the problems with device drivers not handling multithreaded probing well, the current deferred probe wouldn't handle it well. I could make it support multithreading, but I don't see a whole lot of value there. g.
On Fri, 15 Feb 2013 00:58:23 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote: > On 15 February 2013 00:50, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 14 February 2013, Haojian Zhuang wrote: > >> On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote: > >> > On Thursday 14 February 2013, Haojian Zhuang wrote: > >> >> If you can change it into code in below, it could work. Otherwise, it > >> >> always fails. > >> >> driver_deferred_probe_enable = true; > >> >> driver_deferred_probe_trigger(); > >> >> + deferred_probe_work_func(NULL); > >> >> return 0; > >> >> > >> >> Because deferred_probe_work_func() depends on that deferred_probe is added > >> >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called > >> >> first, the deferred uart probe can't be added into active list. So even you call > >> >> work_func at here, it doesn't help. > >> >> > >> > > >> > Would that not cause two instances of the work function to run at the same time? > >> > That sounds like a source for a lot of problems. > >> > > >> > Arnd > >> > >> Two instances of the work function? I'm sorry that I don't > >> understanding your meaning. > >> Could you help explain your question? > > > > I mean you end up calling the work function directly, while it gets run as part > > of the work queue on a different CPU at the same time. I just noticed that > > there is actually locking in place in deferred_probe_work_func that prevents > > any actual bugs, but you are still adding extra overhead here. > > > > Maybe just add > > > > flush_workqueue(deferred_wq); > > > > here? > > > > Arnd > > It's fine to me. Since both of them are flushing workqueue. > > Tested-by: <haojian.zhuang@linaro.org> Hahaha. I just came to the same conclusion. I'll craft a proper patch and send it off. g.
diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 65631015..23db672 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -53,6 +53,9 @@ static LIST_HEAD(deferred_probe_pending_list); static LIST_HEAD(deferred_probe_active_list); static struct workqueue_struct *deferred_wq; +static atomic_t probe_count = ATOMIC_INIT(0); +static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue); + /** * deferred_probe_work_func() - Retry probing devices in the active list. */ @@ -77,6 +80,7 @@ static void deferred_probe_work_func(struct work_struct *work) private = list_first_entry(&deferred_probe_active_list, typeof(*dev->p), deferred_probe); dev = private->device; + atomic_dec(&probe_count); list_del_init(&private->deferred_probe); get_device(dev); @@ -257,9 +261,6 @@ int device_bind_driver(struct device *dev) } EXPORT_SYMBOL_GPL(device_bind_driver); -static atomic_t probe_count = ATOMIC_INIT(0); -static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue); - static int really_probe(struct device *dev, struct device_driver *drv) { int ret = 0; @@ -308,6 +309,7 @@ probe_failed: /* Driver requested deferred probing */ dev_info(dev, "Driver %s requests probe deferral\n", drv->name); driver_deferred_probe_add(dev); + atomic_inc(&probe_count); } else if (ret != -ENODEV && ret != -ENXIO) { /* driver matched but the probe failed */ printk(KERN_WARNING diff --git a/init/main.c b/init/main.c index 63534a1..033bf5f 100644 --- a/init/main.c +++ b/init/main.c @@ -786,6 +786,7 @@ static void __init do_basic_setup(void) do_ctors(); usermodehelper_enable(); do_initcalls(); + wait_for_device_probe(); } static void __init do_pre_smp_initcalls(void)
do_initcalls() could call all driver initialization code in kernel_init thread. It means that probe() function will be also called from that time. After this, kernel could access console & release __init section in the same thread. But if device probe fails and moves into deferred probe list, a new thread is created to reinvoke probe. If the device is serial console, kernel has to open console failure & release __init section before reinvoking failure. Because there's no synchronization mechanism. Now add wait event to synchronize after do_initcalls(). Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> --- drivers/base/dd.c | 8 +++++--- init/main.c | 1 + 2 files changed, 6 insertions(+), 3 deletions(-)