Message ID | 20240205093418.39755-1-brgl@bgdev.pl |
---|---|
Headers | show |
Series | gpio: rework locking and object life-time control | expand |
On Mon, Feb 5, 2024 at 3:06 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Mon, Feb 05, 2024 at 06:04:23AM -0800, Bartosz Golaszewski wrote: > > On Mon, 5 Feb 2024 14:57:45 +0100, Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> said: > > > On Mon, Feb 05, 2024 at 02:54:08PM +0100, Bartosz Golaszewski wrote: > > >> On Mon, Feb 5, 2024 at 2:48 PM Andy Shevchenko > > >> <andriy.shevchenko@linux.intel.com> wrote: > > >> > On Mon, Feb 05, 2024 at 10:34:01AM +0100, Bartosz Golaszewski wrote: > > ... > > > >> > > + for (j = 0; j < i; j++) > > >> > > + cleanup_srcu_struct(&desc->srcu); > > >> > > > >> > What does this loop mean? > > >> > > >> I open-coded it because I want to store the value of i to go back and > > >> destroy the SRCU structs on failure. > > > > > > Where/how is j being used? > > > > > > > In this bit: > > I am sorry, but I don't see how... > > > for (i = 0; i < gc->ngpio; i++) { > > struct gpio_desc *desc = &gdev->descs[i]; > > > > ret = init_srcu_struct(&desc->srcu); > > if (ret) { > > for (j = 0; j < i; j++) > > cleanup_srcu_struct(&desc->srcu); > > So, you call the same several times, why? Ah, now I feel stupid. You're right of course, I'll fix it. Bart > > > goto err_remove_of_chip; > > } > > > >> > > + goto err_remove_of_chip; > > >> > > + } > > -- > With Best Regards, > Andy Shevchenko > >
On Mon, Feb 5, 2024 at 2:47 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Mon, Feb 05, 2024 at 10:34:12AM +0100, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Checking desc->gdev->chip for NULL without holding it in place with some > > serializing mechanism is pointless. Remove this check. Also don't check > > desc->gdev for NULL as it can never happen. We'll be protecting > > gdev->chip with SRCU soon but we will provide a dedicated, automatic > > class for that. > > ... > > > void gpiod_free(struct gpio_desc *desc) > > { > > - /* > > - * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > > - * may already be NULL but we still want to put the references. > > - */ > > - if (!desc) > > - return; > > + VALIDATE_DESC_VOID(desc); > > IIRC we (used to) have two cases like this (you added one in some code like > last year). > None of the consumer-facing functions does it anymore. Not sure about this, maybe it was removed earlier. Bart
On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > [snip] > > > + /* FIXME Cannot use gpio_chip_guard due to const desc. */ > > gpio_chip_guard() > Nope, it's the name of the class type, not a function. Bart
On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > [snip] > > > int gpiod_get_direction(struct gpio_desc *desc) > > { > > - struct gpio_chip *gc; > > unsigned long flags; > > unsigned int offset; > > int ret; > > > > - gc = gpiod_to_chip(desc); > > + if (!desc) > > + /* Sane default is INPUT. */ > > + return 1; > > Hmm... I can't imagine how this value may anyhow be used / useful. > What else would you return for an optional (NULL) GPIO? Bart [snip]
On Mon, Feb 05, 2024 at 08:36:39PM +0100, Bartosz Golaszewski wrote: > On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: ... > > > > > int gpiod_get_direction(struct gpio_desc *desc) > > > { > > > - struct gpio_chip *gc; > > > unsigned long flags; > > > unsigned int offset; > > > int ret; > > > > > > - gc = gpiod_to_chip(desc); > > > + if (!desc) > > > + /* Sane default is INPUT. */ > > > + return 1; > > > > Hmm... I can't imagine how this value may anyhow be used / useful. > > What else would you return for an optional (NULL) GPIO? An error. If somebody asks for direction of the non-existing GPIO, there is no (valid) answer for that.
On Mon, Feb 05, 2024 at 08:22:23PM +0100, Bartosz Golaszewski wrote: > On Mon, Feb 5, 2024 at 2:47 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Mon, Feb 05, 2024 at 10:34:12AM +0100, Bartosz Golaszewski wrote: ... > > > void gpiod_free(struct gpio_desc *desc) > > > { > > > - /* > > > - * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > > > - * may already be NULL but we still want to put the references. > > > - */ > > > - if (!desc) > > > - return; > > > + VALIDATE_DESC_VOID(desc); > > > > IIRC we (used to) have two cases like this (you added one in some code like > > last year). > > > > None of the consumer-facing functions does it anymore. Not sure about > this, maybe it was removed earlier. Okay, the only place that might be considered is gpiod_to_gpio_device(). But that API seems new, I don't know if VALIDATE_DESC_VOID() is okay to use there, maybe it should be commented if not. Also there is a typo in the kernel doc — 'the users already holds' --> 'the user already holds'.
On Tue, Feb 6, 2024 at 1:24 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Mon, Feb 05, 2024 at 08:36:39PM +0100, Bartosz Golaszewski wrote: > > On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > ... > > > > > > > > int gpiod_get_direction(struct gpio_desc *desc) > > > > { > > > > - struct gpio_chip *gc; > > > > unsigned long flags; > > > > unsigned int offset; > > > > int ret; > > > > > > > > - gc = gpiod_to_chip(desc); > > > > + if (!desc) > > > > + /* Sane default is INPUT. */ > > > > + return 1; > > > > > > Hmm... I can't imagine how this value may anyhow be used / useful. > > > > What else would you return for an optional (NULL) GPIO? > > An error. If somebody asks for direction of the non-existing GPIO, there is no > (valid) answer for that. > All other functions return 0 for desc == NULL to accommodate gpiod_get_optional(). I think we should stay consistent here. Bart
On Tue, Feb 06, 2024 at 01:57:39PM +0100, Bartosz Golaszewski wrote: > On Tue, Feb 6, 2024 at 1:24 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Mon, Feb 05, 2024 at 08:36:39PM +0100, Bartosz Golaszewski wrote: > > > On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > ... > > > > > > > > > > > int gpiod_get_direction(struct gpio_desc *desc) > > > > > { > > > > > - struct gpio_chip *gc; > > > > > unsigned long flags; > > > > > unsigned int offset; > > > > > int ret; > > > > > > > > > > - gc = gpiod_to_chip(desc); > > > > > + if (!desc) > > > > > + /* Sane default is INPUT. */ > > > > > + return 1; > > > > > > > > Hmm... I can't imagine how this value may anyhow be used / useful. > > > > > > What else would you return for an optional (NULL) GPIO? > > > > An error. If somebody asks for direction of the non-existing GPIO, there is no > > (valid) answer for that. > All other functions return 0 for desc == NULL to accommodate > gpiod_get_optional(). I think we should stay consistent here. The way you proposed is inconsistent, i.e. you may not return any direction for the unknown / non-existing GPIO. You speculate it will be 1, I may consider that in my (hypothetical for now) case it should be 0. Just don't make all bananas to be oranges. It won't work.
On Tue, Feb 6, 2024 at 2:13 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Feb 06, 2024 at 01:57:39PM +0100, Bartosz Golaszewski wrote: > > On Tue, Feb 6, 2024 at 1:24 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > On Mon, Feb 05, 2024 at 08:36:39PM +0100, Bartosz Golaszewski wrote: > > > > On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > > ... > > > > > > > > > > > > > > int gpiod_get_direction(struct gpio_desc *desc) > > > > > > { > > > > > > - struct gpio_chip *gc; > > > > > > unsigned long flags; > > > > > > unsigned int offset; > > > > > > int ret; > > > > > > > > > > > > - gc = gpiod_to_chip(desc); > > > > > > + if (!desc) > > > > > > + /* Sane default is INPUT. */ > > > > > > + return 1; > > > > > > > > > > Hmm... I can't imagine how this value may anyhow be used / useful. > > > > > > > > What else would you return for an optional (NULL) GPIO? > > > > > > An error. If somebody asks for direction of the non-existing GPIO, there is no > > > (valid) answer for that. > > > All other functions return 0 for desc == NULL to accommodate > > gpiod_get_optional(). I think we should stay consistent here. > > The way you proposed is inconsistent, i.e. you may not return any direction > for the unknown / non-existing GPIO. You speculate it will be 1, I may consider > that in my (hypothetical for now) case it should be 0. > > Just don't make all bananas to be oranges. It won't work. > I don't have a strong conviction here. May make it an error as well. It's still inconsistent though - calling gpiod_direction_output(NULL); will return 0 and then you get an error when you do gpiod_get_direction(NULL). I don't have a good solution though. Bart
On Tue, Feb 06, 2024 at 02:23:35PM +0100, Bartosz Golaszewski wrote: > On Tue, Feb 6, 2024 at 2:13 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Feb 06, 2024 at 01:57:39PM +0100, Bartosz Golaszewski wrote: > > > On Tue, Feb 6, 2024 at 1:24 PM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Mon, Feb 05, 2024 at 08:36:39PM +0100, Bartosz Golaszewski wrote: > > > > > On Mon, Feb 5, 2024 at 1:31 PM Andy Shevchenko > > > > > <andriy.shevchenko@linux.intel.com> wrote: ... > > > > > > > int gpiod_get_direction(struct gpio_desc *desc) > > > > > > > { > > > > > > > - struct gpio_chip *gc; > > > > > > > unsigned long flags; > > > > > > > unsigned int offset; > > > > > > > int ret; > > > > > > > > > > > > > > - gc = gpiod_to_chip(desc); > > > > > > > + if (!desc) > > > > > > > + /* Sane default is INPUT. */ > > > > > > > + return 1; > > > > > > > > > > > > Hmm... I can't imagine how this value may anyhow be used / useful. > > > > > > > > > > What else would you return for an optional (NULL) GPIO? > > > > > > > > An error. If somebody asks for direction of the non-existing GPIO, there is no > > > > (valid) answer for that. > > > > > All other functions return 0 for desc == NULL to accommodate > > > gpiod_get_optional(). I think we should stay consistent here. > > > > The way you proposed is inconsistent, i.e. you may not return any direction > > for the unknown / non-existing GPIO. You speculate it will be 1, I may consider > > that in my (hypothetical for now) case it should be 0. > > > > Just don't make all bananas to be oranges. It won't work. > > I don't have a strong conviction here. May make it an error as well. > It's still inconsistent though - calling gpiod_direction_output(NULL); > will return 0 and then you get an error when you do > gpiod_get_direction(NULL). I don't have a good solution though. Yes, and this is the best what we can have. Because the real code may rely on the returned value and they should be really aware on the returned values in some cases.
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> This is a big rework of locking in GPIOLIB. The current serialization is pretty much useless. There is one big spinlock (gpio_lock) that "protects" both the GPIO device list, GPIO descriptor access and who knows what else. I'm putting "protects" in quotes as in several places the lock is taken, released whenever a sleeping function is called and re-taken without regards for the "protected" state that may have changed. First a little background on what we're dealing with in GPIOLIB. We have consumer API functions that can be called from any context explicitly (get/set value, set direction) as well as many others which will get called in atomic context implicitly (e.g. set config called in certain situations from gpiod_direction_output()). On the other side: we have GPIO provider drivers whose callbacks may or may not sleep depending on the underlying protocol. This makes any attempts at serialization quite complex. We typically cannot use sleeping locks - we may be called from atomic - but we also often cannot use spinlocks - provider callbacks may sleep. Moreover: we have close ties with the interrupt and pinctrl subsystems, often either calling into them or getting called from them. They use their own locking schemes which are at odds with ours (pinctrl uses mutexes, the interrupt subsystem can call GPIO helpers with spinlock taken). There is also another significant issue: the GPIO device object contains a pointer to gpio_chip which is the implementation of the GPIO provider. This object can be removed at any point - as GPIOLIB officially supports hotplugging with all the dynamic expanders that we provide drivers for - and leave the GPIO API callbacks with a suddenly NULL pointer. This is a problem that allowed user-space processes to easily crash the kernel until we patched it with a read-write semaphore in the user-space facing code (but the problem still exists for in-kernel users). This was recognized before as evidenced by the implementation of validate_desc() but without proper serialization, simple checking for a NULL pointer is pointless and we do need a generic solution for that issue as well. If we want to get it right - the more lockless we go, the better. This is why SRCU seems to be the right candidate for the mechanism to use. In fact it's the only mechanism we can use our read-only critical sections to be called from atomic and protecc contexts as well as call driver callbacks that may sleep (for the latter case). We're going to use it in three places: to protect the global list of GPIO devices, to ensure consistency when dereferencing the chip pointer in GPIO device struct and finally to ensure that users can access GPIO descriptors and always see a consistent state. We do NOT serialize all API callbacks. This means that provider callbacks may be called simultaneously and GPIO drivers need to provide their own locking if needed. This is on purpose. First: we only support exclusive GPIO usage* so there's no risk of two drivers getting in each other's way over the same GPIO. Second: with this series, we ensure enough consistency to limit the chance of drivers or user-space users crashing the kernel. With additional improvements in handling the flags field in GPIO descriptors there's very little to gain, while bitbanging drivers may care about the increased performance of going lockless. This series brings in one somewhat significant functional change for in-kernel users, namely: GPIO API calls, for which the underlying GPIO chip is gone, will no longer return 0 and emit a log message but instead will return -ENODEV. I know this is a lot of code to go through but the more eyes we get on it the better. Thanks, Bartosz * - This is not technically true. We do provide the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag. However this is just another piece of technical debt. This is a hack provided for a single use-case in the regulator framework which got out of control and is now used in many places that should have never touched it. It's utterly broken and doesn't even provide any contract as to what a "shared GPIO" is. I would argue that it's the next thing we should address by providing "reference counted GPIO enable", not just a flag allowing to request the same GPIO twice and then allow two drivers to fight over who toggles it as is the case now. For now, let's just treat users of GPIOD_FLAGS_BIT_NONEXCLUSIVE like they're consciously and deliberately choosing to risk undefined behavior. v1 -> v2: - fix jumping over variable initialization in sysfs code - fix RCU-related sparse warnings - fix a smatch complaint about uninitialized variables (even though it's a false positive coming from the fact that scoped_guard() is implemented as a for loop - fix a potential NULL-pointer dereference in debugfs callbacks - improve commit messages Bartosz Golaszewski (23): gpio: protect the list of GPIO devices with SRCU gpio: of: assign and read the hog pointer atomically gpio: remove unused logging helpers gpio: provide and use gpiod_get_label() gpio: don't set label from irq helpers gpio: add SRCU infrastructure to struct gpio_desc gpio: protect the descriptor label with SRCU gpio: sysfs: use gpio_device_find() to iterate over existing devices gpio: remove gpio_lock gpio: reinforce desc->flags handling gpio: remove unneeded code from gpio_device_get_desc() gpio: sysfs: extend the critical section for unregistering sysfs devices gpio: sysfs: pass the GPIO device - not chip - to sysfs callbacks gpio: cdev: replace gpiochip_get_desc() with gpio_device_get_desc() gpio: cdev: don't access gdev->chip if it's not needed gpio: don't dereference gdev->chip in gpiochip_setup_dev() gpio: reduce the functionality of validate_desc() gpio: remove unnecessary checks from gpiod_to_chip() gpio: add the can_sleep flag to struct gpio_device gpio: add SRCU infrastructure to struct gpio_device gpio: protect the pointer to gpio_chip in gpio_device with SRCU gpio: remove the RW semaphore from the GPIO device gpio: mark unsafe gpio_chip manipulators as deprecated drivers/gpio/gpiolib-cdev.c | 92 +++-- drivers/gpio/gpiolib-of.c | 4 +- drivers/gpio/gpiolib-sysfs.c | 167 +++++--- drivers/gpio/gpiolib.c | 760 +++++++++++++++++++---------------- drivers/gpio/gpiolib.h | 86 ++-- 5 files changed, 630 insertions(+), 479 deletions(-)