Message ID | 20240130124828.14678-1-brgl@bgdev.pl |
---|---|
Headers | show |
Series | gpio: rework locking and object life-time control | expand |
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > The device nodes representing GPIO hogs cannot be deleted without > unregistering the GPIO chip so there's no need to serialize their access. > However we must ensure that users can get the right address so write and > read it atomically. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> So this would give barriers around desc->hog, so we are talking SMP where the same hogs are added and removed on different CPUs here I suppose? Seems a bit theoretical but OK. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Wed, Jan 31, 2024 at 6:39 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > The general rule of the kernel is to not provide symbols that have no > > users upstream. Let's remove logging helpers that are not used anywhere. > > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > Seems unrelated to the rest of the patches but OK. > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > Yours, > Linus Walleij It's kind of related because I'm later modifying the logging helpers so I'm saving myself some work. Bart
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > In order to ensure that the label is not freed while it's being > accessed, let's protect it with SRCU and synchronize it everytime it's > changed. > > Let's modify desc_set_label() to manage the memory used for the label as > it can only be freed once synchronize_srcu() returns. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> And this conclude the previous patches by protecting the inevitable label with SRCU, very clever. (I wouldn't have been able to come up with it...) Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > -#define gpiod_err(desc, fmt, ...) \ > - pr_err("gpio-%d (%s): " fmt, desc_to_gpio(desc), desc->label ? : "?", \ > - ##__VA_ARGS__) (...) Now it is clear why you were minimizing these helpers. (Maybe add to the commit message of that patch? "We do this because all functions need to be changed later".) Yours, Linus Walleij
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > The "multi-function" gpio_lock is pretty much useless with how it's used > in GPIOLIB currently. Because many GPIO API calls can be called from all > contexts but may also call into sleeping driver callbacks, there are > many places with utterly broken workarounds like yielding the lock to > call a possibly sleeping function and then re-acquiring it again without > taking into account that the protected state may have changed. > > It was also used to protect several unrelated things: like individual > descriptors AND the GPIO device list. We now serialize access to these > two with SRCU and so can finally remove the spinlock. > > There is of course the question of consistency of lockless access to > GPIO descriptors. Because we only support exclusive access to GPIOs > (officially anyway, I'm looking at you broken > GPIOD_FLAGS_BIT_NONEXCLUSIVE bit...) and the API contract with providers > does not guarantee serialization, it's enough to ensure we cannot > accidentally dereference an invalid pointer and that the state we present > to both users and providers remains consistent. To achieve that: read the > flags field atomically except for a few special cases. Read their current > value before executing callback code and use this value for any subsequent > logic. Modifying the flags depends on the particular use-case and can > differ. For instance: when requesting a GPIO, we need to set the > REQUESTED bit immediately so that the next user trying to request the > same line sees -EBUSY. > > While at it: the allocations that used GFP_ATOMIC until this point can > now switch to GFP_KERNEL. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Neat! Reviewed-by: Linus Walleij <linus.walleij@linaro.org> (I'm sorry about NONEXCLUSIVE, let's see what we can do about it...) > @@ -578,6 +577,9 @@ int gpiod_export(struct gpio_desc *desc, bool direction_may_change) > return -EINVAL; > } > > + if (!test_and_set_bit(FLAG_EXPORT, &desc->flags)) > + return -EPERM; This exit early split off from the big if() below (which is good) is a new thing right? Maybe mention this in the commit message? > -/* gpio_lock prevents conflicts during gpio_desc[] table updates. > - * While any GPIO is requested, its gpio_chip is not removable; > - * each GPIO's "requested" flag serves as a lock and refcount. > - */ > -DEFINE_SPINLOCK(gpio_lock); GOOD RIDDANCE. > - /* FIXME: make this GFP_KERNEL once the spinlock is out. */ > - new = kstrdup_const(label, GFP_ATOMIC); > + new = kstrdup_const(label, GFP_KERNEL); And all of this is neat as well. Someone might complain about splitting that in a separate patch, but not me because I don't care so much about such processy things. (The patchset is already split enough as it is.) Yours, Linus Walleij
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > The GPIO chip pointer is unused. Let's remove it. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Oups I think it was used at one point but we probably factored ourselves away from using it. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > The variable holding the number of GPIO lines is duplicated in GPIO > device so read it instead of unnecessarily dereferencing the chip > pointer. > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Wed, Jan 31, 2024 at 9:16 PM Linus Walleij <linus.walleij@linaro.org> wrote: > On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Checking desc->gdev->chip for NULL without holding it in place with some > > serializing mechanism is pointless. Remove this check. Also don't check > > desc->gdev for NULL as it can never happen. > > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > I don't know if I agree that it is pointless. It will work on any single-CPU > system and 99.9% of other cases. > > On the other hand: what it is supposed to protect against is userspace > doing calls to a gpio_device through the character device, while the > backing struct gpio_chip is gone (e.g. a GPIO expander on USB, > and someone pulled the cable), i.e. it became NULL, and this is why the > error message says "backing device is gone". > > But I want to see where the series is going, maybe you fix this > problem in the end, so I can come back and ACK this. Aha, it is fixed in patches 19+20. Maybe mention that we add a new protection later in the series in the commit message? Anyway, I get it now! Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > We do NOT serialize all API callbacks. This means that provider callbacks > may be called simultaneously and GPIO drivers need to provide their own > locking if needed. This is on purpose. First: we only support exclusive > GPIO usage[1] so there's no risk of two drivers getting in each other's > way over the same GPIO. Second: with this series, we ensure enough > consistency to limit the chance of drivers or user-space users crashing > the kernel. With additional improvements in handling the flags field in > GPIO descriptors there's very little to gain, while bitbanging drivers > may care about the increased performance of going lockless. OK I read this before but didn't understand it, now I understand it. The series: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> I think you should merge it all soon so we get some time to shake it out in linux-next, hopefully any remaining bugs and cleanups can be done in-tree. Excellent work, by the way. Yours, Linus Walleij
On Wed, Jan 31, 2024 at 7:08 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Wed, Jan 31, 2024 at 6:39 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > > > The general rule of the kernel is to not provide symbols that have no > > > users upstream. Let's remove logging helpers that are not used anywhere. > > > > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > Seems unrelated to the rest of the patches but OK. > > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > > > Yours, > > Linus Walleij > > It's kind of related because I'm later modifying the logging helpers > so I'm saving myself some work. Yeah I realized that later. Maybe add to the commit message? No big deal anyway. Yours, Linus Walleij
On Wed, Jan 31, 2024 at 9:32 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Tue, Jan 30, 2024 at 1:48 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > We do NOT serialize all API callbacks. This means that provider callbacks > > may be called simultaneously and GPIO drivers need to provide their own > > locking if needed. This is on purpose. First: we only support exclusive > > GPIO usage[1] so there's no risk of two drivers getting in each other's > > way over the same GPIO. Second: with this series, we ensure enough > > consistency to limit the chance of drivers or user-space users crashing > > the kernel. With additional improvements in handling the flags field in > > GPIO descriptors there's very little to gain, while bitbanging drivers > > may care about the increased performance of going lockless. > > OK I read this before but didn't understand it, now I understand it. > > The series: > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > I think you should merge it all soon so we get some time to shake > it out in linux-next, hopefully any remaining bugs and cleanups > can be done in-tree. > > Excellent work, by the way. > Thanks. There are still a few issues here and there, so I'll be sending a v2 next week. Bart > Yours, > Linus Walleij
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> This has been brewing for some time now but is finally ready to send out for review. This series relies on SRCU a lot so I'm Cc'ing Paul. This is a big rework of locking in GPIOLIB. The current serialization is pretty much useless. There is one big spinlock (gpio_lock) that "protects" both the GPIO device list, GPIO descriptor access and who knows what else. I'm putting "protects" in quotes as in several places the lock is taken, released whenever a sleeping function is called and re-taken without any regards for the "protected" state that may have changed. First a little background on what we're dealing with in GPIOLIB. We have consumer API functions that can be called from any context explicitly (get/set value, set direction) as well as many others which will get called in atomic context implicitly (e.g. set config called in certain situations from gpiod_direction_output()). On the other side: we have GPIO provider drivers whose callbacks may or may not sleep depending on the underlying protocol but they're called from the same code paths. This makes any attempts at serialization quite complex. We typically cannot use sleeping locks - we may be called from atomic - but we also often cannot use spinlocks - provider callbacks may sleep. Moreover: we have close ties with the interrupt and pinctrl subsystems, often either calling into them or getting called from them. They use their own locking schemes which are at odds with ours (pinctrl uses mutexes, the interrupt subsystem can call GPIO helpers with spinlock taken). There is also another significant issue: the GPIO device object contains a pointer to gpio_chip which is the implementation of the GPIO provider. This object can be removed at any point - as GPIOLIB officially supports hotplugging with all the dynamic expanders that we provide drivers for - and leave the GPIO API callbacks with a suddenly NULL pointer. This is a problem that allowed user-space processes to easily crash the kernel until we patched it with a read-write semaphore in the user-space facing code (but the problem still exists for in-kernel users). This was recognized before as evidenced by the implementation of validate_desc() but without proper serialization, simple checking for a NULL pointer is pointless and we do need a generic solution for that issue as well. If we want to get it right - the more lockless we go, the better. This is why SRCU seems to be the right candidate for the mechanism to use. In fact it's the only mechanism we can use for our read-only critical sections to be called from atomic and process contexts as well as be able to call driver callbacks that may sleep (for the latter case). We're going to use it in three places: to protect the global list of GPIO devices, to ensure consistency when dereferencing the chip pointer in GPIO device struct and finally to ensure that users can access GPIO descriptors and always see a consistent state. We do NOT serialize all API callbacks. This means that provider callbacks may be called simultaneously and GPIO drivers need to provide their own locking if needed. This is on purpose. First: we only support exclusive GPIO usage[1] so there's no risk of two drivers getting in each other's way over the same GPIO. Second: with this series, we ensure enough consistency to limit the chance of drivers or user-space users crashing the kernel. With additional improvements in handling the flags field in GPIO descriptors there's very little to gain, while bitbanging drivers may care about the increased performance of going lockless. This series brings in one somewhat significant functional change for in-kernel users, namely: GPIO API calls, for which the underlying GPIO chip is gone, will no longer return 0 and emit a log message but instead will return -ENODEV. I tested the series with libgpiod tests, ran it on some x86 and aarch64 boards and tested some corner cases with user-space command-line tools. Thanks, Bartosz [1] - This is not technically true. We do provide the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag. However this is just another piece of technical debt. This is a hack provided for a single use-case in the regulator framework which got out of control and is now used in many places that should have never touched it. It's utterly broken and doesn't even provide any contract as to what a "shared GPIO" is. I would argue that it's the next thing we should address by providing "reference counted GPIO enable", not just a flag allowing to request the same GPIO twice and then allow two drivers to fight over who toggles it as is the case now. For now, let's just treat users of GPIOD_FLAGS_BIT_NONEXCLUSIVE like they're consciously and deliberately choosing to risk undefined behavior. Bartosz Golaszewski (22): gpio: protect the list of GPIO devices with SRCU gpio: of: assign and read the hog pointer atomically gpio: remove unused logging helpers gpio: provide and use gpiod_get_label() gpio: don't set label from irq helpers gpio: add SRCU infrastructure to struct gpio_desc gpio: protect the descriptor label with SRCU gpio: sysfs: use gpio_device_find() to iterate over existing devices gpio: remove gpio_lock gpio: reinforce desc->flags handling gpio: remove unneeded code from gpio_device_get_desc() gpio: sysfs: extend the critical section for unregistering sysfs devices gpio: sysfs: pass the GPIO device - not chip - to sysfs callbacks gpio: cdev: replace gpiochip_get_desc() with gpio_device_get_desc() gpio: cdev: don't access gdev->chip if it's not needed gpio: reduce the functionality of validate_desc() gpio: remove unnecessary checks from gpiod_to_chip() gpio: add the can_sleep flag to struct gpio_device gpio: add SRCU infrastructure to struct gpio_device gpio: protect the pointer to gpio_chip in gpio_device with SRCU gpio: remove the RW semaphore from the GPIO device gpio: mark unsafe gpio_chip manipulators as deprecated drivers/gpio/gpiolib-cdev.c | 82 ++-- drivers/gpio/gpiolib-of.c | 4 +- drivers/gpio/gpiolib-sysfs.c | 155 +++++--- drivers/gpio/gpiolib.c | 735 +++++++++++++++++++---------------- drivers/gpio/gpiolib.h | 84 ++-- 5 files changed, 595 insertions(+), 465 deletions(-)