Message ID | 20230811193034.59124-1-brgl@bgdev.pl |
---|---|
State | New |
Headers | show |
Series | [v3] gpiolib: fix reference leaks when removing GPIO chips still in use | expand |
On Fri, Aug 11, 2023 at 9:30 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > After we remove a GPIO chip that still has some requested descriptors, > gpiod_free_commit() will fail and we will never put the references to the > GPIO device and the owning module in gpiod_free(). > > Rework this function to: > - not warn on desc == NULL as this is a use-case on which most free > functions silently return > - put the references to desc->gdev and desc->gdev->owner unconditionally > so that the release callback actually gets called when the remaining > references are dropped by external GPIO users > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- > v1 -> v2: > - add a comment about why we can't use VALIDATE_DESC_VOID() > > v2 -> v3: > - we must drop the reference to the owner module before we drop the one > to the gpio_device as the latter may be removed if this is the last > reference and we'll end up calling module_put() on freed memory v3 looks good to me. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > After we remove a GPIO chip that still has some requested descriptors, > gpiod_free_commit() will fail and we will never put the references to the > GPIO device and the owning module in gpiod_free(). > > Rework this function to: > - not warn on desc == NULL as this is a use-case on which most free > functions silently return > - put the references to desc->gdev and desc->gdev->owner unconditionally > so that the release callback actually gets called when the remaining > references are dropped by external GPIO users ... > - if (desc && desc->gdev && gpiod_free_commit(desc)) { The commit message doesn't explain disappearing of gdev check. > - module_put(desc->gdev->owner); > - gpio_device_put(desc->gdev); > - } else { > + /* > + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > + * may already be NULL but we still want to put the references. > + */ > + if (!desc) > + return; > + > + if (!gpiod_free_commit(desc)) > WARN_ON(extra_checks); > - } > + > + module_put(desc->gdev->owner); > + gpio_device_put(desc->gdev); > } So, if gdev can be NULL, you will get an Oops with new code. To keep a status quo this needs to be rewritten.
On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > After we remove a GPIO chip that still has some requested descriptors, > > gpiod_free_commit() will fail and we will never put the references to the > > GPIO device and the owning module in gpiod_free(). > > > > Rework this function to: > > - not warn on desc == NULL as this is a use-case on which most free > > functions silently return > > - put the references to desc->gdev and desc->gdev->owner unconditionally > > so that the release callback actually gets called when the remaining > > references are dropped by external GPIO users > > ... > > > - if (desc && desc->gdev && gpiod_free_commit(desc)) { > > The commit message doesn't explain disappearing of gdev check. > > > - module_put(desc->gdev->owner); > > - gpio_device_put(desc->gdev); > > - } else { > > + /* > > + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > > + * may already be NULL but we still want to put the references. > > + */ > > + if (!desc) > > + return; > > + > > + if (!gpiod_free_commit(desc)) > > WARN_ON(extra_checks); > > - } > > + > > + module_put(desc->gdev->owner); > > + gpio_device_put(desc->gdev); > > } > > So, if gdev can be NULL, you will get an Oops with new code. I read it such that gdev->chip can be NULL, but not gdev, and desc->gdev->owner is fine to reference? Yours, Linus Walleij
On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote: > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > > > After we remove a GPIO chip that still has some requested descriptors, > > > gpiod_free_commit() will fail and we will never put the references to the > > > GPIO device and the owning module in gpiod_free(). > > > > > > Rework this function to: > > > - not warn on desc == NULL as this is a use-case on which most free > > > functions silently return > > > - put the references to desc->gdev and desc->gdev->owner unconditionally > > > so that the release callback actually gets called when the remaining > > > references are dropped by external GPIO users ... > > > - if (desc && desc->gdev && gpiod_free_commit(desc)) { > > > > The commit message doesn't explain disappearing of gdev check. > > > > > - module_put(desc->gdev->owner); > > > - gpio_device_put(desc->gdev); > > > - } else { > > > + /* > > > + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > > > + * may already be NULL but we still want to put the references. > > > + */ > > > + if (!desc) > > > + return; > > > + > > > + if (!gpiod_free_commit(desc)) > > > WARN_ON(extra_checks); > > > - } > > > + > > > + module_put(desc->gdev->owner); > > > + gpio_device_put(desc->gdev); > > > } > > > > So, if gdev can be NULL, you will get an Oops with new code. > > I read it such that gdev->chip can be NULL, but not gdev, > and desc->gdev->owner is fine to reference? Basically the Q is "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote: > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: > > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > > > > > After we remove a GPIO chip that still has some requested descriptors, > > > > gpiod_free_commit() will fail and we will never put the references to the > > > > GPIO device and the owning module in gpiod_free(). > > > > > > > > Rework this function to: > > > > - not warn on desc == NULL as this is a use-case on which most free > > > > functions silently return > > > > - put the references to desc->gdev and desc->gdev->owner unconditionally > > > > so that the release callback actually gets called when the remaining > > > > references are dropped by external GPIO users > > ... > > > > > - if (desc && desc->gdev && gpiod_free_commit(desc)) { > > > > > > The commit message doesn't explain disappearing of gdev check. > > > > > > > - module_put(desc->gdev->owner); > > > > - gpio_device_put(desc->gdev); > > > > - } else { > > > > + /* > > > > + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > > > > + * may already be NULL but we still want to put the references. > > > > + */ > > > > + if (!desc) > > > > + return; > > > > + > > > > + if (!gpiod_free_commit(desc)) > > > > WARN_ON(extra_checks); > > > > - } > > > > + > > > > + module_put(desc->gdev->owner); > > > > + gpio_device_put(desc->gdev); > > > > } > > > > > > So, if gdev can be NULL, you will get an Oops with new code. > > > > I read it such that gdev->chip can be NULL, but not gdev, > > and desc->gdev->owner is fine to reference? > > Basically the Q is > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?" gdev->desc is assigned in one single spot, which is in gpiochip_add_data_with_key(): for (i = 0; i < gc->ngpio; i++) gdev->descs[i].gdev = gdev; It is never assigned anywhere else, so I guess yes. We may also ask if it is ever invalid (i.e. if desc->gdev can point to junk). A gdev turns to junk when its reference count goes down to zero and gpiodev_release() is called effectively calling kfree() on the struct gpio_device *. But that can only happen as a result of module_put() getting called, pulling the references down to zero. Which is what we are discussing. The line after module_put(), desc->gdev *could* be NULL. But then we just call gpio_device_put(desc->gdev) which is just a call to device_put(), which is NULL-tolerant. Yours, Linus Walleij
On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote: > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote: > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: ... > > > > > + module_put(desc->gdev->owner); > > > > > + gpio_device_put(desc->gdev); > > > > > > > > So, if gdev can be NULL, you will get an Oops with new code. > > > > > > I read it such that gdev->chip can be NULL, but not gdev, > > > and desc->gdev->owner is fine to reference? > > > > Basically the Q is > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?" > > gdev->desc is assigned in one single spot, which is in > gpiochip_add_data_with_key(): > > for (i = 0; i < gc->ngpio; i++) > gdev->descs[i].gdev = gdev; > > It is never assigned anywhere else, so I guess yes. > > We may also ask if it is ever invalid (i.e. if desc->gdev can point to > junk). > > A gdev turns to junk when its reference count goes down to zero > and gpiodev_release() is called effectively calling kfree() on the > struct gpio_device *. > > But that can only happen as a result of module_put() getting > called, pulling the references down to zero. Which is what we > are discussing. The line after module_put(), desc->gdev > *could* be NULL. Yes. > But then we just call gpio_device_put(desc->gdev) which is > just a call to device_put(), which is NULL-tolerant. But gpio_device_put() does not NULL tolerant. So, oops in this line then.
On Tue, Aug 15, 2023 at 05:43:53PM +0300, Andy Shevchenko wrote: > On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote: > > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote: > > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: ... > > > > > > + module_put(desc->gdev->owner); > > > > > > + gpio_device_put(desc->gdev); > > > > > > > > > > So, if gdev can be NULL, you will get an Oops with new code. > > > > > > > > I read it such that gdev->chip can be NULL, but not gdev, > > > > and desc->gdev->owner is fine to reference? > > > > > > Basically the Q is > > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?" > > > > gdev->desc is assigned in one single spot, which is in > > gpiochip_add_data_with_key(): > > > > for (i = 0; i < gc->ngpio; i++) > > gdev->descs[i].gdev = gdev; > > > > It is never assigned anywhere else, so I guess yes. > > > > We may also ask if it is ever invalid (i.e. if desc->gdev can point to > > junk). > > > > A gdev turns to junk when its reference count goes down to zero > > and gpiodev_release() is called effectively calling kfree() on the > > struct gpio_device *. > > > > But that can only happen as a result of module_put() getting > > called, pulling the references down to zero. Which is what we > > are discussing. The line after module_put(), desc->gdev > > *could* be NULL. > > Yes. > > > But then we just call gpio_device_put(desc->gdev) which is > > just a call to device_put(), which is NULL-tolerant. > > But gpio_device_put() does not NULL tolerant. > So, oops in this line then. That said, this gpiod_free() function needs a lot of additional comments to explain all this.
On Tue, Aug 15, 2023 at 4:43 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote: > > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko > > <andriy.shevchenko@linux.intel.com> wrote: > > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote: > > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko > > > > <andriy.shevchenko@linux.intel.com> wrote: > > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote: > > ... > > > > > > > + module_put(desc->gdev->owner); > > > > > > + gpio_device_put(desc->gdev); > > > > > > > > > > So, if gdev can be NULL, you will get an Oops with new code. > > > > > > > > I read it such that gdev->chip can be NULL, but not gdev, > > > > and desc->gdev->owner is fine to reference? > > > > > > Basically the Q is > > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?" > > > > gdev->desc is assigned in one single spot, which is in > > gpiochip_add_data_with_key(): > > > > for (i = 0; i < gc->ngpio; i++) > > gdev->descs[i].gdev = gdev; > > > > It is never assigned anywhere else, so I guess yes. > > > > We may also ask if it is ever invalid (i.e. if desc->gdev can point to > > junk). > > > > A gdev turns to junk when its reference count goes down to zero > > and gpiodev_release() is called effectively calling kfree() on the > > struct gpio_device *. > > > > But that can only happen as a result of module_put() getting > > called, pulling the references down to zero. Which is what we > > are discussing. The line after module_put(), desc->gdev > > *could* be NULL. > > Yes. > > > But then we just call gpio_device_put(desc->gdev) which is > > just a call to device_put(), which is NULL-tolerant. > > But gpio_device_put() does not NULL tolerant. > So, oops in this line then. > No. struct gpio_device is reference counted and as long as we get the reference counting right - the descriptor is guaranteed to hold it and only put it when it itself is being destroyed. In other words: desc->gdev cannot be NULL here and cannot be junk. If it is, then it's a programming bug on our side and we do want to crash and burn so that we can catch and fix it. If you think more comments are needed here, feel free to add them or I'll do it at a later time. I don't want to delay this patch any longer as it's one of the issues we need to fix to make driver unbind great again. Unless you see an issue with its logic, I want to queue it tomorrow so that it gets built in next and send it upstream by the end of this week. Bart
On Fri, Aug 11, 2023 at 9:30 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > After we remove a GPIO chip that still has some requested descriptors, > gpiod_free_commit() will fail and we will never put the references to the > GPIO device and the owning module in gpiod_free(). > > Rework this function to: > - not warn on desc == NULL as this is a use-case on which most free > functions silently return > - put the references to desc->gdev and desc->gdev->owner unconditionally > so that the release callback actually gets called when the remaining > references are dropped by external GPIO users > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > --- > v1 -> v2: > - add a comment about why we can't use VALIDATE_DESC_VOID() > > v2 -> v3: > - we must drop the reference to the owner module before we drop the one > to the gpio_device as the latter may be removed if this is the last > reference and we'll end up calling module_put() on freed memory > > drivers/gpio/gpiolib.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index 251c875b5c34..76e0c38026c3 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -2167,12 +2167,18 @@ static bool gpiod_free_commit(struct gpio_desc *desc) > > void gpiod_free(struct gpio_desc *desc) > { > - if (desc && desc->gdev && gpiod_free_commit(desc)) { > - module_put(desc->gdev->owner); > - gpio_device_put(desc->gdev); > - } else { > + /* > + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip > + * may already be NULL but we still want to put the references. > + */ > + if (!desc) > + return; > + > + if (!gpiod_free_commit(desc)) > WARN_ON(extra_checks); > - } > + > + module_put(desc->gdev->owner); > + gpio_device_put(desc->gdev); > } > > /** > -- > 2.39.2 > Queued for fixes. Bart
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 251c875b5c34..76e0c38026c3 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2167,12 +2167,18 @@ static bool gpiod_free_commit(struct gpio_desc *desc) void gpiod_free(struct gpio_desc *desc) { - if (desc && desc->gdev && gpiod_free_commit(desc)) { - module_put(desc->gdev->owner); - gpio_device_put(desc->gdev); - } else { + /* + * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip + * may already be NULL but we still want to put the references. + */ + if (!desc) + return; + + if (!gpiod_free_commit(desc)) WARN_ON(extra_checks); - } + + module_put(desc->gdev->owner); + gpio_device_put(desc->gdev); } /**