Message ID | 6056838.lOV4Wx5bFT@kreacher |
---|---|
State | New |
Headers | show |
Series | [v2] thermal: core: Do not fail cdev registration because of invalid initial state | expand |
On 06/06/2024 18:08, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling > device state to thermal_debug_cdev_add()") causes the ACPI fan driver > to fail probing on some systems which turns out to be due to the _FST > control method returning an invalid value until _FSL is first evaluated > for the given fan. If this happens, the .get_cur_state() cooling device > callback returns an error and __thermal_cooling_device_register() fails > as uses that callback after commit 31a0fa0019b0. > > Arguably, _FST should not return an invalid value even if it is > evaluated before _FSL, so this may be regarded as a platform firmware > issue, but at the same time it is not a good enough reason for failing > the cooling device registration where the initial cooling device state > is only needed to initialize a thermal debug facility. > > Accordingly, modify __thermal_cooling_device_register() to avoid calling > thermal_debug_cdev_add() instead of returning an error if the initial > .get_cur_state() callback invocation fails. > > Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") > Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com > Reported-by: Laura Nao <laura.nao@collabora.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > v1 -> v2: > * Instead of making the thermal debug code effectively ignore the invalid > initial cooling device state, simply don't register thermal debugfs for > a cooling device if its initial state returned by the driver's > .get_cur_state() is invalid (Daniel). > > Laura, please test this one even though I don't see why it wouldn't work for > you if the v1 did. > > --- > drivers/thermal/thermal_core.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > Index: linux-pm/drivers/thermal/thermal_core.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_core.c > +++ linux-pm/drivers/thermal/thermal_core.c > @@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct > > ret = cdev->ops->get_cur_state(cdev, ¤t_state); > if (ret) > - goto out_cdev_type; > + current_state = ULONG_MAX; Why not move the section ? So we end up below. > > thermal_cooling_device_setup_sysfs(cdev); > > @@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct > return ERR_PTR(ret); > } > > - thermal_debug_cdev_add(cdev, current_state); > + if (current_state <= cdev->max_state) > + thermal_debug_cdev_add(cdev, current_state); ret = cdev->ops->get_cur_state(cdev, ¤t_state); if (!ret) thermal_debug_cdev_add(cdev, current_state); Additionally a comment here to explain why get_cur_state can fail and telling it is up to the driver to fix its routine? > /* Add 'this' new cdev to the global cdev list */ > mutex_lock(&thermal_list_lock); > > >
On Thursday, June 6, 2024 6:39:02 PM CEST Daniel Lezcano wrote: > On 06/06/2024 18:08, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling > > device state to thermal_debug_cdev_add()") causes the ACPI fan driver > > to fail probing on some systems which turns out to be due to the _FST > > control method returning an invalid value until _FSL is first evaluated > > for the given fan. If this happens, the .get_cur_state() cooling device > > callback returns an error and __thermal_cooling_device_register() fails > > as uses that callback after commit 31a0fa0019b0. > > > > Arguably, _FST should not return an invalid value even if it is > > evaluated before _FSL, so this may be regarded as a platform firmware > > issue, but at the same time it is not a good enough reason for failing > > the cooling device registration where the initial cooling device state > > is only needed to initialize a thermal debug facility. > > > > Accordingly, modify __thermal_cooling_device_register() to avoid calling > > thermal_debug_cdev_add() instead of returning an error if the initial > > .get_cur_state() callback invocation fails. > > > > Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") > > Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com > > Reported-by: Laura Nao <laura.nao@collabora.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > v1 -> v2: > > * Instead of making the thermal debug code effectively ignore the invalid > > initial cooling device state, simply don't register thermal debugfs for > > a cooling device if its initial state returned by the driver's > > .get_cur_state() is invalid (Daniel). > > > > Laura, please test this one even though I don't see why it wouldn't work for > > you if the v1 did. > > > > --- > > drivers/thermal/thermal_core.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > Index: linux-pm/drivers/thermal/thermal_core.c > > =================================================================== > > --- linux-pm.orig/drivers/thermal/thermal_core.c > > +++ linux-pm/drivers/thermal/thermal_core.c > > @@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct > > > > ret = cdev->ops->get_cur_state(cdev, ¤t_state); > > if (ret) > > - goto out_cdev_type; > > + current_state = ULONG_MAX; > > Why not move the section ? So we end up below. Because it's fewer changes in this patch and it doesn't matter. > > > > thermal_cooling_device_setup_sysfs(cdev); > > > > @@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct > > return ERR_PTR(ret); > > } > > > > - thermal_debug_cdev_add(cdev, current_state); > > + if (current_state <= cdev->max_state) > > + thermal_debug_cdev_add(cdev, current_state); > > ret = cdev->ops->get_cur_state(cdev, ¤t_state); > if (!ret) > thermal_debug_cdev_add(cdev, current_state); So the current_state <= cdev->max_state is actually useful regardless and consistent with the analogous check in thermal_cooling_device_update(). > Additionally a comment here to explain why get_cur_state can fail and > telling it is up to the driver to fix its routine? I've added a comment in v3. > > > /* Add 'this' new cdev to the global cdev list */ > > mutex_lock(&thermal_list_lock); > > > > > > > >
Index: linux-pm/drivers/thermal/thermal_core.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.c +++ linux-pm/drivers/thermal/thermal_core.c @@ -1001,7 +1001,7 @@ __thermal_cooling_device_register(struct ret = cdev->ops->get_cur_state(cdev, ¤t_state); if (ret) - goto out_cdev_type; + current_state = ULONG_MAX; thermal_cooling_device_setup_sysfs(cdev); @@ -1016,7 +1016,8 @@ __thermal_cooling_device_register(struct return ERR_PTR(ret); } - thermal_debug_cdev_add(cdev, current_state); + if (current_state <= cdev->max_state) + thermal_debug_cdev_add(cdev, current_state); /* Add 'this' new cdev to the global cdev list */ mutex_lock(&thermal_list_lock);