Message ID | 20221003092602.1323944-1-daniel.lezcano@linaro.org |
---|---|
Headers | show |
Series | Rework the trip points creation | expand |
Hi Marek, On 03/10/2022 23:18, Daniel Lezcano wrote: [ ... ] >> I've tested this v8 patchset after fixing the issue with Exynos TMU with >> https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/ >> >> patch and I got the following lockdep warning on all Exynos-based boards: >> >> >> ====================================================== >> WARNING: possible circular locking dependency detected >> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted >> ------------------------------------------------------ >> swapper/0/1 is trying to acquire lock: >> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 >> >> but task is already holding lock: >> c2979b94 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_device_update.part.0+0x3c/0x528 >> >> which lock already depends on the new lock. > > I'm wondering if the problem is not already there and related to > data->lock ... > > Doesn't the thermal zone lock already prevent racy access to the data > structure? > > Another question: if the sensor clock is disabled after reading it, how > does the hardware update the temperature and detect the programed > threshold is crossed? just a gentle ping, as the fix will depend on your answer ;) Thanks -- D. [ ... ]
On 05.10.2022 14:37, Daniel Lezcano wrote: > > Hi Marek, > > On 03/10/2022 23:18, Daniel Lezcano wrote: > > [ ... ] > >>> I've tested this v8 patchset after fixing the issue with Exynos TMU >>> with >>> https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/ >>> >>> patch and I got the following lockdep warning on all Exynos-based >>> boards: >>> >>> >>> ====================================================== >>> WARNING: possible circular locking dependency detected >>> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted >>> ------------------------------------------------------ >>> swapper/0/1 is trying to acquire lock: >>> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 >>> >>> but task is already holding lock: >>> c2979b94 (&tz->lock){+.+.}-{3:3}, at: >>> thermal_zone_device_update.part.0+0x3c/0x528 >>> >>> which lock already depends on the new lock. >> >> I'm wondering if the problem is not already there and related to >> data->lock ... >> >> Doesn't the thermal zone lock already prevent racy access to the data >> structure? >> >> Another question: if the sensor clock is disabled after reading it, >> how does the hardware update the temperature and detect the programed >> threshold is crossed? > > just a gentle ping, as the fix will depend on your answer ;) > Sorry, I've been busy with other stuff. I thought I will fix this once I find a bit of spare time. IMHO the clock management is a bit over-engineered, as there is little (if any) benefit from such fine grade clock management. That clock is needed only for the AHB related part of the TMU (reading/writing the registers). The IRQ generation and temperature measurement is clocked from so called 'sclk' (special clock). I also briefly looked at the code and the internal lock doesn't look to be really necessary assuming that the thermal core already serializes all the calls. Best regards
Hi Marek, On 05/10/2022 15:05, Marek Szyprowski wrote: > > On 05.10.2022 14:37, Daniel Lezcano wrote: >> >> Hi Marek, >> >> On 03/10/2022 23:18, Daniel Lezcano wrote: >> >> [ ... ] >> >>>> I've tested this v8 patchset after fixing the issue with Exynos TMU >>>> with >>>> https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/ >>>> >>>> patch and I got the following lockdep warning on all Exynos-based >>>> boards: >>>> >>>> >>>> ====================================================== >>>> WARNING: possible circular locking dependency detected >>>> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted >>>> ------------------------------------------------------ >>>> swapper/0/1 is trying to acquire lock: >>>> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 >>>> >>>> but task is already holding lock: >>>> c2979b94 (&tz->lock){+.+.}-{3:3}, at: >>>> thermal_zone_device_update.part.0+0x3c/0x528 >>>> >>>> which lock already depends on the new lock. >>> >>> I'm wondering if the problem is not already there and related to >>> data->lock ... >>> >>> Doesn't the thermal zone lock already prevent racy access to the data >>> structure? >>> >>> Another question: if the sensor clock is disabled after reading it, >>> how does the hardware update the temperature and detect the programed >>> threshold is crossed? >> >> just a gentle ping, as the fix will depend on your answer ;) >> > Sorry, I've been busy with other stuff. I thought I will fix this once I > find a bit of spare time. Ok, that is great if you can find time to fix it up because I've other drivers to convert to the generic thermal trips. > IMHO the clock management is a bit over-engineered, as there is little > (if any) benefit from such fine grade clock management. That clock is > needed only for the AHB related part of the TMU (reading/writing the > registers). The IRQ generation and temperature measurement is clocked > from so called 'sclk' (special clock). > > I also briefly looked at the code and the internal lock doesn't look to > be really necessary assuming that the thermal core already serializes > all the calls. I looked at the code and I think the driver can be simplified (fixed?) even more. IIUC, the sensor has multiple trip point interrupts, so if the device tree is describing more trip points than the sensor supports, there is a warning and the number of trip point is capped. IMO that can be simplified by using two trip point interrupt because the thermal_zone_device_update() will call the set_trips callback with the new boundaries. IOW, the thermal framework sets a new trip point interrupt when one is crossed. That should result in the simplification of the tmu_control as well as the tmu_probe function. As well as removing the limitation of the number of trip points. In order to have that correctly working, the 'set_trips' ops must be used to call the tmu_control callback instead of calling it in tmu_probe. The intialization workflow should be: probe->... ->thermal_zone_device_register() ->thermal_zone_device_update() ->update_trip_points() ->ops->set_trips() ->tmu_control() Also, replace the workqueue by a threaded interrupt. Does it make sense?
Hi Daniel, On 06.10.2022 08:55, Daniel Lezcano wrote: > > On 05/10/2022 15:05, Marek Szyprowski wrote: >> >> On 05.10.2022 14:37, Daniel Lezcano wrote: >>> >>> On 03/10/2022 23:18, Daniel Lezcano wrote: >>> >>> [ ... ] >>> >>>>> I've tested this v8 patchset after fixing the issue with Exynos TMU >>>>> with >>>>> https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@linaro.org/ >>>>> >>>>> >>>>> patch and I got the following lockdep warning on all Exynos-based >>>>> boards: >>>>> >>>>> >>>>> ====================================================== >>>>> WARNING: possible circular locking dependency detected >>>>> 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted >>>>> ------------------------------------------------------ >>>>> swapper/0/1 is trying to acquire lock: >>>>> c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 >>>>> >>>>> but task is already holding lock: >>>>> c2979b94 (&tz->lock){+.+.}-{3:3}, at: >>>>> thermal_zone_device_update.part.0+0x3c/0x528 >>>>> >>>>> which lock already depends on the new lock. >>>> >>>> I'm wondering if the problem is not already there and related to >>>> data->lock ... >>>> >>>> Doesn't the thermal zone lock already prevent racy access to the data >>>> structure? >>>> >>>> Another question: if the sensor clock is disabled after reading it, >>>> how does the hardware update the temperature and detect the programed >>>> threshold is crossed? >>> >>> just a gentle ping, as the fix will depend on your answer ;) >>> >> Sorry, I've been busy with other stuff. I thought I will fix this once I >> find a bit of spare time. > > Ok, that is great if you can find time to fix it up because I've other > drivers to convert to the generic thermal trips. > > >> IMHO the clock management is a bit over-engineered, as there is little >> (if any) benefit from such fine grade clock management. That clock is >> needed only for the AHB related part of the TMU (reading/writing the >> registers). The IRQ generation and temperature measurement is clocked >> from so called 'sclk' (special clock). >> >> I also briefly looked at the code and the internal lock doesn't look to >> be really necessary assuming that the thermal core already serializes >> all the calls. > > I looked at the code and I think the driver can be simplified (fixed?) > even more. > > IIUC, the sensor has multiple trip point interrupts, so if the device > tree is describing more trip points than the sensor supports, there is > a warning and the number of trip point is capped. > > IMO that can be simplified by using two trip point interrupt because > the thermal_zone_device_update() will call the set_trips callback with > the new boundaries. IOW, the thermal framework sets a new trip point > interrupt when one is crossed. > > That should result in the simplification of the tmu_control as well as > the tmu_probe function. As well as removing the limitation of the > number of trip points. > > In order to have that correctly working, the 'set_trips' ops must be > used to call the tmu_control callback instead of calling it in tmu_probe. > > The intialization workflow should be: > > probe->... > ->thermal_zone_device_register() > ->thermal_zone_device_update() > ->update_trip_points() > ->ops->set_trips() > ->tmu_control() > > Also, replace the workqueue by a threaded interrupt. > > Does it make sense? Yes, definitely. Frankly speaking I've never looked into that code, so I was not aware that it needs some cleanup. Best regards