diff mbox series

[v14] thermal/drivers/mediatek/auxadc_thermal: expose all thermal sensors

Message ID 20241025-auxadc_thermal-v14-1-96ab5b60c02e@chromium.org
State New
Headers show
Series [v14] thermal/drivers/mediatek/auxadc_thermal: expose all thermal sensors | expand

Commit Message

Hsin-Te Yuan Oct. 25, 2024, 12:05 p.m. UTC
From: James Lo <james.lo@mediatek.com>

Previously, the driver only supported reading the temperature from all
sensors and returning the maximum value. This update adds another
get_temp ops to support reading the temperature from each sensor
separately.

Especially, some thermal zones registered by this patch are needed by
MT8183 since those thermal zones are necessary for mtk-svs driver.

Signed-off-by: Michael Kao <michael.kao@mediatek.com>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Signed-off-by: Ben Tseng <ben.tseng@mediatek.com>
Signed-off-by: James Lo <james.lo@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org>
---
Changes in v14:
- Remove redundant error message.
- Link to v13: https://lore.kernel.org/r/20241025-auxadc_thermal-v13-1-a5231c52dccb@chromium.org

Changes in v13:
- Make subject and commit message more clear.
- Make error message more clear.
- Link to v12: https://lore.kernel.org/r/20241016-auxadc_thermal-v12-1-c0433e9f61af@chromium.org

Changes in v12:
- Remove unnecessary check and unused variable assignment in mtk_read_sensor_temp.
- Add more about what this patch achieves in the commit message.
- Link to v11: https://lore.kernel.org/r/20240809-auxadc_thermal-v11-1-af36cc74f3a3@chromium.org

Changes in V11:
    - Rebase on kernel v6.11-rc2
    - Use mtk_thermal_temp_is_valid in mtk_read_sensor_temp just like
      mtk_thermal_bank_temperature
    - Change the error handling of devm_thermal_of_zone_register return
      value
    - link to V10: https://lore.kernel.org/lkml/20220519101044.16765-1-james.lo@mediatek.com/

Changes in V10:
    - Rebase to kernel-v5.18-rc7
    - Resend

Changes in V9:
    - Rebase to kernel-v5.14-rc1
    - Bind raw_to_mcelsius_v1 or raw_to_mcelsius_v2 to compatible
      data of struct mtk_thermal_data
    - Remove duplicate struct 'mtk_thermal_bank'
    - Remove unnecessary if condition check
    - Return error if any thermal zone fail to register

Changes in V8:
    - Rebase to kernel-v5.13-rc1
    - Resend

Changes in v7:
    - Fix build error in v6.

Changes in v6:
    - Rebase to kernel-5.11-rc1.
    - [1/3]
        - add interrupts property.
    - [2/3]
        - add the Tested-by in the commit message.
    - [3/3]
        - use the mt->conf->msr[id] instead of conf->msr[id] in the
          _get_sensor_temp and mtk_thermal_bank_temperature.
        - remove the redundant space in _get_sensor_temp and
          mtk_read_sensor_temp.
        - change kmalloc to dev_kmalloc in mtk_thermal_probe.

Changes in v5:
    - Rebase to kernel-5.9-rc1.
    - Revise the title of cover letter.
    - Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL"
    - [2/2]
        -  Add the judgement to the version of raw_to_mcelsius.

Changes in v4:
    - Rebase to kernel-5.6-rc1.
    - [1/7]
        - Squash thermal zone settings in the dtsi from [v3,5/8]
          arm64: dts: mt8183: Increase polling frequency for CPU thermal zone.
        - Remove the property of interrupts and mediatek,hw-reset-temp.
    - [2/7]
        - Correct commit message.
    - [4/7]
        - Change the target temperature to the 80C and change the commit message.
    - [6/7]
        - Adjust newline alignment.
        - Fix the judgement on the return value of registering thermal zone.

Changes in v3:
    - Rebase to kernel-5.5-rc1.
    - [1/8]
        - Update sustainable power of cpu, tzts1~5 and tztsABB.
    - [7/8]
        - Bypass the failure that non cpu_thermal sensor is not find in thermal-zones
          in dts, which is normal for mt8173, so prompt a warning here instead of
          failing.

    Return -EAGAIN instead of -EACCESS on the first read of sensor that
        often are bogus values. This can avoid following warning on boot:

          thermal thermal_zone6: failed to read out thermal zone (-13)

Changes in v2:
    - [1/8]
        - Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB.
    - [4/8]
        - Add the min opp of cpu throttle.
---

---
 drivers/thermal/mediatek/auxadc_thermal.c | 70 +++++++++++++++++++++++++++----
 1 file changed, 62 insertions(+), 8 deletions(-)


---
base-commit: b589839414be04b2b37e4bf6f84af576c99faf64
change-id: 20240809-auxadc_thermal-9be338ec8b1c

Best regards,

Comments

Chen-Yu Tsai Nov. 1, 2024, 8:29 a.m. UTC | #1
On Fri, Oct 25, 2024 at 9:16 PM Hsin-Te Yuan <yuanhsinte@chromium.org> wrote:
>
> From: James Lo <james.lo@mediatek.com>
>
> Previously, the driver only supported reading the temperature from all
> sensors and returning the maximum value. This update adds another
> get_temp ops to support reading the temperature from each sensor
> separately.
>
> Especially, some thermal zones registered by this patch are needed by
> MT8183 since those thermal zones are necessary for mtk-svs driver.
>
> Signed-off-by: Michael Kao <michael.kao@mediatek.com>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> Signed-off-by: Ben Tseng <ben.tseng@mediatek.com>
> Signed-off-by: James Lo <james.lo@mediatek.com>
> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org>

Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>

> ---
> Changes in v14:
> - Remove redundant error message.
> - Link to v13: https://lore.kernel.org/r/20241025-auxadc_thermal-v13-1-a5231c52dccb@chromium.org
>
> Changes in v13:
> - Make subject and commit message more clear.
> - Make error message more clear.
> - Link to v12: https://lore.kernel.org/r/20241016-auxadc_thermal-v12-1-c0433e9f61af@chromium.org
>
> Changes in v12:
> - Remove unnecessary check and unused variable assignment in mtk_read_sensor_temp.
> - Add more about what this patch achieves in the commit message.
> - Link to v11: https://lore.kernel.org/r/20240809-auxadc_thermal-v11-1-af36cc74f3a3@chromium.org
>
> Changes in V11:
>     - Rebase on kernel v6.11-rc2
>     - Use mtk_thermal_temp_is_valid in mtk_read_sensor_temp just like
>       mtk_thermal_bank_temperature
>     - Change the error handling of devm_thermal_of_zone_register return
>       value
>     - link to V10: https://lore.kernel.org/lkml/20220519101044.16765-1-james.lo@mediatek.com/
>
> Changes in V10:
>     - Rebase to kernel-v5.18-rc7
>     - Resend
>
> Changes in V9:
>     - Rebase to kernel-v5.14-rc1
>     - Bind raw_to_mcelsius_v1 or raw_to_mcelsius_v2 to compatible
>       data of struct mtk_thermal_data
>     - Remove duplicate struct 'mtk_thermal_bank'
>     - Remove unnecessary if condition check
>     - Return error if any thermal zone fail to register
>
> Changes in V8:
>     - Rebase to kernel-v5.13-rc1
>     - Resend
>
> Changes in v7:
>     - Fix build error in v6.
>
> Changes in v6:
>     - Rebase to kernel-5.11-rc1.
>     - [1/3]
>         - add interrupts property.
>     - [2/3]
>         - add the Tested-by in the commit message.
>     - [3/3]
>         - use the mt->conf->msr[id] instead of conf->msr[id] in the
>           _get_sensor_temp and mtk_thermal_bank_temperature.
>         - remove the redundant space in _get_sensor_temp and
>           mtk_read_sensor_temp.
>         - change kmalloc to dev_kmalloc in mtk_thermal_probe.
>
> Changes in v5:
>     - Rebase to kernel-5.9-rc1.
>     - Revise the title of cover letter.
>     - Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL"
>     - [2/2]
>         -  Add the judgement to the version of raw_to_mcelsius.
>
> Changes in v4:
>     - Rebase to kernel-5.6-rc1.
>     - [1/7]
>         - Squash thermal zone settings in the dtsi from [v3,5/8]
>           arm64: dts: mt8183: Increase polling frequency for CPU thermal zone.
>         - Remove the property of interrupts and mediatek,hw-reset-temp.
>     - [2/7]
>         - Correct commit message.
>     - [4/7]
>         - Change the target temperature to the 80C and change the commit message.
>     - [6/7]
>         - Adjust newline alignment.
>         - Fix the judgement on the return value of registering thermal zone.
>
> Changes in v3:
>     - Rebase to kernel-5.5-rc1.
>     - [1/8]
>         - Update sustainable power of cpu, tzts1~5 and tztsABB.
>     - [7/8]
>         - Bypass the failure that non cpu_thermal sensor is not find in thermal-zones
>           in dts, which is normal for mt8173, so prompt a warning here instead of
>           failing.
>
>     Return -EAGAIN instead of -EACCESS on the first read of sensor that
>         often are bogus values. This can avoid following warning on boot:
>
>           thermal thermal_zone6: failed to read out thermal zone (-13)
>
> Changes in v2:
>     - [1/8]
>         - Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB.
>     - [4/8]
>         - Add the min opp of cpu throttle.
> ---
>
> ---
>  drivers/thermal/mediatek/auxadc_thermal.c | 70 +++++++++++++++++++++++++++----
>  1 file changed, 62 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/thermal/mediatek/auxadc_thermal.c b/drivers/thermal/mediatek/auxadc_thermal.c
> index 9ee2e7283435acfcbb1a956303b6122a08affecc..9a9079d559a3abe9e3823f744d4c9a159a8666bd 100644
> --- a/drivers/thermal/mediatek/auxadc_thermal.c
> +++ b/drivers/thermal/mediatek/auxadc_thermal.c
> @@ -847,7 +847,8 @@ static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>
>  static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
>  {
> -       struct mtk_thermal *mt = thermal_zone_device_priv(tz);
> +       struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> +       struct mtk_thermal *mt = bank->mt;
>         int i;
>         int tempmax = INT_MIN;
>
> @@ -866,10 +867,41 @@ static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
>         return 0;
>  }
>
> +static int mtk_read_sensor_temp(struct thermal_zone_device *tz, int *temperature)
> +{
> +       struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> +       struct mtk_thermal *mt = bank->mt;
> +       const struct mtk_thermal_data *conf = mt->conf;
> +       int id = bank->id - 1;
> +       int temp = INT_MIN;
> +       u32 raw;
> +
> +       raw = readl(mt->thermal_base + conf->msr[id]);
> +
> +       temp = mt->raw_to_mcelsius(mt, id, raw);
> +
> +       /*
> +        * The first read of a sensor often contains very high bogus
> +        * temperature value. Filter these out so that the system does
> +        * not immediately shut down.
> +        */
> +
> +       if (unlikely(!mtk_thermal_temp_is_valid(temp)))
> +               return -EAGAIN;
> +
> +       *temperature = temp;
> +
> +       return 0;
> +}
> +
>  static const struct thermal_zone_device_ops mtk_thermal_ops = {
>         .get_temp = mtk_read_temp,
>  };
>
> +static const struct thermal_zone_device_ops mtk_thermal_sensor_ops = {
> +       .get_temp = mtk_read_sensor_temp,
> +};
> +
>  static void mtk_thermal_init_bank(struct mtk_thermal *mt, int num,
>                                   u32 apmixed_phys_base, u32 auxadc_phys_base,
>                                   int ctrl_id)
> @@ -1199,6 +1231,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
>         u64 auxadc_phys_base, apmixed_phys_base;
>         struct thermal_zone_device *tzdev;
>         void __iomem *apmixed_base, *auxadc_base;
> +       struct mtk_thermal_bank *tz;
>
>         mt = devm_kzalloc(&pdev->dev, sizeof(*mt), GFP_KERNEL);
>         if (!mt)
> @@ -1285,14 +1318,35 @@ static int mtk_thermal_probe(struct platform_device *pdev)
>                         mtk_thermal_init_bank(mt, i, apmixed_phys_base,
>                                               auxadc_phys_base, ctrl_id);
>
> -       tzdev = devm_thermal_of_zone_register(&pdev->dev, 0, mt,
> -                                             &mtk_thermal_ops);
> -       if (IS_ERR(tzdev))
> -               return PTR_ERR(tzdev);
> +       for (i = 0; i <= mt->conf->num_sensors; i++) {
> +               tz = devm_kmalloc(&pdev->dev, sizeof(*tz), GFP_KERNEL);
> +               if (!tz)
> +                       return -ENOMEM;
> +
> +               tz->mt = mt;
> +               tz->id = i;
> +
> +               tzdev = devm_thermal_of_zone_register(&pdev->dev, i,
> +                                                     tz, (i == 0) ?
> +                                                     &mtk_thermal_ops
> +                                                     : &mtk_thermal_sensor_ops);

Nit: I think the ':' should be at the end of the previous line?

> +
> +               if (IS_ERR(tzdev)) {
> +                       ret = PTR_ERR(tzdev);
> +                       if (ret == -ENODEV) {
> +                               dev_warn(&pdev->dev,
> +                                        "Can't find thermal zone for sensor %d; sensor skipped.\n", i);
> +                               continue;
> +                       }
> +                       return dev_err_probe(&pdev->dev, ret,
> +                                            "Failed to register thermal zone %d.\n", i);

This section could probably be slightly simplified to:

    if (PTR_ERR(tzdev) == -ENODEV) {
        dev_warn(...);
        continue;
    }

    return dev_err_cast_probe(&p-dev->dev, tzdev, ...);

I don't think these nitpicks warrant a new revision though.

ChenYu

> +               }
>
> -       ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> -       if (ret)
> -               dev_warn(&pdev->dev, "error in thermal_add_hwmon_sysfs");
> +               ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> +               if (ret)
> +                       dev_warn(&pdev->dev,
> +                                "Sensor %d: Error in thermal_add_hwmon_sysfs: %d\n", i, ret);
> +       }
>
>         return 0;
>  }
>
> ---
> base-commit: b589839414be04b2b37e4bf6f84af576c99faf64
> change-id: 20240809-auxadc_thermal-9be338ec8b1c
>
> Best regards,
> --
> Hsin-Te Yuan <yuanhsinte@chromium.org>
>
>
Daniel Lezcano Nov. 14, 2024, 4:48 p.m. UTC | #2
Hi,

On 25/10/2024 14:05, Hsin-Te Yuan wrote:
> From: James Lo <james.lo@mediatek.com>
> 
> Previously, the driver only supported reading the temperature from all
> sensors and returning the maximum value. This update adds another
> get_temp ops to support reading the temperature from each sensor
> separately.
> 
> Especially, some thermal zones registered by this patch are needed by
> MT8183 since those thermal zones are necessary for mtk-svs driver.

The DT for the mt8183 describes the sensor id = 0 as the CPU. On this, 
there is a cooling device with trip points.

The driver registers the id=0 as an aggregator for the sensors which 
overloads the CPU thermal zone above.

Why do you need to aggregate all the sensors to retrieve the max value ?

They are all contributing differently to the heat and they should be 
tied with their proper cooling device.

I don't think the thermal configuration is correct and I suggest to fix 
this aggregator by removing it.



> Signed-off-by: Michael Kao <michael.kao@mediatek.com>
> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> Signed-off-by: Ben Tseng <ben.tseng@mediatek.com>
> Signed-off-by: James Lo <james.lo@mediatek.com>
> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org>
> ---
> Changes in v14:
> - Remove redundant error message.
> - Link to v13: https://lore.kernel.org/r/20241025-auxadc_thermal-v13-1-a5231c52dccb@chromium.org
> 
> Changes in v13:
> - Make subject and commit message more clear.
> - Make error message more clear.
> - Link to v12: https://lore.kernel.org/r/20241016-auxadc_thermal-v12-1-c0433e9f61af@chromium.org
> 
> Changes in v12:
> - Remove unnecessary check and unused variable assignment in mtk_read_sensor_temp.
> - Add more about what this patch achieves in the commit message.
> - Link to v11: https://lore.kernel.org/r/20240809-auxadc_thermal-v11-1-af36cc74f3a3@chromium.org
> 
> Changes in V11:
>      - Rebase on kernel v6.11-rc2
>      - Use mtk_thermal_temp_is_valid in mtk_read_sensor_temp just like
>        mtk_thermal_bank_temperature
>      - Change the error handling of devm_thermal_of_zone_register return
>        value
>      - link to V10: https://lore.kernel.org/lkml/20220519101044.16765-1-james.lo@mediatek.com/
> 
> Changes in V10:
>      - Rebase to kernel-v5.18-rc7
>      - Resend
> 
> Changes in V9:
>      - Rebase to kernel-v5.14-rc1
>      - Bind raw_to_mcelsius_v1 or raw_to_mcelsius_v2 to compatible
>        data of struct mtk_thermal_data
>      - Remove duplicate struct 'mtk_thermal_bank'
>      - Remove unnecessary if condition check
>      - Return error if any thermal zone fail to register
> 
> Changes in V8:
>      - Rebase to kernel-v5.13-rc1
>      - Resend
> 
> Changes in v7:
>      - Fix build error in v6.
> 
> Changes in v6:
>      - Rebase to kernel-5.11-rc1.
>      - [1/3]
>          - add interrupts property.
>      - [2/3]
>          - add the Tested-by in the commit message.
>      - [3/3]
>          - use the mt->conf->msr[id] instead of conf->msr[id] in the
>            _get_sensor_temp and mtk_thermal_bank_temperature.
>          - remove the redundant space in _get_sensor_temp and
>            mtk_read_sensor_temp.
>          - change kmalloc to dev_kmalloc in mtk_thermal_probe.
> 
> Changes in v5:
>      - Rebase to kernel-5.9-rc1.
>      - Revise the title of cover letter.
>      - Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL"
>      - [2/2]
>          -  Add the judgement to the version of raw_to_mcelsius.
> 
> Changes in v4:
>      - Rebase to kernel-5.6-rc1.
>      - [1/7]
>          - Squash thermal zone settings in the dtsi from [v3,5/8]
>            arm64: dts: mt8183: Increase polling frequency for CPU thermal zone.
>          - Remove the property of interrupts and mediatek,hw-reset-temp.
>      - [2/7]
>          - Correct commit message.
>      - [4/7]
>          - Change the target temperature to the 80C and change the commit message.
>      - [6/7]
>          - Adjust newline alignment.
>          - Fix the judgement on the return value of registering thermal zone.
> 
> Changes in v3:
>      - Rebase to kernel-5.5-rc1.
>      - [1/8]
>          - Update sustainable power of cpu, tzts1~5 and tztsABB.
>      - [7/8]
>          - Bypass the failure that non cpu_thermal sensor is not find in thermal-zones
>            in dts, which is normal for mt8173, so prompt a warning here instead of
>            failing.
> 
>      Return -EAGAIN instead of -EACCESS on the first read of sensor that
>          often are bogus values. This can avoid following warning on boot:
> 
>            thermal thermal_zone6: failed to read out thermal zone (-13)
> 
> Changes in v2:
>      - [1/8]
>          - Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB.
>      - [4/8]
>          - Add the min opp of cpu throttle.
> ---
> 
> ---
>   drivers/thermal/mediatek/auxadc_thermal.c | 70 +++++++++++++++++++++++++++----
>   1 file changed, 62 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/thermal/mediatek/auxadc_thermal.c b/drivers/thermal/mediatek/auxadc_thermal.c
> index 9ee2e7283435acfcbb1a956303b6122a08affecc..9a9079d559a3abe9e3823f744d4c9a159a8666bd 100644
> --- a/drivers/thermal/mediatek/auxadc_thermal.c
> +++ b/drivers/thermal/mediatek/auxadc_thermal.c
> @@ -847,7 +847,8 @@ static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>   
>   static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
>   {
> -	struct mtk_thermal *mt = thermal_zone_device_priv(tz);
> +	struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> +	struct mtk_thermal *mt = bank->mt;
>   	int i;
>   	int tempmax = INT_MIN;
>   
> @@ -866,10 +867,41 @@ static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
>   	return 0;
>   }
>   
> +static int mtk_read_sensor_temp(struct thermal_zone_device *tz, int *temperature)
> +{
> +	struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> +	struct mtk_thermal *mt = bank->mt;
> +	const struct mtk_thermal_data *conf = mt->conf;
> +	int id = bank->id - 1;
> +	int temp = INT_MIN;
> +	u32 raw;
> +
> +	raw = readl(mt->thermal_base + conf->msr[id]);
> +
> +	temp = mt->raw_to_mcelsius(mt, id, raw);
> +
> +	/*
> +	 * The first read of a sensor often contains very high bogus
> +	 * temperature value. Filter these out so that the system does
> +	 * not immediately shut down.
> +	 */
> +
> +	if (unlikely(!mtk_thermal_temp_is_valid(temp)))
> +		return -EAGAIN;
> +
> +	*temperature = temp;
> +
> +	return 0;
> +}
> +
>   static const struct thermal_zone_device_ops mtk_thermal_ops = {
>   	.get_temp = mtk_read_temp,
>   };
>   
> +static const struct thermal_zone_device_ops mtk_thermal_sensor_ops = {
> +	.get_temp = mtk_read_sensor_temp,
> +};
> +
>   static void mtk_thermal_init_bank(struct mtk_thermal *mt, int num,
>   				  u32 apmixed_phys_base, u32 auxadc_phys_base,
>   				  int ctrl_id)
> @@ -1199,6 +1231,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
>   	u64 auxadc_phys_base, apmixed_phys_base;
>   	struct thermal_zone_device *tzdev;
>   	void __iomem *apmixed_base, *auxadc_base;
> +	struct mtk_thermal_bank *tz;
>   
>   	mt = devm_kzalloc(&pdev->dev, sizeof(*mt), GFP_KERNEL);
>   	if (!mt)
> @@ -1285,14 +1318,35 @@ static int mtk_thermal_probe(struct platform_device *pdev)
>   			mtk_thermal_init_bank(mt, i, apmixed_phys_base,
>   					      auxadc_phys_base, ctrl_id);
>   
> -	tzdev = devm_thermal_of_zone_register(&pdev->dev, 0, mt,
> -					      &mtk_thermal_ops);
> -	if (IS_ERR(tzdev))
> -		return PTR_ERR(tzdev);
> +	for (i = 0; i <= mt->conf->num_sensors; i++) {
> +		tz = devm_kmalloc(&pdev->dev, sizeof(*tz), GFP_KERNEL);
> +		if (!tz)
> +			return -ENOMEM;
> +
> +		tz->mt = mt;
> +		tz->id = i;
> +
> +		tzdev = devm_thermal_of_zone_register(&pdev->dev, i,
> +						      tz, (i == 0) ?
> +						      &mtk_thermal_ops
> +						      : &mtk_thermal_sensor_ops);
> +
> +		if (IS_ERR(tzdev)) {
> +			ret = PTR_ERR(tzdev);
> +			if (ret == -ENODEV) {
> +				dev_warn(&pdev->dev,
> +					 "Can't find thermal zone for sensor %d; sensor skipped.\n", i);
> +				continue;
> +			}
> +			return dev_err_probe(&pdev->dev, ret,
> +					     "Failed to register thermal zone %d.\n", i);
> +		}
>   
> -	ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> -	if (ret)
> -		dev_warn(&pdev->dev, "error in thermal_add_hwmon_sysfs");
> +		ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> +		if (ret)
> +			dev_warn(&pdev->dev,
> +				 "Sensor %d: Error in thermal_add_hwmon_sysfs: %d\n", i, ret);
> +	}
>   
>   	return 0;
>   }
> 
> ---
> base-commit: b589839414be04b2b37e4bf6f84af576c99faf64
> change-id: 20240809-auxadc_thermal-9be338ec8b1c
> 
> Best regards,
Hsin-Te Yuan Nov. 19, 2024, 7:38 a.m. UTC | #3
On Fri, Nov 15, 2024 at 12:48 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi,
>
> On 25/10/2024 14:05, Hsin-Te Yuan wrote:
> > From: James Lo <james.lo@mediatek.com>
> >
> > Previously, the driver only supported reading the temperature from all
> > sensors and returning the maximum value. This update adds another
> > get_temp ops to support reading the temperature from each sensor
> > separately.
> >
> > Especially, some thermal zones registered by this patch are needed by
> > MT8183 since those thermal zones are necessary for mtk-svs driver.
>
> The DT for the mt8183 describes the sensor id = 0 as the CPU. On this,
> there is a cooling device with trip points.
>
> The driver registers the id=0 as an aggregator for the sensors which
> overloads the CPU thermal zone above.
>
> Why do you need to aggregate all the sensors to retrieve the max value ?
>
> They are all contributing differently to the heat and they should be
> tied with their proper cooling device.
>
> I don't think the thermal configuration is correct and I suggest to fix
> this aggregator by removing it.
>
>
>
 As far as I know the thermal design of Mediatek's board is based on
the highest temperature of the whole board. Also, removing the
aggregator will break all the boards using this driver.
By the way, I heard that baylibre is working on multi-sensor
aggregation support, which can be the alternative solution for the
aggregator in this driver, but that should be another story and is
unrelated to this patch.

> > Signed-off-by: Michael Kao <michael.kao@mediatek.com>
> > Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
> > Signed-off-by: Ben Tseng <ben.tseng@mediatek.com>
> > Signed-off-by: James Lo <james.lo@mediatek.com>
> > Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> > Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org>
> > ---
> > Changes in v14:
> > - Remove redundant error message.
> > - Link to v13: https://lore.kernel.org/r/20241025-auxadc_thermal-v13-1-a5231c52dccb@chromium.org
> >
> > Changes in v13:
> > - Make subject and commit message more clear.
> > - Make error message more clear.
> > - Link to v12: https://lore.kernel.org/r/20241016-auxadc_thermal-v12-1-c0433e9f61af@chromium.org
> >
> > Changes in v12:
> > - Remove unnecessary check and unused variable assignment in mtk_read_sensor_temp.
> > - Add more about what this patch achieves in the commit message.
> > - Link to v11: https://lore.kernel.org/r/20240809-auxadc_thermal-v11-1-af36cc74f3a3@chromium.org
> >
> > Changes in V11:
> >      - Rebase on kernel v6.11-rc2
> >      - Use mtk_thermal_temp_is_valid in mtk_read_sensor_temp just like
> >        mtk_thermal_bank_temperature
> >      - Change the error handling of devm_thermal_of_zone_register return
> >        value
> >      - link to V10: https://lore.kernel.org/lkml/20220519101044.16765-1-james.lo@mediatek.com/
> >
> > Changes in V10:
> >      - Rebase to kernel-v5.18-rc7
> >      - Resend
> >
> > Changes in V9:
> >      - Rebase to kernel-v5.14-rc1
> >      - Bind raw_to_mcelsius_v1 or raw_to_mcelsius_v2 to compatible
> >        data of struct mtk_thermal_data
> >      - Remove duplicate struct 'mtk_thermal_bank'
> >      - Remove unnecessary if condition check
> >      - Return error if any thermal zone fail to register
> >
> > Changes in V8:
> >      - Rebase to kernel-v5.13-rc1
> >      - Resend
> >
> > Changes in v7:
> >      - Fix build error in v6.
> >
> > Changes in v6:
> >      - Rebase to kernel-5.11-rc1.
> >      - [1/3]
> >          - add interrupts property.
> >      - [2/3]
> >          - add the Tested-by in the commit message.
> >      - [3/3]
> >          - use the mt->conf->msr[id] instead of conf->msr[id] in the
> >            _get_sensor_temp and mtk_thermal_bank_temperature.
> >          - remove the redundant space in _get_sensor_temp and
> >            mtk_read_sensor_temp.
> >          - change kmalloc to dev_kmalloc in mtk_thermal_probe.
> >
> > Changes in v5:
> >      - Rebase to kernel-5.9-rc1.
> >      - Revise the title of cover letter.
> >      - Drop "[v4,7/7] thermal: mediatek: use spinlock to protect PTPCORESEL"
> >      - [2/2]
> >          -  Add the judgement to the version of raw_to_mcelsius.
> >
> > Changes in v4:
> >      - Rebase to kernel-5.6-rc1.
> >      - [1/7]
> >          - Squash thermal zone settings in the dtsi from [v3,5/8]
> >            arm64: dts: mt8183: Increase polling frequency for CPU thermal zone.
> >          - Remove the property of interrupts and mediatek,hw-reset-temp.
> >      - [2/7]
> >          - Correct commit message.
> >      - [4/7]
> >          - Change the target temperature to the 80C and change the commit message.
> >      - [6/7]
> >          - Adjust newline alignment.
> >          - Fix the judgement on the return value of registering thermal zone.
> >
> > Changes in v3:
> >      - Rebase to kernel-5.5-rc1.
> >      - [1/8]
> >          - Update sustainable power of cpu, tzts1~5 and tztsABB.
> >      - [7/8]
> >          - Bypass the failure that non cpu_thermal sensor is not find in thermal-zones
> >            in dts, which is normal for mt8173, so prompt a warning here instead of
> >            failing.
> >
> >      Return -EAGAIN instead of -EACCESS on the first read of sensor that
> >          often are bogus values. This can avoid following warning on boot:
> >
> >            thermal thermal_zone6: failed to read out thermal zone (-13)
> >
> > Changes in v2:
> >      - [1/8]
> >          - Add the sustainable-power,trips,cooling-maps to the tzts1~tztsABB.
> >      - [4/8]
> >          - Add the min opp of cpu throttle.
> > ---
> >
> > ---
> >   drivers/thermal/mediatek/auxadc_thermal.c | 70 +++++++++++++++++++++++++++----
> >   1 file changed, 62 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/thermal/mediatek/auxadc_thermal.c b/drivers/thermal/mediatek/auxadc_thermal.c
> > index 9ee2e7283435acfcbb1a956303b6122a08affecc..9a9079d559a3abe9e3823f744d4c9a159a8666bd 100644
> > --- a/drivers/thermal/mediatek/auxadc_thermal.c
> > +++ b/drivers/thermal/mediatek/auxadc_thermal.c
> > @@ -847,7 +847,8 @@ static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
> >
> >   static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
> >   {
> > -     struct mtk_thermal *mt = thermal_zone_device_priv(tz);
> > +     struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> > +     struct mtk_thermal *mt = bank->mt;
> >       int i;
> >       int tempmax = INT_MIN;
> >
> > @@ -866,10 +867,41 @@ static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
> >       return 0;
> >   }
> >
> > +static int mtk_read_sensor_temp(struct thermal_zone_device *tz, int *temperature)
> > +{
> > +     struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
> > +     struct mtk_thermal *mt = bank->mt;
> > +     const struct mtk_thermal_data *conf = mt->conf;
> > +     int id = bank->id - 1;
> > +     int temp = INT_MIN;
> > +     u32 raw;
> > +
> > +     raw = readl(mt->thermal_base + conf->msr[id]);
> > +
> > +     temp = mt->raw_to_mcelsius(mt, id, raw);
> > +
> > +     /*
> > +      * The first read of a sensor often contains very high bogus
> > +      * temperature value. Filter these out so that the system does
> > +      * not immediately shut down.
> > +      */
> > +
> > +     if (unlikely(!mtk_thermal_temp_is_valid(temp)))
> > +             return -EAGAIN;
> > +
> > +     *temperature = temp;
> > +
> > +     return 0;
> > +}
> > +
> >   static const struct thermal_zone_device_ops mtk_thermal_ops = {
> >       .get_temp = mtk_read_temp,
> >   };
> >
> > +static const struct thermal_zone_device_ops mtk_thermal_sensor_ops = {
> > +     .get_temp = mtk_read_sensor_temp,
> > +};
> > +
> >   static void mtk_thermal_init_bank(struct mtk_thermal *mt, int num,
> >                                 u32 apmixed_phys_base, u32 auxadc_phys_base,
> >                                 int ctrl_id)
> > @@ -1199,6 +1231,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
> >       u64 auxadc_phys_base, apmixed_phys_base;
> >       struct thermal_zone_device *tzdev;
> >       void __iomem *apmixed_base, *auxadc_base;
> > +     struct mtk_thermal_bank *tz;
> >
> >       mt = devm_kzalloc(&pdev->dev, sizeof(*mt), GFP_KERNEL);
> >       if (!mt)
> > @@ -1285,14 +1318,35 @@ static int mtk_thermal_probe(struct platform_device *pdev)
> >                       mtk_thermal_init_bank(mt, i, apmixed_phys_base,
> >                                             auxadc_phys_base, ctrl_id);
> >
> > -     tzdev = devm_thermal_of_zone_register(&pdev->dev, 0, mt,
> > -                                           &mtk_thermal_ops);
> > -     if (IS_ERR(tzdev))
> > -             return PTR_ERR(tzdev);
> > +     for (i = 0; i <= mt->conf->num_sensors; i++) {
> > +             tz = devm_kmalloc(&pdev->dev, sizeof(*tz), GFP_KERNEL);
> > +             if (!tz)
> > +                     return -ENOMEM;
> > +
> > +             tz->mt = mt;
> > +             tz->id = i;
> > +
> > +             tzdev = devm_thermal_of_zone_register(&pdev->dev, i,
> > +                                                   tz, (i == 0) ?
> > +                                                   &mtk_thermal_ops
> > +                                                   : &mtk_thermal_sensor_ops);
> > +
> > +             if (IS_ERR(tzdev)) {
> > +                     ret = PTR_ERR(tzdev);
> > +                     if (ret == -ENODEV) {
> > +                             dev_warn(&pdev->dev,
> > +                                      "Can't find thermal zone for sensor %d; sensor skipped.\n", i);
> > +                             continue;
> > +                     }
> > +                     return dev_err_probe(&pdev->dev, ret,
> > +                                          "Failed to register thermal zone %d.\n", i);
> > +             }
> >
> > -     ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> > -     if (ret)
> > -             dev_warn(&pdev->dev, "error in thermal_add_hwmon_sysfs");
> > +             ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
> > +             if (ret)
> > +                     dev_warn(&pdev->dev,
> > +                              "Sensor %d: Error in thermal_add_hwmon_sysfs: %d\n", i, ret);
> > +     }
> >
> >       return 0;
> >   }
> >
> > ---
> > base-commit: b589839414be04b2b37e4bf6f84af576c99faf64
> > change-id: 20240809-auxadc_thermal-9be338ec8b1c
> >
> > Best regards,
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
Daniel Lezcano Nov. 19, 2024, 9:40 p.m. UTC | #4
On 19/11/2024 08:38, Hsin-Te Yuan wrote:
> On Fri, Nov 15, 2024 at 12:48 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>>
>> Hi,
>>
>> On 25/10/2024 14:05, Hsin-Te Yuan wrote:
>>> From: James Lo <james.lo@mediatek.com>
>>>
>>> Previously, the driver only supported reading the temperature from all
>>> sensors and returning the maximum value. This update adds another
>>> get_temp ops to support reading the temperature from each sensor
>>> separately.
>>>
>>> Especially, some thermal zones registered by this patch are needed by
>>> MT8183 since those thermal zones are necessary for mtk-svs driver.
>>
>> The DT for the mt8183 describes the sensor id = 0 as the CPU. On this,
>> there is a cooling device with trip points.
>>
>> The driver registers the id=0 as an aggregator for the sensors which
>> overloads the CPU thermal zone above.
>>
>> Why do you need to aggregate all the sensors to retrieve the max value ?
>>
>> They are all contributing differently to the heat and they should be
>> tied with their proper cooling device.
>>
>> I don't think the thermal configuration is correct and I suggest to fix
>> this aggregator by removing it.
>>
>>
>>
>   As far as I know the thermal design of Mediatek's board is based on
> the highest temperature of the whole board. Also, removing the
> aggregator will break all the boards using this driver.

AFAICT, it is not a thermal design but a thermal configuration.

What is the rational of using power numbers related to the CPU but 
aggregate all temperatures as an input to the governor ?

And for example, the mt8173 has 4 banks and 4 sensors per banks, so 16 
sensors. And they are all grouped together under the thermal zone 
"cpu-thermal" with the cpu cooling device.

So if the GPU is getting hot, we cool down the CPU ?


> By the way, I heard that baylibre is working on multi-sensor
> aggregation support, which can be the alternative solution for the
> aggregator in this driver, but that should be another story and is
> unrelated to this patch.

Right.
diff mbox series

Patch

diff --git a/drivers/thermal/mediatek/auxadc_thermal.c b/drivers/thermal/mediatek/auxadc_thermal.c
index 9ee2e7283435acfcbb1a956303b6122a08affecc..9a9079d559a3abe9e3823f744d4c9a159a8666bd 100644
--- a/drivers/thermal/mediatek/auxadc_thermal.c
+++ b/drivers/thermal/mediatek/auxadc_thermal.c
@@ -847,7 +847,8 @@  static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
 
 static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
 {
-	struct mtk_thermal *mt = thermal_zone_device_priv(tz);
+	struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
+	struct mtk_thermal *mt = bank->mt;
 	int i;
 	int tempmax = INT_MIN;
 
@@ -866,10 +867,41 @@  static int mtk_read_temp(struct thermal_zone_device *tz, int *temperature)
 	return 0;
 }
 
+static int mtk_read_sensor_temp(struct thermal_zone_device *tz, int *temperature)
+{
+	struct mtk_thermal_bank *bank = thermal_zone_device_priv(tz);
+	struct mtk_thermal *mt = bank->mt;
+	const struct mtk_thermal_data *conf = mt->conf;
+	int id = bank->id - 1;
+	int temp = INT_MIN;
+	u32 raw;
+
+	raw = readl(mt->thermal_base + conf->msr[id]);
+
+	temp = mt->raw_to_mcelsius(mt, id, raw);
+
+	/*
+	 * The first read of a sensor often contains very high bogus
+	 * temperature value. Filter these out so that the system does
+	 * not immediately shut down.
+	 */
+
+	if (unlikely(!mtk_thermal_temp_is_valid(temp)))
+		return -EAGAIN;
+
+	*temperature = temp;
+
+	return 0;
+}
+
 static const struct thermal_zone_device_ops mtk_thermal_ops = {
 	.get_temp = mtk_read_temp,
 };
 
+static const struct thermal_zone_device_ops mtk_thermal_sensor_ops = {
+	.get_temp = mtk_read_sensor_temp,
+};
+
 static void mtk_thermal_init_bank(struct mtk_thermal *mt, int num,
 				  u32 apmixed_phys_base, u32 auxadc_phys_base,
 				  int ctrl_id)
@@ -1199,6 +1231,7 @@  static int mtk_thermal_probe(struct platform_device *pdev)
 	u64 auxadc_phys_base, apmixed_phys_base;
 	struct thermal_zone_device *tzdev;
 	void __iomem *apmixed_base, *auxadc_base;
+	struct mtk_thermal_bank *tz;
 
 	mt = devm_kzalloc(&pdev->dev, sizeof(*mt), GFP_KERNEL);
 	if (!mt)
@@ -1285,14 +1318,35 @@  static int mtk_thermal_probe(struct platform_device *pdev)
 			mtk_thermal_init_bank(mt, i, apmixed_phys_base,
 					      auxadc_phys_base, ctrl_id);
 
-	tzdev = devm_thermal_of_zone_register(&pdev->dev, 0, mt,
-					      &mtk_thermal_ops);
-	if (IS_ERR(tzdev))
-		return PTR_ERR(tzdev);
+	for (i = 0; i <= mt->conf->num_sensors; i++) {
+		tz = devm_kmalloc(&pdev->dev, sizeof(*tz), GFP_KERNEL);
+		if (!tz)
+			return -ENOMEM;
+
+		tz->mt = mt;
+		tz->id = i;
+
+		tzdev = devm_thermal_of_zone_register(&pdev->dev, i,
+						      tz, (i == 0) ?
+						      &mtk_thermal_ops
+						      : &mtk_thermal_sensor_ops);
+
+		if (IS_ERR(tzdev)) {
+			ret = PTR_ERR(tzdev);
+			if (ret == -ENODEV) {
+				dev_warn(&pdev->dev,
+					 "Can't find thermal zone for sensor %d; sensor skipped.\n", i);
+				continue;
+			}
+			return dev_err_probe(&pdev->dev, ret,
+					     "Failed to register thermal zone %d.\n", i);
+		}
 
-	ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
-	if (ret)
-		dev_warn(&pdev->dev, "error in thermal_add_hwmon_sysfs");
+		ret = devm_thermal_add_hwmon_sysfs(&pdev->dev, tzdev);
+		if (ret)
+			dev_warn(&pdev->dev,
+				 "Sensor %d: Error in thermal_add_hwmon_sysfs: %d\n", i, ret);
+	}
 
 	return 0;
 }