mbox series

[0/5] Support Autonomous Selection mode in cppc_cpufreq

Message ID 20250211103737.447704-1-sumitg@nvidia.com
Headers show
Series Support Autonomous Selection mode in cppc_cpufreq | expand

Message

Sumit Gupta Feb. 11, 2025, 10:37 a.m. UTC
This patchset supports the Autonomous Performance Level Selection mode
in the cppc_cpufreq driver. The feature is part of the existing CPPC
specification and already present in Intel and AMD specific pstate
cpufreq drivers. The patchset adds the support in generic acpi cppc
cpufreq driver.

It adds a new 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq' driver
for supporting the Autonomous Performance Level Selection and Energy
Performance Preference (EPP).
Autonomous selection will get enabled during boot if 'cppc_auto_sel'
boot argument is passed or the 'Autonomous Selection Enable' register
is already set before kernel boot. When enabled, the hardware is
allowed to autonomously select the CPU frequency within the min and
max perf boundaries using the Engergy Performance Preference hints.
The EPP values range from '0x0'(performance preference) to '0xFF'
(energy efficiency preference).

It also exposes the acpi_cppc sysfs nodes to update the epp, auto_sel
and {min|max_perf} registers for changing the hints to hardware for
Autonomous selection.

In a followup patch, plan to add support to dynamically switch the
cpufreq driver instance from 'cppc_cpufreq_epp' to 'cppc_cpufreq' and
vice-versa without reboot.

The patches are divided into below groups:
- Patch [1-2]: Improvements. Can be applied independently.
- Patch [3-4]: sysfs store nodes for Auto mode. Depend on Patch [1-2].
- Patch [5]: Support for 'cppc_cpufreq_epp'. Uses a macro from [3].

Sumit Gupta (5):
  ACPI: CPPC: add read perf ctrls api and rename few existing
  ACPI: CPPC: expand macro to create store acpi_cppc sysfs node
  ACPI: CPPC: support updating epp, auto_sel and {min|max_perf} from
    sysfs
  Documentation: ACPI: add autonomous mode ctrls info in cppc_sysfs.txt
  cpufreq: CPPC: Add cppc_cpufreq_epp instance for Autonomous mode

 Documentation/admin-guide/acpi/cppc_sysfs.rst |  28 ++
 .../admin-guide/kernel-parameters.txt         |  11 +
 drivers/acpi/cppc_acpi.c                      | 311 ++++++++++++++++--
 drivers/cpufreq/cppc_cpufreq.c                | 260 ++++++++++++++-
 include/acpi/cppc_acpi.h                      |  19 +-
 5 files changed, 572 insertions(+), 57 deletions(-)

Comments

zhenglifeng (A) Feb. 12, 2025, 10:52 a.m. UTC | #1
On 2025/2/11 22:08, Sumit Gupta wrote:
> 
> 
>>
>> On 2025/2/11 18:44, Viresh Kumar wrote:
>>> On 11-02-25, 16:07, Sumit Gupta wrote:
>>>> This patchset supports the Autonomous Performance Level Selection mode
>>>> in the cppc_cpufreq driver. The feature is part of the existing CPPC
>>>> specification and already present in Intel and AMD specific pstate
>>>> cpufreq drivers. The patchset adds the support in generic acpi cppc
>>>> cpufreq driver.
>>>
>>> Is there an overlap with:
>>>
>>> https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
>>>
>>> ?
>>
>> Ha, it looks like we're doing something very similar.
>>
> 
> Hi Viresh,
> 
> Thank you for pointing to [1].
> 
> There seems to be some common points about updating the 'energy_perf'
> and 'auto_sel' registers for autonomous mode but the current patchset
> has more comprehensive changes to support Autonomous mode with the
> cppc_cpufreq driver.
> 
> The patches in [1]:
> 1) Make the cpc register read/write API’s generic and improves error
>    handling for 'CPC_IN_PCC'.
> 2) Expose sysfs under 'cppc_cpufreq_attr' to update 'auto_select',
>    'auto_act_window' and 'epp' registers.
> 
> The current patch series:
> 1) Exposes sysfs under 'cppc_attrs' to keep CPC registers together.
> 2) Updates existing API’s to use new registers and creates new API
>    with similar semantics to get all perf_ctrls.
> 3) Renames some existing API’s for clarity.
> 4) Use these existing API’s from acpi_cppc sysfs to update the CPC
>    registers used in Autonomous mode:
>    'auto_select', 'epp', 'min_perf', 'max_perf' registers.
> 5) Add separate 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq'
>    driver to apply different limit and policy for Autonomous mode.
>    Having it separate will avoid confusion between SW and HW mode.
>    Also, it will be easy to scale and add new features in future
>    without interference. Similar approach is used in Intel and AMD
>    pstate drivers.
> 
> Please share inputs about the preferred approach.
> 
> Best Regards,
> Sumit Gupta
> 
> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
> 
> 

Hi Sumit,

Thanks for upstreaming this.

I think the changes to cppc_acpi in this patchset is inappropriate.

1) cppc_attrs are common sysfs for any system that supports CPPC. That
means, these interfaces would appear even if the cpufreq driver has already
managing it, e.g. amd-pstate and cppc_cpufreq. This would create multiple
interfaces to modify the same CPPC regs, which may probably introduce
concurrency and data consistency issues. Instead, exposing the interfaces
under cppc_cpufreq_attr decouples the write access to CPPC regs.

2) It's inappropriate to call cpufreq_cpu_get() in cppc_acpi. This file
currently provides interfaces for cpufreq drivers to use. It has no ABI
dependency on cpufreq at the moment.

Apart from the changes to cppc_acpi, I think the whole patchset in [1] can
be independent to this patchset. In other words, adding the
cppc_cpufreq_epp_driver could be standalone to discuss. I think combining
the use of ->setpolicy() and setting EPP could be a use case? Could you
explain more on the motivation of adding a new cppc_cpufreq_epp_driver?

[1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/

Regards,
Lifeng

>>>
>>>> It adds a new 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq' driver
>>>> for supporting the Autonomous Performance Level Selection and Energy
>>>> Performance Preference (EPP).
>>>> Autonomous selection will get enabled during boot if 'cppc_auto_sel'
>>>> boot argument is passed or the 'Autonomous Selection Enable' register
>>>> is already set before kernel boot. When enabled, the hardware is
>>>> allowed to autonomously select the CPU frequency within the min and
>>>> max perf boundaries using the Engergy Performance Preference hints.
>>>> The EPP values range from '0x0'(performance preference) to '0xFF'
>>>> (energy efficiency preference).
>>>>
>>>> It also exposes the acpi_cppc sysfs nodes to update the epp, auto_sel
>>>> and {min|max_perf} registers for changing the hints to hardware for
>>>> Autonomous selection.
>>>>
>>>> In a followup patch, plan to add support to dynamically switch the
>>>> cpufreq driver instance from 'cppc_cpufreq_epp' to 'cppc_cpufreq' and
>>>> vice-versa without reboot.
>>>>
>>>> The patches are divided into below groups:
>>>> - Patch [1-2]: Improvements. Can be applied independently.
>>>> - Patch [3-4]: sysfs store nodes for Auto mode. Depend on Patch [1-2].
>>>> - Patch [5]: Support for 'cppc_cpufreq_epp'. Uses a macro from [3].
>>>>
>>>> Sumit Gupta (5):
>>>>    ACPI: CPPC: add read perf ctrls api and rename few existing
>>>>    ACPI: CPPC: expand macro to create store acpi_cppc sysfs node
>>>>    ACPI: CPPC: support updating epp, auto_sel and {min|max_perf} from
>>>>      sysfs
>>>>    Documentation: ACPI: add autonomous mode ctrls info in cppc_sysfs.txt
>>>>    cpufreq: CPPC: Add cppc_cpufreq_epp instance for Autonomous mode
>>>>
>>>>   Documentation/admin-guide/acpi/cppc_sysfs.rst |  28 ++
>>>>   .../admin-guide/kernel-parameters.txt         |  11 +
>>>>   drivers/acpi/cppc_acpi.c                      | 311 ++++++++++++++++--
>>>>   drivers/cpufreq/cppc_cpufreq.c                | 260 ++++++++++++++-
>>>>   include/acpi/cppc_acpi.h                      |  19 +-
>>>>   5 files changed, 572 insertions(+), 57 deletions(-)
>>>
>>
Sumit Gupta Feb. 14, 2025, 7:08 a.m. UTC | #2
On 12/02/25 16:22, zhenglifeng (A) wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2025/2/11 22:08, Sumit Gupta wrote:
>>
>>
>>>
>>> On 2025/2/11 18:44, Viresh Kumar wrote:
>>>> On 11-02-25, 16:07, Sumit Gupta wrote:
>>>>> This patchset supports the Autonomous Performance Level Selection mode
>>>>> in the cppc_cpufreq driver. The feature is part of the existing CPPC
>>>>> specification and already present in Intel and AMD specific pstate
>>>>> cpufreq drivers. The patchset adds the support in generic acpi cppc
>>>>> cpufreq driver.
>>>>
>>>> Is there an overlap with:
>>>>
>>>> https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
>>>>
>>>> ?
>>>
>>> Ha, it looks like we're doing something very similar.
>>>
>>
>> Hi Viresh,
>>
>> Thank you for pointing to [1].
>>
>> There seems to be some common points about updating the 'energy_perf'
>> and 'auto_sel' registers for autonomous mode but the current patchset
>> has more comprehensive changes to support Autonomous mode with the
>> cppc_cpufreq driver.
>>
>> The patches in [1]:
>> 1) Make the cpc register read/write API’s generic and improves error
>>     handling for 'CPC_IN_PCC'.
>> 2) Expose sysfs under 'cppc_cpufreq_attr' to update 'auto_select',
>>     'auto_act_window' and 'epp' registers.
>>
>> The current patch series:
>> 1) Exposes sysfs under 'cppc_attrs' to keep CPC registers together.
>> 2) Updates existing API’s to use new registers and creates new API
>>     with similar semantics to get all perf_ctrls.
>> 3) Renames some existing API’s for clarity.
>> 4) Use these existing API’s from acpi_cppc sysfs to update the CPC
>>     registers used in Autonomous mode:
>>     'auto_select', 'epp', 'min_perf', 'max_perf' registers.
>> 5) Add separate 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq'
>>     driver to apply different limit and policy for Autonomous mode.
>>     Having it separate will avoid confusion between SW and HW mode.
>>     Also, it will be easy to scale and add new features in future
>>     without interference. Similar approach is used in Intel and AMD
>>     pstate drivers.
>>
>> Please share inputs about the preferred approach.
>>
>> Best Regards,
>> Sumit Gupta
>>
>> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
>>
>>
> 
> Hi Sumit,
> 
> Thanks for upstreaming this.
> 
> I think the changes to cppc_acpi in this patchset is inappropriate.
> 
> 1) cppc_attrs are common sysfs for any system that supports CPPC. That
> means, these interfaces would appear even if the cpufreq driver has already
> managing it, e.g. amd-pstate and cppc_cpufreq. This would create multiple
> interfaces to modify the same CPPC regs, which may probably introduce
> concurrency and data consistency issues. Instead, exposing the interfaces
> under cppc_cpufreq_attr decouples the write access to CPPC regs.
> 

Hi Lifeng,

I think its more appropriate to keep all the CPC registers together
instead of splitting the read only registers to the acpi_cppc sysfs
and read/write registers to the cpufreq sysfs.

Only the EPP register is written from Intel and AMD.
  $ grep cpufreq_freq_attr_rw drivers/cpufreq/* | grep -v scaling
  drivers/cpufreq/acpi-cpufreq.c:cpufreq_freq_attr_rw(cpb);
 
drivers/cpufreq/amd-pstate.c:cpufreq_freq_attr_rw(energy_performance_preference);
 
drivers/cpufreq/intel_pstate.c:cpufreq_freq_attr_rw(energy_performance_preference);

We are currently updating four registers and there can be more in
future like 'auto_act_window' update attribute in [1].
Changed to make this conditional with 'ifdef CONFIG_ACPI_CPPC_CPUFREQ'
to not create attributes for Intel/AMD.

  +++ b/drivers/acpi/cppc_acpi.c
  @@ static struct attribute *cppc_attrs[] = {
          &lowest_freq.attr,
  +#ifdef CONFIG_ACPI_CPPC_CPUFREQ
          &max_perf.attr,
          &min_perf.attr,
          &perf_limited.attr,
          &auto_activity_window.attr,
          &energy_perf.attr,
  +#endif

> 2) It's inappropriate to call cpufreq_cpu_get() in cppc_acpi. This file
> currently provides interfaces for cpufreq drivers to use. It has no ABI
> dependency on cpufreq at the moment.
> 

cpufreq_cpu_get() is already used by multiple non-cpufreq drivers.
So, don't think its a problem.
  $ grep -inr "= cpufreq_cpu_get(.*;" drivers/*| grep -v "cpufreq/"|wc -l
  10

> Apart from the changes to cppc_acpi, I think the whole patchset in [1] can
> be independent to this patchset. In other words, adding the
> cppc_cpufreq_epp_driver could be standalone to discuss. I think combining
> the use of ->setpolicy() and setting EPP could be a use case? Could you
> explain more on the motivation of adding a new cppc_cpufreq_epp_driver?
> 

With 'cppc_cpufreq_epp_driver', we provide an easy option to boot all
CPU's in auto mode with right epp and policy min/max equivalent of
{min|max}_perf. The mode can be found clearly with scaling_driver node. 
Separating the HW and SW mode based on driver instance also
makes it easy to scale later.
Advanced users can program sysfs to switch individual CPU's in and out
of the HW mode. We can update policy min/max values accordingly.
In this case, there can be some CPU's in SW mode with epp driver 
instance. But a separate instance will be more convenient for the
users who want all CPU's either in HW mode or in SW mode than having
to explicitly set all the values correctly.

Regards,
Sumit Gupta

> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
> 
> Regards,
> Lifeng
> 
>>>>
>>>>> It adds a new 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq' driver
>>>>> for supporting the Autonomous Performance Level Selection and Energy
>>>>> Performance Preference (EPP).
>>>>> Autonomous selection will get enabled during boot if 'cppc_auto_sel'
>>>>> boot argument is passed or the 'Autonomous Selection Enable' register
>>>>> is already set before kernel boot. When enabled, the hardware is
>>>>> allowed to autonomously select the CPU frequency within the min and
>>>>> max perf boundaries using the Engergy Performance Preference hints.
>>>>> The EPP values range from '0x0'(performance preference) to '0xFF'
>>>>> (energy efficiency preference).
>>>>>
>>>>> It also exposes the acpi_cppc sysfs nodes to update the epp, auto_sel
>>>>> and {min|max_perf} registers for changing the hints to hardware for
>>>>> Autonomous selection.
>>>>>
>>>>> In a followup patch, plan to add support to dynamically switch the
>>>>> cpufreq driver instance from 'cppc_cpufreq_epp' to 'cppc_cpufreq' and
>>>>> vice-versa without reboot.
>>>>>
>>>>> The patches are divided into below groups:
>>>>> - Patch [1-2]: Improvements. Can be applied independently.
>>>>> - Patch [3-4]: sysfs store nodes for Auto mode. Depend on Patch [1-2].
>>>>> - Patch [5]: Support for 'cppc_cpufreq_epp'. Uses a macro from [3].
>>>>>
>>>>> Sumit Gupta (5):
>>>>>     ACPI: CPPC: add read perf ctrls api and rename few existing
>>>>>     ACPI: CPPC: expand macro to create store acpi_cppc sysfs node
>>>>>     ACPI: CPPC: support updating epp, auto_sel and {min|max_perf} from
>>>>>       sysfs
>>>>>     Documentation: ACPI: add autonomous mode ctrls info in cppc_sysfs.txt
>>>>>     cpufreq: CPPC: Add cppc_cpufreq_epp instance for Autonomous mode
>>>>>
>>>>>    Documentation/admin-guide/acpi/cppc_sysfs.rst |  28 ++
>>>>>    .../admin-guide/kernel-parameters.txt         |  11 +
>>>>>    drivers/acpi/cppc_acpi.c                      | 311 ++++++++++++++++--
>>>>>    drivers/cpufreq/cppc_cpufreq.c                | 260 ++++++++++++++-
>>>>>    include/acpi/cppc_acpi.h                      |  19 +-
>>>>>    5 files changed, 572 insertions(+), 57 deletions(-)
>>>>
>>>
>
Rafael J. Wysocki Feb. 18, 2025, 7:23 p.m. UTC | #3
On Fri, Feb 14, 2025 at 8:09 AM Sumit Gupta <sumitg@nvidia.com> wrote:
>
>
>
> On 12/02/25 16:22, zhenglifeng (A) wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 2025/2/11 22:08, Sumit Gupta wrote:
> >>
> >>
> >>>
> >>> On 2025/2/11 18:44, Viresh Kumar wrote:
> >>>> On 11-02-25, 16:07, Sumit Gupta wrote:
> >>>>> This patchset supports the Autonomous Performance Level Selection mode
> >>>>> in the cppc_cpufreq driver. The feature is part of the existing CPPC
> >>>>> specification and already present in Intel and AMD specific pstate
> >>>>> cpufreq drivers. The patchset adds the support in generic acpi cppc
> >>>>> cpufreq driver.
> >>>>
> >>>> Is there an overlap with:
> >>>>
> >>>> https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
> >>>>
> >>>> ?
> >>>
> >>> Ha, it looks like we're doing something very similar.
> >>>
> >>
> >> Hi Viresh,
> >>
> >> Thank you for pointing to [1].
> >>
> >> There seems to be some common points about updating the 'energy_perf'
> >> and 'auto_sel' registers for autonomous mode but the current patchset
> >> has more comprehensive changes to support Autonomous mode with the
> >> cppc_cpufreq driver.
> >>
> >> The patches in [1]:
> >> 1) Make the cpc register read/write API’s generic and improves error
> >>     handling for 'CPC_IN_PCC'.
> >> 2) Expose sysfs under 'cppc_cpufreq_attr' to update 'auto_select',
> >>     'auto_act_window' and 'epp' registers.
> >>
> >> The current patch series:
> >> 1) Exposes sysfs under 'cppc_attrs' to keep CPC registers together.
> >> 2) Updates existing API’s to use new registers and creates new API
> >>     with similar semantics to get all perf_ctrls.
> >> 3) Renames some existing API’s for clarity.
> >> 4) Use these existing API’s from acpi_cppc sysfs to update the CPC
> >>     registers used in Autonomous mode:
> >>     'auto_select', 'epp', 'min_perf', 'max_perf' registers.
> >> 5) Add separate 'cppc_cpufreq_epp' instance of the 'cppc_cpufreq'
> >>     driver to apply different limit and policy for Autonomous mode.
> >>     Having it separate will avoid confusion between SW and HW mode.
> >>     Also, it will be easy to scale and add new features in future
> >>     without interference. Similar approach is used in Intel and AMD
> >>     pstate drivers.
> >>
> >> Please share inputs about the preferred approach.
> >>
> >> Best Regards,
> >> Sumit Gupta
> >>
> >> [1] https://lore.kernel.org/all/20250206131428.3261578-1-zhenglifeng1@huawei.com/
> >>
> >>
> >
> > Hi Sumit,
> >
> > Thanks for upstreaming this.
> >
> > I think the changes to cppc_acpi in this patchset is inappropriate.
> >
> > 1) cppc_attrs are common sysfs for any system that supports CPPC. That
> > means, these interfaces would appear even if the cpufreq driver has already
> > managing it, e.g. amd-pstate and cppc_cpufreq. This would create multiple
> > interfaces to modify the same CPPC regs, which may probably introduce
> > concurrency and data consistency issues. Instead, exposing the interfaces
> > under cppc_cpufreq_attr decouples the write access to CPPC regs.
> >
>
> Hi Lifeng,
>
> I think its more appropriate to keep all the CPC registers together
> instead of splitting the read only registers to the acpi_cppc sysfs
> and read/write registers to the cpufreq sysfs.
>
> Only the EPP register is written from Intel and AMD.
>   $ grep cpufreq_freq_attr_rw drivers/cpufreq/* | grep -v scaling
>   drivers/cpufreq/acpi-cpufreq.c:cpufreq_freq_attr_rw(cpb);
>
> drivers/cpufreq/amd-pstate.c:cpufreq_freq_attr_rw(energy_performance_preference);
>
> drivers/cpufreq/intel_pstate.c:cpufreq_freq_attr_rw(energy_performance_preference);
>
> We are currently updating four registers and there can be more in
> future like 'auto_act_window' update attribute in [1].
> Changed to make this conditional with 'ifdef CONFIG_ACPI_CPPC_CPUFREQ'
> to not create attributes for Intel/AMD.
>
>   +++ b/drivers/acpi/cppc_acpi.c
>   @@ static struct attribute *cppc_attrs[] = {
>           &lowest_freq.attr,
>   +#ifdef CONFIG_ACPI_CPPC_CPUFREQ
>           &max_perf.attr,
>           &min_perf.attr,
>           &perf_limited.attr,
>           &auto_activity_window.attr,
>           &energy_perf.attr,
>   +#endif
>
> > 2) It's inappropriate to call cpufreq_cpu_get() in cppc_acpi. This file
> > currently provides interfaces for cpufreq drivers to use. It has no ABI
> > dependency on cpufreq at the moment.
> >
>
> cpufreq_cpu_get() is already used by multiple non-cpufreq drivers.
> So, don't think its a problem.
>   $ grep -inr "= cpufreq_cpu_get(.*;" drivers/*| grep -v "cpufreq/"|wc -l
>   10
>
> > Apart from the changes to cppc_acpi, I think the whole patchset in [1] can
> > be independent to this patchset. In other words, adding the
> > cppc_cpufreq_epp_driver could be standalone to discuss. I think combining
> > the use of ->setpolicy() and setting EPP could be a use case? Could you
> > explain more on the motivation of adding a new cppc_cpufreq_epp_driver?
> >
>
> With 'cppc_cpufreq_epp_driver', we provide an easy option to boot all
> CPU's in auto mode with right epp and policy min/max equivalent of
> {min|max}_perf. The mode can be found clearly with scaling_driver node.
> Separating the HW and SW mode based on driver instance also
> makes it easy to scale later.
> Advanced users can program sysfs to switch individual CPU's in and out
> of the HW mode. We can update policy min/max values accordingly.
> In this case, there can be some CPU's in SW mode with epp driver
> instance. But a separate instance will be more convenient for the
> users who want all CPU's either in HW mode or in SW mode than having
> to explicitly set all the values correctly.

There seems to be some quite fundamental disagreement on how this
should be done, so I'm afraid I cannot do much about it ATM.

Please agree on a common approach and come back to me when you are ready.

Sending two concurrent patchsets under confusingly similar names again
and again isn't particularly helpful.

Thanks!