diff mbox series

[1/3] cpufreq: Add a callback to update the min_freq_req from drivers

Message ID 20241003083952.3186-2-Dhananjay.Ugwekar@amd.com
State New
Headers show
Series [1/3] cpufreq: Add a callback to update the min_freq_req from drivers | expand

Commit Message

Dhananjay Ugwekar Oct. 3, 2024, 8:39 a.m. UTC
Currently, there is no proper way to update the initial lower frequency
limit from cpufreq drivers. Only way is to add a new min_freq qos
request from the driver side, but it leads to the issue explained below.

The QoS infrastructure collates the constraints from multiple
subsystems and saves them in a plist. The "current value" is defined to
be the highest value in the plist for min_freq constraint.

The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate
driver today adds qos request for min_freq to be lowest_freq, where
lowest_freq corresponds to CPPC.lowest_perf.

Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz,
lowest_non_linear_freq is 1200000 KHz.

At this point of time, the min_freq QoS plist looks like:

head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by
cpufreq core)

When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq,
it only results in updating the cpufreq-core's node in the plist, where
say 0 becomes the newly echoed value.

Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the
new list would be

head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered
by amd-pstate)

and the new "current value" of the min_freq QoS constraint will be 1000000
KHz, this is the scenario where it works as expected.

Suppose we change the amd-pstate driver code's min_freq qos constraint
to lowest_non_linear_freq instead of lowest_freq, then the user will
never be able to request a value below that, due to the following:

At boot time, the min_freq QoS plist would be

head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by 
cpufreq core)

When the user echoes a value of 1000000 KHz, to
/sys/devices/..../scaling_min_freq, then the new list would be

head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered 
by cpufreq core)

with the new "current value" of the min_freq QoS remaining 1200000 KHz.
Since the current value has not changed, there won't be any notifications
sent to the subsystems which have added their QoS constraints. In
particular, the amd-pstate driver will not get the notification, and thus,
the user's request to lower the scaling_min_freq will be ineffective.

Hence, it is advisable to have a single source of truth for the min and
max freq QoS constraints between the cpufreq and the cpufreq drivers.

So add a new callback get_init_min_freq() add in struct cpufreq_driver,
which allows amd-pstate (or any other cpufreq driver) to override the
default min_freq value being set in the policy->min_freq_req. Now
scaling_min_freq can be modified by the user to any value (lower or
higher than the init value) later on if desired.

Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
 drivers/cpufreq/cpufreq.c | 6 +++++-
 include/linux/cpufreq.h   | 6 ++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki Oct. 7, 2024, 3:48 p.m. UTC | #1
On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Hi,
>
> On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
> >
> > Hello Rafael,
> >
> > On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote:
> > > On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar
> > > <Dhananjay.Ugwekar@amd.com> wrote:
> > >>
> > >> Currently, there is no proper way to update the initial lower frequency
> > >> limit from cpufreq drivers.
> > >
> > > Why do you want to do it?
> >
> > We want to set the initial lower frequency limit at a more efficient level
> > (lowest_nonlinear_freq) than the lowest frequency, which helps save power in
> > some idle scenarios, and also improves benchmark results in some scenarios.
> > At the same time, we want to allow the user to set the lower limit back to
> > the inefficient lowest frequency.
>
> So you want the default value of scaling_min_freq to be greater than
> the total floor.
>
> I have to say that I'm not particularly fond of this approach because
> it is adding a new meaning to scaling_min_freq: Setting it below the
> default would not cause the driver to use inefficient frequencies

s/not/now/ (sorry)

I should have double checked this before sending.

> which user space may not be aware of.  Moreover, it would tell the
> driver how far it could go with that.
>
> IMV it would be bettwr to have a separate interface for this kind of tuning.
Dhananjay Ugwekar Oct. 8, 2024, 6:32 a.m. UTC | #2
Hello Rafael,

On 10/7/2024 9:18 PM, Rafael J. Wysocki wrote:
> On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> Hi,
>>
>> On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar
>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>
>>> Hello Rafael,
>>>
>>> On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote:
>>>> On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar
>>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>>
>>>>> Currently, there is no proper way to update the initial lower frequency
>>>>> limit from cpufreq drivers.
>>>>
>>>> Why do you want to do it?
>>>
>>> We want to set the initial lower frequency limit at a more efficient level
>>> (lowest_nonlinear_freq) than the lowest frequency, which helps save power in
>>> some idle scenarios, and also improves benchmark results in some scenarios.
>>> At the same time, we want to allow the user to set the lower limit back to
>>> the inefficient lowest frequency.
>>
>> So you want the default value of scaling_min_freq to be greater than
>> the total floor.

Yes, we want to set the default min value to what we think is best for the platform.

>>
>> I have to say that I'm not particularly fond of this approach because
>> it is adding a new meaning to scaling_min_freq: Setting it below the
>> default would not cause the driver to use inefficient frequencies
> 
> s/not/now/ (sorry)

I believe we are not changing the meaning of the scaling_min_freq just setting it 
to the best value at boot and then allowing the user to have access to the entire 
frequency range for the platform. Also, we have cpuinfo_min_freq/max_freq to 
indicate to the user as to what the entire frequency range is for the platform 
(depending on boost enabled/disabled).

> 
> I should have double checked this before sending.
> 
>> which user space may not be aware of.

I guess, this part we can fix by documenting correctly ?

>> Moreover, it would tell the
>> driver how far it could go with that.

Sorry, I didnt understand this part.

>>
>> IMV it would be bettwr to have a separate interface for this kind of tuning.

I feel like we can incorporate this change cleanly enough into scaling_min_freq, 
without adding a new interface which might further confuse the user. But please 
let me know your concerns and thoughts.

Thanks,
Dhananjay
Viresh Kumar Oct. 10, 2024, 10:07 a.m. UTC | #3
On 10-10-24, 15:24, Dhananjay Ugwekar wrote:
> Sure, that also works for us, if it is okay with you and Rafael, I can add the 
> u64 field in policy struct.

Let Rafael continue the discussion on the whole idea first, if we are aligned,
then you can make this change.
Rafael J. Wysocki Oct. 10, 2024, 5:57 p.m. UTC | #4
Hi,

On Tue, Oct 8, 2024 at 8:32 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Rafael,
>
> On 10/7/2024 9:18 PM, Rafael J. Wysocki wrote:
> > On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar
> >> <Dhananjay.Ugwekar@amd.com> wrote:
> >>>
> >>> Hello Rafael,
> >>>
> >>> On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote:
> >>>> On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar
> >>>> <Dhananjay.Ugwekar@amd.com> wrote:
> >>>>>
> >>>>> Currently, there is no proper way to update the initial lower frequency
> >>>>> limit from cpufreq drivers.
> >>>>
> >>>> Why do you want to do it?
> >>>
> >>> We want to set the initial lower frequency limit at a more efficient level
> >>> (lowest_nonlinear_freq) than the lowest frequency, which helps save power in
> >>> some idle scenarios, and also improves benchmark results in some scenarios.
> >>> At the same time, we want to allow the user to set the lower limit back to
> >>> the inefficient lowest frequency.
> >>
> >> So you want the default value of scaling_min_freq to be greater than
> >> the total floor.
>
> Yes, we want to set the default min value to what we think is best for the platform.
>
> >>
> >> I have to say that I'm not particularly fond of this approach because
> >> it is adding a new meaning to scaling_min_freq: Setting it below the
> >> default would not cause the driver to use inefficient frequencies
> >
> > s/not/now/ (sorry)
>
> I believe we are not changing the meaning of the scaling_min_freq just setting it
> to the best value at boot and then allowing the user to have access to the entire
> frequency range for the platform. Also, we have cpuinfo_min_freq/max_freq to
> indicate to the user as to what the entire frequency range is for the platform
> (depending on boost enabled/disabled).

But the default you want to set is the lowest efficient frequency
which user space needs to be aware of.  Otherwise it may make
suboptimal decisions.

> >
> > I should have double checked this before sending.
> >
> >> which user space may not be aware of.
>
> I guess, this part we can fix by documenting correctly ?
>
> >> Moreover, it would tell the
> >> driver how far it could go with that.
>
> Sorry, I didnt understand this part.

Sorry, never mind.  I was just repeating myself.

> >>
> >> IMV it would be bettwr to have a separate interface for this kind of tuning.
>
> I feel like we can incorporate this change cleanly enough into scaling_min_freq,
> without adding a new interface which might further confuse the user. But please
> let me know your concerns and thoughts.

I'm not sure if you realize that the .show() operation for
scaling_min_freq prints policy->max and you can easily make your
driver's .verify() callback change it to whatever value is desired.
You may as well set it to the lowest efficient frequency if the one
passed to you is equal to FREQ_QOS_MIN_DEFAULT_VALUE.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index f98c9438760c..2923068cf5f4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1361,6 +1361,7 @@  static int cpufreq_online(unsigned int cpu)
 	bool new_policy;
 	unsigned long flags;
 	unsigned int j;
+	u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE;
 	int ret;
 
 	pr_debug("%s: bringing CPU%u online\n", __func__, cpu);
@@ -1445,9 +1446,12 @@  static int cpufreq_online(unsigned int cpu)
 			goto out_destroy_policy;
 		}
 
+		if (cpufreq_driver->get_init_min_freq)
+			init_min_freq = cpufreq_driver->get_init_min_freq(policy);
+
 		ret = freq_qos_add_request(&policy->constraints,
 					   policy->min_freq_req, FREQ_QOS_MIN,
-					   FREQ_QOS_MIN_DEFAULT_VALUE);
+					   init_min_freq);
 		if (ret < 0) {
 			/*
 			 * So we don't call freq_qos_remove_request() for an
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index e0e19d9c1323..b20488b55f6c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -414,6 +414,12 @@  struct cpufreq_driver {
 	 * policy is properly initialized, but before the governor is started.
 	 */
 	void		(*register_em)(struct cpufreq_policy *policy);
+
+	/*
+	 * Set by drivers that want to initialize the policy->min_freq_req with
+	 * a value different from the default value (0) in cpufreq core.
+	 */
+	int		(*get_init_min_freq)(struct cpufreq_policy *policy);
 };
 
 /* flags */