Message ID | 006901d9be8c$f4439930$dccacb90$@telus.net |
---|---|
State | New |
Headers | show |
Series | x86/aperfmperf: Make stale CPU frequency response within limits. | expand |
Hi Doug, On Tue, Jul 25, 2023 at 9:12 PM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Rafael, > > On Tue, Jul 25, 2023 at 11:31 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > > > > > Currently, when the CPU frequency is stale the nominal clock frequency > > > is returned by calls to arch_freq_get_on_cpu(). Some users are > > > confused by the high reported frequency when their system is idle > > > and/or it is above a reduced maximum they set. > > > > > > This patch will return the policy minimum as the stale frequency reply > > > from arch_freq_get_on_cpu(). > > > > > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > > > Signed-off-by: Doug Smythies <dsmythies@telus.net> > > > --- > > > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > > > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > > > include/linux/cpufreq.h | 5 +++++ > > > 3 files changed, 28 insertions(+), 8 deletions(-) > > > > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > > > index fdbb5f07448f..44cc96864d94 100644 > > > --- a/arch/x86/kernel/cpu/aperfmperf.c > > > +++ b/arch/x86/kernel/cpu/aperfmperf.c > > > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > unsigned long last; > > > u64 acnt, mcnt; > > > > > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > > > - goto fallback; > > > - > > > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > > > + freq = cpufreq_quick_get(cpu); > > > + return freq ? freq : cpufreq_quick_get_min(cpu); > > > + } > > > do { > > > seq = raw_read_seqcount_begin(&s->seq); > > > last = s->last_update; > > > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > * which covers idle and NOHZ full CPUs. > > > */ > > > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > > > - goto fallback; > > > + return cpufreq_quick_get_min(cpu); > > > > > > return div64_u64((cpu_khz * acnt), mcnt); > > > - > > > -fallback: > > > - freq = cpufreq_quick_get(cpu); > > > - return freq ? freq : cpu_khz; > > > > It looks to me like modifying cpufreq_quick_get) to return policy->min > > if policy->cur is 0 would work in a similar way, wouldn't it? > > For the configuration of intel_cpufreq driver (intel_pstate in > passive mode), schedutil governor, HWP enabled, for > a stale frequency policy->cur is not 0 and will always > be whatever the min value was when the driver was initialized, > regardless of what has been set since. So I would prefer to address this in the intel_pstate driver than to work around it in the core. > The patch I submitted deals with that situation also. > > A complete list of driver/governor/HWP stale frequency > replies can be found on the bugzilla report at: > > https://bugzilla.kernel.org/attachment.cgi?id=304694 > > There might be push back on some of the performance > governor stale frequency replies. I could not figure out > a performance governor dependant reply. > > Also there are other callers to cpufreq_quick_get > and I was not sure I could mess with the function > response for them. For example > drivers/devfreq/tegra30-devfreq.c > and drivers/thermal/cpufreq_cooling.c > and drivers/powercap/dtpm_cpu.c IIUC, all of the above rely on policy->cur being nonzero. There are other users doing questionable things when cpufreq_quick_get() returns 0 that I think would be better off if the min is returned instead.
Hi Rafael, On Wed, Jul 26, 2023 at 7:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, Jul 25, 2023 at 9:12 PM Doug Smythies <dsmythies@telus.net> wrote: > > On Tue, Jul 25, 2023 at 11:31 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > > > > > > > Currently, when the CPU frequency is stale the nominal clock frequency > > > > is returned by calls to arch_freq_get_on_cpu(). Some users are > > > > confused by the high reported frequency when their system is idle > > > > and/or it is above a reduced maximum they set. > > > > > > > > This patch will return the policy minimum as the stale frequency reply > > > > from arch_freq_get_on_cpu(). > > > > > > > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > > > > Signed-off-by: Doug Smythies <dsmythies@telus.net> > > > > --- > > > > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > > > > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > > > > include/linux/cpufreq.h | 5 +++++ > > > > 3 files changed, 28 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > > > > index fdbb5f07448f..44cc96864d94 100644 > > > > --- a/arch/x86/kernel/cpu/aperfmperf.c > > > > +++ b/arch/x86/kernel/cpu/aperfmperf.c > > > > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > > unsigned long last; > > > > u64 acnt, mcnt; > > > > > > > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > > > > - goto fallback; > > > > - > > > > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > > > > + freq = cpufreq_quick_get(cpu); > > > > + return freq ? freq : cpufreq_quick_get_min(cpu); > > > > + } > > > > do { > > > > seq = raw_read_seqcount_begin(&s->seq); > > > > last = s->last_update; > > > > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > > * which covers idle and NOHZ full CPUs. > > > > */ > > > > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > > > > - goto fallback; > > > > + return cpufreq_quick_get_min(cpu); > > > > > > > > return div64_u64((cpu_khz * acnt), mcnt); > > > > - > > > > -fallback: > > > > - freq = cpufreq_quick_get(cpu); > > > > - return freq ? freq : cpu_khz; > > > > > > It looks to me like modifying cpufreq_quick_get) to return policy->min > > > if policy->cur is 0 would work in a similar way, wouldn't it? > > > > For the configuration of intel_cpufreq driver (intel_pstate in > > passive mode), schedutil governor, HWP enabled, for > > a stale frequency policy->cur is not 0 and will always > > be whatever the min value was when the driver was initialized, > > regardless of what has been set since. > > So I would prefer to address this in the intel_pstate driver than to > work around it in the core. Okay, but I would need some help with it. I already tried to figure out a fix before starting this thread, and have tried again since your comment. I haven't been able to figure it out. An example of the issue: Use the ondemand governor and set some minimum and also put a load on CPU 5 such that the governor asks for a non-min and non-max pstate. Then switch to the schedutil governor, and terminate the load on CPU 5, and look at CPU frequencies: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy10/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy11/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy1/scaling_cur_freq:4799871 /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq:4800027 /sys/devices/system/cpu/cpufreq/policy3/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy4/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy5/scaling_cur_freq:1300000 /sys/devices/system/cpu/cpufreq/policy6/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy7/scaling_cur_freq:4800736 /sys/devices/system/cpu/cpufreq/policy8/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy9/scaling_cur_freq:1000000 non stale frequencies are identified by non round numbers. observe that CPU 5 still indicates pstate 13. observe the other stale frequencies are the pstate 10 min that I set when the governor was ondemand. Now change the minimum to 1.1 GHz and check it: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy10/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy11/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy2/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy3/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy4/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy5/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy6/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy7/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy8/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy9/scaling_min_freq:1100000 and look at current again: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy10/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy11/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy1/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy3/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy4/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy5/scaling_cur_freq:1300000 /sys/devices/system/cpu/cpufreq/policy6/scaling_cur_freq:4800585 /sys/devices/system/cpu/cpufreq/policy7/scaling_cur_freq:4800177 /sys/devices/system/cpu/cpufreq/policy8/scaling_cur_freq:4799992 /sys/devices/system/cpu/cpufreq/policy9/scaling_cur_freq:4800015 Observe the stale frequencies are unchanged and outside of the range limits. > > The patch I submitted deals with that situation also. > > > > A complete list of driver/governor/HWP stale frequency > > replies can be found on the bugzilla report at: > > > > https://bugzilla.kernel.org/attachment.cgi?id=304694 > > > > There might be push back on some of the performance > > governor stale frequency replies. I could not figure out > > a performance governor dependant reply. > > > > Also there are other callers to cpufreq_quick_get > > and I was not sure I could mess with the function > > response for them. For example > > drivers/devfreq/tegra30-devfreq.c > > and drivers/thermal/cpufreq_cooling.c > > and drivers/powercap/dtpm_cpu.c > > IIUC, all of the above rely on policy->cur being nonzero. > > There are other users doing questionable things when > cpufreq_quick_get() returns 0 that I think would be better off if the > min is returned instead. Okay, I'll submit a new patch shortly, with this: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 50bbc969ffe5..4e91169a83f5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1768,7 +1768,7 @@ unsigned int cpufreq_quick_get(unsigned int cpu) policy = cpufreq_cpu_get(cpu); if (policy) { - ret_freq = policy->cur; + ret_freq = policy->cur ? policy->cur : policy->min; cpufreq_cpu_put(policy); } The testing results are in the bugzilla report here: https://bugzilla.kernel.org/attachment.cgi?id=304734 ... Doug
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c index fdbb5f07448f..44cc96864d94 100644 --- a/arch/x86/kernel/cpu/aperfmperf.c +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) unsigned long last; u64 acnt, mcnt; - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) - goto fallback; - + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ + freq = cpufreq_quick_get(cpu); + return freq ? freq : cpufreq_quick_get_min(cpu); + } do { seq = raw_read_seqcount_begin(&s->seq); last = s->last_update; @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) * which covers idle and NOHZ full CPUs. */ if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) - goto fallback; + return cpufreq_quick_get_min(cpu); return div64_u64((cpu_khz * acnt), mcnt); - -fallback: - freq = cpufreq_quick_get(cpu); - return freq ? freq : cpu_khz; } static int __init bp_init_aperfmperf(void) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 50bbc969ffe5..a0b24f61a5b0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1796,6 +1796,24 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) } EXPORT_SYMBOL(cpufreq_quick_get_max); +/** + * cpufreq_quick_get_min - return the min frequency for a given CPU + * @cpu: CPU number + */ +unsigned int cpufreq_quick_get_min(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + unsigned int ret_freq = 0; + + if (policy) { + ret_freq = policy->min; + cpufreq_cpu_put(policy); + } + + return ret_freq; +} +EXPORT_SYMBOL(cpufreq_quick_get_min); + /** * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU * @cpu: CPU number diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 172ff51c1b2a..c74dcb5f9263 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -220,6 +220,7 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) unsigned int cpufreq_get(unsigned int cpu); unsigned int cpufreq_quick_get(unsigned int cpu); unsigned int cpufreq_quick_get_max(unsigned int cpu); +unsigned int cpufreq_quick_get_min(unsigned int cpu); unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); void disable_cpufreq(void); @@ -250,6 +251,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) { return 0; } +static inline unsigned int cpufreq_quick_get_min(unsigned int cpu) +{ + return 0; +} static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) { return 0;
Currently, when the CPU frequency is stale the nominal clock frequency is returned by calls to arch_freq_get_on_cpu(). Some users are confused by the high reported frequency when their system is idle and/or it is above a reduced maximum they set. This patch will return the policy minimum as the stale frequency reply from arch_freq_get_on_cpu(). Reported-by: Yang Jie <yang.jie@linux.intel.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 Signed-off-by: Doug Smythies <dsmythies@telus.net> --- arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 28 insertions(+), 8 deletions(-)