Message ID | 13690581.X0sz4iL7V8@kreacher |
---|---|
Headers | show |
Series | cpufreq: ACPI: Address performance regression related to scale-invariance | expand |
On Thu, 2021-02-04 at 18:25 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > A severe performance regression on AMD EPYC processors when using > the schedutil scaling governor was discovered by Phoronix.com and > attributed to the following commits: > > 41ea667227ba ("x86, sched: Calculate frequency invariance for > AMD systems") > > 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P > for frequency invariance on AMD EPYC") > > [snip] > > Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") > Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") > Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") > Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 > Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/ > Reported-by: Michael Larabel <Michael@phoronix.com> > Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpufreq/acpi-cpufreq.c | 107 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 95 insertions(+), 12 deletions(-) > > Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c > +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c > [...] Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Note there is also the Tested-by: Michael, from the other thread https://lore.kernel.org/lkml/5ea06dbe-255c-3d22-b9bd-6e627c5f94af@phoronix.com/ I tested this patch with the "NASA Parallel Benchmarks" from [link below], the results confirms that the 5.10 performance is recovered: Ratios of completion times, lower is better (5.10 is the baseline) 5.10 5.11-rc6 5.11-rc6-ggherdov 5.11-rc6-rafael ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Integer Sort 1.00 1.21 0.91 0.93 Embarrassingly Parallel 1.00 1.60 1.00 1.00 Discrete FFT 1.00 1.68 0.67 0.67 CPU : MODEL : 2x AMD EPYC 7742 FREQUENCY TABLE : P2: 1.50 GHz P1: 2.00 GHz P0: 2.25 GHz MAX BOOST : 3.40 GHz [link] https://www.nas.nasa.gov/publications/npb.html Thanks, Giovanni
On Thu, 2021-02-04 at 18:34 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > If the maximum performance level taken for computing the > arch_max_freq_ratio value used in the x86 scale-invariance code is > higher than the one corresponding to the cpuinfo.max_freq value > coming from the acpi_cpufreq driver, the scale-invariant utilization > falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly > faster, which causes the schedutil governor to select a frequency > below cpuinfo.max_freq. That frequency corresponds to a frequency > table entry below the maximum performance level necessary to get to > the "boost" range of CPU frequencies which prevents "boost" > frequencies from being used in some workloads. > > While this issue is related to scale-invariance, it may be amplified > by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as > default with intel_pstate") from the 5.10 development cycle which > made it extremely easy to default to schedutil even if the preferred > driver is acpi_cpufreq as long as intel_pstate is built too, because > the mere presence of the latter effectively removes the ondemand > governor from the defaults. Distro kernels are likely to include > both intel_pstate and acpi_cpufreq on x86, so their users who cannot > use intel_pstate or choose to use acpi_cpufreq may easily be > affectecd by this issue. > > If CPPC is available, it can be used to address this issue by > extending the frequency tables created by acpi_cpufreq to cover the > entire available frequency range (including "boost" frequencies) for > each CPU, but if CPPC is not there, acpi_cpufreq has no idea what > the maximum "boost" frequency is and the frequency tables created by > it cannot be extended in a meaningful way, so in that case make it > ask the arch scale-invariance code to to use the "nominal" performance > level for CPU utilization scaling in order to avoid the issue at hand. > > Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > arch/x86/kernel/smpboot.c | 1 + > drivers/cpufreq/acpi-cpufreq.c | 8 ++++++++ > 2 files changed, 9 insertions(+) > > Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c > +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c > @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct > state_count++; > valid_states++; > data->first_perf_state = valid_states; > + } else { > + /* > + * If the maximum "boost" frequency is unknown, ask the arch > + * scale-invariance code to use the "nominal" performance for > + * CPU utilization scaling so as to prevent the schedutil > + * governor from selecting inadequate CPU frequencies. > + */ > + arch_set_max_freq_ratio(true); > } > > freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); > Index: linux-pm/arch/x86/kernel/smpboot.c > =================================================================== > --- linux-pm.orig/arch/x86/kernel/smpboot.c > +++ linux-pm/arch/x86/kernel/smpboot.c > @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_ > arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE : > arch_turbo_freq_ratio; > } > +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio); > > static bool turbo_disabled(void) > { Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Thanks, Giovanni
On Thu, Feb 04, 2021 at 06:34:32PM +0100, Rafael J. Wysocki wrote: > arch/x86/kernel/smpboot.c | 1 + > drivers/cpufreq/acpi-cpufreq.c | 8 ++++++++ > 2 files changed, 9 insertions(+) > > Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c > =================================================================== > --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c > +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c > @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct > state_count++; > valid_states++; > data->first_perf_state = valid_states; > + } else { > + /* > + * If the maximum "boost" frequency is unknown, ask the arch > + * scale-invariance code to use the "nominal" performance for > + * CPU utilization scaling so as to prevent the schedutil > + * governor from selecting inadequate CPU frequencies. > + */ > + arch_set_max_freq_ratio(true); > } > > freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); > Index: linux-pm/arch/x86/kernel/smpboot.c > =================================================================== > --- linux-pm.orig/arch/x86/kernel/smpboot.c > +++ linux-pm/arch/x86/kernel/smpboot.c > @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_ > arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE : > arch_turbo_freq_ratio; > } > +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio); Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On Fri, Feb 5, 2021 at 12:59 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Feb 04, 2021 at 06:34:32PM +0100, Rafael J. Wysocki wrote: > > > arch/x86/kernel/smpboot.c | 1 + > > drivers/cpufreq/acpi-cpufreq.c | 8 ++++++++ > > 2 files changed, 9 insertions(+) > > > > Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c > > =================================================================== > > --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c > > +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c > > @@ -806,6 +806,14 @@ static int acpi_cpufreq_cpu_init(struct > > state_count++; > > valid_states++; > > data->first_perf_state = valid_states; > > + } else { > > + /* > > + * If the maximum "boost" frequency is unknown, ask the arch > > + * scale-invariance code to use the "nominal" performance for > > + * CPU utilization scaling so as to prevent the schedutil > > + * governor from selecting inadequate CPU frequencies. > > + */ > > + arch_set_max_freq_ratio(true); > > } > > > > freq_table = kcalloc(state_count, sizeof(*freq_table), GFP_KERNEL); > > Index: linux-pm/arch/x86/kernel/smpboot.c > > =================================================================== > > --- linux-pm.orig/arch/x86/kernel/smpboot.c > > +++ linux-pm/arch/x86/kernel/smpboot.c > > @@ -1833,6 +1833,7 @@ void arch_set_max_freq_ratio(bool turbo_ > > arch_max_freq_ratio = turbo_disabled ? SCHED_CAPACITY_SCALE : > > arch_turbo_freq_ratio; > > } > > +EXPORT_SYMBOL_GPL(arch_set_max_freq_ratio); > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Thanks, I'm queuing up this lot for a post-rc7 late push.