Message ID | 3827230.0GnL3RTcl1@kreacher |
---|---|
Headers | show |
Series | cpufreq: Allow drivers to receive more information from the governor | expand |
On 2020.12.14 12:02 Rafael J. Wysocki wrote:
> Hi,
Hi Rafael,
V2 test results below are new, other results are partially re-stated:
For readers that do not want to read on, I didn't find anything different than with
the other versions. This was more just due diligence.
Legend:
hwp: Kernel 5.10-rc6, HWP enabled; intel_cpufreq
rfc (or rjw): Kernel 5.10-rc6 + this patch set, HWP enabled; intel_cpu-freq; schedutil
no-hwp: Kernel 5.10-rc6, HWP disabled; intel_cpu-freq
acpi (or acpi-cpufreq): Kernel 5.10-rc6, HWP disabled; acpi-cpufreq; schedutil
patch: Kernel 5.10-rc7 + V1 patch set, HWP enabled; intel_cpu-freq; schedutil
v2: Kernel 5.10-rc7 + V2 patch set, HWP enabled; intel_cpu-freq; schedutil
Fixed work packet, fixed period, periodic workflow, load sweep up/down:
load work/sleep frequency: 73 Hertz:
hwp: Average: 12.00822 watts
rjw: Average: 10.18089 watts
no-hwp: Average: 10.21947 watts
acpi-cpufreq: Average: 9.06585 watts
patch: Average: 10.26060 watts
v2: Average: 10.50444
load work/sleep frequency: 113 Hertz:
hwp: Average: 12.01056
rjw: Average: 10.12303
no-hwp: Average: 10.08228
acpi-cpufreq: Average: 9.02215
patch: Average: 10.27055
v2: Average: 10.31097
load work/sleep frequency: 211 Hertz:
hwp: Average: 12.16067
rjw: Average: 10.24413
no-hwp: Average: 10.12463
acpi-cpufreq: Average: 9.19175
patch: Average: 10.33000
v2: Average: 10.39811
load work/sleep frequency: 347 Hertz:
hwp: Average: 12.34169
rjw: Average: 10.79980
no-hwp: Average: 10.57296
acpi-cpufreq: Average: 9.84709
patch: Average: 10.67029
v2: Average: 10.93143
load work/sleep frequency: 401 Hertz:
hwp: Average: 12.42562
rjw: Average: 11.12465
no-hwp: Average: 11.24203
acpi-cpufreq: Average: 10.78670
patch: Average: 10.94514
v2: Average: 11.50324
Serialized single threaded via PIDs per second method:
A.K.A fixed work packet, variable period
Results:
Execution times (seconds. Less is better):
no-hwp:
performance: Samples: 382 ; Average: 10.54450 ; Stand Deviation: 0.01564 ; Maximum: 10.61000 ; Minimum: 10.50000
schedutil: Samples: 293 ; Average: 13.73416 ; Stand Deviation: 0.73395 ; Maximum: 15.46000 ; Minimum: 11.68000
acpi: Samples: 253 ; Average: 15.94889 ; Stand Deviation: 1.28219 ; Maximum: 18.66000 ; Minimum: 12.04000
hwp:
schedutil: Samples: 380 ; Average: 10.58287 ; Stand Deviation: 0.01864 ; Maximum: 10.64000 ; Minimum: 10.54000
patch: Samples: 276 ; Average: 14.57029 ; Stand Deviation: 0.89771 ; Maximum: 16.04000 ; Minimum: 11.68000
rfc: Samples: 271 ; Average: 14.86037 ; Stand Deviation: 0.84164 ; Maximum: 16.04000 ; Minimum: 12.21000
v2: Samples: 274 ; Average: 14.67978 ; Stand Deviation: 1.03378 ; Maximum: 16.07000 ; Minimum: 11.43000
Power (watts. More indicates higher CPU frequency and better performance. Sample time = 1 second.):
no-hwp:
performance: Samples: 4000 ; Average: 25.41355 ; Stand Deviation: 0.22156 ; Maximum: 26.01996 ; Minimum: 24.08807
schedutil: Samples: 4000 ; Average: 12.58863 ; Stand Deviation: 5.48600 ; Maximum: 25.50934 ; Minimum: 7.54559
acpi: Samples: 4000 ; Average: 9.57924 ; Stand Deviation: 5.41157 ; Maximum: 25.06366 ; Minimum: 5.51129
hwp:
schedutil: Samples: 4000 ; Average: 25.24245 ; Stand Deviation: 0.19539 ; Maximum: 25.93671 ; Minimum: 24.14746
patch: Samples: 4000 ; Average: 11.07225 ; Stand Deviation: 5.63142 ; Maximum: 24.99493 ; Minimum: 3.67548
rfc: Samples: 4000 ; Average: 10.35842 ; Stand Deviation: 4.77915 ; Maximum: 24.95953 ; Minimum: 7.26202
v2: Samples: 4000 ; Average: 10.98284 ; Stand Deviation: 5.48859 ; Maximum: 25.76331 ; Minimum: 7.53790
On Mon, 2020-12-14 at 21:01 +0100, Rafael J. Wysocki wrote: > Hi, > > The timing of this is not perfect (sorry about that), but here's a refresh > of this series. > > The majority of the previous cover letter still applies: > [...] Hello, the series is tested using -> tbench (packets processing with loopback networking, measures throughput) -> dbench (filesystem operations, measures average latency) -> kernbench (kernel compilation, elapsed time) -> and gitsource (long-running shell script, elapsed time) These are chosen because none of them is bound by compute and all are sensitive to freq scaling decisions. The machines are a Cascade Lake based server, a client Skylake and a Coffee Lake laptop. What's being compared: sugov-HWP.desired : the present series; intel_pstate=passive, governor=schedutil sugov-HWP.min : mainline; intel_pstate=passive, governor=schedutil powersave-HWP : mainline; intel_pstate=active, governor=powersave perfgov-HWP : mainline; intel_pstate=active, governor=performance sugov-no-HWP : HWP disabled; intel_pstate=passive, governor=schedutil Dbench and Kernbench have neutral results, but Tbench has sugov-HWP.desired lose in both performance and performance-per-watt, while Gitsource show the series as faster in raw performance but again worse than the competition in efficiency. 1. SUMMARY BY BENCHMARK 1.1. TBENCH 1.2. DBENCH 1.3. KERNBENCH 1.4. GITSOURCE 2. SUMMARY BY USER PROFILE 2.1. PERFORMANCE USER: what if I switch pergov -> schedutil? 2.2. DEFAULT USER: what if I switch powersave -> schedutil? 2.3. DEVELOPER: what if I switch sugov-HWP.min -> sugov-HWP.desired? 3. RESULTS TABLES PERFORMANCE RATIOS PERFORMANCE-PER-WATT RATIOS 1. SUMMARY BY BENCHMARK ~~~~~~~~~~~~~~~~~~~~~~~ Tbench: sugov-HWP.desired is the worst performance on all three machines. sugov-HWP.min is between 20% and 90% better. The baseline sugov-HWP.desired offers a lower throughput, but does it increase efficiency? It actually doesn't: on two out of three machines the incumbent code (current sugov, or intel_pstate=active) has 10% to 35% better efficiency. In other word, the status quo is both faster and more efficient than the proposed series on this benchmark. The absolute power consumption is lower, but the delivered performance is "even more lower", and that's why performance-per-watt shows a net loss. Dbench: generally neutral, in both performance and efficiency. Powersave is occasionally behind the pack in performance, 5% to 15%. A 15% performance loss on the Coffe Lake is compensated by an 80% improved efficiency. To be noted that on the same Coffee Lake sugov-no-HWP is 20% ahead of the pack in efficiency. Kernbench: neutral, in both performance and efficiency. powersave looses 14% to the pack in performance on the Cascade Lake. Gitsource: this test show the most compelling case against the sugov-HWP.desired series: on the Cascade Lake sugov-HWP.desired is 10% faster than sugov-HWP.min (it was expected to be slower!) and 35% less efficient (we expected more performance-per-watt, not less). 2. SUMMARY BY USER PROFILE ~~~~~~~~~~~~~~~~~~~~~~~~~~ If I was a perfgov-HWP user, I would be 20%-90% faster than with other governors on tbench an gitsource. This speed gap comes with an unexpected efficiency bonus on both test. Since dbench and kernbench have a flat profile across the board, there is no incentive to try another governor. If I was a powersave-HWP user, I'd be the slower of the bunch. The lost performance is not, in general, balanced by better efficiency. This only happens on Coffee Lake, which is a CPU for the mobile market and possibly HWP has efficiency-oriented tuning there. Any flavor of schedutil would be an improvement. From a developer perspective, the obstacles to move from HWP.min to HWP.desired are tbench, where HWP.desired is worse than having no HWP support at all, and gitsource, where HWP.desired has the opposite properties than those advertised (it's actually faster but less efficient). 3. RESULTS TABLES ~~~~~~~~~~~~~~~~~ Tilde (~) means the result is the same as baseline (or, the ratio is close to 1). The double asterisk (**) is a visual aid and means the result is better than baseline (higher or lower depending on the case). | 80x_CASCADELAKE_NUMA: Intel Cascade Lake, 40 cores / 80 threads, NUMA, SATA SSD storage + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE RATIOS | tbench 1.00 1.89** 1.88** 1.89** 1.17** higher | dbench 1.00 ~ 1.06 ~ ~ lower | kernbench 1.00 ~ 1.14 ~ ~ lower | gitsource 1.00 1.11 2.70 0.80** ~ lower + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE-PER-WATT RATIOS | tbench 1.00 1.36** 1.38** 1.33** 1.04** higher | dbench 1.00 ~ ~ ~ ~ higher | kernbench 1.00 ~ ~ ~ ~ higher | gitsource 1.00 1.36** 0.63 1.22** 1.02** higher | 8x_COFFEELAKE_UMA: Intel Coffee Lake, 4 cores / 8 threads, UMA, NVMe SSD storage + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE RATIOS | tbench 1.00 1.27** 1.30** 1.30** 1.31** higher | dbench 1.00 ~ 1.15 ~ ~ lower | kernbench 1.00 ~ ~ ~ ~ lower | gitsource 1.00 ~ 2.09 ~ ~ lower + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE-PER-WATT RATIOS | tbench 1.00 ~ ~ ~ ~ higher | dbench 1.00 ~ 1.82** ~ 1.22** higher | kernbench 1.00 ~ ~ ~ ~ higher | gitsource 1.00 ~ 1.56** ~ 1.17** higher | 8x_SKYLAKE_UMA: Intel Skylake (client), 4 cores / 8 threads, UMA, SATA SSD storage + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE RATIOS | tbench 1.00 1.21** 1.22** 1.20** 1.06** higher | dbench 1.00 ~ ~ ~ ~ lower | kernbench 1.00 ~ ~ ~ ~ lower | gitsource 1.00 ~ 1.71 0.96** ~ lower + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | PERFORMANCE-PER-WATT RATIOS | tbench 1.00 1.11** 1.12** 1.10** 1.03** higher | dbench 1.00 ~ ~ ~ ~ higher | kernbench 1.00 ~ ~ ~ ~ higher | gitsource 1.00 ~ 0.75 ~ ~ higher Giovanni
On Thu, Dec 17, 2020 at 4:27 PM Doug Smythies <dsmythies@telus.net> wrote: > > On 2020.12.14 12:02 Rafael J. Wysocki wrote: > > > Hi, > > Hi Rafael, > > V2 test results below are new, other results are partially re-stated: > > For readers that do not want to read on, I didn't find anything different than with > the other versions. This was more just due diligence. Thanks a lot for the data, much appreciated as always! > Legend: > > hwp: Kernel 5.10-rc6, HWP enabled; intel_cpufreq > rfc (or rjw): Kernel 5.10-rc6 + this patch set, HWP enabled; intel_cpu-freq; schedutil > no-hwp: Kernel 5.10-rc6, HWP disabled; intel_cpu-freq > acpi (or acpi-cpufreq): Kernel 5.10-rc6, HWP disabled; acpi-cpufreq; schedutil > patch: Kernel 5.10-rc7 + V1 patch set, HWP enabled; intel_cpu-freq; schedutil > v2: Kernel 5.10-rc7 + V2 patch set, HWP enabled; intel_cpu-freq; schedutil > > Fixed work packet, fixed period, periodic workflow, load sweep up/down: > > load work/sleep frequency: 73 Hertz: > > hwp: Average: 12.00822 watts > rjw: Average: 10.18089 watts > no-hwp: Average: 10.21947 watts > acpi-cpufreq: Average: 9.06585 watts > patch: Average: 10.26060 watts > v2: Average: 10.50444 > > load work/sleep frequency: 113 Hertz: > > hwp: Average: 12.01056 > rjw: Average: 10.12303 > no-hwp: Average: 10.08228 > acpi-cpufreq: Average: 9.02215 > patch: Average: 10.27055 > v2: Average: 10.31097 > > load work/sleep frequency: 211 Hertz: > > hwp: Average: 12.16067 > rjw: Average: 10.24413 > no-hwp: Average: 10.12463 > acpi-cpufreq: Average: 9.19175 > patch: Average: 10.33000 > v2: Average: 10.39811 > > load work/sleep frequency: 347 Hertz: > > hwp: Average: 12.34169 > rjw: Average: 10.79980 > no-hwp: Average: 10.57296 > acpi-cpufreq: Average: 9.84709 > patch: Average: 10.67029 > v2: Average: 10.93143 > > load work/sleep frequency: 401 Hertz: > > hwp: Average: 12.42562 > rjw: Average: 11.12465 > no-hwp: Average: 11.24203 > acpi-cpufreq: Average: 10.78670 > patch: Average: 10.94514 > v2: Average: 11.50324 > > > Serialized single threaded via PIDs per second method: > A.K.A fixed work packet, variable period > Results: > > Execution times (seconds. Less is better): > > no-hwp: > > performance: Samples: 382 ; Average: 10.54450 ; Stand Deviation: 0.01564 ; Maximum: 10.61000 ; Minimum: 10.50000 > > schedutil: Samples: 293 ; Average: 13.73416 ; Stand Deviation: 0.73395 ; Maximum: 15.46000 ; Minimum: 11.68000 > acpi: Samples: 253 ; Average: 15.94889 ; Stand Deviation: 1.28219 ; Maximum: 18.66000 ; Minimum: 12.04000 > > hwp: > > schedutil: Samples: 380 ; Average: 10.58287 ; Stand Deviation: 0.01864 ; Maximum: 10.64000 ; Minimum: 10.54000 > patch: Samples: 276 ; Average: 14.57029 ; Stand Deviation: 0.89771 ; Maximum: 16.04000 ; Minimum: 11.68000 > rfc: Samples: 271 ; Average: 14.86037 ; Stand Deviation: 0.84164 ; Maximum: 16.04000 ; Minimum: 12.21000 > v2: Samples: 274 ; Average: 14.67978 ; Stand Deviation: 1.03378 ; Maximum: 16.07000 ; Minimum: 11.43000 > > Power (watts. More indicates higher CPU frequency and better performance. Sample time = 1 second.): > > no-hwp: > > performance: Samples: 4000 ; Average: 25.41355 ; Stand Deviation: 0.22156 ; Maximum: 26.01996 ; Minimum: 24.08807 > > schedutil: Samples: 4000 ; Average: 12.58863 ; Stand Deviation: 5.48600 ; Maximum: 25.50934 ; Minimum: 7.54559 > acpi: Samples: 4000 ; Average: 9.57924 ; Stand Deviation: 5.41157 ; Maximum: 25.06366 ; Minimum: 5.51129 > > hwp: > > schedutil: Samples: 4000 ; Average: 25.24245 ; Stand Deviation: 0.19539 ; Maximum: 25.93671 ; Minimum: 24.14746 > patch: Samples: 4000 ; Average: 11.07225 ; Stand Deviation: 5.63142 ; Maximum: 24.99493 ; Minimum: 3.67548 > rfc: Samples: 4000 ; Average: 10.35842 ; Stand Deviation: 4.77915 ; Maximum: 24.95953 ; Minimum: 7.26202 > v2: Samples: 4000 ; Average: 10.98284 ; Stand Deviation: 5.48859 ; Maximum: 25.76331 ; Minimum: 7.53790 > >
Hi, On Fri, Dec 18, 2020 at 5:22 PM Giovanni Gherdovich <ggherdovich@suse.com> wrote: > > On Mon, 2020-12-14 at 21:01 +0100, Rafael J. Wysocki wrote: > > Hi, > > > > The timing of this is not perfect (sorry about that), but here's a refresh > > of this series. > > > > The majority of the previous cover letter still applies: > > [...] > > Hello, > > the series is tested using > > -> tbench (packets processing with loopback networking, measures throughput) > -> dbench (filesystem operations, measures average latency) > -> kernbench (kernel compilation, elapsed time) > -> and gitsource (long-running shell script, elapsed time) > > These are chosen because none of them is bound by compute and all are > sensitive to freq scaling decisions. The machines are a Cascade Lake based > server, a client Skylake and a Coffee Lake laptop. First of all, many thanks for the results! Any test results input is always much appreciated for all of the changes under consideration. > What's being compared: > > sugov-HWP.desired : the present series; intel_pstate=passive, governor=schedutil > sugov-HWP.min : mainline; intel_pstate=passive, governor=schedutil > powersave-HWP : mainline; intel_pstate=active, governor=powersave > perfgov-HWP : mainline; intel_pstate=active, governor=performance > sugov-no-HWP : HWP disabled; intel_pstate=passive, governor=schedutil > > Dbench and Kernbench have neutral results, but Tbench has sugov-HWP.desired > lose in both performance and performance-per-watt, while Gitsource show the > series as faster in raw performance but again worse than the competition in > efficiency. Well, AFAICS tbench "likes" high turbo and is sensitive to the response time (as indicated by the fact that it is also sensitive to the polling limit value in cpuidle). Using the target perf to set HWP_REQ.DESIRED (instead of using it to set HWP_REQ.MIN) generally causes the turbo to be less aggressive and the response time to go up, so the tbench result is not a surprise at all. This case represents the tradeoff being made here (as noted by Doug in one of his previous messages). The gitsource result is a bit counter-intuitive, but my conclusions drawn from it are quite different from yours (more on that below). > 1. SUMMARY BY BENCHMARK > 1.1. TBENCH > 1.2. DBENCH > 1.3. KERNBENCH > 1.4. GITSOURCE > 2. SUMMARY BY USER PROFILE > 2.1. PERFORMANCE USER: what if I switch pergov -> schedutil? > 2.2. DEFAULT USER: what if I switch powersave -> schedutil? > 2.3. DEVELOPER: what if I switch sugov-HWP.min -> sugov-HWP.desired? > 3. RESULTS TABLES > PERFORMANCE RATIOS > PERFORMANCE-PER-WATT RATIOS > > > 1. SUMMARY BY BENCHMARK > ~~~~~~~~~~~~~~~~~~~~~~~ > > Tbench: sugov-HWP.desired is the worst performance on all three > machines. sugov-HWP.min is between 20% and 90% better. The baseline > sugov-HWP.desired offers a lower throughput, but does it increase > efficiency? It actually doesn't: on two out of three machines the > incumbent code (current sugov, or intel_pstate=active) has 10% to 35% > better efficiency. In other word, the status quo is both faster and more > efficient than the proposed series on this benchmark. > The absolute power consumption is lower, but the delivered performance is > "even more lower", and that's why performance-per-watt shows a net loss. This benchmark is best off when run under the performance governor and the observation that sugov-HWP.min is almost as good as the performance governor for it is a consequence of a bias towards performance in the former (which need not be regarded as a good thing). The drop in energy-efficiency is somewhat disappointing, but not entirely unexpected too. > Dbench: generally neutral, in both performance and efficiency. Powersave is > occasionally behind the pack in performance, 5% to 15%. A 15% performance > loss on the Coffe Lake is compensated by an 80% improved efficiency. To be > noted that on the same Coffee Lake sugov-no-HWP is 20% ahead of the pack > in efficiency. > > Kernbench: neutral, in both performance and efficiency. powersave looses 14% > to the pack in performance on the Cascade Lake. > > Gitsource: this test show the most compelling case against the > sugov-HWP.desired series: on the Cascade Lake sugov-HWP.desired is 10% > faster than sugov-HWP.min (it was expected to be slower!) and 35% less > efficient (we expected more performance-per-watt, not less). This is a bit counter-intuitive, so it is good to try to understand what's going on instead of drawing conclusions right away from pure numbers. My interpretation of the available data is that gitsource benefits from the "race-to-idle" effect in terms of energy-efficiency which also causes it to suffer in terms of performance. Namely, completing the given piece of work faster causes some CPU idle time to become available and that effectively reduces power, but it also increases the response time (by the idle state exit latency) which causes performance to drop. Whether or not this effect can be present depends on what CPU idle states are available etc. and it may be a pure coincidence. What sugov-HWP.desired really does is to bias the frequency towards whatever is perceived by schedutil as sufficient to run the workload (which is a key property of it - see below) and it appears to do the job here quite well, but it eliminates the "race-to-idle" effect that the workload benefited from originally and, like it or not, schedutil cannot know about that effect. That effect can only be present if the frequencies used for running the workload are too high and by a considerable margin (sufficient for a deep enough idle state to be entered). In some cases running the workload too fast helps (like in this one, although this time it happens to hurt performance), but in some cases it really hurts energy-efficiency and the question is whether or not this should be always done. There is a whole broad category of workloads involving periodic tasks that do the same amount of work in every period regardless of the frequency they run at (as long as the frequency is sufficient to avoid "overrunning" the period) and they almost never benefit from "race-to-idle".There is zero benefit from running them too fast and the energy-efficiency goes down the sink when that happens. Now the problem is that with sugov-HWP.min the users who care about these workloads don't even have an option to use the task utilization history recorded by the scheduler to bias the frequency towards the "sufficient" level, because sugov-HWP.min only sets a lower bound on the frequency selection to improve the situation, so the choice between it and sugov-HWP.desired boils down to whether or not to give that option to them and my clear preference is for that option to exist. Sorry about that. [Note that it really is an option, though, because "pure" HWP is still the default for HWP-enabled systems.] It may be possible to restore some "race-to-idle" benefits by tweaking HWP_REQ.EPP in the future, but that needs to be investigated. BTW, what EPP value was there on the system where you saw better performance under sugov-HWP.desired? If it was greater than zero, it would be useful to decrease EPP (by adjusting the energy_performance_preference attributes in sysfs for all CPUs) and see what happens to the performance difference then. > > 2. SUMMARY BY USER PROFILE > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > If I was a perfgov-HWP user, I would be 20%-90% faster than with other governors > on tbench an gitsource. This speed gap comes with an unexpected efficiency > bonus on both test. Since dbench and kernbench have a flat profile across the > board, there is no incentive to try another governor. > > If I was a powersave-HWP user, I'd be the slower of the bunch. The lost > performance is not, in general, balanced by better efficiency. This only > happens on Coffee Lake, which is a CPU for the mobile market and possibly HWP > has efficiency-oriented tuning there. Any flavor of schedutil would be an > improvement. > > From a developer perspective, the obstacles to move from HWP.min to > HWP.desired are tbench, where HWP.desired is worse than having no HWP support > at all, and gitsource, where HWP.desired has the opposite properties than > those advertised (it's actually faster but less efficient). > > > 3. RESULTS TABLES > ~~~~~~~~~~~~~~~~~ > > Tilde (~) means the result is the same as baseline (or, the ratio is close to 1). > The double asterisk (**) is a visual aid and means the result is better than > baseline (higher or lower depending on the case). > > > | 80x_CASCADELAKE_NUMA: Intel Cascade Lake, 40 cores / 80 threads, NUMA, SATA SSD storage > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE RATIOS > | tbench 1.00 1.89** 1.88** 1.89** 1.17** higher > | dbench 1.00 ~ 1.06 ~ ~ lower > | kernbench 1.00 ~ 1.14 ~ ~ lower > | gitsource 1.00 1.11 2.70 0.80** ~ lower > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE-PER-WATT RATIOS > | tbench 1.00 1.36** 1.38** 1.33** 1.04** higher > | dbench 1.00 ~ ~ ~ ~ higher > | kernbench 1.00 ~ ~ ~ ~ higher > | gitsource 1.00 1.36** 0.63 1.22** 1.02** higher > > > | 8x_COFFEELAKE_UMA: Intel Coffee Lake, 4 cores / 8 threads, UMA, NVMe SSD storage > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE RATIOS > | tbench 1.00 1.27** 1.30** 1.30** 1.31** higher > | dbench 1.00 ~ 1.15 ~ ~ lower > | kernbench 1.00 ~ ~ ~ ~ lower > | gitsource 1.00 ~ 2.09 ~ ~ lower > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE-PER-WATT RATIOS > | tbench 1.00 ~ ~ ~ ~ higher > | dbench 1.00 ~ 1.82** ~ 1.22** higher > | kernbench 1.00 ~ ~ ~ ~ higher > | gitsource 1.00 ~ 1.56** ~ 1.17** higher > > > | 8x_SKYLAKE_UMA: Intel Skylake (client), 4 cores / 8 threads, UMA, SATA SSD storage > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | sugov-HWP.des sugov-HWP.min powersave-HWP perfgov-HWP sugov-no-HWP better if > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE RATIOS > | tbench 1.00 1.21** 1.22** 1.20** 1.06** higher > | dbench 1.00 ~ ~ ~ ~ lower > | kernbench 1.00 ~ ~ ~ ~ lower > | gitsource 1.00 ~ 1.71 0.96** ~ lower > + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | PERFORMANCE-PER-WATT RATIOS > | tbench 1.00 1.11** 1.12** 1.10** 1.03** higher > | dbench 1.00 ~ ~ ~ ~ higher > | kernbench 1.00 ~ ~ ~ ~ higher > | gitsource 1.00 ~ 0.75 ~ ~ higher
On Mon, 2020-12-21 at 17:11 +0100, Rafael J. Wysocki wrote: > Hi, > > On Fri, Dec 18, 2020 at 5:22 PM Giovanni Gherdovich wrote: > > > > Gitsource: this test show the most compelling case against the > > sugov-HWP.desired series: on the Cascade Lake sugov-HWP.desired is 10% > > faster than sugov-HWP.min (it was expected to be slower!) and 35% less > > efficient (we expected more performance-per-watt, not less). > > This is a bit counter-intuitive, so it is good to try to understand > what's going on instead of drawing conclusions right away from pure > numbers. > > My interpretation of the available data is that gitsource benefits > from the "race-to-idle" effect in terms of energy-efficiency which > also causes it to suffer in terms of performance. Namely, completing > the given piece of work faster causes some CPU idle time to become > available and that effectively reduces power, but it also increases > the response time (by the idle state exit latency) which causes > performance to drop. Whether or not this effect can be present depends > on what CPU idle states are available etc. and it may be a pure > coincidence. > > [snip] Right, race-to-idle might explain the increased efficiency of HWP.MIN. As you note, increased exit latencies from idle can also explain the overall performance difference. > There is a whole broad category of workloads involving periodic tasks > that do the same amount of work in every period regardless of the > frequency they run at (as long as the frequency is sufficient to avoid > "overrunning" the period) and they almost never benefit from > "race-to-idle".There is zero benefit from running them too fast and > the energy-efficiency goes down the sink when that happens. > > Now the problem is that with sugov-HWP.min the users who care about > these workloads don't even have an option to use the task utilization > history recorded by the scheduler to bias the frequency towards the > "sufficient" level, because sugov-HWP.min only sets a lower bound on > the frequency selection to improve the situation, so the choice > between it and sugov-HWP.desired boils down to whether or not to give > that option to them and my clear preference is for that option to > exist. Sorry about that. [Note that it really is an option, though, > because "pure" HWP is still the default for HWP-enabled systems.] Sure, the periodic workloads benefit from this patch, Doug's test shows that. I guess I'm still confused by the difference between setting HWP.DESIRED and disabling HWP completely. The Intel manual says that a non-zero HWP.DESIRED "effectively disabl[es] HW autonomous selection", but then continues with "The Desired_Performance input is non-constraining in terms of Performance and Energy optimizations, which are independently controlled". The first statement sounds like HWP is out of the picture (no more autonomous frequency selections) but the latter part implies there are other optimizations still available. I'm not sure how to reason about that. > It may be possible to restore some "race-to-idle" benefits by tweaking > HWP_REQ.EPP in the future, but that needs to be investigated. > > BTW, what EPP value was there on the system where you saw better > performance under sugov-HWP.desired? If it was greater than zero, it > would be useful to decrease EPP (by adjusting the > energy_performance_preference attributes in sysfs for all CPUs) and > see what happens to the performance difference then. For sugov-HWP.desired the EPP was 0x80 (the default value). Giovanni
On Wed, Dec 23, 2020 at 2:08 PM Giovanni Gherdovich <ggherdovich@suse.com> wrote: > > On Mon, 2020-12-21 at 17:11 +0100, Rafael J. Wysocki wrote: > > Hi, > > > > On Fri, Dec 18, 2020 at 5:22 PM Giovanni Gherdovich wrote: > > > > > > Gitsource: this test show the most compelling case against the > > > sugov-HWP.desired series: on the Cascade Lake sugov-HWP.desired is 10% > > > faster than sugov-HWP.min (it was expected to be slower!) and 35% less > > > efficient (we expected more performance-per-watt, not less). > > > > This is a bit counter-intuitive, so it is good to try to understand > > what's going on instead of drawing conclusions right away from pure > > numbers. > > > > My interpretation of the available data is that gitsource benefits > > from the "race-to-idle" effect in terms of energy-efficiency which > > also causes it to suffer in terms of performance. Namely, completing > > the given piece of work faster causes some CPU idle time to become > > available and that effectively reduces power, but it also increases > > the response time (by the idle state exit latency) which causes > > performance to drop. Whether or not this effect can be present depends > > on what CPU idle states are available etc. and it may be a pure > > coincidence. > > > > [snip] > > Right, race-to-idle might explain the increased efficiency of HWP.MIN. > As you note, increased exit latencies from idle can also explain the overall > performance difference. > > > There is a whole broad category of workloads involving periodic tasks > > that do the same amount of work in every period regardless of the > > frequency they run at (as long as the frequency is sufficient to avoid > > "overrunning" the period) and they almost never benefit from > > "race-to-idle".There is zero benefit from running them too fast and > > the energy-efficiency goes down the sink when that happens. > > > > Now the problem is that with sugov-HWP.min the users who care about > > these workloads don't even have an option to use the task utilization > > history recorded by the scheduler to bias the frequency towards the > > "sufficient" level, because sugov-HWP.min only sets a lower bound on > > the frequency selection to improve the situation, so the choice > > between it and sugov-HWP.desired boils down to whether or not to give > > that option to them and my clear preference is for that option to > > exist. Sorry about that. [Note that it really is an option, though, > > because "pure" HWP is still the default for HWP-enabled systems.] > > Sure, the periodic workloads benefit from this patch, Doug's test shows that. > > I guess I'm still confused by the difference between setting HWP.DESIRED and > disabling HWP completely. The Intel manual says that a non-zero HWP.DESIRED > "effectively disabl[es] HW autonomous selection", but then continues with "The > Desired_Performance input is non-constraining in terms of Performance and > Energy optimizations, which are independently controlled". The first > statement sounds like HWP is out of the picture (no more autonomous > frequency selections) but the latter part implies there are other > optimizations still available. I'm not sure how to reason about that. For example, if HWP_REQ.DESIRED is set below the point of maximum energy-efficiency that is known to the processor, it is allowed to go for the max energy-efficiency instead of following the hint. Likewise, if the hint is above the P-state corresponding to the max performance in the given conditions (i.e. increasing the frequency is not likely to result in better performance due to some limitations known to the processor), the processor is allowed to set that P-state instead of following the hint. Generally speaking, the processor may not follow the hint if better results can be achieved by putting the given CPU into a P-state different from the requested one. > > It may be possible to restore some "race-to-idle" benefits by tweaking > > HWP_REQ.EPP in the future, but that needs to be investigated. > > > > BTW, what EPP value was there on the system where you saw better > > performance under sugov-HWP.desired? If it was greater than zero, it > > would be useful to decrease EPP (by adjusting the > > energy_performance_preference attributes in sysfs for all CPUs) and > > see what happens to the performance difference then. > > For sugov-HWP.desired the EPP was 0x80 (the default value). So it would be worth testing with EPP=0x20 (or even lower). Lowering the EPP should cause the processor to ramp up turbo frequencies faster and it may also allow higher turbo frequencies to be used.