Message ID | 12630185.O9o76ZdvQC@rjwysocki.net |
---|---|
State | New |
Headers | show |
Series | [RFT,v1] cpuidle: teo: Avoid selecting deepest idle state over-eagerly | expand |
On Tue, Feb 4, 2025 at 9:58 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > It has been observed that the recent teo governor update which concluded > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation > for low latency constraints") caused the max-jOPS score of the SPECjbb > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. > While it may be argued that this is not a significant increase, the > previous score can be restored by tweaking the inequality used by teo > to decide whether or not to preselect the deepest enabled idle state. > That change also causes the critical-jOPS score of SPECjbb to increase > by around 2%. > > Namely, the likelihood of selecting the deepest enabled idle state in > teo on the platform in question has increased after commit 13ed5c4a6d9c > ("cpuidle: teo: Skip getting the sleep length if wakeups are very > frequent") because some timer wakeups were previously counted as non- > timer ones and they were effectively added to the left-hand side of the > inequality deciding whether or not to preselect the deepest idle state. > > Many of them are now (accurately) counted as timer wakeups, so the left- > hand side of that inequality is now effectively smaller in some cases, > especially when timer wakeups often occur in the range below the target > residency of the deepest enabled idle state and idle states with target > residencies below CPUIDLE_FLAG_POLLING are often selected, but the > majority of recent idle intervals are still above that value most of > the time. As a result, the deepest enabled idle state may be selected > more often than it used to be selected in some cases. > > To counter that effect, add the sum of the hits metric for all of the > idle states below the candidate one (which is the deepest enabled idle > state at that point) to the left-hand side of the inequality mentioned > above. This will cause it to be more balanced because, in principle, > putting both timer and non-timer wakeups on both sides of it is more > consistent than only taking into account the timer wakeups in the range > above the target residency of the deepest enabled idle state. > > Link: https://www.spec.org/jbb2015/ > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpuidle/governors/teo.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > --- a/drivers/cpuidle/governors/teo.c > +++ b/drivers/cpuidle/governors/teo.c > @@ -349,13 +349,13 @@ > } > > /* > - * If the sum of the intercepts metric for all of the idle states > - * shallower than the current candidate one (idx) is greater than the > + * If the sum of the intercepts and hits metric for all of the idle > + * states below the current candidate one (idx) is greater than the > * sum of the intercepts and hits metrics for the candidate state and > * all of the deeper states, a shallower idle state is likely to be a > * better choice. > */ > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { > int first_suitable_idx = idx; > > /* For easier reference/testing this has been exposed in the git branch at https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=experimental/teo-tweak on top of the cpuidle material that went into 6.14-rc1.
Hi Rafael, On 2025.02.04 12:58 Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > It has been observed that the recent teo governor update which concluded > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation > for low latency constraints") caused the max-jOPS score of the SPECjbb > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. > While it may be argued that this is not a significant increase, the > previous score can be restored by tweaking the inequality used by teo > to decide whether or not to preselect the deepest enabled idle state. > That change also causes the critical-jOPS score of SPECjbb to increase > by around 2%. > > Namely, the likelihood of selecting the deepest enabled idle state in > teo on the platform in question has increased after commit 13ed5c4a6d9c > ("cpuidle: teo: Skip getting the sleep length if wakeups are very > frequent") because some timer wakeups were previously counted as non- > timer ones and they were effectively added to the left-hand side of the > inequality deciding whether or not to preselect the deepest idle state. > > Many of them are now (accurately) counted as timer wakeups, so the left- > hand side of that inequality is now effectively smaller in some cases, > especially when timer wakeups often occur in the range below the target > residency of the deepest enabled idle state and idle states with target > residencies below CPUIDLE_FLAG_POLLING are often selected, but the > majority of recent idle intervals are still above that value most of > the time. As a result, the deepest enabled idle state may be selected > more often than it used to be selected in some cases. > > To counter that effect, add the sum of the hits metric for all of the > idle states below the candidate one (which is the deepest enabled idle > state at that point) to the left-hand side of the inequality mentioned > above. This will cause it to be more balanced because, in principle, > putting both timer and non-timer wakeups on both sides of it is more > consistent than only taking into account the timer wakeups in the range > above the target residency of the deepest enabled idle state. > > Link: https://www.spec.org/jbb2015/ > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpuidle/governors/teo.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > --- a/drivers/cpuidle/governors/teo.c > +++ b/drivers/cpuidle/governors/teo.c > @@ -349,13 +349,13 @@ > } > > /* > - * If the sum of the intercepts metric for all of the idle states > - * shallower than the current candidate one (idx) is greater than the > + * If the sum of the intercepts and hits metric for all of the idle > + * states below the current candidate one (idx) is greater than the > * sum of the intercepts and hits metrics for the candidate state and > * all of the deeper states, a shallower idle state is likely to be a > * better choice. > */ > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { > int first_suitable_idx = idx; > > /* I have only just started testing the recent idle governor changes, and have not gotten very far yet. There is a significant increase in processor package power during idle with this patch, about 5 times increase (400%). My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz Distro: Ubuntu 24.04.1, server, no desktop GUI. CPU frequency scaling driver: intel_pstate HWP: disabled. CPU frequency scaling governor: performance Idle states: $ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI Test durations were >= 45 minutes each. Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update Average Idle Power: teo governor: 2.199 watts (+25.51%) Average Idle power: menu governor: 1.873 watts (+6.91%) Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu Average Idle Power: teo governor: 9.401 watts (+436.6%) Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%. 28.6531% idle time is in state 1. More typically it would be 0.03% Average Idle Power: menu governor: 1.820 watts (+3.9%) Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update" Average Idle Power: teo governor: 1.752 watts (reference: 0.0%) Average Idle power: menu governor: 1.909 watts (+9.0%) ... Doug
Hi Doug, On Sat, Feb 8, 2025 at 12:40 AM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Rafael, > > On 2025.02.04 12:58 Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > It has been observed that the recent teo governor update which concluded > > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation > > for low latency constraints") caused the max-jOPS score of the SPECjbb > > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. > > While it may be argued that this is not a significant increase, the > > previous score can be restored by tweaking the inequality used by teo > > to decide whether or not to preselect the deepest enabled idle state. > > That change also causes the critical-jOPS score of SPECjbb to increase > > by around 2%. > > > > Namely, the likelihood of selecting the deepest enabled idle state in > > teo on the platform in question has increased after commit 13ed5c4a6d9c > > ("cpuidle: teo: Skip getting the sleep length if wakeups are very > > frequent") because some timer wakeups were previously counted as non- > > timer ones and they were effectively added to the left-hand side of the > > inequality deciding whether or not to preselect the deepest idle state. > > > > Many of them are now (accurately) counted as timer wakeups, so the left- > > hand side of that inequality is now effectively smaller in some cases, > > especially when timer wakeups often occur in the range below the target > > residency of the deepest enabled idle state and idle states with target > > residencies below CPUIDLE_FLAG_POLLING are often selected, but the > > majority of recent idle intervals are still above that value most of > > the time. As a result, the deepest enabled idle state may be selected > > more often than it used to be selected in some cases. > > > > To counter that effect, add the sum of the hits metric for all of the > > idle states below the candidate one (which is the deepest enabled idle > > state at that point) to the left-hand side of the inequality mentioned > > above. This will cause it to be more balanced because, in principle, > > putting both timer and non-timer wakeups on both sides of it is more > > consistent than only taking into account the timer wakeups in the range > > above the target residency of the deepest enabled idle state. > > > > Link: https://www.spec.org/jbb2015/ > > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/cpuidle/governors/teo.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > --- a/drivers/cpuidle/governors/teo.c > > +++ b/drivers/cpuidle/governors/teo.c > > @@ -349,13 +349,13 @@ > > } > > > > /* > > - * If the sum of the intercepts metric for all of the idle states > > - * shallower than the current candidate one (idx) is greater than the > > + * If the sum of the intercepts and hits metric for all of the idle > > + * states below the current candidate one (idx) is greater than the > > * sum of the intercepts and hits metrics for the candidate state and > > * all of the deeper states, a shallower idle state is likely to be a > > * better choice. > > */ > > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { > > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { > > int first_suitable_idx = idx; > > > > /* > > I have only just started testing the recent idle governor changes, > and have not gotten very far yet. > > There is a significant increase in processor package power during idle > with this patch, about 5 times increase (400%). Thanks for this data point! The restoration of the 1.4% benchmark score is not worth this cost IMV. > My processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz > Distro: Ubuntu 24.04.1, server, no desktop GUI. > CPU frequency scaling driver: intel_pstate > HWP: disabled. > CPU frequency scaling governor: performance > > Idle states: > $ grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name > /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL > /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI > /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2_ACPI > /sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3_ACPI > > Test durations were >= 45 minutes each. > > Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update > Average Idle Power: teo governor: 2.199 watts (+25.51%) > Average Idle power: menu governor: 1.873 watts (+6.91%) menu hasn't changed in 6.14-rc1, so this would be variation between runs I suppose. > Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu > Average Idle Power: teo governor: 9.401 watts (+436.6%) > Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%. Ah, not good. OK, this clearly doesn't go in the right direction. > 28.6531% idle time is in state 1. More typically it would be 0.03% > Average Idle Power: menu governor: 1.820 watts (+3.9%) > > Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update" > Average Idle Power: teo governor: 1.752 watts (reference: 0.0%) > Average Idle power: menu governor: 1.909 watts (+9.0%) Thanks, I'm just going to drop this patch then. If you don't mind, I'll have a couple more teo updates for testing.
On Fri, 2025-02-07 at 15:40 -0800, Doug Smythies wrote: > I have only just started testing the recent idle governor changes, > and have not gotten very far yet. Hi Dough, there is the menu governor patch too. While it helps the server, I did not test it on a client system. May be you would to check it too? Subject is: [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Thanks!
Hi Artem, On 2025.02.09 01:15 : Artem Bityutskiy wrote: > On Fri, 2025-02-07 at 15:40 -0800, Doug Smythies wrote: >> I have only just started testing the recent idle governor changes, >> and have not gotten very far yet. > > Hi Dough, > > there is the menu governor patch too. While it helps the server, I did not test > it on a client system. May be you would to check it too? Subject is: > > [RFT][PATCH v1 0/5] cpuidle: menu: Avoid discarding useful information when processing recent idle intervals Yes, I have included that patch set in my kernel 6.14-rc1 plus patches under test. Other than the idle data, I don't have any other data to present yet as I have been acquiring baseline data.
On 2025.02.08 03:25 Rafael J. Wysocki wrote: >On Sat, Feb 8, 2025 at 12:40 AM Doug Smythies <dsmythies@telus.net> wrote: >> On 2025.02.04 12:58 Rafael J. Wysocki wrote: >>> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> ... snip ... >> >> Test durations were >= 45 minutes each. >> >> Kernel 6.14-rc1: Includes cpuidle: teo: Cleanups and very frequent wakeups handling update >> Average Idle Power: teo governor: 2.199 watts (+25.51%) >> Average Idle power: menu governor: 1.873 watts (+6.91%) > > menu hasn't changed in 6.14-rc1, so this would be variation between >runs I suppose. Perhaps the way I presented my data wasn't the best method. All the % were relative to kernel 6.13 and the teo idle governor. Relative to kernel 6.13 menu governor that would be: -1.9%, certainly within the noise floor. >> Kernel 6.14-rc1-p: Added this patch for teo and "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" for menu >> Average Idle Power: teo governor: 9.401 watts (+436.6%) >> Only 69.61% idle is in the deepest idle state. More typically it would be 98% to 99%. > > Ah, not good. > > OK, this clearly doesn't go in the right direction. > >> 28.6531% idle time is in state 1. More typically it would be 0.03% >> Average Idle Power: menu governor: 1.820 watts (+3.9%) >> >> Kernel 6.13: before "cpuidle: teo: Cleanups and very frequent wakeups handling update" >> Average Idle Power: teo governor: 1.752 watts (reference: 0.0%) >> Average Idle power: menu governor: 1.909 watts (+9.0%) > > Thanks, I'm just going to drop this patch then. > > If you don't mind, I'll have a couple more teo updates for testing. O.K. I'll watch for the patches. Plenty to do in the meantime, acquiring baseline data and continuing with the: "cpuidle: menu: Avoid discarding useful information when processing recent idle intervals" patch set portion of kernel 6.14-rc1-p
On 2/4/25 20:58, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > It has been observed that the recent teo governor update which concluded > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation > for low latency constraints") caused the max-jOPS score of the SPECjbb > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. > While it may be argued that this is not a significant increase, the > previous score can be restored by tweaking the inequality used by teo > to decide whether or not to preselect the deepest enabled idle state. > That change also causes the critical-jOPS score of SPECjbb to increase > by around 2%. > > Namely, the likelihood of selecting the deepest enabled idle state in > teo on the platform in question has increased after commit 13ed5c4a6d9c > ("cpuidle: teo: Skip getting the sleep length if wakeups are very > frequent") because some timer wakeups were previously counted as non- > timer ones and they were effectively added to the left-hand side of the > inequality deciding whether or not to preselect the deepest idle state. > > Many of them are now (accurately) counted as timer wakeups, so the left- > hand side of that inequality is now effectively smaller in some cases, > especially when timer wakeups often occur in the range below the target > residency of the deepest enabled idle state and idle states with target > residencies below CPUIDLE_FLAG_POLLING are often selected, but the > majority of recent idle intervals are still above that value most of > the time. As a result, the deepest enabled idle state may be selected > more often than it used to be selected in some cases. > > To counter that effect, add the sum of the hits metric for all of the > idle states below the candidate one (which is the deepest enabled idle > state at that point) to the left-hand side of the inequality mentioned > above. This will cause it to be more balanced because, in principle, > putting both timer and non-timer wakeups on both sides of it is more > consistent than only taking into account the timer wakeups in the range > above the target residency of the deepest enabled idle state. > > Link: https://www.spec.org/jbb2015/ > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/cpuidle/governors/teo.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > --- a/drivers/cpuidle/governors/teo.c > +++ b/drivers/cpuidle/governors/teo.c > @@ -349,13 +349,13 @@ > } > > /* > - * If the sum of the intercepts metric for all of the idle states > - * shallower than the current candidate one (idx) is greater than the > + * If the sum of the intercepts and hits metric for all of the idle > + * states below the current candidate one (idx) is greater than the > * sum of the intercepts and hits metrics for the candidate state and > * all of the deeper states, a shallower idle state is likely to be a > * better choice. > */ > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { > int first_suitable_idx = idx; > > /* > > > I'm curious, are Doug's numbers reproducible? Or could you share the idle state usage numbers? Is that explainable? Seems like a lot and it does worry me that I can't reproduce anything as drastic. I did a bit of x86 as well and got for Raptor Lake (I won't post the non-x86 numbers now, but teo-tweak performs comparable to teo mainline.) Idle 5 min: device gov iter Joules idles idle_misses idle_miss_ratio belows aboves teo 0 170.02 12690 646 0.051 371 275 teo 1 123.17 8361 517 0.062 281 236 teo 2 122.76 7741 347 0.045 262 85 teo 3 118.5 8699 668 0.077 307 361 teo 4 80.46 8113 443 0.055 264 179 teo-tweak 0 115.05 10223 803 0.079 323 480 teo-tweak 1 164.41 8523 631 0.074 263 368 teo-tweak 2 163.91 8409 711 0.085 256 455 teo-tweak 3 137.22 8581 721 0.084 261 460 teo-tweak 4 174.95 8703 675 0.078 261 414 teo 0 164.34 8443 516 0.061 303 213 teo 1 167.85 8767 492 0.056 256 236 teo 2 166.25 7835 406 0.052 263 143 teo 3 189.77 8865 493 0.056 276 217 teo 4 136.97 9185 467 0.051 286 181 At least in the idle case you can see an increase in 'above' idle_misses. Firefox Youtube 4K video playback 2 min: device gov iter Joules idles idle_misses idle_miss_ratio belows aboves teo 0 260.09 67404 11189 0.166 1899 9290 teo 1 273.71 76649 12155 0.159 2233 9922 teo 2 231.45 59559 10344 0.174 1747 8597 teo 3 202.61 58223 10641 0.183 1748 8893 teo 4 217.56 61411 10744 0.175 1809 8935 teo-tweak 0 227.99 61209 11251 0.184 2110 9141 teo-tweak 1 222.44 61959 10323 0.167 1474 8849 teo-tweak 2 218.1 64380 11080 0.172 1845 9235 teo-tweak 3 207.4 60183 11267 0.187 1929 9338 teo-tweak 4 217.46 61253 10381 0.169 1620 8761 menu 0 225.72 87871 26032 0.296 25412 620 menu 1 200.36 86577 24712 0.285 24486 226 menu 2 214.79 84885 24750 0.292 24556 194 menu 3 206.07 88007 25938 0.295 25683 255 menu 4 216.48 88700 26504 0.299 26302 202 (Idle numbers aren't really reflective in energy used -> dominated by active power.)
Hi Christian, Thank you for trying to repeat my idle test results. On 2025.02.13 06:07 Christian Loehle wrote: > I'm curious, are Doug's numbers reproducible? > Or could you share the idle state usage numbers? Is that explainable? > Seems like a lot and it does worry me that I can't reproduce anything > as drastic. While I am having some severe repeatability issues with my testing, not for this test. Please recall my test conditions because the CPU frequency scaling governor does matter. I was using "performance". The power comes from the high amount of time in idle state 1. I verified the idle state 1 power use by disabling all other idle states. I also have HWP disabled, but do not know if it matters. If I use the "powersave" governor (driver is intel_pstate, not intel_cpufreq) then idle power is < 2 watts. Anyway, my data: http://smythies.com/~doug/linux/idle/teo-6.14/idle/perf/ > (Idle numbers aren't really reflective in energy used -> dominated by > active power.) Well, it depends on idle time verses active time on the computer. I also measured the difference in the mains power at 20%, from 43.2 watts to 51.4 watts. I am about to send a long email with all of my test results. Since I can not seem to function without making graphs, it has a lot of links to graphs.
On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle <christian.loehle@arm.com> wrote: > > On 2/4/25 20:58, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > It has been observed that the recent teo governor update which concluded > > with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation > > for low latency constraints") caused the max-jOPS score of the SPECjbb > > 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. > > While it may be argued that this is not a significant increase, the > > previous score can be restored by tweaking the inequality used by teo > > to decide whether or not to preselect the deepest enabled idle state. > > That change also causes the critical-jOPS score of SPECjbb to increase > > by around 2%. > > > > Namely, the likelihood of selecting the deepest enabled idle state in > > teo on the platform in question has increased after commit 13ed5c4a6d9c > > ("cpuidle: teo: Skip getting the sleep length if wakeups are very > > frequent") because some timer wakeups were previously counted as non- > > timer ones and they were effectively added to the left-hand side of the > > inequality deciding whether or not to preselect the deepest idle state. > > > > Many of them are now (accurately) counted as timer wakeups, so the left- > > hand side of that inequality is now effectively smaller in some cases, > > especially when timer wakeups often occur in the range below the target > > residency of the deepest enabled idle state and idle states with target > > residencies below CPUIDLE_FLAG_POLLING are often selected, but the > > majority of recent idle intervals are still above that value most of > > the time. As a result, the deepest enabled idle state may be selected > > more often than it used to be selected in some cases. > > > > To counter that effect, add the sum of the hits metric for all of the > > idle states below the candidate one (which is the deepest enabled idle > > state at that point) to the left-hand side of the inequality mentioned > > above. This will cause it to be more balanced because, in principle, > > putting both timer and non-timer wakeups on both sides of it is more > > consistent than only taking into account the timer wakeups in the range > > above the target residency of the deepest enabled idle state. > > > > Link: https://www.spec.org/jbb2015/ > > Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/cpuidle/governors/teo.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > --- a/drivers/cpuidle/governors/teo.c > > +++ b/drivers/cpuidle/governors/teo.c > > @@ -349,13 +349,13 @@ > > } > > > > /* > > - * If the sum of the intercepts metric for all of the idle states > > - * shallower than the current candidate one (idx) is greater than the > > + * If the sum of the intercepts and hits metric for all of the idle > > + * states below the current candidate one (idx) is greater than the > > * sum of the intercepts and hits metrics for the candidate state and > > * all of the deeper states, a shallower idle state is likely to be a > > * better choice. > > */ > > - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { > > + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { > > int first_suitable_idx = idx; > > > > /* > > > > > > > > I'm curious, are Doug's numbers reproducible? > Or could you share the idle state usage numbers? Is that explainable? > Seems like a lot and it does worry me that I can't reproduce anything > as drastic. Well, it may not be drastic, but the results below pretty much confirm that this particular change isn't going in the right direction IMV. > I did a bit of x86 as well and got for Raptor Lake (I won't post the > non-x86 numbers now, but teo-tweak performs comparable to teo mainline.) > > Idle 5 min: > device gov iter Joules idles idle_misses idle_miss_ratio belows aboves > teo 0 170.02 12690 646 0.051 371 275 > teo 1 123.17 8361 517 0.062 281 236 > teo 2 122.76 7741 347 0.045 262 85 > teo 3 118.5 8699 668 0.077 307 361 > teo 4 80.46 8113 443 0.055 264 179 > teo-tweak 0 115.05 10223 803 0.079 323 480 > teo-tweak 1 164.41 8523 631 0.074 263 368 > teo-tweak 2 163.91 8409 711 0.085 256 455 > teo-tweak 3 137.22 8581 721 0.084 261 460 > teo-tweak 4 174.95 8703 675 0.078 261 414 So basically the energy usage goes up, idle misses go up, idle misses ratio goes up and the "above" misses go way up. Not so good as far as I'm concerned. > teo 0 164.34 8443 516 0.061 303 213 > teo 1 167.85 8767 492 0.056 256 236 > teo 2 166.25 7835 406 0.052 263 143 > teo 3 189.77 8865 493 0.056 276 217 > teo 4 136.97 9185 467 0.051 286 181 The above is menu I think? > At least in the idle case you can see an increase in 'above' idle_misses. > > Firefox Youtube 4K video playback 2 min: > device gov iter Joules idles idle_misses idle_miss_ratio belows aboves > teo 0 260.09 67404 11189 0.166 1899 9290 > teo 1 273.71 76649 12155 0.159 2233 9922 > teo 2 231.45 59559 10344 0.174 1747 8597 > teo 3 202.61 58223 10641 0.183 1748 8893 > teo 4 217.56 61411 10744 0.175 1809 8935 > teo-tweak 0 227.99 61209 11251 0.184 2110 9141 > teo-tweak 1 222.44 61959 10323 0.167 1474 8849 > teo-tweak 2 218.1 64380 11080 0.172 1845 9235 > teo-tweak 3 207.4 60183 11267 0.187 1929 9338 > teo-tweak 4 217.46 61253 10381 0.169 1620 8761 And it doesn't improve things drastically here, although on average it does reduce energy usage. > menu 0 225.72 87871 26032 0.296 25412 620 > menu 1 200.36 86577 24712 0.285 24486 226 > menu 2 214.79 84885 24750 0.292 24556 194 > menu 3 206.07 88007 25938 0.295 25683 255 > menu 4 216.48 88700 26504 0.299 26302 202 > > (Idle numbers aren't really reflective in energy used -> dominated by > active power.) >
On 2/14/25 21:34, Rafael J. Wysocki wrote: > On Thu, Feb 13, 2025 at 3:08 PM Christian Loehle > <christian.loehle@arm.com> wrote: >> >> On 2/4/25 20:58, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> It has been observed that the recent teo governor update which concluded >>> with commit 16c8d7586c19 ("cpuidle: teo: Skip sleep length computation >>> for low latency constraints") caused the max-jOPS score of the SPECjbb >>> 2015 benchmark [1] on Intel Granite Rapids to decrease by around 1.4%. >>> While it may be argued that this is not a significant increase, the >>> previous score can be restored by tweaking the inequality used by teo >>> to decide whether or not to preselect the deepest enabled idle state. >>> That change also causes the critical-jOPS score of SPECjbb to increase >>> by around 2%. >>> >>> Namely, the likelihood of selecting the deepest enabled idle state in >>> teo on the platform in question has increased after commit 13ed5c4a6d9c >>> ("cpuidle: teo: Skip getting the sleep length if wakeups are very >>> frequent") because some timer wakeups were previously counted as non- >>> timer ones and they were effectively added to the left-hand side of the >>> inequality deciding whether or not to preselect the deepest idle state. >>> >>> Many of them are now (accurately) counted as timer wakeups, so the left- >>> hand side of that inequality is now effectively smaller in some cases, >>> especially when timer wakeups often occur in the range below the target >>> residency of the deepest enabled idle state and idle states with target >>> residencies below CPUIDLE_FLAG_POLLING are often selected, but the >>> majority of recent idle intervals are still above that value most of >>> the time. As a result, the deepest enabled idle state may be selected >>> more often than it used to be selected in some cases. >>> >>> To counter that effect, add the sum of the hits metric for all of the >>> idle states below the candidate one (which is the deepest enabled idle >>> state at that point) to the left-hand side of the inequality mentioned >>> above. This will cause it to be more balanced because, in principle, >>> putting both timer and non-timer wakeups on both sides of it is more >>> consistent than only taking into account the timer wakeups in the range >>> above the target residency of the deepest enabled idle state. >>> >>> Link: https://www.spec.org/jbb2015/ >>> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> --- >>> drivers/cpuidle/governors/teo.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> --- a/drivers/cpuidle/governors/teo.c >>> +++ b/drivers/cpuidle/governors/teo.c >>> @@ -349,13 +349,13 @@ >>> } >>> >>> /* >>> - * If the sum of the intercepts metric for all of the idle states >>> - * shallower than the current candidate one (idx) is greater than the >>> + * If the sum of the intercepts and hits metric for all of the idle >>> + * states below the current candidate one (idx) is greater than the >>> * sum of the intercepts and hits metrics for the candidate state and >>> * all of the deeper states, a shallower idle state is likely to be a >>> * better choice. >>> */ >>> - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { >>> + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { >>> int first_suitable_idx = idx; >>> >>> /* >>> >>> >>> >> >> I'm curious, are Doug's numbers reproducible? >> Or could you share the idle state usage numbers? Is that explainable? >> Seems like a lot and it does worry me that I can't reproduce anything >> as drastic. > > Well, it may not be drastic, but the results below pretty much confirm > that this particular change isn't going in the right direction IMV. Agreed, I'd still be eager to pick up something like Doug reported with my tests too :( > >> I did a bit of x86 as well and got for Raptor Lake (I won't post the >> non-x86 numbers now, but teo-tweak performs comparable to teo mainline.) >> >> Idle 5 min: >> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves >> teo 0 170.02 12690 646 0.051 371 275 >> teo 1 123.17 8361 517 0.062 281 236 >> teo 2 122.76 7741 347 0.045 262 85 >> teo 3 118.5 8699 668 0.077 307 361 >> teo 4 80.46 8113 443 0.055 264 179 >> teo-tweak 0 115.05 10223 803 0.079 323 480 >> teo-tweak 1 164.41 8523 631 0.074 263 368 >> teo-tweak 2 163.91 8409 711 0.085 256 455 >> teo-tweak 3 137.22 8581 721 0.084 261 460 >> teo-tweak 4 174.95 8703 675 0.078 261 414 > > So basically the energy usage goes up, idle misses go up, idle misses > ratio goes up and the "above" misses go way up. Not so good as far as > I'm concerned. > >> teo 0 164.34 8443 516 0.061 303 213 >> teo 1 167.85 8767 492 0.056 256 236 >> teo 2 166.25 7835 406 0.052 263 143 >> teo 3 189.77 8865 493 0.056 276 217 >> teo 4 136.97 9185 467 0.051 286 181 > > The above is menu I think? No this is teo again, just wanted to include it because the variance is quite large, (not unusual for idle). The full table (with menu (mainline)) teo 0 170.02 12690 646 0.051 371 275 teo 1 123.17 8361 517 0.062 281 236 teo 2 122.76 7741 347 0.045 262 85 teo 3 118.5 8699 668 0.077 307 361 teo 4 80.46 8113 443 0.055 264 179 teo-tweak 0 115.05 10223 803 0.079 323 480 teo-tweak 1 164.41 8523 631 0.074 263 368 teo-tweak 2 163.91 8409 711 0.085 256 455 teo-tweak 3 137.22 8581 721 0.084 261 460 teo-tweak 4 174.95 8703 675 0.078 261 414 teo 0 164.34 8443 516 0.061 303 213 teo 1 167.85 8767 492 0.056 256 236 teo 2 166.25 7835 406 0.052 263 143 teo 3 189.77 8865 493 0.056 276 217 teo 4 136.97 9185 467 0.051 286 181 menu 0 180.13 8925 343 0.038 303 40 menu 1 208.49 8717 345 0.040 312 33 menu 2 168.38 8451 321 0.038 274 47 menu 3 139.48 7853 310 0.039 289 21 menu 4 166.61 7769 339 0.044 322 17 > >> At least in the idle case you can see an increase in 'above' idle_misses. Agreed, idle_misses clearly go up as mentioned. >> >> Firefox Youtube 4K video playback 2 min: >> device gov iter Joules idles idle_misses idle_miss_ratio belows aboves >> teo 0 260.09 67404 11189 0.166 1899 9290 >> teo 1 273.71 76649 12155 0.159 2233 9922 >> teo 2 231.45 59559 10344 0.174 1747 8597 >> teo 3 202.61 58223 10641 0.183 1748 8893 >> teo 4 217.56 61411 10744 0.175 1809 8935 >> teo-tweak 0 227.99 61209 11251 0.184 2110 9141 >> teo-tweak 1 222.44 61959 10323 0.167 1474 8849 >> teo-tweak 2 218.1 64380 11080 0.172 1845 9235 >> teo-tweak 3 207.4 60183 11267 0.187 1929 9338 >> teo-tweak 4 217.46 61253 10381 0.169 1620 8761 > > And it doesn't improve things drastically here, although on average it > does reduce energy usage.
--- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -349,13 +349,13 @@ } /* - * If the sum of the intercepts metric for all of the idle states - * shallower than the current candidate one (idx) is greater than the + * If the sum of the intercepts and hits metric for all of the idle + * states below the current candidate one (idx) is greater than the * sum of the intercepts and hits metrics for the candidate state and * all of the deeper states, a shallower idle state is likely to be a * better choice. */ - if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { + if (2 * (idx_intercept_sum + idx_hit_sum) > cpu_data->total) { int first_suitable_idx = idx; /*