diff mbox series

[RFC/RFT] cpufreq: intel_pstate: Accept passive mode with HWP enabled

Message ID 2931539.RsFqoHxarq@kreacher
State New
Headers show
Series [RFC/RFT] cpufreq: intel_pstate: Accept passive mode with HWP enabled | expand

Commit Message

Rafael J. Wysocki May 26, 2020, 6:20 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Allow intel_pstate to work in the passive mode with HWP enabled and
make it set the HWP minimum performance limit to 75% of the P-state
value corresponding to the target frequency supplied by the cpufreq
governor, so as to prevent the HWP algorithm and the CPU scheduler
from working against each other at least when the schedutil governor
is in use.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This is a replacement for https://patchwork.kernel.org/patch/11563615/ that
uses the HWP floor (minimum performance limit) as the feedback to the HWP
algorithm (instead of the EPP).

The INTEL_CPUFREQ_TRANSITION_DELAY_HWP is still 5000 and the previous comments
still apply to it.

In addition to that, the 75% fraction used in intel_cpufreq_adjust_hwp() can be
adjusted too, but I would like to use a value with a power-of-2 denominator for
that (so the next candidate would be 7/8).

Everyone who can do that is kindly requested to test this and let me know
the outcome.

Of course, the documentation still needs to be updated.  Also, the EPP can be
handled in analogy with the active mode now, but that part can be added in a
separate patch on top of this one.

Thanks!

---
 drivers/cpufreq/intel_pstate.c |  119 ++++++++++++++++++++++++++++++-----------
 1 file changed, 88 insertions(+), 31 deletions(-)

Comments

Doug Smythies July 8, 2020, 2:41 p.m. UTC | #1
On 2020.06.30 11:41 Doug Smythies wrote:
> 

> Hi Srinivas,

> 

> O.K. let's try this again, starting a new thread, with address list similar to a few weeks ago.

> I believe I have untangled my multiple issues, such that this e-mail should be only about

> the single issue of HWP capable processors incorrectly deciding to lower the CPU frequency

> under some conditions. Also, my previous assertion as to the issue was indeed incorrect.

> 

> I now:

> . never use x86_energy_perf_policy.

> . For HWP disabled: never change from active to passive or via versa, but rather do it via boot.

> . after boot always check and reset the various power limit log bits that are set.

> . never compile the kernel (well, until after any tests), which will set those bits again.

> . never run prime95 high heat torture test, which will set those bits again.

> . Note that the tests done for this e-mail never ever set those bits again.

> . Invented an entirely new way to manifest, demonstrate, and exploit the issue (also mentioned June

> 6th).

> . All tests were repeated on another HWP capable computer, so a i5-9600K and a i5-6200U.

> 

> New method (old was periodic workflow):

> 

> Long busy, short gap, busy but taking loop time samples so as to estimate CPU frequency.

> I am calling it an inverse impulse response test.

> 

> Assertion:

> 

> If the short sleep is somehow simultaneous with some sort of 5.0 millisecond (200 Hertz)

> periodic event (either in HWP itself, or via the driver, I am unable to determine which,

> but think it is inside the black box that is HWP),


I have been attempting to characterise the "black box" that is HWP.
In terms of system response verses EPP, I only observe the HWP loop time as the
response variable.

0 <= EPP <= 1 : My test can not measure loop time.
2 <= EPP <= 39 : HWP servo loop time 2 milliseconds
40 <= EPP <= 55 : HWP servo loop time 3 milliseconds
56 <= EPP <= 79 : HWP servo loop time 4 milliseconds
80 <= EPP <= 133 : HWP servo loop time 5 milliseconds
134 <= EPP <= 143 : HWP servo loop time 6 milliseconds
144 <= EPP <= 154 : HWP servo loop time 7 milliseconds
155 <= EPP <= 175 : HWP servo loop time 8 milliseconds
176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

If there are other system response differences within
those groups, I haven't been able to detect them,
but would be grateful for any further insight.

Otherwise, in future, I do not see a need to test anything
other than 9 values of EPP, one from each group.

> then there is a possibility that the

> CPU frequency will drop significantly and will take an excessive amount of time to recover.

> Frequency step ups are exactly on 5.0 millisecond boundaries +/- the short gap time.

> 

> . The probability is somewhat inconsistent and a function of whatever else the computer is doing.

> . The time to recover is a function of EPP, and if EPP is low enough my test never fails.

> . These tests were all done with default settings.

> . The "5.0" mSec is only for those default settings, it actually depends on EPP.

>   . Crude step boundaries, mSec: EPP=32, 2; EPP=64, 4; EPP=128, 5.00; EPP=196, 9


Now fully understood, as listed above.

> . High level: i5-9600K: 2453 tests, 60 failures, 2.45% fail rate. (HWP - powersave)

> . High level: i5-6200U: 4134 tests, 128 failures, 3.1% fail rate. (HWP - powersave)

> . Low level (capture waveforms): i5-9600K: 1842 captured failure waveforms. See graph.

> . Low level (capture waveforms): i5-6200U: 458 captured failure waveforms. See graph.

> . Verify acpi-cpufreq/ondemand works fine: i5-9600K: 8975 tests. 0 failures.

> . Verify acpi-cpufreq/ondemand works fine: i5-6200U: 8575 tests. 0 failures.


The tests were all done using the teo idle governor.
While the menu governor does not fail for this particular test, it fails
in other scenarios.

I have yet to find a failure scenario when idle state 2 is disabled.
I have captured and analyzed about 400 megabytes of trace data,
and have not been able to isolate an exact correlation.

> 

> The short gap was 842 uSeconds for all these tests, and for no particular reason.

> 

> While I have not re-done the bounds investigation, I have no reason to doubt

> my previous work, re-stated below:

> 

> > Gap definition:

> > lower limit not known, but < 747 uSeconds.

> > Upper limit is between 952 and 955 uSeconds (there will be some overhead uncertainties).


The only new information I have is that the upper bound is bigger.

> > Must be preceded by busy time spanning a couple of HWP sampling boundaries

> > or jiffy boundaries or something (I don't actually know how HWP does stuff).

> 

> Rather than point to graphs, which nobody seems to look at, they are attached,

> and so might get striped for some of you.

> 

> ... Doug

> 

> Addendum: Some of the MSRs you have requested in the past:

> 

> i5-9600K (HWP - powersave after test):

> 

> root@s18:/home/doug# /home/doug/c/msr-decoder

> 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> 1.) 0x19C: IA32_THERM_STATUS: 88480000

> 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> reset

> 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

> 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 108252E : high 46 : guaranteed 37 : efficient 8 :

> lowest 1

> 6.) 0x774: IA32_HWP_REQUEST:    CPU 0-5 :

>     raw: 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 :

>     min:        8 :        8 :        8 :        8 :        8 :        8 :

>     max:       46 :       46 :       46 :       46 :       46 :       46 :

>     des:        0 :        0 :        0 :        0 :        0 :        0 :

>     epp:      128 :      128 :      128 :      128 :      128 :      128 :

>     act:        0 :        0 :        0 :        0 :        0 :        0 :

> 7.) 0x777: IA32_HWP_STATUS: 0 : high 0 : guaranteed 0 : efficient 0 : lowest 0

> 

> i5-9600K (no HWP - acpi-cpufreq/ondemand after test):

> 

> root@s18:/home/doug/c# /home/doug/c/msr-decoder

> 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6 :   6 :   6 :

> 1.) 0x19C: IA32_THERM_STATUS: 88480000

> 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> reset

> 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

> 

> i5-6200U (HWP - powersave after test):

> 

> 8.) 0x198: IA32_PERF_STATUS : CPU 0-3 : 19 : 19 : 19 : 19 :

> B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> 1.) 0x19C: IA32_THERM_STATUS: 88430000

> 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> reset

> 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88420000

> 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO enable

> 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 105171C : high 28 : guaranteed 23 : efficient 5 :

> lowest 1

> 6.) 0x774: IA32_HWP_REQUEST: CPU 0-3 :

>     raw: 80001B04 : 80001B04 : 80001B04 : 80001B04 :

>     min:        4 :        4 :        4 :        4 :

>     max:       27 :       27 :       27 :       27 :

>     des:        0 :        0 :        0 :        0 :

>     epp:      128 :      128 :      128 :      128 :

>     act:        0 :        0 :        0 :        0 :

> 7.) 0x777: IA32_HWP_STATUS: 4 : high 4 : guaranteed 0 : efficient 0 : lowest 0

> 

> i5-6200U (no HWP - acpi-cpufreq/ondemand after test):

> 

> 8.) 0x198: IA32_PERF_STATUS     : CPU 0-3 :  23 :  23 :  23 :  23 :

> B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> 9.) 0x199: IA32_PERF_CTL        : CPU 0-3 :  11 :   5 :   5 :   5 :

> C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-3 :   6 :   6 :   6 :   6 :

> 1.) 0x19C: IA32_THERM_STATUS: 88440000

> 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> reset

> 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88430000

> 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO enable
Srinivas Pandruvada July 8, 2020, 2:54 p.m. UTC | #2
On Wed, 2020-07-08 at 07:41 -0700, Doug Smythies wrote:
> On 2020.06.30 11:41 Doug Smythies wrote:

> > Hi Srinivas,

> > 

> > O.K. let's try this again, starting a new thread, with address list

> > similar to a few weeks ago.

> > I believe I have untangled my multiple issues, such that this e-

> > mail should be only about

> > the single issue of HWP capable processors incorrectly deciding to

> > lower the CPU frequency

> > under some conditions. Also, my previous assertion as to the issue

> > was indeed incorrect.

> > 

> > I now:

> > . never use x86_energy_perf_policy.

> > . For HWP disabled: never change from active to passive or via

> > versa, but rather do it via boot.

> > . after boot always check and reset the various power limit log

> > bits that are set.

> > . never compile the kernel (well, until after any tests), which

> > will set those bits again.

> > . never run prime95 high heat torture test, which will set those

> > bits again.

> > . Note that the tests done for this e-mail never ever set those

> > bits again.

> > . Invented an entirely new way to manifest, demonstrate, and

> > exploit the issue (also mentioned June

> > 6th).

> > . All tests were repeated on another HWP capable computer, so a i5-

> > 9600K and a i5-6200U.

> > 

> > New method (old was periodic workflow):

> > 

> > Long busy, short gap, busy but taking loop time samples so as to

> > estimate CPU frequency.

> > I am calling it an inverse impulse response test.

> > 

> > Assertion:

> > 

> > If the short sleep is somehow simultaneous with some sort of 5.0

> > millisecond (200 Hertz)

> > periodic event (either in HWP itself, or via the driver, I am

> > unable to determine which,

> > but think it is inside the black box that is HWP),

> 

> I have been attempting to characterise the "black box" that is HWP.

> In terms of system response verses EPP, I only observe the HWP loop

> time as the

> response variable.

> 

> 0 <= EPP <= 1 : My test can not measure loop time.

> 2 <= EPP <= 39 : HWP servo loop time 2 milliseconds

> 40 <= EPP <= 55 : HWP servo loop time 3 milliseconds

> 56 <= EPP <= 79 : HWP servo loop time 4 milliseconds

> 80 <= EPP <= 133 : HWP servo loop time 5 milliseconds

> 134 <= EPP <= 143 : HWP servo loop time 6 milliseconds

> 144 <= EPP <= 154 : HWP servo loop time 7 milliseconds

> 155 <= EPP <= 175 : HWP servo loop time 8 milliseconds

> 176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

> 

> If there are other system response differences within

> those groups, I haven't been able to detect them,

> but would be grateful for any further insight.

> 

> Otherwise, in future, I do not see a need to test anything

> other than 9 values of EPP, one from each group.

> 

Thanks Doug,
I think they are enough. But there is no guarantee that every CPU model
will have same results as the power curve will be different.

Thanks,
Srinivas

> > then there is a possibility that the

> > CPU frequency will drop significantly and will take an excessive

> > amount of time to recover.

> > Frequency step ups are exactly on 5.0 millisecond boundaries +/-

> > the short gap time.

> > 

> > . The probability is somewhat inconsistent and a function of

> > whatever else the computer is doing.

> > . The time to recover is a function of EPP, and if EPP is low

> > enough my test never fails.

> > . These tests were all done with default settings.

> > . The "5.0" mSec is only for those default settings, it actually

> > depends on EPP.

> >   . Crude step boundaries, mSec: EPP=32, 2; EPP=64, 4; EPP=128,

> > 5.00; EPP=196, 9

> 

> Now fully understood, as listed above.

> 

> > . High level: i5-9600K: 2453 tests, 60 failures, 2.45% fail rate.

> > (HWP - powersave)

> > . High level: i5-6200U: 4134 tests, 128 failures, 3.1% fail rate.

> > (HWP - powersave)

> > . Low level (capture waveforms): i5-9600K: 1842 captured failure

> > waveforms. See graph.

> > . Low level (capture waveforms): i5-6200U: 458 captured failure

> > waveforms. See graph.

> > . Verify acpi-cpufreq/ondemand works fine: i5-9600K: 8975 tests. 0

> > failures.

> > . Verify acpi-cpufreq/ondemand works fine: i5-6200U: 8575 tests. 0

> > failures.

> 

> The tests were all done using the teo idle governor.

> While the menu governor does not fail for this particular test, it

> fails

> in other scenarios.

> 

> I have yet to find a failure scenario when idle state 2 is disabled.

> I have captured and analyzed about 400 megabytes of trace data,

> and have not been able to isolate an exact correlation.

> 

> > The short gap was 842 uSeconds for all these tests, and for no

> > particular reason.

> > 

> > While I have not re-done the bounds investigation, I have no reason

> > to doubt

> > my previous work, re-stated below:

> > 

> > > Gap definition:

> > > lower limit not known, but < 747 uSeconds.

> > > Upper limit is between 952 and 955 uSeconds (there will be some

> > > overhead uncertainties).

> 

> The only new information I have is that the upper bound is bigger.

> 

> > > Must be preceded by busy time spanning a couple of HWP sampling

> > > boundaries

> > > or jiffy boundaries or something (I don't actually know how HWP

> > > does stuff).

> > 

> > Rather than point to graphs, which nobody seems to look at, they

> > are attached,

> > and so might get striped for some of you.

> > 

> > ... Doug

> > 

> > Addendum: Some of the MSRs you have requested in the past:

> > 

> > i5-9600K (HWP - powersave after test):

> > 

> > root@s18:/home/doug# /home/doug/c/msr-decoder

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > :   8 :   8 :

> > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO

> > disable

> > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 108252E : high 46 :

> > guaranteed 37 : efficient 8 :

> > lowest 1

> > 6.) 0x774: IA32_HWP_REQUEST:    CPU 0-5 :

> >     raw: 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 :

> > 80002E08 :

> >     min:        8 :        8 :        8 :        8 :        8

> > :        8 :

> >     max:       46 :       46 :       46 :       46 :       46

> > :       46 :

> >     des:        0 :        0 :        0 :        0 :        0

> > :        0 :

> >     epp:      128 :      128 :      128 :      128 :      128

> > :      128 :

> >     act:        0 :        0 :        0 :        0 :        0

> > :        0 :

> > 7.) 0x777: IA32_HWP_STATUS: 0 : high 0 : guaranteed 0 : efficient 0

> > : lowest 0

> > 

> > i5-9600K (no HWP - acpi-cpufreq/ondemand after test):

> > 

> > root@s18:/home/doug/c# /home/doug/c/msr-decoder

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > :   8 :   8 :

> > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :   8 :   8 :   8 :   8

> > :   8 :   8 :

> > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6

> > :   6 :   6 :

> > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO

> > disable

> > 

> > i5-6200U (HWP - powersave after test):

> > 

> > 8.) 0x198: IA32_PERF_STATUS : CPU 0-3 : 19 : 19 : 19 : 19 :

> > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > 1.) 0x19C: IA32_THERM_STATUS: 88430000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88420000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > enable

> > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 105171C : high 28 :

> > guaranteed 23 : efficient 5 :

> > lowest 1

> > 6.) 0x774: IA32_HWP_REQUEST: CPU 0-3 :

> >     raw: 80001B04 : 80001B04 : 80001B04 : 80001B04 :

> >     min:        4 :        4 :        4 :        4 :

> >     max:       27 :       27 :       27 :       27 :

> >     des:        0 :        0 :        0 :        0 :

> >     epp:      128 :      128 :      128 :      128 :

> >     act:        0 :        0 :        0 :        0 :

> > 7.) 0x777: IA32_HWP_STATUS: 4 : high 4 : guaranteed 0 : efficient 0

> > : lowest 0

> > 

> > i5-6200U (no HWP - acpi-cpufreq/ondemand after test):

> > 

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-3 :  23 :  23 :  23 :  23 :

> > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-3 :  11 :   5 :   5 :   5 :

> > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-3 :   6 :   6 :   6 :   6 :

> > 1.) 0x19C: IA32_THERM_STATUS: 88440000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88430000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > enable

> 

>
Doug Smythies July 8, 2020, 3:39 p.m. UTC | #3
On 2020.07.08 07:54 srinivas pandruvada wrote:
> On Wed, 2020-07-08 at 07:41 -0700, Doug Smythies wrote:

> > On 2020.06.30 11:41 Doug Smythies wrote:


...

> > > If the short sleep is somehow simultaneous with some sort of 5.0

> > > millisecond (200 Hertz)

> > > periodic event (either in HWP itself, or via the driver, I am

> > > unable to determine which,

> > > but think it is inside the black box that is HWP),

> >

> > I have been attempting to characterise the "black box" that is HWP.

> > In terms of system response verses EPP, I only observe the HWP loop

> > time as the

> > response variable.

> >

> > 0 <= EPP <= 1 : My test can not measure loop time.

> > 2 <= EPP <= 39 : HWP servo loop time 2 milliseconds

> > 40 <= EPP <= 55 : HWP servo loop time 3 milliseconds

> > 56 <= EPP <= 79 : HWP servo loop time 4 milliseconds

> > 80 <= EPP <= 133 : HWP servo loop time 5 milliseconds

> > 134 <= EPP <= 143 : HWP servo loop time 6 milliseconds

> > 144 <= EPP <= 154 : HWP servo loop time 7 milliseconds

> > 155 <= EPP <= 175 : HWP servo loop time 8 milliseconds

> > 176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

> >

> > If there are other system response differences within

> > those groups, I haven't been able to detect them,

> > but would be grateful for any further insight.

> >

> > Otherwise, in future, I do not see a need to test anything

> > other than 9 values of EPP, one from each group.

> >

> Thanks Doug,

> I think they are enough. But there is no guarantee that every CPU model

> will have same results as the power curve will be different.


Yes, of course the response curve is different between CPU models.

However, the basic loops times seem to be the same.
Although I admit to having limited data from other CPU models.

... Doug
Doug Smythies Aug. 2, 2020, 2:36 p.m. UTC | #4
Hi Srinivas, or anybody at Intel,

Any chance of you looking into this issue.
I first raised it over 2 months ago.

On 2020.07.08 07:41 Doug Smythies wrote:
> On 2020.06.30 11:41 Doug Smythies wrote:

> >

> > Hi Srinivas,

> >

> > O.K. let's try this again, starting a new thread, with address list similar to a few weeks ago.

> > I believe I have untangled my multiple issues, such that this e-mail should be only about

> > the single issue of HWP capable processors incorrectly deciding to lower the CPU frequency

> > under some conditions. Also, my previous assertion as to the issue was indeed incorrect.

> >

> > I now:

> > . never use x86_energy_perf_policy.

> > . For HWP disabled: never change from active to passive or via versa, but rather do it via boot.

> > . after boot always check and reset the various power limit log bits that are set.

> > . never compile the kernel (well, until after any tests), which will set those bits again.

> > . never run prime95 high heat torture test, which will set those bits again.

> > . Note that the tests done for this e-mail never ever set those bits again.

> > . Invented an entirely new way to manifest, demonstrate, and exploit the issue (also mentioned June

> > 6th).

> > . All tests were repeated on another HWP capable computer, so a i5-9600K and a i5-6200U.

> >

> > New method (old was periodic workflow):

> >

> > Long busy, short gap, busy but taking loop time samples so as to estimate CPU frequency.

> > I am calling it an inverse impulse response test.

> >

> > Assertion:

> >

> > If the short sleep is somehow simultaneous with some sort of 5.0 millisecond (200 Hertz)

> > periodic event (either in HWP itself, or via the driver, I am unable to determine which,

> > but think it is inside the black box that is HWP),

> 

> I have been attempting to characterise the "black box" that is HWP.

> In terms of system response verses EPP, I only observe the HWP loop time as the

> response variable.

> 

> 0 <= EPP <= 1 : My test can not measure loop time.

> 2 <= EPP <= 39 : HWP servo loop time 2 milliseconds

> 40 <= EPP <= 55 : HWP servo loop time 3 milliseconds

> 56 <= EPP <= 79 : HWP servo loop time 4 milliseconds

> 80 <= EPP <= 133 : HWP servo loop time 5 milliseconds

> 134 <= EPP <= 143 : HWP servo loop time 6 milliseconds

> 144 <= EPP <= 154 : HWP servo loop time 7 milliseconds

> 155 <= EPP <= 175 : HWP servo loop time 8 milliseconds

> 176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

> 

> If there are other system response differences within

> those groups, I haven't been able to detect them,

> but would be grateful for any further insight.

> 

> Otherwise, in future, I do not see a need to test anything

> other than 9 values of EPP, one from each group.

> 

> > then there is a possibility that the

> > CPU frequency will drop significantly and will take an excessive amount of time to recover.

> > Frequency step ups are exactly on 5.0 millisecond boundaries +/- the short gap time.

> >

> > . The probability is somewhat inconsistent and a function of whatever else the computer is doing.

> > . The time to recover is a function of EPP, and if EPP is low enough my test never fails.

> > . These tests were all done with default settings.

> > . The "5.0" mSec is only for those default settings, it actually depends on EPP.

> >   . Crude step boundaries, mSec: EPP=32, 2; EPP=64, 4; EPP=128, 5.00; EPP=196, 9

> 

> Now fully understood, as listed above.

> 

> > . High level: i5-9600K: 2453 tests, 60 failures, 2.45% fail rate. (HWP - powersave)

> > . High level: i5-6200U: 4134 tests, 128 failures, 3.1% fail rate. (HWP - powersave)

> > . Low level (capture waveforms): i5-9600K: 1842 captured failure waveforms. See graph.

> > . Low level (capture waveforms): i5-6200U: 458 captured failure waveforms. See graph.

> > . Verify acpi-cpufreq/ondemand works fine: i5-9600K: 8975 tests. 0 failures.

> > . Verify acpi-cpufreq/ondemand works fine: i5-6200U: 8575 tests. 0 failures.

> 

> The tests were all done using the teo idle governor.

> While the menu governor does not fail for this particular test, it fails

> in other scenarios.

> 

> I have yet to find a failure scenario when idle state 2 is disabled.

> I have captured and analyzed about 400 megabytes of trace data,

> and have not been able to isolate an exact correlation.

> 

> >

> > The short gap was 842 uSeconds for all these tests, and for no particular reason.

> >

> > While I have not re-done the bounds investigation, I have no reason to doubt

> > my previous work, re-stated below:

> >

> > > Gap definition:

> > > lower limit not known, but < 747 uSeconds.

> > > Upper limit is between 952 and 955 uSeconds (there will be some overhead uncertainties).

> 

> The only new information I have is that the upper bound is bigger.

> 

> > > Must be preceded by busy time spanning a couple of HWP sampling boundaries

> > > or jiffy boundaries or something (I don't actually know how HWP does stuff).

> >

> > Rather than point to graphs, which nobody seems to look at, they are attached,

> > and so might get striped for some of you.

> >

> > ... Doug

> >

> > Addendum: Some of the MSRs you have requested in the past:

> >

> > i5-9600K (HWP - powersave after test):

> >

> > root@s18:/home/doug# /home/doug/c/msr-decoder

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

> > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 108252E : high 46 : guaranteed 37 : efficient 8 :

> > lowest 1

> > 6.) 0x774: IA32_HWP_REQUEST:    CPU 0-5 :

> >     raw: 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 :

> >     min:        8 :        8 :        8 :        8 :        8 :        8 :

> >     max:       46 :       46 :       46 :       46 :       46 :       46 :

> >     des:        0 :        0 :        0 :        0 :        0 :        0 :

> >     epp:      128 :      128 :      128 :      128 :      128 :      128 :

> >     act:        0 :        0 :        0 :        0 :        0 :        0 :

> > 7.) 0x777: IA32_HWP_STATUS: 0 : high 0 : guaranteed 0 : efficient 0 : lowest 0

> >

> > i5-9600K (no HWP - acpi-cpufreq/ondemand after test):

> >

> > root@s18:/home/doug/c# /home/doug/c/msr-decoder

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :   8 :   8 :   8 :   8 :   8 :   8 :

> > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6 :   6 :   6 :

> > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable : RHO disable

> >

> > i5-6200U (HWP - powersave after test):

> >

> > 8.) 0x198: IA32_PERF_STATUS : CPU 0-3 : 19 : 19 : 19 : 19 :

> > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > 1.) 0x19C: IA32_THERM_STATUS: 88430000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88420000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO enable

> > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 105171C : high 28 : guaranteed 23 : efficient 5 :

> > lowest 1

> > 6.) 0x774: IA32_HWP_REQUEST: CPU 0-3 :

> >     raw: 80001B04 : 80001B04 : 80001B04 : 80001B04 :

> >     min:        4 :        4 :        4 :        4 :

> >     max:       27 :       27 :       27 :       27 :

> >     des:        0 :        0 :        0 :        0 :

> >     epp:      128 :      128 :      128 :      128 :

> >     act:        0 :        0 :        0 :        0 :

> > 7.) 0x777: IA32_HWP_STATUS: 4 : high 4 : guaranteed 0 : efficient 0 : lowest 0

> >

> > i5-6200U (no HWP - acpi-cpufreq/ondemand after test):

> >

> > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-3 :  23 :  23 :  23 :  23 :

> > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > 9.) 0x199: IA32_PERF_CTL        : CPU 0-3 :  11 :   5 :   5 :   5 :

> > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-3 :   6 :   6 :   6 :   6 :

> > 1.) 0x19C: IA32_THERM_STATUS: 88440000

> > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination enabled OOB Bit 8 reset OOB Bit 18

> > reset

> > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88430000

> > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO enable
Srinivas Pandruvada Aug. 2, 2020, 6:18 p.m. UTC | #5
On Sun, 2020-08-02 at 07:36 -0700, Doug Smythies wrote:
> Hi Srinivas, or anybody at Intel,

> 

> Any chance of you looking into this issue.

> I first raised it over 2 months ago.


Hi Doug,

Unfortunately, didn't reach to this yet.

Thanks,
Srinivas


> 

> On 2020.07.08 07:41 Doug Smythies wrote:

> > On 2020.06.30 11:41 Doug Smythies wrote:

> > > Hi Srinivas,

> > > 

> > > O.K. let's try this again, starting a new thread, with address

> > > list similar to a few weeks ago.

> > > I believe I have untangled my multiple issues, such that this e-

> > > mail should be only about

> > > the single issue of HWP capable processors incorrectly deciding

> > > to lower the CPU frequency

> > > under some conditions. Also, my previous assertion as to the

> > > issue was indeed incorrect.

> > > 

> > > I now:

> > > . never use x86_energy_perf_policy.

> > > . For HWP disabled: never change from active to passive or via

> > > versa, but rather do it via boot.

> > > . after boot always check and reset the various power limit log

> > > bits that are set.

> > > . never compile the kernel (well, until after any tests), which

> > > will set those bits again.

> > > . never run prime95 high heat torture test, which will set those

> > > bits again.

> > > . Note that the tests done for this e-mail never ever set those

> > > bits again.

> > > . Invented an entirely new way to manifest, demonstrate, and

> > > exploit the issue (also mentioned June

> > > 6th).

> > > . All tests were repeated on another HWP capable computer, so a

> > > i5-9600K and a i5-6200U.

> > > 

> > > New method (old was periodic workflow):

> > > 

> > > Long busy, short gap, busy but taking loop time samples so as to

> > > estimate CPU frequency.

> > > I am calling it an inverse impulse response test.

> > > 

> > > Assertion:

> > > 

> > > If the short sleep is somehow simultaneous with some sort of 5.0

> > > millisecond (200 Hertz)

> > > periodic event (either in HWP itself, or via the driver, I am

> > > unable to determine which,

> > > but think it is inside the black box that is HWP),

> > 

> > I have been attempting to characterise the "black box" that is HWP.

> > In terms of system response verses EPP, I only observe the HWP loop

> > time as the

> > response variable.

> > 

> > 0 <= EPP <= 1 : My test can not measure loop time.

> > 2 <= EPP <= 39 : HWP servo loop time 2 milliseconds

> > 40 <= EPP <= 55 : HWP servo loop time 3 milliseconds

> > 56 <= EPP <= 79 : HWP servo loop time 4 milliseconds

> > 80 <= EPP <= 133 : HWP servo loop time 5 milliseconds

> > 134 <= EPP <= 143 : HWP servo loop time 6 milliseconds

> > 144 <= EPP <= 154 : HWP servo loop time 7 milliseconds

> > 155 <= EPP <= 175 : HWP servo loop time 8 milliseconds

> > 176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

> > 

> > If there are other system response differences within

> > those groups, I haven't been able to detect them,

> > but would be grateful for any further insight.

> > 

> > Otherwise, in future, I do not see a need to test anything

> > other than 9 values of EPP, one from each group.

> > 

> > > then there is a possibility that the

> > > CPU frequency will drop significantly and will take an excessive

> > > amount of time to recover.

> > > Frequency step ups are exactly on 5.0 millisecond boundaries +/-

> > > the short gap time.

> > > 

> > > . The probability is somewhat inconsistent and a function of

> > > whatever else the computer is doing.

> > > . The time to recover is a function of EPP, and if EPP is low

> > > enough my test never fails.

> > > . These tests were all done with default settings.

> > > . The "5.0" mSec is only for those default settings, it actually

> > > depends on EPP.

> > >   . Crude step boundaries, mSec: EPP=32, 2; EPP=64, 4; EPP=128,

> > > 5.00; EPP=196, 9

> > 

> > Now fully understood, as listed above.

> > 

> > > . High level: i5-9600K: 2453 tests, 60 failures, 2.45% fail rate.

> > > (HWP - powersave)

> > > . High level: i5-6200U: 4134 tests, 128 failures, 3.1% fail rate.

> > > (HWP - powersave)

> > > . Low level (capture waveforms): i5-9600K: 1842 captured failure

> > > waveforms. See graph.

> > > . Low level (capture waveforms): i5-6200U: 458 captured failure

> > > waveforms. See graph.

> > > . Verify acpi-cpufreq/ondemand works fine: i5-9600K: 8975 tests.

> > > 0 failures.

> > > . Verify acpi-cpufreq/ondemand works fine: i5-6200U: 8575 tests.

> > > 0 failures.

> > 

> > The tests were all done using the teo idle governor.

> > While the menu governor does not fail for this particular test, it

> > fails

> > in other scenarios.

> > 

> > I have yet to find a failure scenario when idle state 2 is

> > disabled.

> > I have captured and analyzed about 400 megabytes of trace data,

> > and have not been able to isolate an exact correlation.

> > 

> > > The short gap was 842 uSeconds for all these tests, and for no

> > > particular reason.

> > > 

> > > While I have not re-done the bounds investigation, I have no

> > > reason to doubt

> > > my previous work, re-stated below:

> > > 

> > > > Gap definition:

> > > > lower limit not known, but < 747 uSeconds.

> > > > Upper limit is between 952 and 955 uSeconds (there will be some

> > > > overhead uncertainties).

> > 

> > The only new information I have is that the upper bound is bigger.

> > 

> > > > Must be preceded by busy time spanning a couple of HWP sampling

> > > > boundaries

> > > > or jiffy boundaries or something (I don't actually know how HWP

> > > > does stuff).

> > > 

> > > Rather than point to graphs, which nobody seems to look at, they

> > > are attached,

> > > and so might get striped for some of you.

> > > 

> > > ... Doug

> > > 

> > > Addendum: Some of the MSRs you have requested in the past:

> > > 

> > > i5-9600K (HWP - powersave after test):

> > > 

> > > root@s18:/home/doug# /home/doug/c/msr-decoder

> > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > > :   8 :   8 :

> > > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > > enabled OOB Bit 8 reset OOB Bit 18

> > > reset

> > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable :

> > > RHO disable

> > > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 108252E : high 46

> > > : guaranteed 37 : efficient 8 :

> > > lowest 1

> > > 6.) 0x774: IA32_HWP_REQUEST:    CPU 0-5 :

> > >     raw: 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 :

> > > 80002E08 :

> > >     min:        8 :        8 :        8 :        8 :        8

> > > :        8 :

> > >     max:       46 :       46 :       46 :       46 :       46

> > > :       46 :

> > >     des:        0 :        0 :        0 :        0 :        0

> > > :        0 :

> > >     epp:      128 :      128 :      128 :      128 :      128

> > > :      128 :

> > >     act:        0 :        0 :        0 :        0 :        0

> > > :        0 :

> > > 7.) 0x777: IA32_HWP_STATUS: 0 : high 0 : guaranteed 0 : efficient

> > > 0 : lowest 0

> > > 

> > > i5-9600K (no HWP - acpi-cpufreq/ondemand after test):

> > > 

> > > root@s18:/home/doug/c# /home/doug/c/msr-decoder

> > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > > :   8 :   8 :

> > > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :   8 :   8 :   8 :   8

> > > :   8 :   8 :

> > > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6

> > > :   6 :   6 :

> > > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > > enabled OOB Bit 8 reset OOB Bit 18

> > > reset

> > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable :

> > > RHO disable

> > > 

> > > i5-6200U (HWP - powersave after test):

> > > 

> > > 8.) 0x198: IA32_PERF_STATUS : CPU 0-3 : 19 : 19 : 19 : 19 :

> > > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > > 1.) 0x19C: IA32_THERM_STATUS: 88430000

> > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > > enabled OOB Bit 8 reset OOB Bit 18

> > > reset

> > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88420000

> > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > > enable

> > > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 105171C : high 28

> > > : guaranteed 23 : efficient 5 :

> > > lowest 1

> > > 6.) 0x774: IA32_HWP_REQUEST: CPU 0-3 :

> > >     raw: 80001B04 : 80001B04 : 80001B04 : 80001B04 :

> > >     min:        4 :        4 :        4 :        4 :

> > >     max:       27 :       27 :       27 :       27 :

> > >     des:        0 :        0 :        0 :        0 :

> > >     epp:      128 :      128 :      128 :      128 :

> > >     act:        0 :        0 :        0 :        0 :

> > > 7.) 0x777: IA32_HWP_STATUS: 4 : high 4 : guaranteed 0 : efficient

> > > 0 : lowest 0

> > > 

> > > i5-6200U (no HWP - acpi-cpufreq/ondemand after test):

> > > 

> > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-3 :  23 :  23 :  23 :  23

> > > :

> > > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-3 :  11 :   5 :   5 :   5

> > > :

> > > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-3 :   6 :   6 :   6 :   6

> > > :

> > > 1.) 0x19C: IA32_THERM_STATUS: 88440000

> > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > > enabled OOB Bit 8 reset OOB Bit 18

> > > reset

> > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88430000

> > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > > enable

> 

>
Doug Smythies Dec. 17, 2020, 4:58 p.m. UTC | #6
On 2020.08.02 Srinivas Pandruvada wrote:
> On Sun, 2020-08-02 at 07:36 -0700, Doug Smythies wrote:

> > Hi Srinivas, or anybody at Intel,

> >

> > Any chance of you looking into this issue.

> > I first raised it over 2 months ago.

> 

> Hi Doug,

> 

> Unfortunately, didn't reach to this yet.


O.K., I created a bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=210741

> 

> Thanks,

> Srinivas

> 

> 

> >

> > On 2020.07.08 07:41 Doug Smythies wrote:

> > > On 2020.06.30 11:41 Doug Smythies wrote:

> > > > Hi Srinivas,

> > > >

> > > > O.K. let's try this again, starting a new thread, with address

> > > > list similar to a few weeks ago.

> > > > I believe I have untangled my multiple issues, such that this e-

> > > > mail should be only about

> > > > the single issue of HWP capable processors incorrectly deciding

> > > > to lower the CPU frequency

> > > > under some conditions. Also, my previous assertion as to the

> > > > issue was indeed incorrect.

> > > >

> > > > I now:

> > > > . never use x86_energy_perf_policy.

> > > > . For HWP disabled: never change from active to passive or via

> > > > versa, but rather do it via boot.

> > > > . after boot always check and reset the various power limit log

> > > > bits that are set.

> > > > . never compile the kernel (well, until after any tests), which

> > > > will set those bits again.

> > > > . never run prime95 high heat torture test, which will set those

> > > > bits again.

> > > > . Note that the tests done for this e-mail never ever set those

> > > > bits again.

> > > > . Invented an entirely new way to manifest, demonstrate, and

> > > > exploit the issue (also mentioned June

> > > > 6th).

> > > > . All tests were repeated on another HWP capable computer, so a

> > > > i5-9600K and a i5-6200U.

> > > >

> > > > New method (old was periodic workflow):

> > > >

> > > > Long busy, short gap, busy but taking loop time samples so as to

> > > > estimate CPU frequency.

> > > > I am calling it an inverse impulse response test.

> > > >

> > > > Assertion:

> > > >

> > > > If the short sleep is somehow simultaneous with some sort of 5.0

> > > > millisecond (200 Hertz)

> > > > periodic event (either in HWP itself, or via the driver, I am

> > > > unable to determine which,

> > > > but think it is inside the black box that is HWP),

> > >

> > > I have been attempting to characterise the "black box" that is HWP.

> > > In terms of system response verses EPP, I only observe the HWP loop

> > > time as the

> > > response variable.

> > >

> > > 0 <= EPP <= 1 : My test can not measure loop time.

> > > 2 <= EPP <= 39 : HWP servo loop time 2 milliseconds

> > > 40 <= EPP <= 55 : HWP servo loop time 3 milliseconds

> > > 56 <= EPP <= 79 : HWP servo loop time 4 milliseconds

> > > 80 <= EPP <= 133 : HWP servo loop time 5 milliseconds

> > > 134 <= EPP <= 143 : HWP servo loop time 6 milliseconds

> > > 144 <= EPP <= 154 : HWP servo loop time 7 milliseconds

> > > 155 <= EPP <= 175 : HWP servo loop time 8 milliseconds

> > > 176 <= EPP <= 255 : HWP servo loop time 9 milliseconds

> > >

> > > If there are other system response differences within

> > > those groups, I haven't been able to detect them,

> > > but would be grateful for any further insight.

> > >

> > > Otherwise, in future, I do not see a need to test anything

> > > other than 9 values of EPP, one from each group.

> > >

> > > > then there is a possibility that the

> > > > CPU frequency will drop significantly and will take an excessive

> > > > amount of time to recover.

> > > > Frequency step ups are exactly on 5.0 millisecond boundaries +/-

> > > > the short gap time.

> > > >

> > > > . The probability is somewhat inconsistent and a function of

> > > > whatever else the computer is doing.

> > > > . The time to recover is a function of EPP, and if EPP is low

> > > > enough my test never fails.

> > > > . These tests were all done with default settings.

> > > > . The "5.0" mSec is only for those default settings, it actually

> > > > depends on EPP.

> > > >   . Crude step boundaries, mSec: EPP=32, 2; EPP=64, 4; EPP=128,

> > > > 5.00; EPP=196, 9

> > >

> > > Now fully understood, as listed above.

> > >

> > > > . High level: i5-9600K: 2453 tests, 60 failures, 2.45% fail rate.

> > > > (HWP - powersave)

> > > > . High level: i5-6200U: 4134 tests, 128 failures, 3.1% fail rate.

> > > > (HWP - powersave)

> > > > . Low level (capture waveforms): i5-9600K: 1842 captured failure

> > > > waveforms. See graph.

> > > > . Low level (capture waveforms): i5-6200U: 458 captured failure

> > > > waveforms. See graph.

> > > > . Verify acpi-cpufreq/ondemand works fine: i5-9600K: 8975 tests.

> > > > 0 failures.

> > > > . Verify acpi-cpufreq/ondemand works fine: i5-6200U: 8575 tests.

> > > > 0 failures.

> > >

> > > The tests were all done using the teo idle governor.

> > > While the menu governor does not fail for this particular test, it

> > > fails

> > > in other scenarios.

> > >

> > > I have yet to find a failure scenario when idle state 2 is

> > > disabled.

> > > I have captured and analyzed about 400 megabytes of trace data,

> > > and have not been able to isolate an exact correlation.

> > >

> > > > The short gap was 842 uSeconds for all these tests, and for no

> > > > particular reason.

> > > >

> > > > While I have not re-done the bounds investigation, I have no

> > > > reason to doubt

> > > > my previous work, re-stated below:

> > > >

> > > > > Gap definition:

> > > > > lower limit not known, but < 747 uSeconds.

> > > > > Upper limit is between 952 and 955 uSeconds (there will be some

> > > > > overhead uncertainties).

> > >

> > > The only new information I have is that the upper bound is bigger.

> > >

> > > > > Must be preceded by busy time spanning a couple of HWP sampling

> > > > > boundaries

> > > > > or jiffy boundaries or something (I don't actually know how HWP

> > > > > does stuff).

> > > >

> > > > Rather than point to graphs, which nobody seems to look at, they

> > > > are attached,

> > > > and so might get striped for some of you.

> > > >

> > > > ... Doug

> > > >

> > > > Addendum: Some of the MSRs you have requested in the past:

> > > >

> > > > i5-9600K (HWP - powersave after test):

> > > >

> > > > root@s18:/home/doug# /home/doug/c/msr-decoder

> > > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > > > :   8 :   8 :

> > > > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > > > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > > > enabled OOB Bit 8 reset OOB Bit 18

> > > > reset

> > > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable :

> > > > RHO disable

> > > > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 108252E : high 46

> > > > : guaranteed 37 : efficient 8 :

> > > > lowest 1

> > > > 6.) 0x774: IA32_HWP_REQUEST:    CPU 0-5 :

> > > >     raw: 80002E08 : 80002E08 : 80002E08 : 80002E08 : 80002E08 :

> > > > 80002E08 :

> > > >     min:        8 :        8 :        8 :        8 :        8

> > > > :        8 :

> > > >     max:       46 :       46 :       46 :       46 :       46

> > > > :       46 :

> > > >     des:        0 :        0 :        0 :        0 :        0

> > > > :        0 :

> > > >     epp:      128 :      128 :      128 :      128 :      128

> > > > :      128 :

> > > >     act:        0 :        0 :        0 :        0 :        0

> > > > :        0 :

> > > > 7.) 0x777: IA32_HWP_STATUS: 0 : high 0 : guaranteed 0 : efficient

> > > > 0 : lowest 0

> > > >

> > > > i5-9600K (no HWP - acpi-cpufreq/ondemand after test):

> > > >

> > > > root@s18:/home/doug/c# /home/doug/c/msr-decoder

> > > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-5 :   8 :   8 :   8 :   8

> > > > :   8 :   8 :

> > > > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-5 :   8 :   8 :   8 :   8

> > > > :   8 :   8 :

> > > > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-5 :   6 :   6 :   6 :   6

> > > > :   6 :   6 :

> > > > 1.) 0x19C: IA32_THERM_STATUS: 88480000

> > > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 401CC0 EIST enabled Coordination

> > > > enabled OOB Bit 8 reset OOB Bit 18

> > > > reset

> > > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88460000

> > > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > > A.) 0x1FC: MSR_POWER_CTL: 3C005D : C1E disable : EEO disable :

> > > > RHO disable

> > > >

> > > > i5-6200U (HWP - powersave after test):

> > > >

> > > > 8.) 0x198: IA32_PERF_STATUS : CPU 0-3 : 19 : 19 : 19 : 19 :

> > > > B.) 0x770: IA32_PM_ENABLE: 1 : HWP enable

> > > > 1.) 0x19C: IA32_THERM_STATUS: 88430000

> > > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > > > enabled OOB Bit 8 reset OOB Bit 18

> > > > reset

> > > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88420000

> > > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > > > enable

> > > > 5.) 0x771: IA32_HWP_CAPABILITIES (performance): 105171C : high 28

> > > > : guaranteed 23 : efficient 5 :

> > > > lowest 1

> > > > 6.) 0x774: IA32_HWP_REQUEST: CPU 0-3 :

> > > >     raw: 80001B04 : 80001B04 : 80001B04 : 80001B04 :

> > > >     min:        4 :        4 :        4 :        4 :

> > > >     max:       27 :       27 :       27 :       27 :

> > > >     des:        0 :        0 :        0 :        0 :

> > > >     epp:      128 :      128 :      128 :      128 :

> > > >     act:        0 :        0 :        0 :        0 :

> > > > 7.) 0x777: IA32_HWP_STATUS: 4 : high 4 : guaranteed 0 : efficient

> > > > 0 : lowest 0

> > > >

> > > > i5-6200U (no HWP - acpi-cpufreq/ondemand after test):

> > > >

> > > > 8.) 0x198: IA32_PERF_STATUS     : CPU 0-3 :  23 :  23 :  23 :  23

> > > > :

> > > > B.) 0x770: IA32_PM_ENABLE: 0 : HWP disable

> > > > 9.) 0x199: IA32_PERF_CTL        : CPU 0-3 :  11 :   5 :   5 :   5

> > > > :

> > > > C.) 0x1B0: IA32_ENERGY_PERF_BIAS: CPU 0-3 :   6 :   6 :   6 :   6

> > > > :

> > > > 1.) 0x19C: IA32_THERM_STATUS: 88440000

> > > > 2.) 0x1AA: MSR_MISC_PWR_MGMT: 4018C0 EIST enabled Coordination

> > > > enabled OOB Bit 8 reset OOB Bit 18

> > > > reset

> > > > 3.) 0x1B1: IA32_PACKAGE_THERM_STATUS: 88430000

> > > > 4.) 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 0

> > > > A.) 0x1FC: MSR_POWER_CTL: 24005D : C1E disable : EEO enable : RHO

> > > > enable

> >

> >
diff mbox series

Patch

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -36,6 +36,7 @@ 
 #define INTEL_PSTATE_SAMPLING_INTERVAL	(10 * NSEC_PER_MSEC)
 
 #define INTEL_CPUFREQ_TRANSITION_LATENCY	20000
+#define INTEL_CPUFREQ_TRANSITION_DELAY_HWP	5000
 #define INTEL_CPUFREQ_TRANSITION_DELAY		500
 
 #ifdef CONFIG_ACPI
@@ -2175,7 +2176,10 @@  static int intel_pstate_verify_policy(st
 
 static void intel_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 {
-	intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]);
+	if (hwp_active)
+		intel_pstate_hwp_force_min_perf(policy->cpu);
+	else
+		intel_pstate_set_min_pstate(all_cpu_data[policy->cpu]);
 }
 
 static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
@@ -2183,12 +2187,10 @@  static void intel_pstate_stop_cpu(struct
 	pr_debug("CPU %d exiting\n", policy->cpu);
 
 	intel_pstate_clear_update_util_hook(policy->cpu);
-	if (hwp_active) {
+	if (hwp_active)
 		intel_pstate_hwp_save_state(policy);
-		intel_pstate_hwp_force_min_perf(policy->cpu);
-	} else {
-		intel_cpufreq_stop_cpu(policy);
-	}
+
+	intel_cpufreq_stop_cpu(policy);
 }
 
 static int intel_pstate_cpu_exit(struct cpufreq_policy *policy)
@@ -2318,13 +2320,58 @@  static void intel_cpufreq_trace(struct c
 		fp_toint(cpu->iowait_boost * 100));
 }
 
+static void intel_cpufreq_update_hwp_request(struct cpudata *cpu, u32 min_perf)
+{
+	u64 value, prev;
+
+	rdmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, &prev);
+	value = prev;
+
+	value &= ~HWP_MIN_PERF(~0L);
+	value |= HWP_MIN_PERF(min_perf);
+
+	/*
+	 * The entire MSR needs to be updated in order to update the HWP min
+	 * field in it, so opportunistically update the max too if needed.
+	 */
+	value &= ~HWP_MAX_PERF(~0L);
+	value |= HWP_MAX_PERF(cpu->max_perf_ratio);
+
+	if (value != prev)
+		wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
+}
+
+/**
+ * intel_cpufreq_adjust_hwp - Adjust the HWP reuqest register.
+ * @cpu: Target CPU.
+ * @target_pstate: P-state corresponding to the target frequency.
+ *
+ * Set the HWP minimum performance limit to 75% of @target_pstate taking the
+ * global min and max policy limits into account.
+ *
+ * The purpose of this is to avoid situations in which the kernel and the HWP
+ * algorithm work against each other by giving a hint about the expectations of
+ * the former to the latter.
+ */
+static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 target_pstate)
+{
+	u32 min_perf;
+
+	min_perf = max_t(u32, (3 * target_pstate) / 4, cpu->min_perf_ratio);
+	min_perf = min_t(u32, min_perf, cpu->max_perf_ratio);
+	if (min_perf != cpu->pstate.current_pstate) {
+		cpu->pstate.current_pstate = min_perf;
+		intel_cpufreq_update_hwp_request(cpu, min_perf);
+	}
+}
+
 static int intel_cpufreq_target(struct cpufreq_policy *policy,
 				unsigned int target_freq,
 				unsigned int relation)
 {
 	struct cpudata *cpu = all_cpu_data[policy->cpu];
+	int target_pstate, old_pstate = cpu->pstate.current_pstate;
 	struct cpufreq_freqs freqs;
-	int target_pstate, old_pstate;
 
 	update_turbo_state();
 
@@ -2332,26 +2379,33 @@  static int intel_cpufreq_target(struct c
 	freqs.new = target_freq;
 
 	cpufreq_freq_transition_begin(policy, &freqs);
+
 	switch (relation) {
 	case CPUFREQ_RELATION_L:
-		target_pstate = DIV_ROUND_UP(freqs.new, cpu->pstate.scaling);
+		target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
 		break;
 	case CPUFREQ_RELATION_H:
-		target_pstate = freqs.new / cpu->pstate.scaling;
+		target_pstate = target_freq / cpu->pstate.scaling;
 		break;
 	default:
-		target_pstate = DIV_ROUND_CLOSEST(freqs.new, cpu->pstate.scaling);
+		target_pstate = DIV_ROUND_CLOSEST(target_freq, cpu->pstate.scaling);
 		break;
 	}
-	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
-	old_pstate = cpu->pstate.current_pstate;
-	if (target_pstate != cpu->pstate.current_pstate) {
-		cpu->pstate.current_pstate = target_pstate;
-		wrmsrl_on_cpu(policy->cpu, MSR_IA32_PERF_CTL,
-			      pstate_funcs.get_val(cpu, target_pstate));
+
+	if (hwp_active) {
+		intel_cpufreq_adjust_hwp(cpu, target_pstate);
+	} else {
+		target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+		if (target_pstate != old_pstate) {
+			cpu->pstate.current_pstate = target_pstate;
+			wrmsrl_on_cpu(cpu->cpu, MSR_IA32_PERF_CTL,
+				      pstate_funcs.get_val(cpu, target_pstate));
+		}
 	}
-	freqs.new = target_pstate * cpu->pstate.scaling;
 	intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_TARGET, old_pstate);
+
+	freqs.new = target_pstate * cpu->pstate.scaling;
+
 	cpufreq_freq_transition_end(policy, &freqs, false);
 
 	return 0;
@@ -2361,14 +2415,19 @@  static unsigned int intel_cpufreq_fast_s
 					      unsigned int target_freq)
 {
 	struct cpudata *cpu = all_cpu_data[policy->cpu];
-	int target_pstate, old_pstate;
+	int target_pstate, old_pstate = cpu->pstate.current_pstate;
 
 	update_turbo_state();
 
 	target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
-	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
-	old_pstate = cpu->pstate.current_pstate;
-	intel_pstate_update_pstate(cpu, target_pstate);
+
+	if (hwp_active) {
+		intel_cpufreq_adjust_hwp(cpu, target_pstate);
+	} else {
+		target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+		intel_pstate_update_pstate(cpu, target_pstate);
+	}
+
 	intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_FAST_SWITCH, old_pstate);
 	return target_pstate * cpu->pstate.scaling;
 }
@@ -2389,7 +2448,6 @@  static int intel_cpufreq_cpu_init(struct
 		return ret;
 
 	policy->cpuinfo.transition_latency = INTEL_CPUFREQ_TRANSITION_LATENCY;
-	policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY;
 	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
 	policy->cur = policy->cpuinfo.min_freq;
 
@@ -2401,10 +2459,13 @@  static int intel_cpufreq_cpu_init(struct
 
 	cpu = all_cpu_data[policy->cpu];
 
-	if (hwp_active)
+	if (hwp_active) {
 		intel_pstate_get_hwp_max(policy->cpu, &turbo_max, &max_state);
-	else
+		policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY_HWP;
+	} else {
 		turbo_max = cpu->pstate.turbo_pstate;
+		policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY;
+	}
 
 	min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100);
 	min_freq *= cpu->pstate.scaling;
@@ -2505,9 +2566,6 @@  static int intel_pstate_register_driver(
 
 static int intel_pstate_unregister_driver(void)
 {
-	if (hwp_active)
-		return -EBUSY;
-
 	cpufreq_unregister_driver(intel_pstate_driver);
 	intel_pstate_driver_cleanup();
 
@@ -2815,12 +2873,11 @@  static int __init intel_pstate_setup(cha
 	if (!str)
 		return -EINVAL;
 
-	if (!strcmp(str, "disable")) {
+	if (!strcmp(str, "disable"))
 		no_load = 1;
-	} else if (!strcmp(str, "passive")) {
+	else if (!strcmp(str, "passive"))
 		default_driver = &intel_cpufreq;
-		no_hwp = 1;
-	}
+
 	if (!strcmp(str, "no_hwp")) {
 		pr_info("HWP disabled\n");
 		no_hwp = 1;