[V3,0/9] cpufreq: transition-latency cleanups

Message ID	cover.1500373914.git.viresh.kumar@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Viresh Kumar <viresh.kumar@linaro.org> To: Rafael Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org>, linux-pm@vger.kernel.org, Vincent Guittot <vincent.guittot@linaro.org>, linux@dominikbrodowski.net, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Ingo Molnar <mingo@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Len Brown <lenb@kernel.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Michael Ellerman <mpe@ellerman.id.au>, Paul Mackerras <paulus@samba.org>, Peter Zijlstra <peterz@infradead.org>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Sudeep Holla <sudeep.holla@arm.com> Subject: [PATCH V3 0/9] cpufreq: transition-latency cleanups Date: Wed, 19 Jul 2017 15:42:40 +0530 Message-Id: <cover.1500373914.git.viresh.kumar@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	cpufreq: transition-latency cleanups \| expand [V3,0/9] cpufreq: transition-latency cleanups [V3,1/9] cpufreq: governor: Drop min_sampling_rate [V3,2/9] cpufreq: Use transition_delay_us for legacy governors as well [V3,4/9] cpufreq: Don't set transition_latency for setpolicy drivers [V3,5/9] cpufreq: arm_big_little: Make ->get_transition_latency() mandatory [V3,6/9] cpufreq: Replace "max_transition_latency" with "dynamic_switching" [V3,8/9] cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag [V3,9/9] cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency

Viresh Kumar July 19, 2017, 10:12 a.m. UTC

Hi Rafael,

This series tries to cleanup the code around transition-latency and its
users. Some of the old legacy code, which may not make much sense now,
is dropped as well. And some code consolidation is also done across
governors.

Based of: v4.13-rc1
Tested on: ARM64 Hikey board.

I have pushed it here as well (which gets tested by kbuild test bot):

git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/transition-latency

V2->V3:
- Rearranged patches to keep related stuff together
- Introduce CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING flag (Rafael)
- Minor optimization in cpufreq_policy_transition_delay_us() and moved
  it to cpufreq.c (Rafael)
- Allow dynamic switching for drivers which don't know their transition
  latency.

V1->V2:
- While we still get rid of the limitation of 10ms for using
  ondemand/conservative, but we preserve the earlier behavior where the
  transition latency set to CPUFREQ_ETERNAL would not allow use of
  ondemand/conservative governors. Thanks to Dominik for his feedback on
  that.

--
viresh

Viresh Kumar (9):
  cpufreq: governor: Drop min_sampling_rate
  cpufreq: Use transition_delay_us for legacy governors as well
  cpufreq: Cap the default transition delay value to 10 ms
  cpufreq: Don't set transition_latency for setpolicy drivers
  cpufreq: arm_big_little: Make ->get_transition_latency() mandatory
  cpufreq: Replace "max_transition_latency" with "dynamic_switching"
  cpufreq: schedutil: Set dynamic_switching to true
  cpufreq: Add CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING cpufreq driver flag
  cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency

 Documentation/admin-guide/pm/cpufreq.rst |  8 --------
 drivers/cpufreq/arm_big_little.c         | 10 ++++------
 drivers/cpufreq/cpufreq-nforce2.c        |  2 +-
 drivers/cpufreq/cpufreq.c                | 34 ++++++++++++++++++++++++++++----
 drivers/cpufreq/cpufreq_conservative.c   |  6 ------
 drivers/cpufreq/cpufreq_governor.c       | 17 ++--------------
 drivers/cpufreq/cpufreq_governor.h       |  3 +--
 drivers/cpufreq/cpufreq_ondemand.c       | 12 -----------
 drivers/cpufreq/elanfreq.c               |  4 +---
 drivers/cpufreq/gx-suspmod.c             |  2 +-
 drivers/cpufreq/intel_pstate.c           |  1 -
 drivers/cpufreq/longrun.c                |  1 -
 drivers/cpufreq/pmac32-cpufreq.c         |  7 +++++--
 drivers/cpufreq/sa1100-cpufreq.c         |  5 +++--
 drivers/cpufreq/sa1110-cpufreq.c         |  5 +++--
 drivers/cpufreq/sh-cpufreq.c             |  3 +--
 drivers/cpufreq/speedstep-smi.c          |  2 +-
 drivers/cpufreq/unicore2-cpufreq.c       |  3 +--
 include/linux/cpufreq.h                  | 18 ++++++++---------
 kernel/sched/cpufreq_schedutil.c         | 12 ++---------
 20 files changed, 65 insertions(+), 90 deletions(-)

-- 
2.13.0.71.gd7076ec9c9cb

Rafael J. Wysocki July 19, 2017, 12:42 p.m. UTC | #1

On Wednesday, July 19, 2017 03:42:40 PM Viresh Kumar wrote:
> Hi Rafael,

> 

> This series tries to cleanup the code around transition-latency and its

> users. Some of the old legacy code, which may not make much sense now,

> is dropped as well. And some code consolidation is also done across

> governors.

> 

> Based of: v4.13-rc1

> Tested on: ARM64 Hikey board.


From the first quick look this version is fine by me.

Unless I find anything of concern later, this will be queued up for 4.14.

Thanks,
Rafael

Leonard Crestez July 27, 2017, 4:54 p.m. UTC | #2

On Wed, 2017-07-26 at 11:36 +0530, Viresh Kumar wrote:
> On 25-07-17, 14:54, Leonard Crestez wrote:

> > This patch made it's way into linux-next and it seems to cause imx socs

> > to almost always hang around their max frequency with the ondemand

> > governor, even when almost completely idle. The lowest frequency is

> > never reached. This seems wrong?

> > This driver calculates transition_latency at probe time, the value is

> > not terribly accurate but it reaches values like latency = 109 us, so

> So this is the value that is stored in the global variable

> "transition_latency" in the imx6q-cpufreq.c file? i.e.

> transition_latency = 109000 (ns) to be exact ?

Yes.

> - Don't use this patch and try to change ondemand's sampling rate from

>   sysfs. Try setting it to 10000 and see if the behavior is identical

>   to after this patch.

Yes, it seems to be. Also setting 100000 explicitly fixes this.

I also tried to switch from HZ=100 to HZ=1000 but that did not make a
difference.

> - Find how much time does it really take to change the frequency of

>   the CPU. I don't really thing 109 us is the right transition

>   latency. Use attached patch for that and look for the print message.

Your patch measures latencies of around 2.5ms, but it can vary between
1.6 ms to 3ms from boot-to-boot. This is a lot more than what the
driver reports. Most transitions seem to be faster.

I did a little digging and it seems that a majority of time is always
spent inside clk_pllv3_wait_lock which spins on a HW bit while doing
usleep_range(50, 500). I originally thought it was because of
regulators but the delays involved in that are smaller.

Measuring wall time on a process that can sleep seems dubious, isn't
this vulnerable to random delays because of other tasks?

> Without this patch the sampling rate of ondemand governor will be 109

> ms. And after this patch it would be capped at 10 ms. Why would that

> screw up anyone's setup ? I don't have an answer to that right now.

On a closer look it seems that most of the time is actually spent at
low cpufreq though (90%+).

Your change makes it so that even something like "sleep 1; cat
scaling_cur_freq" raises the frequency to the maximum. This happens
enough that even if you do it in a loop you will never see the minimum
frequency. It seems there is enough internal bookkeeping on such a
wakeup that it takes more than 10ms and enough for a reevaluation of
cpufreq until cat returns the value?!

I found this by enabling the power:cpu_frequency tracepoint event and
checking for deltas with a script. Enabling CPU_FREQ_STAT show this:

time_in_state:

396000 1609
792000 71
996000 54

trans_table:

   From  :    To
         :    396000    792000    996000 
   396000:         0        10         7 
   792000:        16         0        12 
   996000:         1        18         0 

This is very unexpected but not necessarily wrong.

--
Regards,
Leonard

Viresh Kumar July 28, 2017, 5:28 a.m. UTC | #3

+ IMX maintainers.

On 27-07-17, 19:54, Leonard Crestez wrote:
> On Wed, 2017-07-26 at 11:36 +0530, Viresh Kumar wrote:


> > - Find how much time does it really take to change the frequency of

> >   the CPU. I don't really thing 109 us is the right transition

> >   latency. Use attached patch for that and look for the print message.

> 

> Your patch measures latencies of around 2.5ms, but it can vary between

> 1.6 ms to 3ms from boot-to-boot. This is a lot more than what the

> driver reports. Most transitions seem to be faster.


Wow !!

I was pretty sure all these figures are just made up by some coder :)

> I did a little digging and it seems that a majority of time is always

> spent inside clk_pllv3_wait_lock which spins on a HW bit while doing

> usleep_range(50, 500). I originally thought it was because of

> regulators but the delays involved in that are smaller.

> 

> Measuring wall time on a process that can sleep seems dubious, isn't

> this vulnerable to random delays because of other tasks?


I am not sure I understood that, sorry.

> > Without this patch the sampling rate of ondemand governor will be 109

> > ms. And after this patch it would be capped at 10 ms. Why would that

> > screw up anyone's setup ? I don't have an answer to that right now.

> 

> On a closer look it seems that most of the time is actually spent at

> low cpufreq though (90%+).

> 

> Your change makes it so that even something like "sleep 1; cat

> scaling_cur_freq" raises the frequency to the maximum.


Why?

> This happens

> enough that even if you do it in a loop you will never see the minimum

> frequency. It seems there is enough internal bookkeeping on such a

> wakeup that it takes more than 10ms and enough for a reevaluation of

> cpufreq until cat returns the value?!


At this point I really feel that this is a hardware specific problem
and it was working by chance until now. And I am not sure if we
shouldn't be stopping this patch from getting merged just because of
that.

At least you can teach your distribution to go increase the sampling
rate from userspace to make it all work.

Can you try one more thing? Try using schedutil governor and see how
it behaves ?

> I found this by enabling the power:cpu_frequency tracepoint event and

> checking for deltas with a script. Enabling CPU_FREQ_STAT show this:

> 

> time_in_state:

> 

> 396000 1609


So we still stay at the lowest frequency most of the time.

> 792000 71

> 996000 54

> 

> trans_table:

> 

>    From  :    To

>          :    396000    792000    996000 

>    396000:         0        10         7 

>    792000:        16         0        12 

>    996000:         1        18         0 


What is it that you are trying to point out here? I still see that we
are coming back to 396 MHz quite often.

Maybe can you compare these values with and without this patch to let
us know?

> This is very unexpected but not necessarily wrong.


I haven't understood the problem completely yet :(

-- 
viresh

Leonard Crestez Aug. 1, 2017, 5:48 p.m. UTC | #4

On Fri, 2017-07-28 at 10:58 +0530, Viresh Kumar wrote:
> On 27-07-17, 19:54, Leonard Crestez wrote:

> > On Wed, 2017-07-26 at 11:36 +0530, Viresh Kumar wrote:

> > > Without this patch the sampling rate of ondemand governor will be 109

> > > ms. And after this patch it would be capped at 10 ms. Why would that

> > > screw up anyone's setup ? I don't have an answer to that right now.

> > On a closer look it seems that most of the time is actually spent at

> > low cpufreq though (90%+).

> > 

> > Your change makes it so that even something like "sleep 1; cat

> > scaling_cur_freq" raises the frequency to the maximum.

> Why?

> 

> > 

> > This happens

> > enough that even if you do it in a loop you will never see the minimum

> > frequency. It seems there is enough internal bookkeeping on such a

> > wakeup that it takes more than 10ms and enough for a reevaluation of

> > cpufreq until cat returns the value?!


> At this point I really feel that this is a hardware specific problem

> and it was working by chance until now. And I am not sure if we

> shouldn't be stopping this patch from getting merged just because of

> that.


Yes, I agree. Something is fishy here but most likely your patch just
expose the problem.

> At least you can teach your distribution to go increase the sampling

> rate from userspace to make it all work.

> 

> Can you try one more thing? Try using schedutil governor and see how

> it behaves ?


I don't have the time to investigate this properly right now.

> > I found this by enabling the power:cpu_frequency tracepoint event and

> > checking for deltas with a script. Enabling CPU_FREQ_STAT show this:

> > 

> > time_in_state:

> > 

> > 396000 1609

> So we still stay at the lowest frequency most of the time.


Yes

> Maybe can you compare these values with and without this patch to let

> us know?


Without the patch it is always at low freq. Sampling at a lower
frequency means spikes get ignored.

Leonard Crestez Aug. 16, 2017, 9:42 a.m. UTC | #5

On Wed, 2017-08-16 at 12:04 +0530, Viresh Kumar wrote:
> On 28-07-17, 10:58, Viresh Kumar wrote:

> > 

> > At this point I really feel that this is a hardware specific problem

> > and it was working by chance until now. And I am not sure if we

> > shouldn't be stopping this patch from getting merged just because of

> > that.

> > 

> > At least you can teach your distribution to go increase the sampling

> > rate from userspace to make it all work.

> Its been 3 weeks since my last email on this thread and no reply yet

> from any of the IMX maintainers. Can someone please help here ?

> 

> @Shawn: Can you help debugging a bit here, to see what's get screwed

> up due to this commit ? Its just that your platform isn't able to

> change freq at 10 ms rate.

> 

> @Rafael: I am not sure, but should we be stopping this patch because

> some hardware isn't able to change freq at 10ms interval and is just

> faking the transition delay to start with ?

> 

> Maybe we get this merged again and the IMX guys can figure out what's

> wrong on their platform and how to fix it ?


I reported the initial issue but did not have the time to do a more
thorough investigation, this is more complicated than it seems. I said
this before but maybe it got lost:

I don't think the odd behavior I noticed justifies keeping the patch
from merging.

--
Regards,
Leonrd

[V3,0/9] cpufreq: transition-latency cleanups

Message

Comments