Message ID | 1415370687-18688-2-git-send-email-daniel.lezcano@linaro.org |
---|---|
State | New |
Headers | show |
On 11/07/2014 08:01 PM, Daniel Lezcano wrote: > The poll function is called when a timer expired or if we force to poll when > the cpu_idle_force_poll option is set. > > The poll function does: > > local_irq_enable(); > while (!tif_need_resched()) > cpu_relax(); > > This default poll function suits for the x86 arch because its rep; nop; > hardware power optimization. But on other archs, this optimization does not > exists and we are not saving power. The arch specific bits may want to > optimize this loop by adding their own optimization. > > Give an opportunity to the different platform to specify their own polling > loop by adding a weak cpu_idle_poll_loop function. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > kernel/sched/idle.c | 29 +++++++++++++++++------------ > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index c47fce7..ea65bbae 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -42,18 +42,6 @@ static int __init cpu_idle_nopoll_setup(char *__unused) > __setup("hlt", cpu_idle_nopoll_setup); > #endif > > -static inline int cpu_idle_poll(void) > -{ > - rcu_idle_enter(); > - trace_cpu_idle_rcuidle(0, smp_processor_id()); > - local_irq_enable(); > - while (!tif_need_resched()) > - cpu_relax(); > - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); > - rcu_idle_exit(); > - return 1; > -} > - > /* Weak implementations for optional arch specific functions */ > void __weak arch_cpu_idle_prepare(void) { } > void __weak arch_cpu_idle_enter(void) { } > @@ -65,6 +53,23 @@ void __weak arch_cpu_idle(void) > local_irq_enable(); > } > > +void __weak arch_cpu_idle_poll(void) > +{ > + local_irq_enable(); > + while (!tif_need_resched()) > + cpu_relax(); > +} > + > +static inline int cpu_idle_poll(void) > +{ > + rcu_idle_enter(); > + trace_cpu_idle_rcuidle(0, smp_processor_id()); > + arch_cpu_idle_poll(); > + trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); > + rcu_idle_exit(); > + return 1; > +} > + > /** > * cpuidle_idle_call - the main idle function > * > Thanks Daniel! Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote: > The poll function is called when a timer expired or if we force to poll when > the cpu_idle_force_poll option is set. > > The poll function does: > > local_irq_enable(); > while (!tif_need_resched()) > cpu_relax(); > > This default poll function suits for the x86 arch because its rep; nop; > hardware power optimization. But on other archs, this optimization does not > exists and we are not saving power. The arch specific bits may want to > optimize this loop by adding their own optimization. This doesn't make sense to me; should an arch not either implement an actual idle driver or implement cpu_relax() properly, why allow for a third weird option?
Hi Peter, On 11/10/2014 05:59 PM, Peter Zijlstra wrote: > On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote: >> The poll function is called when a timer expired or if we force to poll when >> the cpu_idle_force_poll option is set. >> >> The poll function does: >> >> local_irq_enable(); >> while (!tif_need_resched()) >> cpu_relax(); >> >> This default poll function suits for the x86 arch because its rep; nop; >> hardware power optimization. But on other archs, this optimization does not >> exists and we are not saving power. The arch specific bits may want to >> optimize this loop by adding their own optimization. > > This doesn't make sense to me; should an arch not either implement an > actual idle driver or implement cpu_relax() properly, why allow for a > third weird option? > The previous version of this patch simply invoked cpu_idle_loop() for cases where latency_req was 0. This would have changed the behavior on PowerPC wherein earlier the 0th idle index was returned which is also a polling loop but differs from cpu_idle_loop() in two ways: a. It polls at a relatively lower power state than cpu_relax(). b. We set certain registers to indicate that the cpu is idle. Hence for all such cases wherein the cpu is required to poll while idle (only for cases such as force poll, broadcast ipi to arrive soon and latency_req = 0), we should be able to call into cpuidle_idle_loop() only if the cpuidle driver's 0th idle state has an exit_latency > 0. (The 0th idle state is expected to be a polling loop with exit_latency = 0). If otherwise, it would mean the driver has an optimized polling loop when idle. But instead of adding in the logic of checking the exit_latency, we thought it would be simpler to call into an arch defined polling idle loop under the above circumstances. If that is no better we could fall back to cpuidle_idle_loop(). Regards Preeti U Murthy
On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote: > Hi Peter, > > On 11/10/2014 05:59 PM, Peter Zijlstra wrote: > > On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote: > >> The poll function is called when a timer expired or if we force to poll when > >> the cpu_idle_force_poll option is set. > >> > >> The poll function does: > >> > >> local_irq_enable(); > >> while (!tif_need_resched()) > >> cpu_relax(); > >> > >> This default poll function suits for the x86 arch because its rep; nop; > >> hardware power optimization. But on other archs, this optimization does not > >> exists and we are not saving power. The arch specific bits may want to > >> optimize this loop by adding their own optimization. > > > > This doesn't make sense to me; should an arch not either implement an > > actual idle driver or implement cpu_relax() properly, why allow for a > > third weird option? > > > > The previous version of this patch simply invoked cpu_idle_loop() for > cases where latency_req was 0. This would have changed the behavior > on PowerPC wherein earlier the 0th idle index was returned which is also > a polling loop but differs from cpu_idle_loop() in two ways: > > a. It polls at a relatively lower power state than cpu_relax(). > b. We set certain registers to indicate that the cpu is idle. So I'm confused; the current code runs the generic cpu_relax idle poll loop for the broadcast case. I suppose you want to retain this because not doing your a-b above will indeed give you a lower latency. Therefore one could argue that latency_req==0 should indeed use this, and your a-b idle state should be latency_req==1 or higher. Thus yes it changes behaviour, but I think it actually fixes something. You cannot have a latency_req==0 state which has higher latency than the actual polling loop, as you appear to have. > Hence for all such cases wherein the cpu is required to poll while idle > (only for cases such as force poll, broadcast ipi to arrive soon and > latency_req = 0), we should be able to call into cpuidle_idle_loop() > only if the cpuidle driver's 0th idle state has an exit_latency > 0. > (The 0th idle state is expected to be a polling loop with > exit_latency = 0). > > If otherwise, it would mean the driver has an optimized polling loop > when idle. But instead of adding in the logic of checking the > exit_latency, we thought it would be simpler to call into an arch > defined polling idle loop under the above circumstances. If that is no > better we could fall back to cpuidle_idle_loop(). That still doesn't make sense to me; suppose the implementation of this special poll state differs on different uarchs for the same arch, then we'll end up with another registration and selection interface parallel to cpuidle.
On 11/10/2014 08:47 PM, Peter Zijlstra wrote: > On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote: >> Hi Peter, >> >> On 11/10/2014 05:59 PM, Peter Zijlstra wrote: >>> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote: >>>> The poll function is called when a timer expired or if we force to poll when >>>> the cpu_idle_force_poll option is set. >>>> >>>> The poll function does: >>>> >>>> local_irq_enable(); >>>> while (!tif_need_resched()) >>>> cpu_relax(); >>>> >>>> This default poll function suits for the x86 arch because its rep; nop; >>>> hardware power optimization. But on other archs, this optimization does not >>>> exists and we are not saving power. The arch specific bits may want to >>>> optimize this loop by adding their own optimization. >>> >>> This doesn't make sense to me; should an arch not either implement an >>> actual idle driver or implement cpu_relax() properly, why allow for a >>> third weird option? >>> >> >> The previous version of this patch simply invoked cpu_idle_loop() for >> cases where latency_req was 0. This would have changed the behavior >> on PowerPC wherein earlier the 0th idle index was returned which is also >> a polling loop but differs from cpu_idle_loop() in two ways: >> >> a. It polls at a relatively lower power state than cpu_relax(). >> b. We set certain registers to indicate that the cpu is idle. > > So I'm confused; the current code runs the generic cpu_relax idle poll > loop for the broadcast case. I suppose you want to retain this because > not doing your a-b above will indeed give you a lower latency. > > Therefore one could argue that latency_req==0 should indeed use this, > and your a-b idle state should be latency_req==1 or higher. > > Thus yes it changes behaviour, but I think it actually fixes something. > You cannot have a latency_req==0 state which has higher latency than the > actual polling loop, as you appear to have. Yes you are right. This is fixing the current behavior. So we should be good to call cpuidle_idle_loop() when latency_req=0. > >> Hence for all such cases wherein the cpu is required to poll while idle >> (only for cases such as force poll, broadcast ipi to arrive soon and >> latency_req = 0), we should be able to call into cpuidle_idle_loop() >> only if the cpuidle driver's 0th idle state has an exit_latency > 0. >> (The 0th idle state is expected to be a polling loop with >> exit_latency = 0). >> >> If otherwise, it would mean the driver has an optimized polling loop >> when idle. But instead of adding in the logic of checking the >> exit_latency, we thought it would be simpler to call into an arch >> defined polling idle loop under the above circumstances. If that is no >> better we could fall back to cpuidle_idle_loop(). > > That still doesn't make sense to me; suppose the implementation of this > special poll state differs on different uarchs for the same arch, then > we'll end up with another registration and selection interface parallel > to cpuidle. Yeah this will only get complicated. I was trying to see how to keep the current behavior unchanged but looks like it is not worth it. Thanks Regards Preeti U Murthy > >
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index c47fce7..ea65bbae 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -42,18 +42,6 @@ static int __init cpu_idle_nopoll_setup(char *__unused) __setup("hlt", cpu_idle_nopoll_setup); #endif -static inline int cpu_idle_poll(void) -{ - rcu_idle_enter(); - trace_cpu_idle_rcuidle(0, smp_processor_id()); - local_irq_enable(); - while (!tif_need_resched()) - cpu_relax(); - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); - rcu_idle_exit(); - return 1; -} - /* Weak implementations for optional arch specific functions */ void __weak arch_cpu_idle_prepare(void) { } void __weak arch_cpu_idle_enter(void) { } @@ -65,6 +53,23 @@ void __weak arch_cpu_idle(void) local_irq_enable(); } +void __weak arch_cpu_idle_poll(void) +{ + local_irq_enable(); + while (!tif_need_resched()) + cpu_relax(); +} + +static inline int cpu_idle_poll(void) +{ + rcu_idle_enter(); + trace_cpu_idle_rcuidle(0, smp_processor_id()); + arch_cpu_idle_poll(); + trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); + rcu_idle_exit(); + return 1; +} + /** * cpuidle_idle_call - the main idle function *
The poll function is called when a timer expired or if we force to poll when the cpu_idle_force_poll option is set. The poll function does: local_irq_enable(); while (!tif_need_resched()) cpu_relax(); This default poll function suits for the x86 arch because its rep; nop; hardware power optimization. But on other archs, this optimization does not exists and we are not saving power. The arch specific bits may want to optimize this loop by adding their own optimization. Give an opportunity to the different platform to specify their own polling loop by adding a weak cpu_idle_poll_loop function. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- kernel/sched/idle.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-)