Message ID | 6b165676325a47d67e667582a7b78da85c5c118a.1549536337.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/2] sched/fair: Don't pass sd to select_idle_smt() | expand |
On Thu, Feb 07, 2019 at 04:16:06PM +0530, Viresh Kumar wrote: > @@ -6081,10 +6082,14 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int > for_each_cpu_wrap(core, cpus, target) { > bool idle = true; > > - for_each_cpu(cpu, cpu_smt_mask(core)) { > - cpumask_clear_cpu(cpu, cpus); > - if (!available_idle_cpu(cpu)) > + smt = cpu_smt_mask(core); > + cpumask_andnot(cpus, cpus, smt); So where the previous code was like 1-2 stores, you just added 16. (assuming 64bit and NR_CPUS=1024) And we still do the iteration anyway: > + for_each_cpu(cpu, smt) { > + if (!available_idle_cpu(cpu)) { > idle = false; > + break; > + } > } An actual improvement would've been: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 38d4669aa2ef..2d352d6d15c7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6082,7 +6082,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int bool idle = true; for_each_cpu(cpu, cpu_smt_mask(core)) { - cpumask_clear_cpu(cpu, cpus); + __cpumask_clear_cpu(cpu, cpus); if (!available_idle_cpu(cpu)) idle = false; }
On 11-02-19, 10:30, Peter Zijlstra wrote: > On Thu, Feb 07, 2019 at 04:16:06PM +0530, Viresh Kumar wrote: > > @@ -6081,10 +6082,14 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int > > for_each_cpu_wrap(core, cpus, target) { > > bool idle = true; > > > > - for_each_cpu(cpu, cpu_smt_mask(core)) { > > - cpumask_clear_cpu(cpu, cpus); > > - if (!available_idle_cpu(cpu)) > > + smt = cpu_smt_mask(core); > > + cpumask_andnot(cpus, cpus, smt); > > So where the previous code was like 1-2 stores, you just added 16. Is the max number of possible threads per core just 2? That's what I read just now and I wasn't aware of that earlier. This commit doesn't improve anything then. Sorry for the noise. -- viresh
On Mon, Feb 11, 2019 at 03:56:59PM +0530, Viresh Kumar wrote: > On 11-02-19, 10:30, Peter Zijlstra wrote: > > On Thu, Feb 07, 2019 at 04:16:06PM +0530, Viresh Kumar wrote: > > > @@ -6081,10 +6082,14 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int > > > for_each_cpu_wrap(core, cpus, target) { > > > bool idle = true; > > > > > > - for_each_cpu(cpu, cpu_smt_mask(core)) { > > > - cpumask_clear_cpu(cpu, cpus); > > > - if (!available_idle_cpu(cpu)) > > > + smt = cpu_smt_mask(core); > > > + cpumask_andnot(cpus, cpus, smt); > > > > So where the previous code was like 1-2 stores, you just added 16. > > Is the max number of possible threads per core just 2? That's what I > read just now and I wasn't aware of that earlier. This commit doesn't > improve anything then. Sorry for the noise. We've got up to SMT8 in the tree (Sparc64, Power8 and some MIPS IIRC), but that's still less than having to touch the entire bitmap. Also, Power9 went back to SMT4 and I think the majory of SMT deployments is that or less.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d5c82342a36..ccd0ae9878a2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6068,6 +6068,7 @@ void __update_idle_core(struct rq *rq) static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) { struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); + const struct cpumask *smt; int core, cpu; if (!static_branch_likely(&sched_smt_present)) @@ -6081,10 +6082,14 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int for_each_cpu_wrap(core, cpus, target) { bool idle = true; - for_each_cpu(cpu, cpu_smt_mask(core)) { - cpumask_clear_cpu(cpu, cpus); - if (!available_idle_cpu(cpu)) + smt = cpu_smt_mask(core); + cpumask_andnot(cpus, cpus, smt); + + for_each_cpu(cpu, smt) { + if (!available_idle_cpu(cpu)) { idle = false; + break; + } } if (idle)
Once a non-idle thread is found for a core, there is no point in traversing rest of the threads of that core. We continue traversal currently to clear those threads from "cpus" mask. Clear all the threads with a single call to cpumask_andnot(), which will also let us exit the loop earlier. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- kernel/sched/fair.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) -- 2.20.1.321.g9e740568ce00