Message ID | 1568878421-12301-10-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
Series | sched/fair: rework the CFS load balance | expand |
On Thu, 2019-09-19 at 09:33 +0200, Vincent Guittot wrote: > runnable load has been introduced to take into account the case where > blocked load biases the wake up path which may end to select an > overloaded > CPU with a large number of runnable tasks instead of an underutilized > CPU with a huge blocked load. > > Tha wake up path now starts to looks for idle CPUs before comparing > runnable load and it's worth aligning the wake up path with the > load_balance. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> On a single socket system, patches 9 & 10 have the result of driving a woken up task (when wake_wide is true) to the CPU core with the lowest blocked load, even when there is an idle core the task could run on right now. With the whole series applied, I see a 1-2% regression in CPU use due to that issue. With only patches 1-8 applied, I see a 1% improvement in CPU use for that same workload. Given that it looks like select_idle_sibling and find_idlest_group_cpu do roughly the same thing, I wonder if it is enough to simply add an additional test to find_idlest_group to have it return the LLC sg, if it is called on the LLC sd on a single socket system. That way find_idlest_group_cpu can still find an idle core like it does today. Does that seem like a reasonable thing? I can run tests with that :) -- All Rights Reversed.
On Mon, 7 Oct 2019 at 17:14, Rik van Riel <riel@surriel.com> wrote: > > On Thu, 2019-09-19 at 09:33 +0200, Vincent Guittot wrote: > > runnable load has been introduced to take into account the case where > > blocked load biases the wake up path which may end to select an > > overloaded > > CPU with a large number of runnable tasks instead of an underutilized > > CPU with a huge blocked load. > > > > Tha wake up path now starts to looks for idle CPUs before comparing > > runnable load and it's worth aligning the wake up path with the > > load_balance. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > On a single socket system, patches 9 & 10 have the > result of driving a woken up task (when wake_wide is > true) to the CPU core with the lowest blocked load, > even when there is an idle core the task could run on > right now. > > With the whole series applied, I see a 1-2% regression > in CPU use due to that issue. > > With only patches 1-8 applied, I see a 1% improvement in > CPU use for that same workload. Thanks for testing. patch 8-9 have just replaced runnable load by blocked load and then removed the duplicated metrics in find_idlest_group. I'm preparing an additional patch that reworks find_idlest_group() to behave similarly to find_busiest_group(). It gathers statistics what it already does, then classifies the groups and finally selects the idlest one. This should fix the problem that you mentioned above when it selects a group with lowest blocked load whereas there are idle cpus in another group with high blocked load. > > Given that it looks like select_idle_sibling and > find_idlest_group_cpu do roughly the same thing, I > wonder if it is enough to simply add an additional > test to find_idlest_group to have it return the > LLC sg, if it is called on the LLC sd on a single > socket system. That make sense to me > > That way find_idlest_group_cpu can still find an > idle core like it does today. > > Does that seem like a reasonable thing? That's worth testing > > I can run tests with that :) > > -- > All Rights Reversed.
On Mon, 2019-10-07 at 17:27 +0200, Vincent Guittot wrote: > On Mon, 7 Oct 2019 at 17:14, Rik van Riel <riel@surriel.com> wrote: > > On Thu, 2019-09-19 at 09:33 +0200, Vincent Guittot wrote: > > > runnable load has been introduced to take into account the case > > > where > > > blocked load biases the wake up path which may end to select an > > > overloaded > > > CPU with a large number of runnable tasks instead of an > > > underutilized > > > CPU with a huge blocked load. > > > > > > Tha wake up path now starts to looks for idle CPUs before > > > comparing > > > runnable load and it's worth aligning the wake up path with the > > > load_balance. > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > > > On a single socket system, patches 9 & 10 have the > > result of driving a woken up task (when wake_wide is > > true) to the CPU core with the lowest blocked load, > > even when there is an idle core the task could run on > > right now. > > > > With the whole series applied, I see a 1-2% regression > > in CPU use due to that issue. > > > > With only patches 1-8 applied, I see a 1% improvement in > > CPU use for that same workload. > > Thanks for testing. > patch 8-9 have just replaced runnable load by blocked load and then > removed the duplicated metrics in find_idlest_group. > I'm preparing an additional patch that reworks find_idlest_group() > to > behave similarly to find_busiest_group(). It gathers statistics what > it already does, then classifies the groups and finally selects the > idlest one. This should fix the problem that you mentioned above when > it selects a group with lowest blocked load whereas there are idle > cpus in another group with high blocked load. That should do the trick! > > Given that it looks like select_idle_sibling and > > find_idlest_group_cpu do roughly the same thing, I > > wonder if it is enough to simply add an additional > > test to find_idlest_group to have it return the > > LLC sg, if it is called on the LLC sd on a single > > socket system. > > That make sense to me > > > That way find_idlest_group_cpu can still find an > > idle core like it does today. > > > > Does that seem like a reasonable thing? > > That's worth testing I'll give it a try. Doing the full find_idlest_group heuristic inside an LLC seems like it would be overkill, anyway. -- All Rights Reversed.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index acca869..39a37ae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5485,7 +5485,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, s64 this_eff_load, prev_eff_load; unsigned long task_load; - this_eff_load = cpu_runnable_load(cpu_rq(this_cpu)); + this_eff_load = cpu_load(cpu_rq(this_cpu)); if (sync) { unsigned long current_load = task_h_load(current); @@ -5503,7 +5503,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, this_eff_load *= 100; this_eff_load *= capacity_of(prev_cpu); - prev_eff_load = cpu_runnable_load(cpu_rq(prev_cpu)); + prev_eff_load = cpu_load(cpu_rq(prev_cpu)); prev_eff_load -= task_load; if (sched_feat(WA_BIAS)) prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2; @@ -5591,7 +5591,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); runnable_load += load; avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); @@ -5732,7 +5732,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this continue; } - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); if (load < min_load) { min_load = load; least_loaded_cpu = i;
runnable load has been introduced to take into account the case where blocked load biases the wake up path which may end to select an overloaded CPU with a large number of runnable tasks instead of an underutilized CPU with a huge blocked load. Tha wake up path now starts to looks for idle CPUs before comparing runnable load and it's worth aligning the wake up path with the load_balance. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.7.4