Message ID | 1461958364-675-5-git-send-email-dietmar.eggemann@arm.com |
---|---|
State | New |
Headers | show |
On 03/05/16 11:12, Peter Zijlstra wrote: > On Fri, Apr 29, 2016 at 08:32:41PM +0100, Dietmar Eggemann wrote: >> Avoid the need to add scaled_busy_load_per_task on both sides of the if >> condition to determine whether imbalance has to be set to >> busiest->load_per_task or not. >> >> The imbn variable was introduced with commit 2dd73a4f09be ("[PATCH] >> sched: implement smpnice") and the original if condition was >> >> if (max_load - this_load >= busiest_load_per_task * imbn) >> >> which over time changed into the current version where >> scaled_busy_load_per_task is to be found on both sides of >> the if condition. > > This appears to have started with: > > dd41f596cda0 ("sched: cfs core code") > > which for unexplained reasons does: > > - if (max_load - this_load >= busiest_load_per_task * imbn) { > + if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >= > + busiest_load_per_task * imbn) { > > > And later patches (by me) change that FUZZ into a variable metric, > because a fixed fuzz like that didn't at all work for the small loads > that result from cgroup tasks. > > > > Now fix_small_imbalance() always hurt my head; it originated in the > original sched_domain balancer from Nick which wasn't smpnice aware; and > lives on until today. I see, all this code is already in the history.git kernel. > > Its purpose is to determine if moving one task over is beneficial. > However over time -- and smpnice started this -- the idea of _one_ task > became quite muddled. > > With the fine grained load accounting of today; does it even make sense > to ask this question? IOW. what does fix_small_imbalance() really gain > us -- other than a head-ache? So task priority breaks the assumption that 1 task is equivalent to SCHED_LOAD_SCALE and so does fine grained load accounting. fix_small_imbalance() is called twice from calculate_imbalance, if we would get rid of it, I don't know if we should bail out of lb in case the avg load values don't align nicely (busiest > sd avg > local) or just continue w/ lb. In the second case, where the imbalance value is raised (to busiest->load_per_task), we probably can just continue w/ lb, hoping that there is a task on the src rq which fits the smaller imbalance value.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c066574cff04..dc4828bbe50d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6915,7 +6915,7 @@ static inline void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) { unsigned long tmp, capa_now = 0, capa_move = 0; - unsigned int imbn = 2; + unsigned int imbn = 1; unsigned long scaled_busy_load_per_task; struct sg_lb_stats *local, *busiest; @@ -6925,13 +6925,13 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) if (!local->sum_nr_running) local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); else if (busiest->load_per_task > local->load_per_task) - imbn = 1; + imbn = 0; scaled_busy_load_per_task = (busiest->load_per_task * SCHED_CAPACITY_SCALE) / busiest->group_capacity; - if (busiest->avg_load + scaled_busy_load_per_task >= + if (busiest->avg_load >= local->avg_load + (scaled_busy_load_per_task * imbn)) { env->imbalance = busiest->load_per_task; return;
Avoid the need to add scaled_busy_load_per_task on both sides of the if condition to determine whether imbalance has to be set to busiest->load_per_task or not. The imbn variable was introduced with commit 2dd73a4f09be ("[PATCH] sched: implement smpnice") and the original if condition was if (max_load - this_load >= busiest_load_per_task * imbn) which over time changed into the current version where scaled_busy_load_per_task is to be found on both sides of the if condition. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> --- The original smpnice implementation sets imbalance to the avg task load of the busiest sched group (sg) if the difference between the avg load of the busiest sg and the local sg is greater or equal 2 times (imbn) the avg task load of the busiest sg if there is no task running in the local sg or the avg task load of the busiest sg is smaller or equal the avg task load of the local sg. Otherwise the imbn factor is lowered to 1. imbn set to 2 makes sense when all the tasks have the same priority so in case there are n tasks on local sd there have to be n+2 tasks on the busiest sg to give load balance the chance to pull over one task. imbn set to 1 makes sense when the avg task load of the busiest sg is greater than the one of the local sg since there could be at least one task with a smaller load on the busiest sg which can be pulled over to the local sg. The current version lowered imbn effectively by one, so we use 1 instead of 2 in case the avg task load of sg_busiest isn't greater than the one of sg_local and 0 instead of 1 in the other case. This behaviour is not in sync with the explanation above on why we set imbn to certain values. kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 1.9.1