From patchwork Wed Sep 21 10:14:48 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dietmar Eggemann X-Patchwork-Id: 76680 Delivered-To: patch@linaro.org Received: by 10.140.106.72 with SMTP id d66csp1964186qgf; Wed, 21 Sep 2016 03:14:58 -0700 (PDT) X-Received: by 10.66.131.74 with SMTP id ok10mr27373995pab.126.1474452898737; Wed, 21 Sep 2016 03:14:58 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5si40140386pau.218.2016.09.21.03.14.58; Wed, 21 Sep 2016 03:14:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756083AbcIUKO4 (ORCPT + 27 others); Wed, 21 Sep 2016 06:14:56 -0400 Received: from foss.arm.com ([217.140.101.70]:44340 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751387AbcIUKOy (ORCPT ); Wed, 21 Sep 2016 06:14:54 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 347E117C; Wed, 21 Sep 2016 03:14:54 -0700 (PDT) Received: from [10.1.210.41] (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BA84F3F21A; Wed, 21 Sep 2016 03:14:52 -0700 (PDT) Subject: Re: [PATCH 2/7 v3] sched: fix hierarchical order in rq->leaf_cfs_rq_list To: Vincent Guittot , peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, yuyang.du@intel.com, Morten.Rasmussen@arm.com References: <1473666472-13749-1-git-send-email-vincent.guittot@linaro.org> <1473666472-13749-3-git-send-email-vincent.guittot@linaro.org> Cc: linaro-kernel@lists.linaro.org, pjt@google.com, bsegall@google.com From: Dietmar Eggemann Message-ID: Date: Wed, 21 Sep 2016 11:14:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1473666472-13749-3-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vincent, On 12/09/16 08:47, Vincent Guittot wrote: > Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that > a child will always be called before its parent. > > The hierarchical order in shares update list has been introduced by > commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list") > > With the current implementation a child can be still put after its parent. > > Lets take the example of > root > \ > b > /\ > c d* > | > e* > > with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list > looks like: head -> c -> b -> root -> tail > > The branch d -> e will be added the first time that they are enqueued, > starting with e then d. > > When e is added, its parents is not already on the list so e is put at the > tail : head -> c -> b -> root -> e -> tail > > Then, d is added at the head because its parent is already on the list: > head -> d -> c -> b -> root -> e -> tail > > e is not placed at the right position and will be called the last whereas > it should be called at the beginning. > > Because it follows the bottom-up enqueue sequence, we are sure that we > will finished to add either a cfs_rq without parent or a cfs_rq with a parent > that is already on the list. We can use this event to detect when we have > finished to add a new branch. For the others, whose parents are not already > added, we have to ensure that they will be added after their children that > have just been inserted the steps before, and after any potential parents that > are already in the list. The easiest way is to put the cfs_rq just after the > last inserted one and to keep track of it untl the branch is fully added. [...] I tested it with a multi-level task group hierarchy and it does the right thing for this testcase (task t runs alternately on Cpu0 and Cpu1 in tg w/ tg_css_id=6) in a multi-level task group hierarchy (tg_css_id=2,4,6). I wonder if this patch is related to the comment "TODO: fix up out-of-order children on enqueue." in update_shares_cpu() of commit 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation")? I guess in the meantime we lost the functionality to remove a cfs_rq from the leaf_cfs_rq list once there are no se's enqueued on it anymore. If e.g. t migrates away from Cpu1, all the cfs_rq's of the task hierarchy (for tg_css_id=2,4,6) owned by Cpu1 stay in the leaf_cfs_rq list. Shouldn't we reintegrate it? Following patch goes on top of this patch: -- >8 -- From: Dietmar Eggemann Date: Tue, 20 Sep 2016 17:30:09 +0100 Subject: [PATCH] Re-integrate list_del_leaf_cfs_rq() into update_blocked_averages() There is no functionality anymore to delete a cfs_rq from the leaf_cfs_rq list in case there are no se's enqueued on it. The functionality had been initially introduced by commit 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation") but has been taken out by commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking"). Signed-off-by: Dietmar Eggemann --- kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) -- 1.9.1 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dd530359ef84..951c83337e2b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6584,8 +6584,12 @@ static void update_blocked_averages(int cpu) update_tg_load_avg(cfs_rq, 0); /* Propagate pending load changes to the parent */ - if (cfs_rq->tg->se[cpu]) + if (cfs_rq->tg->se[cpu]) { update_load_avg(cfs_rq->tg->se[cpu], 0, 0); + + if (!cfs_rq->nr_running) + list_del_leaf_cfs_rq(cfs_rq); + } } raw_spin_unlock_irqrestore(&rq->lock, flags); }