[1/2] sched: force update of blocked load of idle cpus

Message ID	20171024122556.15872-2-brendan.jackman@arm.com
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Brendan Jackman <brendan.jackman@arm.com> To: Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, linux-kernel@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com>, Morten Rasmussen <morten.rasmussen@arm.com> Subject: [PATCH 1/2] sched: force update of blocked load of idle cpus Date: Tue, 24 Oct 2017 13:25:55 +0100 Message-Id: <20171024122556.15872-2-brendan.jackman@arm.com> In-Reply-To: <20171024122556.15872-1-brendan.jackman@arm.com> References: <20171024122556.15872-1-brendan.jackman@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	[1/2] sched: force update of blocked load of idle cpus \| expand [1/2] sched: force update of blocked load of idle cpus

Message ID

20171024122556.15872-2-brendan.jackman@arm.com

State

New

Headers

Received-SPF: pass (google.com: best guess record for domain of
	linux-kernel-owner@vger.kernel.org designates 209.132.180.67
	as permitted sender) client-ip=209.132.180.67; 
From: Brendan Jackman <brendan.jackman@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>
Subject: [PATCH 1/2] sched: force update of blocked load of idle cpus
Date: Tue, 24 Oct 2017 13:25:55 +0100
Message-Id: <20171024122556.15872-2-brendan.jackman@arm.com>
In-Reply-To: <20171024122556.15872-1-brendan.jackman@arm.com>
References: <20171024122556.15872-1-brendan.jackman@arm.com>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Series

[1/2] sched: force update of blocked load of idle cpus | expand

Commit Message

Brendan Jackman Oct. 24, 2017, 12:25 p.m. UTC

From: Vincent Guittot <vincent.guittot@linaro.org>


When idle, the blocked load of CPUs will be updated only when an idle
load balance is triggered which may never happen. Because of this
uncertainty on the execution of idle load balance, the utilization,
the load and the shares of idle cfs_rq can stay artificially high and
steal shares and running time to busy cfs_rqs of the task group.
Add a new light idle load balance state which ensures that blocked loads
are periodically updated and decayed but does not perform any task
migration.

The remote load udpates are rate-limited, so that they are not
performed with a shorter period than LOAD_AVG_PERIOD (i.e. PELT
half-life). This is the period after which we have a known 50% error
in stale load.

Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

[Switched remote update interval to use PELT half life]
[Moved update_blocked_averges call outside rebalance_domains
 to simplify code]
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>

---
 kernel/sched/fair.c  | 71 +++++++++++++++++++++++++++++++++++++++++++++-------
 kernel/sched/sched.h |  1 +
 2 files changed, 63 insertions(+), 9 deletions(-)

-- 
2.14.1

Comments

Todd Kjos Nov. 9, 2017, 7:56 p.m. UTC | #1

> @@ -8683,6 +8692,10 @@ static void nohz_balancer_kick(void)

>

>         if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))

>                 return;

> +

> +       if (only_update)

> +               set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));


Should there be an "else clear_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));" ?

Seems like any time this is called as !only_update, we should stop
inhibiting rebalance. In fact, we should perhaps go a little further
so that an only_update never inhibits rebalance from a concurrent
!only_update.

Brendan Jackman Nov. 10, 2017, 2:53 p.m. UTC | #2

Hi Todd,

On Thu, Nov 09 2017 at 19:56, Todd Kjos wrote:
>> @@ -8683,6 +8692,10 @@ static void nohz_balancer_kick(void)

>>

>>         if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))

>>                 return;

>> +

>> +       if (only_update)

>> +               set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));

>

> Should there be an "else clear_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));" ?

>

> Seems like any time this is called as !only_update, we should stop

> inhibiting rebalance. In fact, we should perhaps go a little further

> so that an only_update never inhibits rebalance from a concurrent

> !only_update.

Yes, I think you are essentially right. To make sure I understand you: I
guess you are saying that where two CPUs are concurrently in
nohz_balancer_kick, one with only_update=1 and one with only_update=0,
the former should not prevent the latter from triggering a full load
balance. (I'm assuming they both get the same value for ilb_cpu).

The exact solution you described won't work: only one of those CPUs can
get to this line of code, because of the test_and_set_bit above. So I
think we need something like:

       if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu))) {
               if (!only_update) {
                      /*
                       * There's a pending stats kick or an ongoing
                       * nohz_idle balance that's just for stats.
                       * Convert it to a proper nohz balance.
                       */
                      clear_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));
               }
               return;
       }

       if (only_update)
               set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));

There's still scope for some lost rebalance_domains calls, but as
Vincent pointed out in a recent chat, because the rq->next_balance
fields won't be changed for the CPUs they get missed out, those will
only be delayed until the next scheduler_tick.

I'm on holiday ATM, I'll get to testing that when I get back.

Thanks for the review,
Brendan

Vincent Guittot Nov. 20, 2017, 9:04 a.m. UTC | #3

On 24 October 2017 at 14:25, Brendan Jackman <brendan.jackman@arm.com> wrote:
> From: Vincent Guittot <vincent.guittot@linaro.org>

>

> When idle, the blocked load of CPUs will be updated only when an idle

> load balance is triggered which may never happen. Because of this

> uncertainty on the execution of idle load balance, the utilization,

> the load and the shares of idle cfs_rq can stay artificially high and

> steal shares and running time to busy cfs_rqs of the task group.

> Add a new light idle load balance state which ensures that blocked loads

> are periodically updated and decayed but does not perform any task

> migration.

>

> The remote load udpates are rate-limited, so that they are not

> performed with a shorter period than LOAD_AVG_PERIOD (i.e. PELT

> half-life). This is the period after which we have a known 50% error

> in stale load.

>

> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>

> Cc: Vincent Guittot <vincent.guittot@linaro.org>

> Cc: Ingo Molnar <mingo@redhat.com>

> Cc: Morten Rasmussen <morten.rasmussen@arm.com>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> [Switched remote update interval to use PELT half life]

> [Moved update_blocked_averges call outside rebalance_domains

>  to simplify code]

> Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>

> ---

>  kernel/sched/fair.c  | 71 +++++++++++++++++++++++++++++++++++++++++++++-------

>  kernel/sched/sched.h |  1 +

>  2 files changed, 63 insertions(+), 9 deletions(-)

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 85d1ec1c3b39..9085caf49c76 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -5976,6 +5976,9 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)

>         return min_cap * 1024 < task_util(p) * capacity_margin;

>  }

>

> +static inline bool nohz_kick_needed(struct rq *rq, bool only_update);

> +static void nohz_balancer_kick(bool only_update);

> +

>  /*

>   * select_task_rq_fair: Select target runqueue for the waking task in domains

>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,

> @@ -6074,6 +6077,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f

>         }

>         rcu_read_unlock();

>

> +#ifdef CONFIG_NO_HZ_COMMON

> +       if (nohz_kick_needed(cpu_rq(new_cpu), true))

> +               nohz_balancer_kick(true);

> +#endif

> +

>         return new_cpu;

>  }

>

> @@ -8653,6 +8661,7 @@ static struct {

>         cpumask_var_t idle_cpus_mask;

>         atomic_t nr_cpus;

>         unsigned long next_balance;     /* in jiffy units */

> +       unsigned long next_update;     /* in jiffy units */

>  } nohz ____cacheline_aligned;

>

>  static inline int find_new_ilb(void)

> @@ -8670,7 +8679,7 @@ static inline int find_new_ilb(void)

>   * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle

>   * CPU (if there is one).

>   */

> -static void nohz_balancer_kick(void)

> +static void nohz_balancer_kick(bool only_update)

>  {

>         int ilb_cpu;

>

> @@ -8683,6 +8692,10 @@ static void nohz_balancer_kick(void)

>

>         if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))

>                 return;

> +

> +       if (only_update)

> +               set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));

> +

>         /*

>          * Use smp_send_reschedule() instead of resched_cpu().

>          * This way we generate a sched IPI on the target cpu which

> @@ -8770,6 +8783,8 @@ void nohz_balance_enter_idle(int cpu)

>         atomic_inc(&nohz.nr_cpus);

>         set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));

>  }

> +#else

> +static inline void nohz_balancer_kick(bool only_update) {}

>  #endif

>

>  static DEFINE_SPINLOCK(balancing);

> @@ -8801,8 +8816,6 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)

>         int need_serialize, need_decay = 0;

>         u64 max_cost = 0;

>

> -       update_blocked_averages(cpu);

> -

>         rcu_read_lock();

>         for_each_domain(cpu, sd) {

>                 /*

> @@ -8901,6 +8914,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

>  {

>         int this_cpu = this_rq->cpu;

>         struct rq *rq;

> +       struct sched_domain *sd;

>         int balance_cpu;

>         /* Earliest time when we have to do rebalance again */

>         unsigned long next_balance = jiffies + 60*HZ;

> @@ -8910,6 +8924,23 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

>             !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))

>                 goto end;

>

> +       /*

> +        * This cpu is going to update the blocked load of idle CPUs either

> +        * before doing a rebalancing or just to keep metrics up to date. we

> +        * can safely update the next update timestamp

> +        */

> +       rcu_read_lock();

> +       sd = rcu_dereference(this_rq->sd);

> +       /*

> +        * Check whether there is a sched_domain available for this cpu.

> +        * The last other cpu can have been unplugged since the ILB has been

> +        * triggered and the sched_domain can now be null. The idle balance

> +        * sequence will quickly be aborted as there is no more idle CPUs

> +        */

> +       if (sd)

> +               nohz.next_update = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD);

> +       rcu_read_unlock();

> +

>         for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {

>                 if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))

>                         continue;

> @@ -8936,7 +8967,15 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

>                         cpu_load_update_idle(rq);

>                         rq_unlock_irq(rq, &rf);

>

> -                       rebalance_domains(rq, CPU_IDLE);

> +                       update_blocked_averages(balance_cpu);

> +                       /*

> +                        * This idle load balance softirq may have been

> +                        * triggered only to update the blocked load and shares

> +                        * of idle CPUs (which we have just done for

> +                        * balance_cpu). In that case skip the actual balance.

> +                        */

> +                       if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_cpu)))

> +                               rebalance_domains(rq, idle);

>                 }

>

>                 if (time_after(next_balance, rq->next_balance)) {

> @@ -8967,7 +9006,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

>   *   - For SD_ASYM_PACKING, if the lower numbered cpu's in the scheduler

>   *     domain span are idle.

>   */

> -static inline bool nohz_kick_needed(struct rq *rq)

> +static inline bool nohz_kick_needed(struct rq *rq, bool only_update)

>  {

>         unsigned long now = jiffies;

>         struct sched_domain_shared *sds;

> @@ -8975,7 +9014,7 @@ static inline bool nohz_kick_needed(struct rq *rq)

>         int nr_busy, i, cpu = rq->cpu;

>         bool kick = false;

>

> -       if (unlikely(rq->idle_balance))

> +       if (unlikely(rq->idle_balance) && !only_update)

>                 return false;

>

>         /*

> @@ -8992,6 +9031,13 @@ static inline bool nohz_kick_needed(struct rq *rq)

>         if (likely(!atomic_read(&nohz.nr_cpus)))

>                 return false;

>

> +       if (only_update) {

> +               if (time_before(now, nohz.next_update))

> +                       return false;

> +               else

> +                       return true;

> +       }

> +

>         if (time_before(now, nohz.next_balance))

>                 return false;

>

> @@ -9041,6 +9087,7 @@ static inline bool nohz_kick_needed(struct rq *rq)

>  }

>  #else

>  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }

> +static inline bool nohz_kick_needed(struct rq *rq, bool only_update) { return false; }

>  #endif

>

>  /*

> @@ -9062,7 +9109,12 @@ static __latent_entropy void run_rebalance_domains(struct softirq_action *h)

>          * and abort nohz_idle_balance altogether if we pull some load.

>          */

>         nohz_idle_balance(this_rq, idle);

> -       rebalance_domains(this_rq, idle);

> +       update_blocked_averages(this_rq->cpu);

> +       if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu)))


The NOHZ_STATS_KICK field is only defined with CONFIG_NO_HZ_COMMON.

> +               rebalance_domains(this_rq, idle);

> +#ifdef CONFIG_NO_HZ_COMMON

> +       clear_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu));

> +#endif

>  }

>

>  /*

> @@ -9077,8 +9129,8 @@ void trigger_load_balance(struct rq *rq)

>         if (time_after_eq(jiffies, rq->next_balance))

>                 raise_softirq(SCHED_SOFTIRQ);

>  #ifdef CONFIG_NO_HZ_COMMON

> -       if (nohz_kick_needed(rq))

> -               nohz_balancer_kick();

> +       if (nohz_kick_needed(rq, false))

> +               nohz_balancer_kick(false);

>  #endif

>  }

>

> @@ -9657,6 +9709,7 @@ __init void init_sched_fair_class(void)

>

>  #ifdef CONFIG_NO_HZ_COMMON

>         nohz.next_balance = jiffies;

> +       nohz.next_update = jiffies;

>         zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);

>  #endif

>  #endif /* SMP */

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

> index 14db76cd496f..6f95ef653f73 100644

> --- a/kernel/sched/sched.h

> +++ b/kernel/sched/sched.h

> @@ -1978,6 +1978,7 @@ extern void cfs_bandwidth_usage_dec(void);

>  enum rq_nohz_flag_bits {

>         NOHZ_TICK_STOPPED,

>         NOHZ_BALANCE_KICK,

> +       NOHZ_STATS_KICK

>  };

>

>  #define nohz_flags(cpu)        (&cpu_rq(cpu)->nohz_flags)

> --

> 2.14.1

>

Brendan Jackman Nov. 30, 2017, 3:59 p.m. UTC | #4

Hi Vincent,

On Mon, Nov 20 2017 at 09:04, Vincent Guittot wrote:
> On 24 October 2017 at 14:25, Brendan Jackman <brendan.jackman@arm.com> wrote:

>> @@ -9062,7 +9109,12 @@ static __latent_entropy void run_rebalance_domains(struct softirq_action *h)

>>          * and abort nohz_idle_balance altogether if we pull some load.

>>          */

>>         nohz_idle_balance(this_rq, idle);

>> -       rebalance_domains(this_rq, idle);

>> +       update_blocked_averages(this_rq->cpu);

>> +       if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu)))

>

> The NOHZ_STATS_KICK field is only defined with CONFIG_NO_HZ_COMMON.


Damn, sorry. Will fix this and the similar issue you pointed out on
patch 2/2 and send a v2.

Thanks for reviewing,
Brendan

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 85d1ec1c3b39..9085caf49c76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5976,6 +5976,9 @@  static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 	return min_cap * 1024 < task_util(p) * capacity_margin;
 }
 
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update);
+static void nohz_balancer_kick(bool only_update);
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -6074,6 +6077,11 @@  select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	}
 	rcu_read_unlock();
 
+#ifdef CONFIG_NO_HZ_COMMON
+	if (nohz_kick_needed(cpu_rq(new_cpu), true))
+		nohz_balancer_kick(true);
+#endif
+
 	return new_cpu;
 }
 
@@ -8653,6 +8661,7 @@  static struct {
 	cpumask_var_t idle_cpus_mask;
 	atomic_t nr_cpus;
 	unsigned long next_balance;     /* in jiffy units */
+	unsigned long next_update;     /* in jiffy units */
 } nohz ____cacheline_aligned;
 
 static inline int find_new_ilb(void)
@@ -8670,7 +8679,7 @@  static inline int find_new_ilb(void)
  * nohz_load_balancer CPU (if there is one) otherwise fallback to any idle
  * CPU (if there is one).
  */
-static void nohz_balancer_kick(void)
+static void nohz_balancer_kick(bool only_update)
 {
 	int ilb_cpu;
 
@@ -8683,6 +8692,10 @@  static void nohz_balancer_kick(void)
 
 	if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))
 		return;
+
+	if (only_update)
+		set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));
+
 	/*
 	 * Use smp_send_reschedule() instead of resched_cpu().
 	 * This way we generate a sched IPI on the target cpu which
@@ -8770,6 +8783,8 @@  void nohz_balance_enter_idle(int cpu)
 	atomic_inc(&nohz.nr_cpus);
 	set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
+#else
+static inline void nohz_balancer_kick(bool only_update) {}
 #endif
 
 static DEFINE_SPINLOCK(balancing);
@@ -8801,8 +8816,6 @@  static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 	int need_serialize, need_decay = 0;
 	u64 max_cost = 0;
 
-	update_blocked_averages(cpu);
-
 	rcu_read_lock();
 	for_each_domain(cpu, sd) {
 		/*
@@ -8901,6 +8914,7 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 {
 	int this_cpu = this_rq->cpu;
 	struct rq *rq;
+	struct sched_domain *sd;
 	int balance_cpu;
 	/* Earliest time when we have to do rebalance again */
 	unsigned long next_balance = jiffies + 60*HZ;
@@ -8910,6 +8924,23 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
 		goto end;
 
+	/*
+	 * This cpu is going to update the blocked load of idle CPUs either
+	 * before doing a rebalancing or just to keep metrics up to date. we
+	 * can safely update the next update timestamp
+	 */
+	rcu_read_lock();
+	sd = rcu_dereference(this_rq->sd);
+	/*
+	 * Check whether there is a sched_domain available for this cpu.
+	 * The last other cpu can have been unplugged since the ILB has been
+	 * triggered and the sched_domain can now be null. The idle balance
+	 * sequence will quickly be aborted as there is no more idle CPUs
+	 */
+	if (sd)
+		nohz.next_update = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD);
+	rcu_read_unlock();
+
 	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
 		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
 			continue;
@@ -8936,7 +8967,15 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 			cpu_load_update_idle(rq);
 			rq_unlock_irq(rq, &rf);
 
-			rebalance_domains(rq, CPU_IDLE);
+			update_blocked_averages(balance_cpu);
+			/*
+			 * This idle load balance softirq may have been
+			 * triggered only to update the blocked load and shares
+			 * of idle CPUs (which we have just done for
+			 * balance_cpu). In that case skip the actual balance.
+			 */
+			if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_cpu)))
+				rebalance_domains(rq, idle);
 		}
 
 		if (time_after(next_balance, rq->next_balance)) {
@@ -8967,7 +9006,7 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
  *   - For SD_ASYM_PACKING, if the lower numbered cpu's in the scheduler
  *     domain span are idle.
  */
-static inline bool nohz_kick_needed(struct rq *rq)
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update)
 {
 	unsigned long now = jiffies;
 	struct sched_domain_shared *sds;
@@ -8975,7 +9014,7 @@  static inline bool nohz_kick_needed(struct rq *rq)
 	int nr_busy, i, cpu = rq->cpu;
 	bool kick = false;
 
-	if (unlikely(rq->idle_balance))
+	if (unlikely(rq->idle_balance) && !only_update)
 		return false;
 
        /*
@@ -8992,6 +9031,13 @@  static inline bool nohz_kick_needed(struct rq *rq)
 	if (likely(!atomic_read(&nohz.nr_cpus)))
 		return false;
 
+	if (only_update) {
+		if (time_before(now, nohz.next_update))
+			return false;
+		else
+			return true;
+	}
+
 	if (time_before(now, nohz.next_balance))
 		return false;
 
@@ -9041,6 +9087,7 @@  static inline bool nohz_kick_needed(struct rq *rq)
 }
 #else
 static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update) { return false; }
 #endif
 
 /*
@@ -9062,7 +9109,12 @@  static __latent_entropy void run_rebalance_domains(struct softirq_action *h)
 	 * and abort nohz_idle_balance altogether if we pull some load.
 	 */
 	nohz_idle_balance(this_rq, idle);
-	rebalance_domains(this_rq, idle);
+	update_blocked_averages(this_rq->cpu);
+	if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu)))
+		rebalance_domains(this_rq, idle);
+#ifdef CONFIG_NO_HZ_COMMON
+	clear_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu));
+#endif
 }
 
 /*
@@ -9077,8 +9129,8 @@  void trigger_load_balance(struct rq *rq)
 	if (time_after_eq(jiffies, rq->next_balance))
 		raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ_COMMON
-	if (nohz_kick_needed(rq))
-		nohz_balancer_kick();
+	if (nohz_kick_needed(rq, false))
+		nohz_balancer_kick(false);
 #endif
 }
 
@@ -9657,6 +9709,7 @@  __init void init_sched_fair_class(void)
 
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
+	nohz.next_update = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
 #endif
 #endif /* SMP */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14db76cd496f..6f95ef653f73 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1978,6 +1978,7 @@  extern void cfs_bandwidth_usage_dec(void);
 enum rq_nohz_flag_bits {
 	NOHZ_TICK_STOPPED,
 	NOHZ_BALANCE_KICK,
+	NOHZ_STATS_KICK
 };
 
 #define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)

[1/2] sched: force update of blocked load of idle cpus

Commit Message

Comments

Patch