[RFC,0/2] Disable RT-throttling for idle-inject threads

Message ID	20240410045417.3048209-1-quic_atulpant@quicinc.com
Headers	show Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0989911713; Wed, 10 Apr 2024 04:55:12 +0000 (UTC) From: Atul Pant <quic_atulpant@quicinc.com> To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, rafael@kernel.org, daniel.lezcano@linaro.org Cc: Atul Pant <quic_atulpant@quicinc.com>, kernel@quicinc.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Subject: [RFC PATCH 0/2] Disable RT-throttling for idle-inject threads Date: Wed, 10 Apr 2024 10:24:15 +0530 Message-Id: <20240410045417.3048209-1-quic_atulpant@quicinc.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Disable RT-throttling for idle-inject threads \| expand [RFC,0/2] Disable RT-throttling for idle-inject threads [RFC,1/2] sched/rt: Disable runtime accounting for idle threads with SCHED_FIFO policy [RFC,2/2] sched/idle: Add a description for play_idle_precise

Atul Pant April 10, 2024, 4:54 a.m. UTC

We are trying to implement a solution for thermal mitigation by using
idle injection on CPUs.  However we face some limitations with the
current idle-inject framework. As per our need, we want to start
injecting idle cycles on a cpu for indefinite time (until the
temperature/power of the CPU falls below a threshold). This will allow
to keep the hot CPUs in the sleep state until we see improvement in
temperature/power. If we set idle duration to a large value or have an
idle-injection ratio of 100%,  then the idle-inject RT thread suffers
from RT throttling. This results in the CPU exiting from the sleep state
and consume some power.

To solve this limitation, we propose a solution to disable RT-throttling
whenever idle-inject threads run. We achieve this by not accounting the
runtime for the idle-inject threads.

Atul Pant (2):
  sched/rt: Disable runtime accounting for idle threads with SCHED_FIFO
    policy
  sched/idle: Add a description for play_idle_precise

 kernel/sched/idle.c | 5 +++++
 kernel/sched/rt.c   | 4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Daniel Bristot de Oliveira April 10, 2024, 7 a.m. UTC | #1

On 4/10/24 06:54, Atul Pant wrote:
> To prevent the throttling of RT idle threads, like the idle-inject
> threads, skip accounting of runtime for these threads.
> 
> Signed-off-by: Atul Pant <quic_atulpant@quicinc.com>
> ---
>  kernel/sched/rt.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4ac36eb4cdee..d20999270e75 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1075,7 +1075,9 @@ static void update_curr_rt(struct rq *rq)
>  		struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
>  		int exceeded;
>  
> -		if (sched_rt_runtime(rt_rq) != RUNTIME_INF) {
> +		if (sched_rt_runtime(rt_rq) != RUNTIME_INF &&
> +				!(curr->policy == SCHED_FIFO &&
> +					curr->flags & PF_IDLE)) {


FYI, this will not be a problem with dl server because play_idle_precise()
disables preemption, so the dl server will not be scheduled until preempt_enable().

with the DL server, the time consumed as an rt task will not change the DL server
behavior because the logic inverts: it provides bandwidth for the fair
scheduler (instead of throttling the RT sched).

-- Daniel

>  			raw_spin_lock(&rt_rq->rt_runtime_lock);
>  			rt_rq->rt_time += delta_exec;
>  			exceeded = sched_rt_runtime_exceeded(rt_rq);

Peter Zijlstra April 10, 2024, 8:54 a.m. UTC | #2

On Wed, Apr 10, 2024 at 10:24:15AM +0530, Atul Pant wrote:
> We are trying to implement a solution for thermal mitigation by using
> idle injection on CPUs.  However we face some limitations with the
> current idle-inject framework. As per our need, we want to start
> injecting idle cycles on a cpu for indefinite time (until the
> temperature/power of the CPU falls below a threshold). This will allow
> to keep the hot CPUs in the sleep state until we see improvement in
> temperature/power. If we set idle duration to a large value or have an
> idle-injection ratio of 100%,  then the idle-inject RT thread suffers
> from RT throttling. This results in the CPU exiting from the sleep state
> and consume some power.
> 
> To solve this limitation, we propose a solution to disable RT-throttling
> whenever idle-inject threads run. We achieve this by not accounting the
> runtime for the idle-inject threads.

Running RT tasks for indefinite amounts of time will wreck the system.
Things like workqueues and other per-cpu threads expect service or
things will pile up and run to ground.

Idle injection, just like every other RT user must not be able to starve
the system of service.

If your system design requires this (I would argue it is broken), look
at other means, like CPU-hotplug (which I also really detest) -- which
takes down the CPU in a controlled manner and avoids the resource
issues.

Peter Zijlstra April 10, 2024, 11:46 a.m. UTC | #3

On Wed, Apr 10, 2024 at 04:59:33PM +0530, Atul Kumar Pant wrote:

> Hi Peter,
> We are trying to add support for true 100% idle-injection ratio from
> idle-injection framework. 

Yeah, I got that. I'm saying that is broken. Both from a requirement POV
and an implementation POV.

If your hardware needs 100% idle injection that means CPU availability
is unreliable and everything that relies on CPU-masks will be broken.

Furthermore, since idle-injection is build on top of FIFO, anything with
a higher priority (DL, stop) will simply preempt it anyway.

And, as already mentioned, hogging the system with FIFO will break
things.

So I would *very* strongly urge you to push back on whoever thought this
is 'needed'. This is *bad* hardware.

> It might happen that we want to run idle cycles for
> slightly more time than permitted by RT-bandwidth control.

The thing is configurable for a reason.

> We understand the
> concern about it hogging the cpu. Will it be better if we make it a choice for
> the user who uses idle-inject framework, whether to have true 100%
> idle-injection support or not?

None of the above mentioned issues will magically go away. Run a 100%
FIFO task for an indeterminate amount of time and you get to keep the
pieces.

Also, we'll be replacing the throttling code with DL servers 'soonish'
at which point all this will stop working anyway, since DL will preempt
anything FIFO, including your idle injection crud.

Daniel Bristot de Oliveira April 10, 2024, 12:24 p.m. UTC | #4

On 4/10/24 13:46, Peter Zijlstra wrote:
> Also, we'll be replacing the throttling code with DL servers 'soonish'
> at which point all this will stop working anyway, since DL will preempt
> anything FIFO, including your idle injection crud.

+1

also, given that the code spins with preempt disabled, with dl server it could
even become a non-rt thread...

FIFO RUNNING
	DL_SERVER activates
		their loop
			disables preemption()
			run()
			enable preemption()
	DL_SERVE throttled
FIFO BACK

So, there will be no need for this busy loop to be RT.

Anyways, it breaks RT and DL if it keeps running for too long... It can
also cause complaints like RCU stalls and loong wait on locks, e.g., on
kworkers...

-- Daniel

Steven Rostedt April 10, 2024, 1:31 p.m. UTC | #5

On Wed, 10 Apr 2024 10:24:15 +0530
Atul Pant <quic_atulpant@quicinc.com> wrote:

> We are trying to implement a solution for thermal mitigation by using
> idle injection on CPUs.  However we face some limitations with the
> current idle-inject framework. As per our need, we want to start
> injecting idle cycles on a cpu for indefinite time (until the
> temperature/power of the CPU falls below a threshold). This will allow
> to keep the hot CPUs in the sleep state until we see improvement in
> temperature/power. If we set idle duration to a large value or have an
> idle-injection ratio of 100%,  then the idle-inject RT thread suffers
> from RT throttling. This results in the CPU exiting from the sleep state
> and consume some power.
> 
> To solve this limitation, we propose a solution to disable RT-throttling
> whenever idle-inject threads run. We achieve this by not accounting the
> runtime for the idle-inject threads.

I'm going to assume that when dl-server is finally accepted, this will no
longer be an issue for you?

  https://lore.kernel.org/all/cover.1699095159.git.bristot@kernel.org/

-- Steve

[RFC,0/2] Disable RT-throttling for idle-inject threads

Message

Comments