From patchwork Tue Apr 5 01:07:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 558483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CF4BC433EF for ; Tue, 5 Apr 2022 01:59:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229580AbiDECBk (ORCPT ); Mon, 4 Apr 2022 22:01:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229546AbiDECBj (ORCPT ); Mon, 4 Apr 2022 22:01:39 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF48C462B01; Mon, 4 Apr 2022 18:18:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 30A9060DC6; Tue, 5 Apr 2022 01:08:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 31008C2BBE4; Tue, 5 Apr 2022 01:08:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649120887; bh=Jpz+Jv4Xaf7lFDh2SyosScKUc58hWmkQlsew08yht+o=; h=From:To:Cc:Subject:Date:From; b=gXub8KIAxJdu63/6MTrPAGneJXnd33dv3jqT5/0InvZyfMvDdsjypiJF0mto6q6Cv YWCf8Tb872r+2Rl4K9dpQUcB0YJk+XoyfGUjkRiSq1wT7U65peaU4UbicSrtviTo8W bKrnRxxCVyVNqiaycQ0H+MvUz/I08bwaZjFP+K+YB68g0lBYcBZPCap2uO3Um6Rr1H 31i+hM6JhR63/nOoS0nS25pbmjnnod8+BTvHDo0btkMBCDBxLEzm+X8BmKfzrn5mVr hP8iJuyoWGAAyVcn2uyRCX0dpaOgNLWjJX73pEa9FAxZ6EIuT9Gx0YdUddFU/a11Oz lO9XgGO2Tr/5Q== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , linux-rt-users , Mel Gorman , Sebastian Andrzej Siewior , Thomas Gleixner Subject: [RT][PATCH 1/2] rcutorture: Also force sched priority to timersd on boosting test Date: Tue, 5 Apr 2022 03:07:51 +0200 Message-Id: <20220405010752.1347437-1-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org ksoftirqd is statically boosted to the priority level right above the one of rcu_torture_boost() so that timers, which torture readers rely on, get a chance to run while rcu_torture_boost() is polling. However timers processing got split from ksoftirqd into their own kthread (timersd) that isn't boosted. It has the same SCHED_FIFO low prio as rcu_torture_boost() and therefore timers can't preempt it and may starve. The issue can be triggered in practice on v5.17.1-rt17 using: ./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y" triggering the following: rcu-torture: rcu_torture_boost boosting failed Boost inversion thread ->rt_priority 1 gp_state 12908 jiffies 3501 rcu: rcu_preempt: wait state: RCU_GP_WAIT_FQS(5) ->state: 0x402 ->rt_priority 2 delta ->gp_start 3502 ->gp_activity 3502 ->gp_req_activity 3502 ->gp_wake_time 3502 ->gp_wake_seq 12897 ->gp_seq 12901 ->gp_seq_needed 12904 ->gp_max 4057 ->gp_flags 0x0 rcu: rcu_node 0:7 ->gp_seq 12901 ->gp_seq_needed 12904 ->qsmask 0x2 .... ->n_boosts 0 rcu: rcu_node 0:3 ->gp_seq 12901 ->gp_seq_needed 12908 ->qsmask 0x0 b... ->n_boosts 22 rcu: cpu 0 ->gp_seq_needed 12904 rcu: rcu_node 4:7 ->gp_seq 12901 ->gp_seq_needed 12908 ->qsmask 0x0 b..G ->n_boosts 6 rcu: cpu 5 ->gp_seq_needed 12904 rcu: CB 1^0->2 KbclSW F5089 L5089 C1 ..... q0 S CPU 0 rcu: CB 2^0->3 KbclSW F16859 L16859 C1 ..... q0 S CPU 0 rcu: CB 3^0->-1 KbclSW F3502 L3502 C2 .W12908.N. q4 S CPU 0 rcu: nocb GP 4 KldtS .[W.] .G:12904 rnp 4:7 1459 S CPU 0 rcu: CB 4^4->5 KbclSW F9918 L10534 C4 ..... q0 S CPU 0 rcu: CB 5^4->6 KbclSW F2550 L2550 C2 .W12904.N. q8 S CPU 0 rcu: CB 6^4->7 KbclSW F6639 L7253 C0 ..... q0 S CPU 0 rcu: CB 7^4->-1 KbclSW F30685 L30685 C1 ..... q0 S CPU 0 rcu: RCU callbacks invoked since boot: 236075 rcu_tasks: RTGS_WAIT_CBS(11) since 409853 g:0 i:0/0 k. rcu_tasks_rude: RTGS_WAIT_CBS(11) since 409851 g:0 i:0/0 k. rcu_tasks_trace: RTGS_WAIT_CBS(11) since 409848 g:0 i:0/0 k. N0 h:0/0/0 Boost inversion: GP 12908 still pending. rcu-torture: rtc: 0000000093327eb2 ver: 1504 tfle: 0 rta: 1505 rtaf: 0 rtf: 1493 rtmbe: 0 rtmbkf: 0/782 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 1 rtb: 7 nt: 66220 onoff: 6/6:11/12 40,763:29,5050 1537:11766 (HZ=1000) barrier: 239/240:0 read-exits: 67 nocb-toggles: 0:0 rcu-torture: !!! ------------[ cut here ]------------ WARNING: CPU: 5 PID: 121 at kernel/rcu/rcutorture.c:1870 rcu_torture_stats_print+0x5ad/0x610 Modules linked in: CPU: 5 PID: 121 Comm: rcu_torture_sta Not tainted 5.17.1-rt17 #63 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) RIP: 0010:rcu_torture_stats_print+0x5ad/0x610 Code: c7 01 02 00 74 02 0f 0b 48 83 3d 35 c7 01 02 00 74 02 0f 0b 48 83 3d 21 c7 01 02 00 74 02 0f 0b 48 83 3d 0d c7 01 02 00 74 02 <0f> 0b 83 eb 01 0f 8e ba fc ff ff 0f 0b e9 b3 fc ff ff 8b 05 8f c6 RSP: 0018:ffffa7560041fdf8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 40000000ffffe735 RDI: 00000000ffffffff RBP: 00000000000005d5 R08: ffffffffaf65f6e0 R09: ffffffffafee3a7c R10: ffffa75600427d58 R11: 0000000020212121 R12: ffffa7560041fe20 R13: ffffa7560041fe78 R14: 0000000000000000 R15: 000000000001bd20 FS: 0000000000000000(0000) GS:ffff9fe7df540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000014216000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? lock_timer_base+0x62/0x80 ? rcu_torture_stats_print+0x610/0x610 rcu_torture_stats+0x29/0x70 kthread+0x118/0x130 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Fix this with statically boosting timersd just like is done with ksoftirqd. Suggested-by: Mel Gorman Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- include/linux/interrupt.h | 1 + kernel/rcu/rcutorture.c | 6 ++++++ kernel/softirq.c | 2 +- 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index db4a23d07a57..e4b8a04e67ce 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -606,6 +606,7 @@ extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); #ifdef CONFIG_PREEMPT_RT +DECLARE_PER_CPU(struct task_struct *, timersd); extern void raise_timer_softirq(void); extern void raise_hrtimer_softirq(void); diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 422f7e4cc08d..ac9715ed58ba 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -3283,6 +3283,12 @@ rcu_torture_init(void) WARN_ON_ONCE(!t); sp.sched_priority = 2; sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); +#ifdef CONFIG_PREEMPT_RT + t = per_cpu(timersd, cpu); + WARN_ON_ONCE(!t); + sp.sched_priority = 2; + sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); +#endif } } } diff --git a/kernel/softirq.c b/kernel/softirq.c index 22948c2109f5..89eb45614af6 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -624,7 +624,7 @@ static inline void tick_irq_exit(void) #endif } -static DEFINE_PER_CPU(struct task_struct *, timersd); +DEFINE_PER_CPU(struct task_struct *, timersd); static DEFINE_PER_CPU(unsigned long, pending_timer_softirq); static unsigned int local_pending_timers(void) From patchwork Tue Apr 5 01:07:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 556433 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A3BEC433F5 for ; Tue, 5 Apr 2022 02:16:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbiDECSc (ORCPT ); Mon, 4 Apr 2022 22:18:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229589AbiDECSb (ORCPT ); Mon, 4 Apr 2022 22:18:31 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4B7317A2C2; Mon, 4 Apr 2022 18:13:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id EFF8CCE1B74; Tue, 5 Apr 2022 01:08:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0C437C340EE; Tue, 5 Apr 2022 01:08:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649120889; bh=kq/b6VF1wMSV/+Puxxy8XdTehugTZhED97PZCHZBYu0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g9jgIrVBhksHbL9ZdPmZqBn9XVWsVHN0AzHOic0nz+08MsWNDHZA+E2mTTUL1++GK ms3B+Nh+FntxGjBJcXAug6OwaenRhP2kxCw568lmBJMOfyUwzBPZimPP5olaQKdIom a/OoaAa/bKQGCpzifUxA87T/GwRkXJuv80aSk1zdjUaiW4kFyqZ5xYs3E9+KCaR/bJ t3E5F5SRD1x2Zzm73n/gygIqufGkT+zsEO0X7R8PdUWK2zkf85uM5YKC2cm7MYsi3J nqEbzyILtq1Mdl7w93gI9TPscugaMBy6Fl+AVn5nhQPHrj2g12l3jUsWS1d9pkHzJx 7gsQa/nNMqRXQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , linux-rt-users , Mel Gorman , Sebastian Andrzej Siewior , Thomas Gleixner Subject: [RT][PATCH 2/2] tick: Fix timer storm since introduction of timersd Date: Tue, 5 Apr 2022 03:07:52 +0200 Message-Id: <20220405010752.1347437-2-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220405010752.1347437-1-frederic@kernel.org> References: <20220405010752.1347437-1-frederic@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org If timers are pending while the tick is reprogrammed on nohz_mode, the next expiry is not armed to fire now, it is delayed one jiffy forward instead so as not to raise an inextinguishable timer storm with such scenario: 1) IRQ triggers and queue a timer 2) ksoftirqd() is woken up 3) IRQ tail: timer is reprogrammed to fire now 4) IRQ exit 5) TIMER interrupt 6) goto 3) ...all that until we finally reach ksoftirqd. Unfortunately we are checking the wrong softirq vector bitmask since timersd kthread has split from ksoftirqd. Timers now have their own vector state field that must be checked separately. As a result, the old timer storm is back. This shows up early on boot with extremely long initcalls: [ 333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs and the cause is uncovered with the right trace events showing just 10 microseconds between ticks (~100 000 Hz): swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608 swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082 swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550 swapper/-1 1dn.h111 60818611us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415515013 swapper/-1 1dn.h111 60818620us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415524483 swapper/-1 1dn.h111 60818630us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415533949 swapper/-1 1dn.h111 60818639us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415543426 swapper/-1 1dn.h111 60818649us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415553061 swapper/-1 1dn.h111 60818658us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415562511 Fix this with checking the right timer vector state from the nohz code. Signed-off-by: Frederic Weisbecker Cc: Mel Gorman Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner --- include/linux/interrupt.h | 12 ++++++++++++ kernel/softirq.c | 7 +------ kernel/time/tick-sched.c | 2 +- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index e4b8a04e67ce..da248458f4d9 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -607,9 +607,16 @@ extern void raise_softirq(unsigned int nr); #ifdef CONFIG_PREEMPT_RT DECLARE_PER_CPU(struct task_struct *, timersd); +DECLARE_PER_CPU(unsigned long, pending_timer_softirq); + extern void raise_timer_softirq(void); extern void raise_hrtimer_softirq(void); +static inline unsigned int local_pending_timers(void) +{ + return __this_cpu_read(pending_timer_softirq); +} + #else static inline void raise_timer_softirq(void) { @@ -620,6 +627,11 @@ static inline void raise_hrtimer_softirq(void) { raise_softirq_irqoff(HRTIMER_SOFTIRQ); } + +static inline unsigned int local_pending_timers(void) +{ + return local_softirq_pending(); +} #endif DECLARE_PER_CPU(struct task_struct *, ksoftirqd); diff --git a/kernel/softirq.c b/kernel/softirq.c index 89eb45614af6..c0aef5f760e5 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -625,12 +625,7 @@ static inline void tick_irq_exit(void) } DEFINE_PER_CPU(struct task_struct *, timersd); -static DEFINE_PER_CPU(unsigned long, pending_timer_softirq); - -static unsigned int local_pending_timers(void) -{ - return __this_cpu_read(pending_timer_softirq); -} +DEFINE_PER_CPU(unsigned long, pending_timer_softirq); static void wake_timersd(void) { diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 17a283ce2b20..7c359f029b97 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -763,7 +763,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & BIT(TIMER_SOFTIRQ); + return local_pending_timers() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)