Message ID | 20221219151503.385816-1-krzysztof.kozlowski@linaro.org |
---|---|
Headers | show |
Series | PM: Fixes for Realtime systems | expand |
Hi Krzysztof, Thanks for looking into this! I tested your patchset on the QDrive3 on a CentOS Stream 9 RT kernel (I couldn't test it on mainline because the latest RT patchset only supports 6.1 which is missing some bits needed to boot QDrive3). It fixes the PSCI cpuidle issue I was encountering in [1]. However, I may have found another code path that triggers a similar issue: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 113, name: kworker/4:2 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by kworker/4:2/113: #0: ffff09b0c2376928 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0 #1: ffff800008bf3dd0 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0 #2: ffff09b0c2e44860 (&genpd->rslock){....}-{2:2}, at: genpd_lock_rawspin+0x20/0x30 #3: ffff09b0c6696a20 (&dev->power.lock){+.+.}-{2:2}, at: dev_pm_qos_flags+0x2c/0x60 irq event stamp: 170 hardirqs last enabled at (169): [<ffffa1be822f8a78>] _raw_spin_unlock_irq+0x48/0xc4 hardirqs last disabled at (170): [<ffffa1be822f8df4>] _raw_spin_lock_irqsave+0xb0/0xfc softirqs last enabled at (0): [<ffffa1be814cfff0>] copy_process+0x68c/0x1500 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffa1be81d7e620>] genpd_lock_rawspin+0x20/0x30 CPU: 4 PID: 113 Comm: kworker/4:2 Tainted: G X --------- --- 5.14.0-rt14+ #2 Hardware name: Qualcomm SA8540 ADP (DT) Workqueue: pm genpd_power_off_work_fn Call trace: dump_backtrace+0xb4/0x12c show_stack+0x1c/0x70 dump_stack_lvl+0x98/0xd0 dump_stack+0x14/0x2c __might_resched+0x180/0x220 rt_spin_lock+0x74/0x11c dev_pm_qos_flags+0x2c/0x60 genpd_power_off.part.0.isra.0+0xac/0x2d0 genpd_power_off_work_fn+0x68/0x8c process_one_work+0x2b8/0x7c0 worker_thread+0x15c/0x44c kthread+0xf8/0x104 ret_from_fork+0x10/0x20 This happens consistently during boot. But on the mainline kernel, this code path has changed: genpd_power_off no longer calls dev_pm_qos_flags. So it might not happen on mainline. I hope to be able to test your patchset again soon on mainline with the next version of the RT patchset (which should be able to boot the QDrive3). Best, Adrien [1] https://lore.kernel.org/all/20220615203605.1068453-1-athierry@redhat.com/
On Tue, 20 Dec 2022 at 22:36, Adrien Thierry <athierry@redhat.com> wrote: > > Hi Krzysztof, > Thanks for looking into this! > > I tested your patchset on the QDrive3 on a CentOS Stream 9 RT kernel (I > couldn't test it on mainline because the latest RT patchset only supports > 6.1 which is missing some bits needed to boot QDrive3). > > It fixes the PSCI cpuidle issue I was encountering in [1]. However, I may > have found another code path that triggers a similar issue: > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 > in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 113, name: kworker/4:2 > preempt_count: 1, expected: 0 > RCU nest depth: 0, expected: 0 > 4 locks held by kworker/4:2/113: > #0: ffff09b0c2376928 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0 > #1: ffff800008bf3dd0 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x1f4/0x7c0 > #2: ffff09b0c2e44860 (&genpd->rslock){....}-{2:2}, at: genpd_lock_rawspin+0x20/0x30 > #3: ffff09b0c6696a20 (&dev->power.lock){+.+.}-{2:2}, at: dev_pm_qos_flags+0x2c/0x60 > irq event stamp: 170 > hardirqs last enabled at (169): [<ffffa1be822f8a78>] _raw_spin_unlock_irq+0x48/0xc4 > hardirqs last disabled at (170): [<ffffa1be822f8df4>] _raw_spin_lock_irqsave+0xb0/0xfc > softirqs last enabled at (0): [<ffffa1be814cfff0>] copy_process+0x68c/0x1500 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > Preemption disabled at: > [<ffffa1be81d7e620>] genpd_lock_rawspin+0x20/0x30 > CPU: 4 PID: 113 Comm: kworker/4:2 Tainted: G X --------- --- 5.14.0-rt14+ #2 > Hardware name: Qualcomm SA8540 ADP (DT) > Workqueue: pm genpd_power_off_work_fn > Call trace: > dump_backtrace+0xb4/0x12c > show_stack+0x1c/0x70 > dump_stack_lvl+0x98/0xd0 > dump_stack+0x14/0x2c > __might_resched+0x180/0x220 > rt_spin_lock+0x74/0x11c > dev_pm_qos_flags+0x2c/0x60 > genpd_power_off.part.0.isra.0+0xac/0x2d0 > genpd_power_off_work_fn+0x68/0x8c > process_one_work+0x2b8/0x7c0 > worker_thread+0x15c/0x44c > kthread+0xf8/0x104 > ret_from_fork+0x10/0x20 > > This happens consistently during boot. But on the mainline kernel, this > code path has changed: genpd_power_off no longer calls dev_pm_qos_flags. > So it might not happen on mainline. I hope to be able to test your > patchset again soon on mainline with the next version of the RT patchset > (which should be able to boot the QDrive3). You are right, since commit 3f9ee7da724a ("PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd") dev_pm_qos_flags() doesn't get called in genpd_power_off() anymore. That patch was introduced in v5.19. > > Best, > Adrien > > [1] https://lore.kernel.org/all/20220615203605.1068453-1-athierry@redhat.com/ > Kind regards Uffe