Message ID | 20240222-cdns-qspi-pm-fix-v4-0-6b6af8bcbf59@bootlin.com |
---|---|
Headers | show |
Series | spi: cadence-qspi: Fix runtime PM and system-wide suspend | expand |
On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote: > This fixes runtime PM and system-wide suspend for the cadence-qspi > driver. Seeing how runtime PM and autosuspend are enabled by default, I > believe this affects all users of the driver. > > This series has been tested on both Mobileye EyeQ5 hardware and the TI > J7200 EVM board, under s2idle. > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks commit: 959043afe53ae80633e810416cee6076da6e91c6 [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks commit: 078d62de433b4f4556bb676e5dd670f0d4103376 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote: > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks > > commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc > > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks > > commit: 959043afe53ae80633e810416cee6076da6e91c6 > > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names > > commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a > > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks > > commit: 078d62de433b4f4556bb676e5dd670f0d4103376 > It seems like between 6.8.0-rc5-next-20240220 and > 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken. Is this with some specific kernel configuration? > It particularly seemed related to these patches because we can see > cqspi_probe in the call trace and also cqspi_suspend toward the top. It would be useful to bisect which patch, there's only 4 of them... > See logs for kernel crash in [0] and working in [1] > [0] https://gist.github.com/DhruvaG2000/ed997452b41d6e5301598225fc579800 > [1] https://gist.github.com/DhruvaG2000/d4e73111aeafaca555ba2d5208deb6dd The relevant section from the failing log is: [ 1.516342] printk: legacy bootconsole [ns16550a0] disabled [ 1.533247] Unable to handle kernel paging request at virtual address 12800000340001b4 ... [ 1.709414] Call trace: [ 1.711852] __mutex_lock.constprop.0+0x84/0x540 [ 1.716460] __mutex_lock_slowpath+0x14/0x20 [ 1.720719] mutex_lock+0x48/0x54 [ 1.724026] spi_controller_suspend+0x30/0x7c [ 1.728377] cqspi_suspend+0x1c/0x6c [ 1.731944] pm_generic_runtime_suspend+0x2c/0x44 [ 1.736640] genpd_runtime_suspend+0xa8/0x254 (it's generally helpful to provide the most relevant section directly.) The issue here appears to be that we've registered for runtime suspend prior to registering the controller...
Hello Dhruva, On Mon Feb 26, 2024 at 1:18 PM CET, Dhruva Gole wrote: > Hi Mark, Theo, > > + Nishanth, Vignesh (maintainers of TI K3) > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > > On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote: > > > This fixes runtime PM and system-wide suspend for the cadence-qspi > > > driver. Seeing how runtime PM and autosuspend are enabled by default, I > > > believe this affects all users of the driver. > > > > > > This series has been tested on both Mobileye EyeQ5 hardware and the TI > > > J7200 EVM board, under s2idle. > > > > > > [...] > > > > Applied to > > > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next > > > > Thanks! > > > > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks > > commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc > > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks > > commit: 959043afe53ae80633e810416cee6076da6e91c6 > > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names > > commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a > > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks > > commit: 078d62de433b4f4556bb676e5dd670f0d4103376 > > It seems like between 6.8.0-rc5-next-20240220 and > 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken. > > It particularly seemed related to these patches because we can see > cqspi_probe in the call trace and also cqspi_suspend toward the top. > > See logs for kernel crash in [0] and working in [1] I'm guessing we are talking about tags next-20240220 and next-20240222 on: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ Neither of those tags include the patches about fixing PM hooks. ⟩ # next-20240220 ⟩ git log --oneline --author theo.lebrun 2d5c7b7eb345 \ drivers/spi/spi-cadence-quadspi.c ⟩ # next-20240222 ⟩ git log --oneline --author theo.lebrun e31185ce00a9 \ drivers/spi/spi-cadence-quadspi.c 0f3841a5e115 spi: cadence-qspi: report correct number of chip-select 7cc3522aedb5 spi: cadence-qspi: set maximum chip-select to 4 0d62c64a8e48 spi: cadence-qspi: assert each subnode flash CS is valid ⟩ # Those are unrelated patches. Also it shows from the calltrace: this series renames the runtime suspend/resume hooks to cqspi_runtime_* while the callstack you gave talks about cqspi_suspend. It only gets called at system-wide suspend following this series. My guess is that this series will rather fix the issue that you are now facing. :-) Could you try applying them and checking if that fixes your error? Regards, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote: > On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote: > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > [ 1.709414] Call trace: > [ 1.711852] __mutex_lock.constprop.0+0x84/0x540 > [ 1.716460] __mutex_lock_slowpath+0x14/0x20 > [ 1.720719] mutex_lock+0x48/0x54 > [ 1.724026] spi_controller_suspend+0x30/0x7c > [ 1.728377] cqspi_suspend+0x1c/0x6c > [ 1.731944] pm_generic_runtime_suspend+0x2c/0x44 > [ 1.736640] genpd_runtime_suspend+0xa8/0x254 > (it's generally helpful to provide the most relevant section directly.) > The issue here appears to be that we've registered for runtime suspend > prior to registering the controller... Actually, no - after this series cqspi_suspend() is the system not runtime PM operation and should not be called from runtime suspend. How is that happening?
Hello, On Mon Feb 26, 2024 at 2:40 PM CET, Mark Brown wrote: > On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote: > > On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote: > > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > > > [ 1.709414] Call trace: > > [ 1.711852] __mutex_lock.constprop.0+0x84/0x540 > > [ 1.716460] __mutex_lock_slowpath+0x14/0x20 > > [ 1.720719] mutex_lock+0x48/0x54 > > [ 1.724026] spi_controller_suspend+0x30/0x7c > > [ 1.728377] cqspi_suspend+0x1c/0x6c > > [ 1.731944] pm_generic_runtime_suspend+0x2c/0x44 > > [ 1.736640] genpd_runtime_suspend+0xa8/0x254 > > > (it's generally helpful to provide the most relevant section directly.) > > > The issue here appears to be that we've registered for runtime suspend > > prior to registering the controller... > > Actually, no - after this series cqspi_suspend() is the system not > runtime PM operation and should not be called from runtime suspend. How > is that happening? You might have seen my answer by now. This series is not in the tags quoted. I believe the memory corruption I fixed with this series is being encountered for the first time on TI hardware. They probably did not encounter it previously by luck. Regards, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ------------------------------------------------------------------------
Hi, On Feb 26, 2024 at 13:40:00 +0000, Mark Brown wrote: > On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote: > > On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote: > > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > > > [ 1.709414] Call trace: > > [ 1.711852] __mutex_lock.constprop.0+0x84/0x540 > > [ 1.716460] __mutex_lock_slowpath+0x14/0x20 > > [ 1.720719] mutex_lock+0x48/0x54 > > [ 1.724026] spi_controller_suspend+0x30/0x7c > > [ 1.728377] cqspi_suspend+0x1c/0x6c > > [ 1.731944] pm_generic_runtime_suspend+0x2c/0x44 > > [ 1.736640] genpd_runtime_suspend+0xa8/0x254 > > > (it's generally helpful to provide the most relevant section directly.) > > > The issue here appears to be that we've registered for runtime suspend > > prior to registering the controller... > > Actually, no - after this series cqspi_suspend() is the system not > runtime PM operation and should not be called from runtime suspend. How > is that happening? I tried dropping this entire series, it doesn't really solve the kernel boot issues. Also this particular stack dump isn't easily reproducible either. Perhaps this series may not be the rootcause, I will need some more time to see what's breaking boot for us. But for now this series seems to be in the clear. Will keep you posted if I find anything funny here. FYI- We're just using the arm64 defconfig and respective device DTs
Hi, On Feb 26, 2024 at 14:36:17 +0100, Théo Lebrun wrote: > Hello Dhruva, > > On Mon Feb 26, 2024 at 1:18 PM CET, Dhruva Gole wrote: > > Hi Mark, Theo, > > > > + Nishanth, Vignesh (maintainers of TI K3) > > > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote: > > > On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote: > > > > This fixes runtime PM and system-wide suspend for the cadence-qspi > > > > driver. Seeing how runtime PM and autosuspend are enabled by default, I > > > > believe this affects all users of the driver. > > > > > > > > This series has been tested on both Mobileye EyeQ5 hardware and the TI > > > > J7200 EVM board, under s2idle. > > > > > > > > [...] > > > > > > Applied to > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next > > > > > > Thanks! > > > > > > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks > > > commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc > > > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks > > > commit: 959043afe53ae80633e810416cee6076da6e91c6 > > > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names > > > commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a > > > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks > > > commit: 078d62de433b4f4556bb676e5dd670f0d4103376 > > > > It seems like between 6.8.0-rc5-next-20240220 and > > 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken. > > > > It particularly seemed related to these patches because we can see > > cqspi_probe in the call trace and also cqspi_suspend toward the top. > > > > See logs for kernel crash in [0] and working in [1] > > I'm guessing we are talking about tags next-20240220 and next-20240222 > on: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ > > Neither of those tags include the patches about fixing PM hooks. > > ⟩ # next-20240220 > ⟩ git log --oneline --author theo.lebrun 2d5c7b7eb345 \ > drivers/spi/spi-cadence-quadspi.c > > ⟩ # next-20240222 > ⟩ git log --oneline --author theo.lebrun e31185ce00a9 \ > drivers/spi/spi-cadence-quadspi.c > 0f3841a5e115 spi: cadence-qspi: report correct number of chip-select > 7cc3522aedb5 spi: cadence-qspi: set maximum chip-select to 4 > 0d62c64a8e48 spi: cadence-qspi: assert each subnode flash CS is valid > ⟩ # Those are unrelated patches. > > Also it shows from the calltrace: this series renames the runtime > suspend/resume hooks to cqspi_runtime_* while the callstack you gave > talks about cqspi_suspend. It only gets called at system-wide suspend > following this series. > > My guess is that this series will rather fix the issue that you are now > facing. :-) Could you try applying them and checking if that fixes your > error? Indeed, it seems like kernelci generated 22Feb and no future builds in our case hence we were not testing the -next with your patches applied. Please pardon the confusion. The boot logs are here with local linux build from 27 Feb -next: https://gist.github.com/DhruvaG2000/78ef6f2953b0940ef8ea38797f2ec6cb It does seem like these patches help us fix the previous regressions. Thanks for the fixes.
Hi, This fixes runtime PM and system-wide suspend for the cadence-qspi driver. Seeing how runtime PM and autosuspend are enabled by default, I believe this affects all users of the driver. This series has been tested on both Mobileye EyeQ5 hardware and the TI J7200 EVM board, under s2idle. Thanks all, Théo Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> --- Changes in v4: - Take Reviewed-by Dhruva Gole on patch 1/4. - Fix struct dev_pm_ops declaration to avoid -Wunused-function warning when CONFIG_PM_SLEEP=n. Replace SET_*_PM_OPS() by *_PM_OPS(). See kernel test robot warning: https://lore.kernel.org/oe-kbuild-all/202402221505.712Q7MSU-lkp@intel.com/ - Link to v3: https://lore.kernel.org/r/20240209-cdns-qspi-pm-fix-v3-0-540ac222f26b@bootlin.com Changes in v3: - Move both bugfix patches to the start of the series. - Remove Fixes: trailer from the function renaming patch. - Link to v2: https://lore.kernel.org/r/20240205-cdns-qspi-pm-fix-v2-0-2e7bbad49a46@bootlin.com Changes in v2: - Split the initial change into three separate commits, to make intents clearer. - Mark controller as suspended during the system-wide suspend. - Link to v1: https://lore.kernel.org/r/20240202-cdns-qspi-pm-fix-v1-1-3c8feb2bfdd8@bootlin.com --- Théo Lebrun (4): spi: cadence-qspi: fix pointer reference in runtime PM hooks spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks spi: cadence-qspi: put runtime in runtime PM hooks names spi: cadence-qspi: add system-wide suspend and resume callbacks drivers/spi/spi-cadence-quadspi.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) --- base-commit: 13acce918af915278e49980a3038df31845dbf39 change-id: 20240202-cdns-qspi-pm-fix-29600cc6d7bf Best regards,