mbox series

[v2,0/5] spi: add support for pre-cooking messages

Message ID 20240219-mainline-spi-precook-message-v2-0-4a762c6701b9@baylibre.com
Headers show
Series spi: add support for pre-cooking messages | expand

Message

David Lechner Feb. 19, 2024, 10:33 p.m. UTC
This is a follow-up to [1] where it was suggested to break down the
proposed SPI offload support into smaller series.

This takes on the first suggested task of introducing an API to
"pre-cook" SPI messages. This idea was first discussed extensively in
2013 [2][3] and revisited more briefly 2022 [4].

The goal here is to be able to improve performance (higher throughput,
and reduced CPU usage) by allowing peripheral drivers that use the
same struct spi_message repeatedly to "pre-cook" the message once to
avoid repeating the same validation, and possibly other operations each
time the message is sent.

This series includes __spi_validate() and the automatic splitting of
xfers in the optimizations. Another frequently suggested optimization
is doing DMA mapping only once. This is not included in this series, but
can be added later (preferably by someone with a real use case for it).

To show how this all works and get some real-world measurements, this
series includes the core changes, optimization of a SPI controller
driver, and optimization of an ADC driver. This test case was only able
to take advantage of the single validation optimization, since it didn't
require splitting transfers. With these changes, CPU usage of the
threaded interrupt handler, which calls spi_sync(), was reduced from
83% to 73% while at the same time the sample rate (frequency of SPI
xfers) was increased from 20kHz to 25kHz.

[1]: https://lore.kernel.org/linux-spi/20240109-axi-spi-engine-series-3-v1-1-e42c6a986580@baylibre.com/T/
[2]: https://lore.kernel.org/linux-spi/E81F4810-48DD-41EE-B110-D0D848B8A510@martin.sperl.org/T/
[3]: https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/T/
[4]: https://lore.kernel.org/linux-spi/20220525163946.48ea40c9@erd992/T/

---
Changes in v2:
- Removed pre_optimized parameter from __spi_optimize_message()
- Added comment explaining purpose of pre_optimized flag
- Fixed missing doc comment for @pre_optimized
- Removed kernel doc inclusion (/** -> /*) from static members
- Removed unrelated comment about calling spi_finalize_current_message()
- Reworked IIO driver patch
- Link to v1: https://lore.kernel.org/r/20240212-mainline-spi-precook-message-v1-0-a2373cd72d36@baylibre.com

---
David Lechner (5):
      spi: add spi_optimize_message() APIs
      spi: move splitting transfers to spi_optimize_message()
      spi: stm32: move splitting transfers to optimize_message
      spi: axi-spi-engine: move message compile to optimize_message
      iio: adc: ad7380: use spi_optimize_message()

 drivers/iio/adc/ad7380.c         |  36 +++++-
 drivers/spi/spi-axi-spi-engine.c |  40 +++---
 drivers/spi/spi-stm32.c          |  28 +++--
 drivers/spi/spi.c                | 259 ++++++++++++++++++++++++++++++++-------
 include/linux/spi/spi.h          |  20 +++
 5 files changed, 297 insertions(+), 86 deletions(-)
---
base-commit: 55072343f1df834879b8bae9e419cd5cbb5f3259
prerequisite-patch-id: 844c06b6caf25a2724e130dfa7999dc90dd26fde
change-id: 20240208-mainline-spi-precook-message-189b2f08ba7f

Comments

Mark Brown Feb. 26, 2024, 7:16 p.m. UTC | #1
On Mon, 19 Feb 2024 16:33:17 -0600, David Lechner wrote:
> This is a follow-up to [1] where it was suggested to break down the
> proposed SPI offload support into smaller series.
> 
> This takes on the first suggested task of introducing an API to
> "pre-cook" SPI messages. This idea was first discussed extensively in
> 2013 [2][3] and revisited more briefly 2022 [4].
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/5] spi: add spi_optimize_message() APIs
      commit: 7b1d87af14d9ae902ed0c5dc5fabf4eea5abdf02
[2/5] spi: move splitting transfers to spi_optimize_message()
      commit: fab53fea21a909e4e0656764a8ee7c356fe89d6f
[3/5] spi: stm32: move splitting transfers to optimize_message
      commit: c2bcfe7c6edf418d5adf731a7d60a8abd81e952f
[4/5] spi: axi-spi-engine: move message compile to optimize_message
      commit: 7dba2adb063bcf7a293eacb88980e0975b1fb1fd

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Oleksij Rempel June 28, 2024, 9:49 a.m. UTC | #2
Hi David,

On Mon, Feb 19, 2024 at 04:33:17PM -0600, David Lechner wrote:
> This is a follow-up to [1] where it was suggested to break down the
> proposed SPI offload support into smaller series.
> 
> This takes on the first suggested task of introducing an API to
> "pre-cook" SPI messages. This idea was first discussed extensively in
> 2013 [2][3] and revisited more briefly 2022 [4].
> 
> The goal here is to be able to improve performance (higher throughput,
> and reduced CPU usage) by allowing peripheral drivers that use the
> same struct spi_message repeatedly to "pre-cook" the message once to
> avoid repeating the same validation, and possibly other operations each
> time the message is sent.
> 
> This series includes __spi_validate() and the automatic splitting of
> xfers in the optimizations. Another frequently suggested optimization
> is doing DMA mapping only once. This is not included in this series, but
> can be added later (preferably by someone with a real use case for it).
> 
> To show how this all works and get some real-world measurements, this
> series includes the core changes, optimization of a SPI controller
> driver, and optimization of an ADC driver. This test case was only able
> to take advantage of the single validation optimization, since it didn't
> require splitting transfers. With these changes, CPU usage of the
> threaded interrupt handler, which calls spi_sync(), was reduced from
> 83% to 73% while at the same time the sample rate (frequency of SPI
> xfers) was increased from 20kHz to 25kHz.
> 
> [1]: https://lore.kernel.org/linux-spi/20240109-axi-spi-engine-series-3-v1-1-e42c6a986580@baylibre.com/T/
> [2]: https://lore.kernel.org/linux-spi/E81F4810-48DD-41EE-B110-D0D848B8A510@martin.sperl.org/T/
> [3]: https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/T/
> [4]: https://lore.kernel.org/linux-spi/20220525163946.48ea40c9@erd992/T/

I have a regression after this patch set. My system explodes with
following trace:
 Unable to handle kernel paging request at virtual address 0064f69bfd319200
 Mem abort info:                                  
   ESR = 0x0000000086000004
   EC = 0x21: IABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 [0064f69bfd319200] address between user and kernel address ranges
 Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP
 Modules linked in: lan78xx etnaviv crct10dif_ce gpu_sched fsl_imx8_ddr_perf caam ina2xx ina3221 rtc_snvs error imx8mm_thermal imx_cpufreq_dt fuse
 CPU: 2 PID: 103 Comm: spi2 Tainted: G        W          6.8.0-rc3-00118-g82b98fb8cd33 #36
 Hardware name: Philips i.MX8MPlus Delta Evaluation Kit (DT)
 pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : 0x64f69bfd319200
 lr : spi_res_release+0x68/0xb8
 sp : ffff800083a6bcb0
 x29: ffff800083a6bcb0 x28: ffff800081906008 x27: ffff8000809feea8
 x26: 0000000000000001 x25: dead000000000122 x24: dead000000000100
 x23: ffff800083c7ba38 x22: ffff0000c3ca5000 x21: ffff800083c7b9e0
 x20: ffff800083c7ba40 x19: ffff800083c7bb48 x18: 0020000000000000
 x17: 0000000000901400 x16: 00000000b0891400 x15: 049d200e412d4100
 x14: 00000090fff6c1f4 x13: 000000280b412d41 x12: 000ed3ddde0a5202
 x11: 6e0194029343039e x10: 00000000000014c0 x9 : ffff8000809fd5dc
 x8 : ffff800081906008 x7 : ffff0000c3c9c41c x6 : ffff8000800f34b0
 x5 : 0000000000000000 x4 : ffff800083a6bab8 x3 : 8f64f69bfd319200
 x2 : ffff800083c7bb60 x1 : ffff800083c7b9e0 x0 : ffff0000c3ca5000
 Call trace:
  0x64f69bfd319200
  spi_async+0xac/0x110
9H0[]6 spi_mux_transfer_one_message+0xb8/0xf0
  __spi_pump_transfer_message+0x26c/0x650
  __spi_pump_messages+0xe0/0x228
  spi_pump_messages+0x20/0x38
  kthread_worker_fn+0xe0/0x2a8
  kthread+0x10c/0x120
  ret_from_fork+0x10/0x20
 Code: ???????? ???????? ???????? ???????? (????????) 
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: disabled
 CPU features: 0x0,00000008,00020000,2100421b
 Memory Limit: none
 Rebooting in 10 seconds..

It seems to be spi_mux specific. We have seen similar trace on other system
with spi_mux.

Regards,
Oleksij
Mark Brown June 28, 2024, 12:17 p.m. UTC | #3
On Fri, Jun 28, 2024 at 11:49:38AM +0200, Oleksij Rempel wrote:
> On Mon, Feb 19, 2024 at 04:33:17PM -0600, David Lechner wrote:

> I have a regression after this patch set. My system explodes with
> following trace:

Not looked at this at all but were you able to try a bisect?

>  Call trace:
>   0x64f69bfd319200
>   spi_async+0xac/0x110
> 9H0[]6 spi_mux_transfer_one_message+0xb8/0xf0

Could you run this through addr2line please?
Oleksij Rempel June 28, 2024, 5:47 p.m. UTC | #4
Hi David,

On Fri, Jun 28, 2024 at 10:27:28AM -0500, David Lechner wrote:
> Hi Oleksij and Marc,
> 
> I'm supposed to be on vacation so I didn't look into this deeply yet
> but I can see what is happening here.
> 
> spi_mux_transfer_one_message() is calling spi_async() which is calling
> __spi_optimize_message() on an already optimized message.
> 
> Then it also calls __spi_unoptimize_message() which tries to release
> resources. But this fails because the spi-mux driver has swapped
> out the pointer to the device in the SPI message. This causes the
> wrong ctlr to be passed to spi_res_release(), causing the crash.
> 
> I don't know if a proper fix could be quite so simple, but here is
> something you could try (untested):
 
Thx! I'll test it at Monday.

Regards,
Oleksij
Oleksij Rempel July 1, 2024, 6:55 a.m. UTC | #5
On Fri, Jun 28, 2024 at 10:27:28AM -0500, David Lechner wrote:
> On 6/28/24 5:16 AM, Marc Kleine-Budde wrote:
> > On 28.06.2024 11:49:38, Oleksij Rempel wrote:
> >> It seems to be spi_mux specific. We have seen similar trace on other system
> >> with spi_mux.
> > 
> > Here is the other backtrace from another imx8mp system with a completely
> > different workload. Both have in common that they use a spi_mux on the
> > spi-imx driver.
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000dd0
> > Mem abort info:
> >   ESR = 0x0000000096000004
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x04: level 0 translation fault
> > Data abort info:
> >   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046760000
> > [0000000000000dd0] pgd=0000000000000000, p4d=0000000000000000
> > Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > Modules linked in: can_raw can ti_ads7950 industrialio_triggered_buffer kfifo_buf spi_mux fsl_imx8_ddr_perf at24 flexcan caam can_dev error rtc_snvs imx8mm_thermal spi_imx capture_events_irq cfg80211 iio_trig_hrtimer industrialio_sw_trigger ind>
> > CPU: 3 PID: 177 Comm: spi5 Not tainted 6.9.0 #1
> > Hardware name: xxxxxxxxxxxxxxxx (xxxxxxxxx) (DT)
> > pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : spi_res_release+0x24/0xb8
> > lr : spi_async+0xac/0x118
> > sp : ffff8000823fbcc0
> > x29: ffff8000823fbcc0 x28: 0000000000000000 x27: 0000000000000000
> > x26: ffff8000807bef88 x25: ffff80008115c008 x24: 0000000000000000
> > x23: ffff8000826c3938 x22: 0000000000000000 x21: ffff0000076a9800
> > x20: 0000000000000000 x19: 0000000000000dc8 x18: 0000000000000000
> > x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff88c0e760
> > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > x11: ffff8000815a1f98 x10: ffff8000823fbb40 x9 : ffff8000807b8420
> > x8 : ffff800081508000 x7 : 0000000000000004 x6 : 0000000003ce4c66
> > x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000001000000
> > x2 : 0000000000000000 x1 : ffff8000826c38e0 x0 : ffff0000076a9800
> > Call trace:
> >  spi_res_release+0x24/0xb8
> >  spi_async+0xac/0x118
> >  spi_mux_transfer_one_message+0xb8/0xf0 [spi_mux]
> >  __spi_pump_transfer_message+0x260/0x5d8
> >  __spi_pump_messages+0xdc/0x320
> >  spi_pump_messages+0x20/0x38
> >  kthread_worker_fn+0xdc/0x220
> >  kthread+0x118/0x128
> >  ret_from_fork+0x10/0x20
> > Code: a90153f3 a90363f7 91016037 f9403033 (f9400674) 
> > ---[ end trace 0000000000000000 ]---
> > 
> > regards,
> > Marc
> > 
> 
> 
> Hi Oleksij and Marc,
> 
> I'm supposed to be on vacation so I didn't look into this deeply yet
> but I can see what is happening here.
> 
> spi_mux_transfer_one_message() is calling spi_async() which is calling
> __spi_optimize_message() on an already optimized message.
> 
> Then it also calls __spi_unoptimize_message() which tries to release
> resources. But this fails because the spi-mux driver has swapped
> out the pointer to the device in the SPI message. This causes the
> wrong ctlr to be passed to spi_res_release(), causing the crash.
> 
> I don't know if a proper fix could be quite so simple, but here is
> something you could try (untested):

This issue is still reproducible with following trace:
[   19.566433] Call trace:
[   19.568882]  spi_async+0x90/0x118
[   19.572204]  spi_mux_transfer_one_message+0xd4/0x110
[   19.577175]  __spi_pump_transfer_message+0x2bc/0x6e8
[   19.582149]  __spi_pump_messages+0xe0/0x228
[   19.586339]  spi_pump_messages+0x20/0x38
[   19.590271]  kthread_worker_fn+0xe0/0x2e8
[   19.594286]  kthread+0x10c/0x120
[   19.597518]  ret_from_fork+0x10/0x20

addr2line for spi_async+0x90/0x118 -> drivers/spi/spi.c:2136
  2132 static void __spi_unoptimize_message(struct spi_message *msg)         
  2133 {                                                                     
  2134         struct spi_controller *ctlr = msg->spi->controller;
  2135
  2136         if (ctlr->unoptimize_message)
  2137                ctlr->unoptimize_message(msg);
  2138
  2139         spi_res_release(ctlr, msg);                                   
  2140
  2141         msg->optimized = false;
  2142         msg->opt_state = NULL;
  2143 }

Regards,
Oleksij