Message ID | 20220617125750.728590-1-arnd@kernel.org |
---|---|
Headers | show |
Series | phase out CONFIG_VIRT_TO_BUS | expand |
Arnd, Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: > From: Arnd Bergmann <arnd@arndb.de> > > All architecture-independent users of virt_to_bus() and bus_to_virt() > have been fixed to use the dma mapping interfaces or have been > removed now. This means the definitions on most architectures, and the > CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. > > The only exceptions to this are a few network and scsi drivers for m68k > Amiga and VME machines and ppc32 Macintosh. These drivers work correctly > with the old interfaces and are probably not worth changing. The Amiga SCSI drivers are all old WD33C93 ones, and replacing virt_to_bus by virt_to_phys in the dma_setup() function there would cause no functional change at all. drivers/vme/bridges/vme_ca91cx42.c hasn't been used at all on m68k (it is a PCI-to-VME bridge chipset driver that would be needed on architectures that natively use a PCI bus). I haven't found anything that selects that driver, so not sure it is even still in use?? That would allow you to drop the remaining virt_to_bus define from arch/m68k/include/asm/virtconvert.h. I could submit a patch to convert the Amiga SCSI drivers to use virt_to_phys if Geert and the SCSI maintainers think it's worth the churn. 32bit powerpc is a different matter though. Cheers, Michael
On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: > > > > All architecture-independent users of virt_to_bus() and bus_to_virt() > > have been fixed to use the dma mapping interfaces or have been > > removed now. This means the definitions on most architectures, and the > > CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. > > > > The only exceptions to this are a few network and scsi drivers for m68k > > Amiga and VME machines and ppc32 Macintosh. These drivers work correctly > > with the old interfaces and are probably not worth changing. > > The Amiga SCSI drivers are all old WD33C93 ones, and replacing > virt_to_bus by virt_to_phys in the dma_setup() function there would > cause no functional change at all. Ok, thanks for taking a look here. > drivers/vme/bridges/vme_ca91cx42.c hasn't been used at all on m68k (it > is a PCI-to-VME bridge chipset driver that would be needed on > architectures that natively use a PCI bus). I haven't found anything > that selects that driver, so not sure it is even still in use?? It's gone now, Greg has already taken my patches for this through the staging tree. > That would allow you to drop the remaining virt_to_bus define from > arch/m68k/include/asm/virtconvert.h. > > I could submit a patch to convert the Amiga SCSI drivers to use > virt_to_phys if Geert and the SCSI maintainers think it's worth the churn. I don't think using virt_to_phys() is an improvement here, as virt_to_bus() was originally meant as a better abstraction to replace the use of virt_to_phys() to make drivers portable, before it got replaced by the dma-mapping interface in turn. It looks like the Amiga SCSI drivers have an open-coded version of what dma_map_single() does, to do bounce buffering and cache management. The ideal solution would be to convert the drivers actually use the appropriate dma-mapping interfaces and remove this custom code. The same could be done for the two vme drivers (scsi/mvme147.c and ethernet/82596.c), which do the cache management but apparently don't need swiotlb bounce buffering. Rewriting the drivers to modern APIs is of course non-trivial, and if you want a shortcut here, I would suggest introducing platform specific helpers similar to isa_virt_to_bus() and call them amiga_virt_to_bus() and vme_virt_to_bus, respectively. Putting these into a platform specific header file at least helps clarify that both the helper functions and the drivers using them are non-portable. > 32bit powerpc is a different matter though. It's similar, but unrelated. The two apple ethernet drivers (bmac and mace) can again either get changed to use the dma-mapping interfaces, or get a custom pmac_virt_to_bus()/ pmac_bus_to_virt() helper. There is also drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c, which I think just needs a trivial change, but I'm not sure how to do it correctly. Arnd
Arnd, Am 24.06.2022 um 21:10 schrieb Arnd Bergmann: > On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> wrote: >> Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: >>> >>> All architecture-independent users of virt_to_bus() and bus_to_virt() >>> have been fixed to use the dma mapping interfaces or have been >>> removed now. This means the definitions on most architectures, and the >>> CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. >>> >>> The only exceptions to this are a few network and scsi drivers for m68k >>> Amiga and VME machines and ppc32 Macintosh. These drivers work correctly >>> with the old interfaces and are probably not worth changing. >> >> The Amiga SCSI drivers are all old WD33C93 ones, and replacing >> virt_to_bus by virt_to_phys in the dma_setup() function there would >> cause no functional change at all. > > Ok, thanks for taking a look here. > >> drivers/vme/bridges/vme_ca91cx42.c hasn't been used at all on m68k (it >> is a PCI-to-VME bridge chipset driver that would be needed on >> architectures that natively use a PCI bus). I haven't found anything >> that selects that driver, so not sure it is even still in use?? > > It's gone now, Greg has already taken my patches for this through > the staging tree. One less to worry about, thanks. >> That would allow you to drop the remaining virt_to_bus define from >> arch/m68k/include/asm/virtconvert.h. >> >> I could submit a patch to convert the Amiga SCSI drivers to use >> virt_to_phys if Geert and the SCSI maintainers think it's worth the churn. > > I don't think using virt_to_phys() is an improvement here, as > virt_to_bus() was originally meant as a better abstraction to > replace the use of virt_to_phys() to make drivers portable, before > it got replaced by the dma-mapping interface in turn. > > It looks like the Amiga SCSI drivers have an open-coded version of > what dma_map_single() does, to do bounce buffering and cache > management. The ideal solution would be to convert the drivers > actually use the appropriate dma-mapping interfaces and remove > this custom code. I've taken another look at these drivers' dma_setup() functions and they all look much more complex than the Amiga ESP drivers (which do use the dma-mapping interface for parts of the DMA setup). From my limited understanding, the difference between the ESP and WD33C93 drivers is that the former are used on 040/060 accelerator boards only (where the processor does do bus snooping and DMA can access all of RAM). The latter ones would need cache management, could only use non-coherent mappings and would require special case handling for DMA-inaccessible RAM inside a device-specific dma ops' map_page() function. That's several bridges too far for me ... I have no Amiga hardware whatsoever, and know no one who could test changes to WD33C93 drivers for me. What I have is a NCR5380 with the proverbial 'pathological DMA' integration example (and its driver was never changed to even use virt_to_bus()!). I might learn enough about using the dma-mapping API on that one eventually (though the requirement for at least 1 MB swiotlb bounce buffers looks hard to meet), and use that to convert the WD33C93 drivers, but it would still remain untested. > The same could be done for the two vme drivers (scsi/mvme147.c > and ethernet/82596.c), which do the cache management but > apparently don't need swiotlb bounce buffering. > > Rewriting the drivers to modern APIs is of course non-trivial, > and if you want a shortcut here, I would suggest introducing > platform specific helpers similar to isa_virt_to_bus() and call > them amiga_virt_to_bus() and vme_virt_to_bus, respectively. I don't think Amiga and m68k VME differ at all in that respect, so might just call it m68k_virt_to_bus() for now. > Putting these into a platform specific header file at least helps > clarify that both the helper functions and the drivers using them > are non-portable. There are no platform specific header files other than asm/amigahw.h and asm/mvme147hw.h, currently only holding register address definitions. Would it be OK to add m68k_virt_to_bus() in there if it can't remain in asm/virtconvert.h, Geert? > >> 32bit powerpc is a different matter though. > > It's similar, but unrelated. The two apple ethernet drivers > (bmac and mace) can again either get changed to use the > dma-mapping interfaces, or get a custom pmac_virt_to_bus()/ > pmac_bus_to_virt() helper. Hmmm - I see Finn had done the DMA API conversion on macmace.c which might give some hints on what to do about mace.c ... no idea about bmac.c though. And again, haven't got hardware to test, so custom helpers is it, then. Cheers, Michael > There is also drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c, > which I think just needs a trivial change, but I'm not sure > how to do it correctly. > > Arnd >
(On Sun, Jun 26, 2022 at 7:21 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > > The same could be done for the two vme drivers (scsi/mvme147.c > > and ethernet/82596.c), which do the cache management but > > apparently don't need swiotlb bounce buffering. > > > > Rewriting the drivers to modern APIs is of course non-trivial, > > and if you want a shortcut here, I would suggest introducing > > platform specific helpers similar to isa_virt_to_bus() and call > > them amiga_virt_to_bus() and vme_virt_to_bus, respectively. > > I don't think Amiga and m68k VME differ at all in that respect, so might > just call it m68k_virt_to_bus() for now. > > > Putting these into a platform specific header file at least helps > > clarify that both the helper functions and the drivers using them > > are non-portable. > > There are no platform specific header files other than asm/amigahw.h and > asm/mvme147hw.h, currently only holding register address definitions. > Would it be OK to add m68k_virt_to_bus() in there if it can't remain in > asm/virtconvert.h, Geert? In that case, I would just leave it under the current name and not change m68k at all. I don't like the m68k_virt_to_bus() name because there is not anything CPU specific in what it does, and keeping it in a common header does nothing to prevent it from being used on other platforms either. > >> 32bit powerpc is a different matter though. > > > > It's similar, but unrelated. The two apple ethernet drivers > > (bmac and mace) can again either get changed to use the > > dma-mapping interfaces, or get a custom pmac_virt_to_bus()/ > > pmac_bus_to_virt() helper. > > Hmmm - I see Finn had done the DMA API conversion on macmace.c which > might give some hints on what to do about mace.c ... no idea about > bmac.c though. And again, haven't got hardware to test, so custom > helpers is it, then. Ok. Arnd
Arnd, Am 26.06.2022 um 20:36 schrieb Arnd Bergmann: >> There are no platform specific header files other than asm/amigahw.h and >> asm/mvme147hw.h, currently only holding register address definitions. >> Would it be OK to add m68k_virt_to_bus() in there if it can't remain in >> asm/virtconvert.h, Geert? > > In that case, I would just leave it under the current name and not change > m68k at all. I don't like the m68k_virt_to_bus() name because there is > not anything CPU specific in what it does, and keeping it in a common > header does nothing to prevent it from being used on other platforms > either. Fair enough. >>>> 32bit powerpc is a different matter though. >>> >>> It's similar, but unrelated. The two apple ethernet drivers >>> (bmac and mace) can again either get changed to use the >>> dma-mapping interfaces, or get a custom pmac_virt_to_bus()/ >>> pmac_bus_to_virt() helper. >> >> Hmmm - I see Finn had done the DMA API conversion on macmace.c which >> might give some hints on what to do about mace.c ... no idea about >> bmac.c though. And again, haven't got hardware to test, so custom >> helpers is it, then. > > Ok. Again, no platform specific headers to shift renamed helpers to, so may as well keep this as-is. Cheers, Michael > > Arnd >
Hi Michael, On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: > > From: Arnd Bergmann <arnd@arndb.de> > > > > All architecture-independent users of virt_to_bus() and bus_to_virt() > > have been fixed to use the dma mapping interfaces or have been > > removed now. This means the definitions on most architectures, and the > > CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. > > > > The only exceptions to this are a few network and scsi drivers for m68k > > Amiga and VME machines and ppc32 Macintosh. These drivers work correctly > > with the old interfaces and are probably not worth changing. > > The Amiga SCSI drivers are all old WD33C93 ones, and replacing > virt_to_bus by virt_to_phys in the dma_setup() function there would > cause no functional change at all. FTR, the sgiwd93 driver use dma_map_single(). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, On 27/06/22 20:26, Geert Uytterhoeven wrote: > Hi Michael, > > On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> wrote: >> Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: >>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> All architecture-independent users of virt_to_bus() and bus_to_virt() >>> have been fixed to use the dma mapping interfaces or have been >>> removed now. This means the definitions on most architectures, and the >>> CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. >>> >>> The only exceptions to this are a few network and scsi drivers for m68k >>> Amiga and VME machines and ppc32 Macintosh. These drivers work correctly >>> with the old interfaces and are probably not worth changing. >> The Amiga SCSI drivers are all old WD33C93 ones, and replacing >> virt_to_bus by virt_to_phys in the dma_setup() function there would >> cause no functional change at all. > FTR, the sgiwd93 driver use dma_map_single(). Thanks! From what I see, it doesn't have to deal with bounce buffers though? Cheers, Michael > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
Hii Geert Am 28.06.2022 um 09:12 schrieb Michael Schmitz: > Hi Geert, > > On 27/06/22 20:26, Geert Uytterhoeven wrote: >> Hi Michael, >> >> On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> >> wrote: >>> Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: >>>> From: Arnd Bergmann <arnd@arndb.de> >>>> >>>> All architecture-independent users of virt_to_bus() and bus_to_virt() >>>> have been fixed to use the dma mapping interfaces or have been >>>> removed now. This means the definitions on most architectures, and the >>>> CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. >>>> >>>> The only exceptions to this are a few network and scsi drivers for m68k >>>> Amiga and VME machines and ppc32 Macintosh. These drivers work >>>> correctly >>>> with the old interfaces and are probably not worth changing. >>> The Amiga SCSI drivers are all old WD33C93 ones, and replacing >>> virt_to_bus by virt_to_phys in the dma_setup() function there would >>> cause no functional change at all. >> FTR, the sgiwd93 driver use dma_map_single(). > > Thanks! From what I see, it doesn't have to deal with bounce buffers > though? Leaving the bounce buffer handling in place, and taking a few other liberties - this is what converting the easiest case (a3000 SCSI) might look like. Any obvious mistakes? The mvme147 driver would be very similar to handle (after conversion to a platform device). The driver allocates bounce buffers using kmalloc if it hits an unaligned data buffer - can such buffers still even happen these days? If I understand dma_map_single() correctly, the resulting dma handle would be equally misaligned? To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() even though AFAIU memory used for DMA buffers generally isn't consistent on m68k? Thinking ahead to the other two Amiga drivers - I wonder whether allocating a static bounce buffer or a DMA pool at driver init is likely to succeed if the kernel runs from the low 16 MB RAM chunk? It certainly won't succeed if the kernel runs from a higher memory address, so the present bounce buffer logic around amiga_chip_alloc() might still need to be used here. Leaves the question whether converting the gvp11 and a2091 drivers is actually worth it, if bounce buffers still have to be handled explicitly. Untested (except for compile testing), un-checkpatched, don't try this on any disk with valuable data ... Cheers, Michael
Hi Michael, On Tue, Jun 28, 2022 at 5:26 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > Am 28.06.2022 um 09:12 schrieb Michael Schmitz: > > On 27/06/22 20:26, Geert Uytterhoeven wrote: > >> On Sat, Jun 18, 2022 at 3:06 AM Michael Schmitz <schmitzmic@gmail.com> > >> wrote: > >>> Am 18.06.2022 um 00:57 schrieb Arnd Bergmann: > >>>> From: Arnd Bergmann <arnd@arndb.de> > >>>> > >>>> All architecture-independent users of virt_to_bus() and bus_to_virt() > >>>> have been fixed to use the dma mapping interfaces or have been > >>>> removed now. This means the definitions on most architectures, and the > >>>> CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. > >>>> > >>>> The only exceptions to this are a few network and scsi drivers for m68k > >>>> Amiga and VME machines and ppc32 Macintosh. These drivers work > >>>> correctly > >>>> with the old interfaces and are probably not worth changing. > >>> The Amiga SCSI drivers are all old WD33C93 ones, and replacing > >>> virt_to_bus by virt_to_phys in the dma_setup() function there would > >>> cause no functional change at all. > >> FTR, the sgiwd93 driver use dma_map_single(). > > > > Thanks! From what I see, it doesn't have to deal with bounce buffers > > though? > > Leaving the bounce buffer handling in place, and taking a few other > liberties - this is what converting the easiest case (a3000 SCSI) might > look like. Any obvious mistakes? The mvme147 driver would be very > similar to handle (after conversion to a platform device). Thanks, looks reasonable. > The driver allocates bounce buffers using kmalloc if it hits an > unaligned data buffer - can such buffers still even happen these days? No idea. > If I understand dma_map_single() correctly, the resulting dma handle > would be equally misaligned? > > To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() > even though AFAIU memory used for DMA buffers generally isn't consistent > on m68k? > > Thinking ahead to the other two Amiga drivers - I wonder whether > allocating a static bounce buffer or a DMA pool at driver init is likely > to succeed if the kernel runs from the low 16 MB RAM chunk? It certainly > won't succeed if the kernel runs from a higher memory address, so the > present bounce buffer logic around amiga_chip_alloc() might still need > to be used here. > > Leaves the question whether converting the gvp11 and a2091 drivers is > actually worth it, if bounce buffers still have to be handled explicitly. A2091 should be straight-forward, as A3000 is basically A2091 on the motherboard (comparing the two drivers, looks like someone's been sprinkling mb()s over the A3000 driver). I don't have any of these SCSI host adapters (not counting the A590 (~A2091) expansion of the old A500, which is not Linux-capable, and hasn't been powered on for 20 years). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Tue, Jun 28, 2022 at 5:25 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > Am 28.06.2022 um 09:12 schrieb Michael Schmitz: > > Leaving the bounce buffer handling in place, and taking a few other > liberties - this is what converting the easiest case (a3000 SCSI) might > look like. Any obvious mistakes? The mvme147 driver would be very > similar to handle (after conversion to a platform device). > > The driver allocates bounce buffers using kmalloc if it hits an > unaligned data buffer - can such buffers still even happen these days? > If I understand dma_map_single() correctly, the resulting dma handle > would be equally misaligned? > > To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() > even though AFAIU memory used for DMA buffers generally isn't consistent > on m68k? I think it makes sense to skip the bounce buffering as you do here: the only standardized way we have for integrating that part is to use the swiotlb infrastructure, but as you mentioned earlier that part is probably too resource-heavy here for Amiga. I see two other problems with your patch though: a) you still duplicate the cache handling: the cache_clear()/cache_push() is supposed to already be done by dma_map_single() when the device is not cache-coherent. b) The bounce buffer is never mapped here, instead you have the virt_to_phys() here, which is not the same. I think you need to map the pointer that actually gets passed down to the device after deciding to use a bouce buffer or not. Arnd
Hi Geert, On 28/06/22 19:03, Geert Uytterhoeven wrote: > >> Leaving the bounce buffer handling in place, and taking a few other >> liberties - this is what converting the easiest case (a3000 SCSI) might >> look like. Any obvious mistakes? The mvme147 driver would be very >> similar to handle (after conversion to a platform device). > Thanks, looks reasonable. Thanks, I'll take care of Arnd's comments and post a corrected version later. >> The driver allocates bounce buffers using kmalloc if it hits an >> unaligned data buffer - can such buffers still even happen these days? > No idea. Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this code path is still being used. > >> If I understand dma_map_single() correctly, the resulting dma handle >> would be equally misaligned? >> >> To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() >> even though AFAIU memory used for DMA buffers generally isn't consistent >> on m68k? >> >> Thinking ahead to the other two Amiga drivers - I wonder whether >> allocating a static bounce buffer or a DMA pool at driver init is likely >> to succeed if the kernel runs from the low 16 MB RAM chunk? It certainly >> won't succeed if the kernel runs from a higher memory address, so the >> present bounce buffer logic around amiga_chip_alloc() might still need >> to be used here. >> >> Leaves the question whether converting the gvp11 and a2091 drivers is >> actually worth it, if bounce buffers still have to be handled explicitly. > A2091 should be straight-forward, as A3000 is basically A2091 on the > motherboard (comparing the two drivers, looks like someone's been > sprinkling mb()s over the A3000 driver). Yep, and at least the ones in the dma_setup() function are there for no reason (the compiler won't reorder stores around the cache flush calls, I hope?). Just leaves the 24 bit DMA mask there (and likely need for bounce buffers). > I don't have any of these SCSI host adapters (not counting the A590 > (~A2091) expansion of the old A500, which is not Linux-capable, and > hasn't been powered on for 20 years). I wonder whether kullervo has survived - that one was an A3000. Should have gone to Adrian a few years ago... Cheers, Michael > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
Hi Arnd, On 28/06/22 19:08, Arnd Bergmann wrote: > On Tue, Jun 28, 2022 at 5:25 AM Michael Schmitz <schmitzmic@gmail.com> wrote: >> Am 28.06.2022 um 09:12 schrieb Michael Schmitz: >> >> Leaving the bounce buffer handling in place, and taking a few other >> liberties - this is what converting the easiest case (a3000 SCSI) might >> look like. Any obvious mistakes? The mvme147 driver would be very >> similar to handle (after conversion to a platform device). >> >> The driver allocates bounce buffers using kmalloc if it hits an >> unaligned data buffer - can such buffers still even happen these days? >> If I understand dma_map_single() correctly, the resulting dma handle >> would be equally misaligned? >> >> To allocate a bounce buffer, would it be OK to use dma_alloc_coherent() >> even though AFAIU memory used for DMA buffers generally isn't consistent >> on m68k? > I think it makes sense to skip the bounce buffering as you do here: > the only standardized way we have for integrating that part is to > use the swiotlb infrastructure, but as you mentioned earlier that > part is probably too resource-heavy here for Amiga. OK, leaving the old custom logic in place allows to convert the 24 bit DMA drivers more easily. > > I see two other problems with your patch though: > > a) you still duplicate the cache handling: the cache_clear()/cache_push() > is supposed to already be done by dma_map_single() when the device > is not cache-coherent. That's one of the 'liberties' I alluded to. The reason I left these in is that I'm none too certain what device feature the DMA API uses to decide a device isn't cache-coherent. If it's dev->coherent_dma_mask, the way I set up the device in the a3000 driver should leave the coherent mask unchanged. For the Zorro drivers, devices are set up to use the same storage to store normal and coherent masks - something we most likely want to change. I need to think about the ramifications of that. Note that zorro_esp.c uses dma_sync_single_for_device() and uses a 32 bit coherent DMA mask which does work OK. I might ask Adrian to test a change to only set dev->dma_mask, and drop the dma_sync_single_for_device() calls if there's any doubt on this aspect. > b) The bounce buffer is never mapped here, instead you have the > virt_to_phys() here, which is not the same. I think you need to map > the pointer that actually gets passed down to the device after deciding > to use a bouce buffer or not. I hadn't realized that I can map the bounce buffer just as it's done for the SCp data buffer. Should have been obvious, but I'm still learning about the DMA API. I've updated the patch now, will re-send as part of a complete series once done. Cheers, Michael > > Arnd
On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz <schmitzmic@gmail.com> wrote: > On 28/06/22 19:03, Geert Uytterhoeven wrote: > >> The driver allocates bounce buffers using kmalloc if it hits an > >> unaligned data buffer - can such buffers still even happen these days? > > No idea. > Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this > code path is still being used. kmalloc() guarantees alignment to the next power-of-two size or KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it is cacheline aligned. Arnd
On Tue, Jun 28, 2022 at 11:38 PM Michael Schmitz <schmitzmic@gmail.com> wrote: > On 28/06/22 19:08, Arnd Bergmann wrote: > > I see two other problems with your patch though: > > > > a) you still duplicate the cache handling: the cache_clear()/cache_push() > > is supposed to already be done by dma_map_single() when the device > > is not cache-coherent. > > That's one of the 'liberties' I alluded to. The reason I left these in > is that I'm none too certain what device feature the DMA API uses to > decide a device isn't cache-coherent. If it's dev->coherent_dma_mask, > the way I set up the device in the a3000 driver should leave the > coherent mask unchanged. For the Zorro drivers, devices are set up to > use the same storage to store normal and coherent masks - something we > most likely want to change. I need to think about the ramifications of > that. > > Note that zorro_esp.c uses dma_sync_single_for_device() and uses a 32 > bit coherent DMA mask which does work OK. I might ask Adrian to test a > change to only set dev->dma_mask, and drop the > dma_sync_single_for_device() calls if there's any doubt on this aspect. The "coherent_mask" is independent of the cache flushing. On some architectures, a device can indicate whether it needs cache management or not to guarantee coherency, but on m68k it appears that we always assume it does, see arch/m68k/kernel/dma.c > > b) The bounce buffer is never mapped here, instead you have the > > virt_to_phys() here, which is not the same. I think you need to map > > the pointer that actually gets passed down to the device after deciding > > to use a bouce buffer or not. > > I hadn't realized that I can map the bounce buffer just as it's done for > the SCp data buffer. Should have been obvious, but I'm still learning > about the DMA API. > > I've updated the patch now, will re-send as part of a complete series > once done. I suppose you can just drop the bounce buffer if this just comes from kmalloc(). Arnd
Hi Arnd, On 29/06/22 09:50, Arnd Bergmann wrote: > On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz <schmitzmic@gmail.com> wrote: >> On 28/06/22 19:03, Geert Uytterhoeven wrote: >>>> The driver allocates bounce buffers using kmalloc if it hits an >>>> unaligned data buffer - can such buffers still even happen these days? >>> No idea. >> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this >> code path is still being used. > kmalloc() guarantees alignment to the next power-of-two size or > KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it > is cacheline aligned. And all SCSI buffers are allocated using kmalloc? No way at all for user space to pass unaligned data? (SCSI is a weird beast - I have used a SCSI DAT tape driver many many years ago, which broke all sorts of assumptions about transfer block sizes ... but that might actually have been in the v0.99 days, many rewrites of SCSI midlevel ago). Just being cautious, as getting any of this tested will be a stretch. Cheers, Michael > > Arnd
Hi Arnd, On 29/06/22 09:55, Arnd Bergmann wrote: > On Tue, Jun 28, 2022 at 11:38 PM Michael Schmitz <schmitzmic@gmail.com> wrote: >> On 28/06/22 19:08, Arnd Bergmann wrote: >>> I see two other problems with your patch though: >>> >>> a) you still duplicate the cache handling: the cache_clear()/cache_push() >>> is supposed to already be done by dma_map_single() when the device >>> is not cache-coherent. >> That's one of the 'liberties' I alluded to. The reason I left these in >> is that I'm none too certain what device feature the DMA API uses to >> decide a device isn't cache-coherent. If it's dev->coherent_dma_mask, >> the way I set up the device in the a3000 driver should leave the >> coherent mask unchanged. For the Zorro drivers, devices are set up to >> use the same storage to store normal and coherent masks - something we >> most likely want to change. I need to think about the ramifications of >> that. >> >> Note that zorro_esp.c uses dma_sync_single_for_device() and uses a 32 >> bit coherent DMA mask which does work OK. I might ask Adrian to test a >> change to only set dev->dma_mask, and drop the >> dma_sync_single_for_device() calls if there's any doubt on this aspect. > The "coherent_mask" is independent of the cache flushing. On some > architectures, a device can indicate whether it needs cache management > or not to guarantee coherency, but on m68k it appears that we always > assume it does, see arch/m68k/kernel/dma.c Thanks - what I see there indicates that on the relevant platforms, pages mapped for DMA have their page table cache bits modified to make them non-cacheable (and I suppose unmapping restores the default cache bits). That means I should use dma_set_mask_and_coherent() here to take advantage of this, and no need to mess around with dma_sync_single_for_device() in the drivers' dma_setup() functions. >>> b) The bounce buffer is never mapped here, instead you have the >>> virt_to_phys() here, which is not the same. I think you need to map >>> the pointer that actually gets passed down to the device after deciding >>> to use a bouce buffer or not. >> I hadn't realized that I can map the bounce buffer just as it's done for >> the SCp data buffer. Should have been obvious, but I'm still learning >> about the DMA API. >> >> I've updated the patch now, will re-send as part of a complete series >> once done. > I suppose you can just drop the bounce buffer if this just comes > from kmalloc(). That's only true for a3000 and mvme147 though. Cheers, Michael > > Arnd
On 6/28/22 16:09, Michael Schmitz wrote: > On 29/06/22 09:50, Arnd Bergmann wrote: >> On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz >> <schmitzmic@gmail.com> wrote: >>> On 28/06/22 19:03, Geert Uytterhoeven wrote: >>>>> The driver allocates bounce buffers using kmalloc if it hits an >>>>> unaligned data buffer - can such buffers still even happen these days? >>>> No idea. >>> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this >>> code path is still being used. >> kmalloc() guarantees alignment to the next power-of-two size or >> KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it >> is cacheline aligned. > > And all SCSI buffers are allocated using kmalloc? No way at all for user > space to pass unaligned data? > > (SCSI is a weird beast - I have used a SCSI DAT tape driver many many > years ago, which broke all sorts of assumptions about transfer block > sizes ... but that might actually have been in the v0.99 days, many > rewrites of SCSI midlevel ago). > > Just being cautious, as getting any of this tested will be a stretch. An example of a user space application that passes an SG I/O data buffer to the kernel that is aligned to a four byte boundary but not to an eight byte boundary if the -s (scattered) command line option is used: https://github.com/osandov/blktests/blob/master/src/discontiguous-io.cpp Bart.
Hi Bart, On 29/06/22 11:50, Bart Van Assche wrote: > On 6/28/22 16:09, Michael Schmitz wrote: >> On 29/06/22 09:50, Arnd Bergmann wrote: >>> On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz >>> <schmitzmic@gmail.com> wrote: >>>> On 28/06/22 19:03, Geert Uytterhoeven wrote: >>>>>> The driver allocates bounce buffers using kmalloc if it hits an >>>>>> unaligned data buffer - can such buffers still even happen these >>>>>> days? >>>>> No idea. >>>> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether >>>> this >>>> code path is still being used. >>> kmalloc() guarantees alignment to the next power-of-two size or >>> KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it >>> is cacheline aligned. >> >> And all SCSI buffers are allocated using kmalloc? No way at all for >> user space to pass unaligned data? >> >> (SCSI is a weird beast - I have used a SCSI DAT tape driver many many >> years ago, which broke all sorts of assumptions about transfer block >> sizes ... but that might actually have been in the v0.99 days, many >> rewrites of SCSI midlevel ago). >> >> Just being cautious, as getting any of this tested will be a stretch. > > An example of a user space application that passes an SG I/O data > buffer to the kernel that is aligned to a four byte boundary but not > to an eight byte boundary if the -s (scattered) command line option is > used: > https://github.com/osandov/blktests/blob/master/src/discontiguous-io.cpp Thanks - four byte alignment actually wouldn't be an issue for me. It's two byte or smaller that would trip up the SCSI DMA. While I'm sure such an even more pathological test case could be written, I was rather worried about st.c and sr.c input ... Cheers, Michael > > Bart.
Hi Bart, On 29/06/22 12:01, Michael Schmitz wrote: > >> An example of a user space application that passes an SG I/O data >> buffer to the kernel that is aligned to a four byte boundary but not >> to an eight byte boundary if the -s (scattered) command line option >> is used: >> https://github.com/osandov/blktests/blob/master/src/discontiguous-io.cpp > > Thanks - four byte alignment actually wouldn't be an issue for me. > It's two byte or smaller that would trip up the SCSI DMA. > > While I'm sure such an even more pathological test case could be > written, I was rather worried about st.c and sr.c input ... Nevermind - I just see m68k defines ARCH_DMA_MINALIGN to be four bytes. Should be safe for all that matters, then. Cheers, Michael
On Wed, Jun 29, 2022 at 11:09:00AM +1200, Michael Schmitz wrote: > And all SCSI buffers are allocated using kmalloc? No way at all for user > space to pass unaligned data? Most that you will see actually comes from the page allocator. But the block layer has a dma_alignment limit, and when userspace sends I/O that is not properly aligned it will be bounce buffered before it it sent to the driver.
On Wed, Jun 29, 2022 at 09:38:00AM +1200, Michael Schmitz wrote: > That's one of the 'liberties' I alluded to. The reason I left these in is > that I'm none too certain what device feature the DMA API uses to decide a > device isn't cache-coherent. The DMA API does not look at device features at all. It needs to be told so by the platform code. Once an architecture implements the hooks to support non-coherent DMA all devices are treated as non-coherent by default unless overriden by the architecture either globally (using the global dma_default_coherent variable) or per-device (using the dev->dma_coherent field, usually set by arch_setup_dma_ops). > If it's dev->coherent_dma_mask, the way I set > up the device in the a3000 driver should leave the coherent mask unchanged. > For the Zorro drivers, devices are set up to use the same storage to store > normal and coherent masks - something we most likely want to change. I need > to think about the ramifications of that. No, the coherent mask is slightly misnamed amd not actually related.
From: Michael Schmitz > Sent: 29 June 2022 00:09 > > Hi Arnd, > > On 29/06/22 09:50, Arnd Bergmann wrote: > > On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz <schmitzmic@gmail.com> wrote: > >> On 28/06/22 19:03, Geert Uytterhoeven wrote: > >>>> The driver allocates bounce buffers using kmalloc if it hits an > >>>> unaligned data buffer - can such buffers still even happen these days? > >>> No idea. > >> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this > >> code path is still being used. > > kmalloc() guarantees alignment to the next power-of-two size or > > KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it > > is cacheline aligned. > > And all SCSI buffers are allocated using kmalloc? No way at all for user > space to pass unaligned data? I didn't think kmalloc() gave any such guarantee about alignment. There are cache-line alignment requirements on systems with non-coherent dma, but otherwise the alignment can be much smaller. One of the allocators adds a header to each item, IIRC that can lead to 'unexpected' alignments - especially on m68k. dma_alloc_coherent() does align to next 'power of 2'. And sometimes you need (eg) 16k allocates that are 16k aligned. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Le 30/06/2022 à 10:04, David Laight a écrit : > From: Michael Schmitz >> Sent: 29 June 2022 00:09 >> >> Hi Arnd, >> >> On 29/06/22 09:50, Arnd Bergmann wrote: >>> On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz <schmitzmic@gmail.com> wrote: >>>> On 28/06/22 19:03, Geert Uytterhoeven wrote: >>>>>> The driver allocates bounce buffers using kmalloc if it hits an >>>>>> unaligned data buffer - can such buffers still even happen these days? >>>>> No idea. >>>> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this >>>> code path is still being used. >>> kmalloc() guarantees alignment to the next power-of-two size or >>> KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it >>> is cacheline aligned. >> >> And all SCSI buffers are allocated using kmalloc? No way at all for user >> space to pass unaligned data? > > I didn't think kmalloc() gave any such guarantee about alignment. I does since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") Christophe > There are cache-line alignment requirements on systems with non-coherent > dma, but otherwise the alignment can be much smaller. > > One of the allocators adds a header to each item, IIRC that can > lead to 'unexpected' alignments - especially on m68k. > > dma_alloc_coherent() does align to next 'power of 2'. > And sometimes you need (eg) 16k allocates that are 16k aligned. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
From: Christophe Leroy > Sent: 30 June 2022 10:40 > > Le 30/06/2022 à 10:04, David Laight a écrit : > > From: Michael Schmitz > >> Sent: 29 June 2022 00:09 > >> > >> Hi Arnd, > >> > >> On 29/06/22 09:50, Arnd Bergmann wrote: > >>> On Tue, Jun 28, 2022 at 11:03 PM Michael Schmitz <schmitzmic@gmail.com> wrote: > >>>> On 28/06/22 19:03, Geert Uytterhoeven wrote: > >>>>>> The driver allocates bounce buffers using kmalloc if it hits an > >>>>>> unaligned data buffer - can such buffers still even happen these days? > >>>>> No idea. > >>>> Hmmm - I think I'll stick a WARN_ONCE() in there so we know whether this > >>>> code path is still being used. > >>> kmalloc() guarantees alignment to the next power-of-two size or > >>> KMALLOC_MIN_ALIGN, whichever is bigger. On m68k this means it > >>> is cacheline aligned. > >> > >> And all SCSI buffers are allocated using kmalloc? No way at all for user > >> space to pass unaligned data? > > > > I didn't think kmalloc() gave any such guarantee about alignment. > > I does since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural > alignment for kmalloc(power-of-two)") Looks like it is done for 'power-of-two' less than PAGE_SIZE. This may not help scsi tape writes which could easily be (say) 47 bytes. I think that only guarantees 2 byte alignment on m68k. (Although increasing the min-alignment on m68k to 4 (or even 8) will probably make no measurable difference.) What happens above PAGE_SIZE? Any structure with a trailing [] field could easily request '64k + a_bit' bytes. You don't really want to extend this to 128k - but I suspect that is what happens. David > > Christophe > > > There are cache-line alignment requirements on systems with non-coherent > > dma, but otherwise the alignment can be much smaller. > > > > One of the allocators adds a header to each item, IIRC that can > > lead to 'unexpected' alignments - especially on m68k. > > > > dma_alloc_coherent() does align to next 'power of 2'. > > And sometimes you need (eg) 16k allocates that are 16k aligned. > > > > David > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > > Registration No: 1397386 (Wales) - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi Christoph, On 29/06/22 18:21, Christoph Hellwig wrote: > On Wed, Jun 29, 2022 at 11:09:00AM +1200, Michael Schmitz wrote: >> And all SCSI buffers are allocated using kmalloc? No way at all for user >> space to pass unaligned data? > Most that you will see actually comes from the page allocator. But > the block layer has a dma_alignment limit, and when userspace sends > I/O that is not properly aligned it will be bounce buffered before > it it sent to the driver. That limit is set to L1_CACHE_BYTES on m68k so we're good here. Thanks, Michael
Hi Christoph, On 29/06/22 18:25, Christoph Hellwig wrote: > On Wed, Jun 29, 2022 at 09:38:00AM +1200, Michael Schmitz wrote: >> That's one of the 'liberties' I alluded to. The reason I left these in is >> that I'm none too certain what device feature the DMA API uses to decide a >> device isn't cache-coherent. > The DMA API does not look at device features at all. It needs to be > told so by the platform code. Once an architecture implements the > hooks to support non-coherent DMA all devices are treated as > non-coherent by default unless overriden by the architecture either > globally (using the global dma_default_coherent variable) or per-device > (using the dev->dma_coherent field, usually set by arch_setup_dma_ops). Haven't got any of that, so non-coherent DMA is all we can use (even though some of the RAM used for bounce buffers may actually be coherent due to the page table cache bits). > >> If it's dev->coherent_dma_mask, the way I set >> up the device in the a3000 driver should leave the coherent mask unchanged. >> For the Zorro drivers, devices are set up to use the same storage to store >> normal and coherent masks - something we most likely want to change. I need >> to think about the ramifications of that. > No, the coherent mask is slightly misnamed amd not actually related. Thanks, that had me confused. Cheers, Michael
From: Arnd Bergmann <arnd@arndb.de> The virt_to_bus/bus_to_virt interface has been deprecated for decades. After Jakub Kicinski put a lot of work into cleaning out the network drivers using them, there are only a couple of other drivers left, which can all be removed or otherwise cleaned up, to remove the old interface for good. Any out of tree drivers using virt_to_bus() should be converted to using the dma-mapping interfaces, typically dma_alloc_coherent() or dma_map_single()). There are a few m68k and ppc32 specific drivers that keep using the interfaces, but these are all guarded with architecture-specific Kconfig dependencies, and are not actually broken. There are still a number of drivers that are using virt_to_phys() and phys_to_virt() in place of dma-mapping operations, and these are often broken, but they are out of scope for this series. I would like the first two patches to either get merged through the SCSI tree, or get an Ack from the SCSI maintainers so I can merge them through the asm-generic tree Arnd --- Changes since v1: - dropped VME patches that are already in staging-next - dropped media patch that gets merged independently - added a networking patch and dropped it again after it got merged - replace BusLogic removal with a workaround Cc: Jakub Kicinski <kuba@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> # dma-mapping Cc: Marek Szyprowski <m.szyprowski@samsung.com> # dma-mapping Cc: Robin Murphy <robin.murphy@arm.com> # dma-mapping Cc: iommu@lists.linux-foundation.org Cc: Khalid Aziz <khalid@gonehiking.org> # buslogic Cc: Maciej W. Rozycki <macro@orcam.me.uk> # buslogic Cc: Matt Wang <wwentao@vmware.com> # buslogic Cc: Miquel van Smoorenburg <mikevs@xs4all.net> # dpt_i2o Cc: Mark Salyzyn <salyzyn@android.com> # dpt_i2o Cc: linux-scsi@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arch@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-parisc@vger.kernel.org Cc: Denis Efremov <efremov@linux.com> # floppy Arnd Bergmann (3): scsi: dpt_i2o: drop stale VIRT_TO_BUS dependency scsi: BusLogic remove bus_to_virt arch/*/: remove CONFIG_VIRT_TO_BUS .../core-api/bus-virt-phys-mapping.rst | 220 ------------------ Documentation/core-api/dma-api-howto.rst | 14 -- Documentation/core-api/index.rst | 1 - .../translations/zh_CN/core-api/index.rst | 1 - arch/alpha/Kconfig | 1 - arch/alpha/include/asm/floppy.h | 2 +- arch/alpha/include/asm/io.h | 8 +- arch/ia64/Kconfig | 1 - arch/ia64/include/asm/io.h | 8 - arch/m68k/Kconfig | 1 - arch/m68k/include/asm/virtconvert.h | 4 +- arch/microblaze/Kconfig | 1 - arch/microblaze/include/asm/io.h | 2 - arch/mips/Kconfig | 1 - arch/mips/include/asm/io.h | 9 - arch/parisc/Kconfig | 1 - arch/parisc/include/asm/floppy.h | 4 +- arch/parisc/include/asm/io.h | 2 - arch/powerpc/Kconfig | 1 - arch/powerpc/include/asm/io.h | 2 - arch/riscv/include/asm/page.h | 1 - arch/x86/Kconfig | 1 - arch/x86/include/asm/io.h | 9 - arch/xtensa/Kconfig | 1 - arch/xtensa/include/asm/io.h | 3 - drivers/scsi/BusLogic.c | 27 ++- drivers/scsi/Kconfig | 4 +- drivers/scsi/dpt_i2o.c | 4 +- include/asm-generic/io.h | 14 -- mm/Kconfig | 8 - 30 files changed, 30 insertions(+), 326 deletions(-) delete mode 100644 Documentation/core-api/bus-virt-phys-mapping.rst