Message ID | 20190211174544.4302-2-will.deacon@arm.com |
---|---|
State | New |
Headers | show |
Series | Ensure inX() is ordered wrt delay() routines | expand |
On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote: > > The inX() I/O accessors must enforce ordering against subsequent calls > to the delay() routines, so that a read-back from a device can be used > to postpone a subsequent write to the same device. > > On some architectures, including arm64, this ordering can only be > achieved by creating a dependency on the value returned by the inX() > operation, so we need to pass the value we read to the __io_par() > macro in this case. > > Reported-by: Andrew Murray <andrew.murray@arm.com> > Signed-off-by: Will Deacon <will.deacon@arm.com> > --- > include/asm-generic/io.h | 8 ++++---- For changing the asm-generic file in the arm64 tree, Acked-by: Arnd Bergmann <arnd@arndb.de> For all I can see, this should not conflict with the usage of the same macros on RISC-V, though it does make add a significant difference, so I'd like to see an Ack from the RISC-V folks as well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h to do a corresponding change. Arnd
On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote: > On Mon, Feb 11, 2019 at 6:45 PM Will Deacon <will.deacon@arm.com> wrote: > > > > The inX() I/O accessors must enforce ordering against subsequent calls > > to the delay() routines, so that a read-back from a device can be used > > to postpone a subsequent write to the same device. > > > > On some architectures, including arm64, this ordering can only be > > achieved by creating a dependency on the value returned by the inX() > > operation, so we need to pass the value we read to the __io_par() > > macro in this case. > > > > Reported-by: Andrew Murray <andrew.murray@arm.com> > > Signed-off-by: Will Deacon <will.deacon@arm.com> > > --- > > include/asm-generic/io.h | 8 ++++---- > > For changing the asm-generic file in the arm64 tree, > > Acked-by: Arnd Bergmann <arnd@arndb.de> Thanks, Arnd. > For all I can see, this should not conflict with the usage of the > same macros on RISC-V, though it does make add a significant > difference, so I'd like to see an Ack from the RISC-V folks as > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h > to do a corresponding change. There's already a comment in that header which says that the accesses are ordered wrt timer reads, so I don't think anything needs to change there. For consistency with the macro arguments, I could augment their __io_par to take the read value as an unused argument, if that's what you mean? Will
On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote: > On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote: > > > For all I can see, this should not conflict with the usage of the > > same macros on RISC-V, though it does make add a significant > > difference, so I'd like to see an Ack from the RISC-V folks as > > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h > > to do a corresponding change. > > There's already a comment in that header which says that the accesses are > ordered wrt timer reads, so I don't think anything needs to change there. > For consistency with the macro arguments, I could augment their __io_par to > take the read value as an unused argument, if that's what you mean? Yes, that's what I meant, I should have been clearer there. Arnd
On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote: > On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote: > >> On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote: >> >> > For all I can see, this should not conflict with the usage of the >> > same macros on RISC-V, though it does make add a significant >> > difference, so I'd like to see an Ack from the RISC-V folks as >> > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h >> > to do a corresponding change. Thanks, the original patches didn't make it through my filters. >> There's already a comment in that header which says that the accesses are >> ordered wrt timer reads, so I don't think anything needs to change there. >> For consistency with the macro arguments, I could augment their __io_par to >> take the read value as an unused argument, if that's what you mean? FWIW, we don't really have a way to mandate this in the ISA yet as there's no formal model for either CSR orderings or the IO memory space. > Yes, that's what I meant, I should have been clearer there. That sounds reasonable to me. It looks like we can also go ahead and delete a bunch of arch/riscv/include/asm/io.h now that this stuff is in asm-generic, which would cause us to actually start using these things. I didn't know this had all been moved to asm-generic otherwise I would have cleaned this up earlier. I think this should do it, but this does bring up a bit of an issue: the RISC-V versions of reads and friends put barriers outside the loop, while the asm-generic version don't. What are these actually supposed to do? Either way that resolves, feel free to consider something like diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h index b269451e7e85..378975f180a7 100644 --- a/arch/riscv/include/asm/io.h +++ b/arch/riscv/include/asm/io.h @@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr) * writes. */ #define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory"); -#define __io_par() __asm__ __volatile__ ("fence i,ior" : : : "memory"); +#define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory"); #define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory"); #define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory"); -#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) -#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) -#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) +#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) +#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) +#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) #define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) #define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) #define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) #ifdef CONFIG_64BIT -#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; }) +#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; }) #define outq(v,c) ({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); }) #endif @@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar()) #define readsw(addr, buffer, count) __readsw(addr, buffer, count) #define readsl(addr, buffer, count) __readsl(addr, buffer, count) -__io_reads_ins(ins, u8, b, __io_pbr(), __io_par()) -__io_reads_ins(ins, u16, w, __io_pbr(), __io_par()) -__io_reads_ins(ins, u32, l, __io_pbr(), __io_par()) +__io_reads_ins(ins, u8, b, __io_pbr(), __io_par(addr)) +__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr)) +__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr)) #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count) #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count) #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count) as Revewied-by: Palmer Dabbelt <palmer@sifive.com> when included along with the other diff. That way we can at least keep the macro signatures matching, the cleanup can come later... Thanks!
On Wed, Feb 13, 2019 at 01:57:50PM -0800, Palmer Dabbelt wrote: > On Wed, 13 Feb 2019 12:59:28 PST (-0800), Arnd Bergmann wrote: > > On Wed, Feb 13, 2019 at 6:46 PM Will Deacon <will.deacon@arm.com> wrote: > > > > > On Tue, Feb 12, 2019 at 12:55:17PM +0100, Arnd Bergmann wrote: > > > > > > > For all I can see, this should not conflict with the usage of the > > > > same macros on RISC-V, though it does make add a significant > > > > difference, so I'd like to see an Ack from the RISC-V folks as > > > > well (added to Cc), or possibly a change to arch/riscv/include/asm/io.h > > > > to do a corresponding change. > > Thanks, the original patches didn't make it through my filters. > > > > There's already a comment in that header which says that the accesses are > > > ordered wrt timer reads, so I don't think anything needs to change there. > > > For consistency with the macro arguments, I could augment their __io_par to > > > take the read value as an unused argument, if that's what you mean? > > FWIW, we don't really have a way to mandate this in the ISA yet as there's > no formal model for either CSR orderings or the IO memory space. Ah, so you may end up needing the dependency trick too, depending on where you land with the ISA. > > Yes, that's what I meant, I should have been clearer there. > > That sounds reasonable to me. It looks like we can also go ahead and delete > a bunch of arch/riscv/include/asm/io.h now that this stuff is in > asm-generic, which would cause us to actually start using these things. I > didn't know this had all been moved to asm-generic otherwise I would have > cleaned this up earlier. > > I think this should do it, but this does bring up a bit of an issue: the > RISC-V versions of reads and friends put barriers outside the loop, while > the asm-generic version don't. What are these actually supposed to do? You're referring to the string accessors (e.g. insb() and readsw()), right? arm and arm64 don't provide barriers here either, and I don't think they should have to given that these routines are usually used to poll data register-based FIFOs and therefore don't need to provide ordering guarantees against DMA operations. However, this is woefully undocumented and I shall try to address this in the next version of my memory-barriers.txt patch relating to this area [1]. > Either way that resolves, feel free to consider something like > > diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h > index b269451e7e85..378975f180a7 100644 > --- a/arch/riscv/include/asm/io.h > +++ b/arch/riscv/include/asm/io.h > @@ -198,20 +198,20 @@ static inline u64 __raw_readq(const volatile void __iomem *addr) > * writes. > */ > #define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory"); > -#define __io_par() __asm__ __volatile__ ("fence i,ior" : : : "memory"); > +#define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory"); > #define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory"); > #define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory"); > > -#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) > -#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) > -#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(); __v; }) > +#define inb(c) ({ u8 __v; __io_pbr(); __v = readb_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) > +#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) > +#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; }) > > #define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) > #define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) > #define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); }) > > #ifdef CONFIG_64BIT > -#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(); __v; }) > +#define inq(c) ({ u64 __v; __io_pbr(); __v = readq_cpu((void*)(c)); __io_par(__v); __v; }) > #define outq(v,c) ({ __io_pbw(); writeq_cpu((v),(void*)(c)); __io_paw(); }) > #endif > > @@ -261,9 +261,9 @@ __io_reads_ins(reads, u32, l, __io_br(), __io_ar()) > #define readsw(addr, buffer, count) __readsw(addr, buffer, count) > #define readsl(addr, buffer, count) __readsl(addr, buffer, count) > > -__io_reads_ins(ins, u8, b, __io_pbr(), __io_par()) > -__io_reads_ins(ins, u16, w, __io_pbr(), __io_par()) > -__io_reads_ins(ins, u32, l, __io_pbr(), __io_par()) > +__io_reads_ins(ins, u8, b, __io_pbr(), __io_par(addr)) > +__io_reads_ins(ins, u16, w, __io_pbr(), __io_par(addr)) > +__io_reads_ins(ins, u32, l, __io_pbr(), __io_par(addr)) > #define insb(addr, buffer, count) __insb((void __iomem *)(long)addr, buffer, count) > #define insw(addr, buffer, count) __insw((void __iomem *)(long)addr, buffer, count) > #define insl(addr, buffer, count) __insl((void __iomem *)(long)addr, buffer, count) > > as > > Revewied-by: Palmer Dabbelt <palmer@sifive.com> > > when included along with the other diff. That way we can at least keep the > macro signatures matching, the cleanup can come later... Thanks, Palmer! I'll send a v2 of this patch, updated to fix up insq() as well as the readX() macros too, since they're likely to suffer the exact same issues as inX() in this regard. Will [1] https://lkml.org/lkml/2019/2/11/1803
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index d356f802945a..b5737c0d8316 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -65,7 +65,7 @@ #endif #ifndef __io_par -#define __io_par() __io_ar() +#define __io_par(v) __io_ar() #endif @@ -471,7 +471,7 @@ static inline u8 inb(unsigned long addr) __io_pbr(); val = __raw_readb(PCI_IOBASE + addr); - __io_par(); + __io_par(val); return val; } #endif @@ -484,7 +484,7 @@ static inline u16 inw(unsigned long addr) __io_pbr(); val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr)); - __io_par(); + __io_par(val); return val; } #endif @@ -497,7 +497,7 @@ static inline u32 inl(unsigned long addr) __io_pbr(); val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr)); - __io_par(); + __io_par(val); return val; } #endif
The inX() I/O accessors must enforce ordering against subsequent calls to the delay() routines, so that a read-back from a device can be used to postpone a subsequent write to the same device. On some architectures, including arm64, this ordering can only be achieved by creating a dependency on the value returned by the inX() operation, so we need to pass the value we read to the __io_par() macro in this case. Reported-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> --- include/asm-generic/io.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.11.0