Message ID | 20210723214031.3251801-4-atish.patra@wdc.com |
---|---|
State | New |
Headers | show |
Series | Support non-coherent DMA on RISC-V using a global pool | expand |
On Fri, Jul 23, 2021 at 3:40 PM Atish Patra <atish.patra@wdc.com> wrote: > > Currently, linux,dma-default is used to reserve a global non-coherent pool > to allocate memory for dma operations. This can be useful for RISC-V as > well as the ISA specification doesn't specify a method to modify PMA > attributes or page table entries to define non-cacheable area yet. > A non-cacheable memory window is an alternate options for vendors to > support non-coherent devices. "dma-ranges" must be used in conjunction with > "linux,dma-default" property to define one or more mappings between device > and cpu accesible memory regions. 'dma-ranges' applies to buses. And, well, maybe devices when the bus is not well defined. It is not a reserved-memory property. Rob
On Fri, Jul 23, 2021 at 02:40:29PM -0700, Atish Patra wrote: > Currently, linux,dma-default is used to reserve a global non-coherent pool > to allocate memory for dma operations. This can be useful for RISC-V as > well as the ISA specification doesn't specify a method to modify PMA > attributes or page table entries to define non-cacheable area yet. > A non-cacheable memory window is an alternate options for vendors to > support non-coherent devices. Please explain why you do not want to use the simply non-cachable window support using arch_dma_set_uncached as used by mips, niops2 and xtensa. > +static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size) > { > struct dma_coherent_mem *mem; > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); > + if (phys_addr == device_addr) > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, true); > + else > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, false); Nak. The phys_addr != device_addr support is goign away. This needs to be filled in using dma-ranges property hanging of the struct device.
On Mon, Jul 26, 2021 at 12:00 AM Christoph Hellwig <hch@lst.de> wrote: > > On Fri, Jul 23, 2021 at 02:40:29PM -0700, Atish Patra wrote: > > Currently, linux,dma-default is used to reserve a global non-coherent pool > > to allocate memory for dma operations. This can be useful for RISC-V as > > well as the ISA specification doesn't specify a method to modify PMA > > attributes or page table entries to define non-cacheable area yet. > > A non-cacheable memory window is an alternate options for vendors to > > support non-coherent devices. > > Please explain why you do not want to use the simply non-cachable > window support using arch_dma_set_uncached as used by mips, niops2 and > xtensa. > arch_dma_set_uncached works as well in this case. However, mips, niops2 & xtensa uses a fixed (via config) value for the offset. Similar approach can't be used here because the platform specific offset value has to be determined at runtime so that a single kernel image can boot on all platforms. That's why we need the following additional changes for RISC-V to make it work. 1. a new DT property so that arch specific code is aware of the non-cacheable window offset. - either under /chosen node or a completely separate node with multiple non-cacheable window support We also need to define how it is going to referenced from individual device if a per-device non-cacheable window support is required in future. As of now, the beagleV memory region lies in 0x10_0000_00000 - x17_FFFF_FFFF which is mapped to start of DRAM 0x80000000. All of the non-coherent devices can do 32bit DMA only. 2. Use the dma-ranges and modify the arch_dma_set_uncached function to pass the struct device as an argument. Either way, we will need arch specific hook ups and additional changes while the global non-coherent pool provides a more elegant solution without any additional arch specific code. If arch_dma_set_uncached is still preferred way to solve this problem, I can revise the patch with either approach 1 or approach 2 > > +static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size) > > > > > > { > > struct dma_coherent_mem *mem; > > > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); > > + if (phys_addr == device_addr) > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, true); > > + else > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, false); > > Nak. The phys_addr != device_addr support is goign away. This needs ok. > to be filled in using dma-ranges property hanging of the struct device. struct device is only accessible in rmem_dma_device_init. I couldn't find a proper way to access it during dma_reserved_default_memory setup under global pool. Does that mean we should use a per-device memory pool instead of a global non-coherent pool ? > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu -- Regards, Atish
On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote: > arch_dma_set_uncached works as well in this case. However, mips, > niops2 & xtensa uses a > fixed (via config) value for the offset. Similar approach can't be > used here because the platform specific > offset value has to be determined at runtime so that a single kernel > image can boot on all platforms. Nothing in the interface requires a fixed offset. And using the offset has one enormous advantage in that there is no need to declare a statically sized pool - allocations are fully dynamic. And any kind of fixed pool tends to cause huge problems. > 1. a new DT property so that arch specific code is aware of the > non-cacheable window offset. Yes. > individual device if a per-device non-cacheable > window support is required in future. As of now, the beagleV memory If you require a per-device noncachable area you can use the per-device coherent pools. But why would you want that? > region lies in 0x10_0000_00000 - x17_FFFF_FFFF > which is mapped to start of DRAM 0x80000000. All of the > non-coherent devices can do 32bit DMA only. Adjust ZONE_DMA32 so that it takes the uncached offset into account. > > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); > > > + if (phys_addr == device_addr) > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, true); > > > + else > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, false); > > > > Nak. The phys_addr != device_addr support is goign away. This needs > > ok. > > > to be filled in using dma-ranges property hanging of the struct device. > > struct device is only accessible in rmem_dma_device_init. I couldn't > find a proper way to access it during > dma_reserved_default_memory setup under global pool. > > Does that mean we should use a per-device memory pool instead of a > global non-coherent pool ? Indeed, that would be a problem in this case. But if we can just use the uncached offset directly I think everything will be much simpler.
On Tue, Jul 27, 2021 at 1:52 AM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Jul 26, 2021 at 03:47:54PM -0700, Atish Patra wrote: > > arch_dma_set_uncached works as well in this case. However, mips, > > niops2 & xtensa uses a > > fixed (via config) value for the offset. Similar approach can't be > > used here because the platform specific > > offset value has to be determined at runtime so that a single kernel > > image can boot on all platforms. > > Nothing in the interface requires a fixed offset. And using the offset > has one enormous advantage in that there is no need to declare a > statically sized pool - allocations are fully dynamic. And any kind of > fixed pool tends to cause huge problems. > > > 1. a new DT property so that arch specific code is aware of the > > non-cacheable window offset. > > Yes. > > > individual device if a per-device non-cacheable > > window support is required in future. As of now, the beagleV memory > > If you require a per-device noncachable area you can use the per-device > coherent pools. But why would you want that? > > > region lies in 0x10_0000_00000 - x17_FFFF_FFFF > > which is mapped to start of DRAM 0x80000000. All of the > > non-coherent devices can do 32bit DMA only. > > Adjust ZONE_DMA32 so that it takes the uncached offset into account. > > > > > - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); > > > > + if (phys_addr == device_addr) > > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, true); > > > > + else > > > > + mem = dma_init_coherent_memory(phys_addr, device_addr, size, false); > > > > > > Nak. The phys_addr != device_addr support is goign away. This needs > > > > ok. > > > > > to be filled in using dma-ranges property hanging of the struct device. > > > > struct device is only accessible in rmem_dma_device_init. I couldn't > > find a proper way to access it during > > dma_reserved_default_memory setup under global pool. > > > > Does that mean we should use a per-device memory pool instead of a > > global non-coherent pool ? > > Indeed, that would be a problem in this case. But if we can just > use the uncached offset directly I think everything will be much > simpler. Yes. I was planning to change this to use an uncached offset. However, the planned mass production for beaglev starlight sbc is cancelled now [1]. As there is no other board that requires an uncached offset, I don't think there is no usecase for adding uncached offset support for RISC-V right now. I will revisit(hopefully we don't have to) this in case any platform implements uncached window support in future. [1] https://www.cnx-software.com/2021/07/31/beaglev-starlight-sbc-wont-be-mass-manufactured-redesigned-beaglev-risc-v-sbc-expected-in-q1-2022/ -- Regards, Atish
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c index 97677df5408b..d0b33b1a76f0 100644 --- a/kernel/dma/coherent.c +++ b/kernel/dma/coherent.c @@ -9,6 +9,8 @@ #include <linux/module.h> #include <linux/dma-direct.h> #include <linux/dma-map-ops.h> +#include <linux/of_address.h> +#include <linux/libfdt.h> struct dma_coherent_mem { void *virt_base; @@ -302,19 +304,27 @@ int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *vaddr, vaddr, size, ret); } -int dma_init_global_coherent(phys_addr_t phys_addr, size_t size) +static int __dma_init_global_coherent(phys_addr_t phys_addr, dma_addr_t device_addr, size_t size) { struct dma_coherent_mem *mem; - mem = dma_init_coherent_memory(phys_addr, phys_addr, size, true); + if (phys_addr == device_addr) + mem = dma_init_coherent_memory(phys_addr, device_addr, size, true); + else + mem = dma_init_coherent_memory(phys_addr, device_addr, size, false); + if (IS_ERR(mem)) return PTR_ERR(mem); dma_coherent_default_memory = mem; pr_info("DMA: default coherent area is set\n"); return 0; } -#endif /* CONFIG_DMA_GLOBAL_POOL */ +int dma_init_global_coherent(phys_addr_t phys_addr, size_t size) +{ + return __dma_init_global_coherent(phys_addr, phys_addr, size); +} +#endif /* CONFIG_DMA_GLOBAL_POOL */ /* * Support for reserved memory regions defined in device tree */ @@ -329,8 +339,8 @@ static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev) if (!rmem->priv) { struct dma_coherent_mem *mem; - mem = dma_init_coherent_memory(rmem->base, rmem->base, - rmem->size, true); + mem = dma_init_coherent_memory(rmem->base, rmem->base, rmem->size, true); + if (IS_ERR(mem)) return PTR_ERR(mem); rmem->priv = mem; @@ -358,7 +368,7 @@ static int __init rmem_dma_setup(struct reserved_mem *rmem) if (of_get_flat_dt_prop(node, "reusable", NULL)) return -EINVAL; -#ifdef CONFIG_ARM +#if defined(CONFIG_ARM) || defined(CONFIG_RISCV) if (!of_get_flat_dt_prop(node, "no-map", NULL)) { pr_err("Reserved memory: regions without no-map are not yet supported\n"); return -EINVAL; @@ -382,10 +392,33 @@ static int __init rmem_dma_setup(struct reserved_mem *rmem) #ifdef CONFIG_DMA_GLOBAL_POOL static int __init dma_init_reserved_memory(void) { + struct device_node *np; + const struct bus_dma_region *map = NULL; + int ret; + int64_t uc_offset = 0; + if (!dma_reserved_default_memory) return -ENOMEM; - return dma_init_global_coherent(dma_reserved_default_memory->base, - dma_reserved_default_memory->size); + + /* dma-ranges is only valid for global pool i.e. dma-default is set */ + np = of_find_node_with_property(NULL, "linux,dma-default"); + if (!np) + goto global_init; + of_node_put(np); + + ret = of_dma_get_range(np, &map); + if (ret < 0) + goto global_init; + + /* Sanity check for the non-coherent global pool from uncached region */ + if (map->dma_start == dma_reserved_default_memory->base && + map->size == dma_reserved_default_memory->size) + uc_offset = map->offset; + +global_init: + return __dma_init_global_coherent(dma_reserved_default_memory->base + uc_offset, + dma_reserved_default_memory->base, + dma_reserved_default_memory->size); } core_initcall(dma_init_reserved_memory); #endif /* CONFIG_DMA_GLOBAL_POOL */
Currently, linux,dma-default is used to reserve a global non-coherent pool to allocate memory for dma operations. This can be useful for RISC-V as well as the ISA specification doesn't specify a method to modify PMA attributes or page table entries to define non-cacheable area yet. A non-cacheable memory window is an alternate options for vendors to support non-coherent devices. "dma-ranges" must be used in conjunction with "linux,dma-default" property to define one or more mappings between device and cpu accesible memory regions. This allows RISC-V to use global pool for non-coherent platforms that relies on a uncached memory region that is outside of the system ram. Signed-off-by: Atish Patra <atish.patra@wdc.com> --- kernel/dma/coherent.c | 49 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 41 insertions(+), 8 deletions(-)