mbox series

[PATCHv8,0/8] System Cache support for GPU and required SMMU support

Message ID cover.1605621785.git.saiprakash.ranjan@codeaurora.org
Headers show
Series System Cache support for GPU and required SMMU support | expand

Message

Sai Prakash Ranjan Nov. 17, 2020, 2:30 p.m. UTC
Some hardware variants contain a system cache or the last level
cache(llc). This cache is typically a large block which is shared
by multiple clients on the SOC. GPU uses the system cache to cache
both the GPU data buffers(like textures) as well the SMMU pagetables.
This helps with improved render performance as well as lower power
consumption by reducing the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices
which then be used by multiple SOC clients. This patch series is an
effort to enable and use two of those slices preallocated for the GPU,
one for the GPU data buffers and another for the GPU SMMU hardware
pagetables.

Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
Patch 7 and 8 are minor cleanups for arm-smmu impl.

Changes in v8:
 * Introduce a generic domain attribute for pagetable config (Will)
 * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
 * Move non-strict mode to use new struct domain_attr_io_pgtbl_config (Will)

Changes in v7:
 * Squash Jordan's patch to support MMU500 targets
 * Rebase on top of for-joerg/arm-smmu/updates and Jordan's short series for adreno-smmu impl

Changes in v6:
 * Move table to arm-smmu-qcom (Robin)

Changes in v5:
 * Drop cleanup of blank lines since it was intentional (Robin)
 * Rebase again on top of msm-next-pgtables as it moves pretty fast

Changes in v4:
 * Drop IOMMU_SYS_CACHE prot flag
 * Rebase on top of https://gitlab.freedesktop.org/drm/msm/-/tree/msm-next-pgtables

Changes in v3:
 * Fix domain attribute setting to before iommu_attach_device()
 * Fix few code style and checkpatch warnings
 * Rebase on top of Jordan's latest split pagetables and per-instance
   pagetables support

Changes in v2:
 * Addressed review comments and rebased on top of Jordan's split
   pagetables series

Jordan Crouse (1):
  drm/msm/a6xx: Add support for using system cache on MMU500 based
    targets

Sai Prakash Ranjan (5):
  iommu/io-pgtable-arm: Add support to use system cache
  iommu/arm-smmu: Add domain attribute for pagetable configuration
  iommu/arm-smmu: Move non-strict mode to use domain_attr_io_pgtbl_cfg
  iommu: arm-smmu-impl: Use table to list QCOM implementations
  iommu: arm-smmu-impl: Add a space before open parenthesis

Sharat Masetty (2):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm/a6xx: Add support for using system cache(LLC)

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      | 109 +++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h      |   5 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c    |  17 ++++
 drivers/gpu/drm/msm/msm_drv.c              |   8 ++
 drivers/gpu/drm/msm/msm_drv.h              |   1 +
 drivers/gpu/drm/msm/msm_gpu.h              |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  11 +--
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  21 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c      |  30 +++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |   3 +-
 drivers/iommu/io-pgtable-arm.c             |  10 +-
 include/linux/io-pgtable.h                 |   8 ++
 include/linux/iommu.h                      |   1 +
 13 files changed, 203 insertions(+), 26 deletions(-)


base-commit: a29bbb0861f487a5e144dc997a9f71a36c7a2404

Comments

Will Deacon Nov. 23, 2020, 3:21 p.m. UTC | #1
On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:
> Some hardware variants contain a system cache or the last level
> cache(llc). This cache is typically a large block which is shared
> by multiple clients on the SOC. GPU uses the system cache to cache
> both the GPU data buffers(like textures) as well the SMMU pagetables.
> This helps with improved render performance as well as lower power
> consumption by reducing the bus traffic to the system memory.
> 
> The system cache architecture allows the cache to be split into slices
> which then be used by multiple SOC clients. This patch series is an
> effort to enable and use two of those slices preallocated for the GPU,
> one for the GPU data buffers and another for the GPU SMMU hardware
> pagetables.
> 
> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
> Patch 7 and 8 are minor cleanups for arm-smmu impl.
> 
> Changes in v8:
>  * Introduce a generic domain attribute for pagetable config (Will)
>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config (Will)

Modulo some minor comments I've made, this looks good to me. What is the
plan for merging it? I can take the IOMMU parts, but patches 4-6 touch the
MSM GPU driver and I'd like to avoid conflicts with that.

Will
Sai Prakash Ranjan Nov. 23, 2020, 5:01 p.m. UTC | #2
On 2020-11-23 20:51, Will Deacon wrote:
> On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:

>> Some hardware variants contain a system cache or the last level

>> cache(llc). This cache is typically a large block which is shared

>> by multiple clients on the SOC. GPU uses the system cache to cache

>> both the GPU data buffers(like textures) as well the SMMU pagetables.

>> This helps with improved render performance as well as lower power

>> consumption by reducing the bus traffic to the system memory.

>> 

>> The system cache architecture allows the cache to be split into slices

>> which then be used by multiple SOC clients. This patch series is an

>> effort to enable and use two of those slices preallocated for the GPU,

>> one for the GPU data buffers and another for the GPU SMMU hardware

>> pagetables.

>> 

>> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.

>> Patch 7 and 8 are minor cleanups for arm-smmu impl.

>> 

>> Changes in v8:

>>  * Introduce a generic domain attribute for pagetable config (Will)

>>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)

>>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config 

>> (Will)

> 

> Modulo some minor comments I've made, this looks good to me. What is 

> the

> plan for merging it? I can take the IOMMU parts, but patches 4-6 touch 

> the

> MSM GPU driver and I'd like to avoid conflicts with that.

> 


SMMU bits are pretty much independent and GPU relies on the domain 
attribute
and the quirk exposed, so as long as SMMU changes go in first it should 
be good.
Rob?

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation
Rob Clark Nov. 23, 2020, 7:22 p.m. UTC | #3
On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2020-11-23 20:51, Will Deacon wrote:
> > On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:
> >> Some hardware variants contain a system cache or the last level
> >> cache(llc). This cache is typically a large block which is shared
> >> by multiple clients on the SOC. GPU uses the system cache to cache
> >> both the GPU data buffers(like textures) as well the SMMU pagetables.
> >> This helps with improved render performance as well as lower power
> >> consumption by reducing the bus traffic to the system memory.
> >>
> >> The system cache architecture allows the cache to be split into slices
> >> which then be used by multiple SOC clients. This patch series is an
> >> effort to enable and use two of those slices preallocated for the GPU,
> >> one for the GPU data buffers and another for the GPU SMMU hardware
> >> pagetables.
> >>
> >> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
> >> Patch 7 and 8 are minor cleanups for arm-smmu impl.
> >>
> >> Changes in v8:
> >>  * Introduce a generic domain attribute for pagetable config (Will)
> >>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
> >>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config
> >> (Will)
> >
> > Modulo some minor comments I've made, this looks good to me. What is
> > the
> > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch
> > the
> > MSM GPU driver and I'd like to avoid conflicts with that.
> >
>
> SMMU bits are pretty much independent and GPU relies on the domain
> attribute
> and the quirk exposed, so as long as SMMU changes go in first it should
> be good.
> Rob?

I suppose one option would be to split out the patch that adds the
attribute into it's own patch, and merge that both thru drm and iommu?

If Will/Robin dislike that approach, I'll pick up the parts of the drm
patches which don't depend on the new attribute for v5.11 and the rest
for v5.12.. or possibly a second late v5.11 pull req if airlied
doesn't hate me too much for it.

Going forward, I think we will have one or two more co-dependent
series, like the smmu iova fault handler improvements that Jordan
posted.  So I would like to hear how Will and Robin prefer to handle
those.

BR,
-R


> Thanks,
> Sai
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member
> of Code Aurora Forum, hosted by The Linux Foundation
Sai Prakash Ranjan Nov. 24, 2020, 4:02 a.m. UTC | #4
On 2020-11-24 00:52, Rob Clark wrote:
> On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan

> <saiprakash.ranjan@codeaurora.org> wrote:

>> 

>> On 2020-11-23 20:51, Will Deacon wrote:

>> > On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:

>> >> Some hardware variants contain a system cache or the last level

>> >> cache(llc). This cache is typically a large block which is shared

>> >> by multiple clients on the SOC. GPU uses the system cache to cache

>> >> both the GPU data buffers(like textures) as well the SMMU pagetables.

>> >> This helps with improved render performance as well as lower power

>> >> consumption by reducing the bus traffic to the system memory.

>> >>

>> >> The system cache architecture allows the cache to be split into slices

>> >> which then be used by multiple SOC clients. This patch series is an

>> >> effort to enable and use two of those slices preallocated for the GPU,

>> >> one for the GPU data buffers and another for the GPU SMMU hardware

>> >> pagetables.

>> >>

>> >> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.

>> >> Patch 7 and 8 are minor cleanups for arm-smmu impl.

>> >>

>> >> Changes in v8:

>> >>  * Introduce a generic domain attribute for pagetable config (Will)

>> >>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)

>> >>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config

>> >> (Will)

>> >

>> > Modulo some minor comments I've made, this looks good to me. What is

>> > the

>> > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch

>> > the

>> > MSM GPU driver and I'd like to avoid conflicts with that.

>> >

>> 

>> SMMU bits are pretty much independent and GPU relies on the domain

>> attribute

>> and the quirk exposed, so as long as SMMU changes go in first it 

>> should

>> be good.

>> Rob?

> 

> I suppose one option would be to split out the patch that adds the

> attribute into it's own patch, and merge that both thru drm and iommu?

> 


Ok I can split out domain attr and quirk into its own patch if Will is
fine with that approach.

> If Will/Robin dislike that approach, I'll pick up the parts of the drm

> patches which don't depend on the new attribute for v5.11 and the rest

> for v5.12.. or possibly a second late v5.11 pull req if airlied

> doesn't hate me too much for it.

> 

> Going forward, I think we will have one or two more co-dependent

> series, like the smmu iova fault handler improvements that Jordan

> posted.  So I would like to hear how Will and Robin prefer to handle

> those.

> 

> BR,

> -R

> 


Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation
Will Deacon Nov. 24, 2020, 11:10 a.m. UTC | #5
On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:
> On 2020-11-24 00:52, Rob Clark wrote:
> > On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
> > <saiprakash.ranjan@codeaurora.org> wrote:
> > > 
> > > On 2020-11-23 20:51, Will Deacon wrote:
> > > > On Tue, Nov 17, 2020 at 08:00:39PM +0530, Sai Prakash Ranjan wrote:
> > > >> Some hardware variants contain a system cache or the last level
> > > >> cache(llc). This cache is typically a large block which is shared
> > > >> by multiple clients on the SOC. GPU uses the system cache to cache
> > > >> both the GPU data buffers(like textures) as well the SMMU pagetables.
> > > >> This helps with improved render performance as well as lower power
> > > >> consumption by reducing the bus traffic to the system memory.
> > > >>
> > > >> The system cache architecture allows the cache to be split into slices
> > > >> which then be used by multiple SOC clients. This patch series is an
> > > >> effort to enable and use two of those slices preallocated for the GPU,
> > > >> one for the GPU data buffers and another for the GPU SMMU hardware
> > > >> pagetables.
> > > >>
> > > >> Patch 1 - Patch 6 adds system cache support in SMMU and GPU driver.
> > > >> Patch 7 and 8 are minor cleanups for arm-smmu impl.
> > > >>
> > > >> Changes in v8:
> > > >>  * Introduce a generic domain attribute for pagetable config (Will)
> > > >>  * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
> > > >>  * Move non-strict mode to use new struct domain_attr_io_pgtbl_config
> > > >> (Will)
> > > >
> > > > Modulo some minor comments I've made, this looks good to me. What is
> > > > the
> > > > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch
> > > > the
> > > > MSM GPU driver and I'd like to avoid conflicts with that.
> > > >
> > > 
> > > SMMU bits are pretty much independent and GPU relies on the domain
> > > attribute
> > > and the quirk exposed, so as long as SMMU changes go in first it
> > > should
> > > be good.
> > > Rob?
> > 
> > I suppose one option would be to split out the patch that adds the
> > attribute into it's own patch, and merge that both thru drm and iommu?
> > 
> 
> Ok I can split out domain attr and quirk into its own patch if Will is
> fine with that approach.

Why don't I just queue the first two patches on their own branch and we
both pull that?

Will
Rob Clark Nov. 24, 2020, 10:08 p.m. UTC | #6
On Tue, Nov 24, 2020 at 1:43 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Nov 24, 2020 at 11:05:39AM -0800, Rob Clark wrote:
> > On Tue, Nov 24, 2020 at 3:10 AM Will Deacon <will@kernel.org> wrote:
> > > On Tue, Nov 24, 2020 at 09:32:54AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2020-11-24 00:52, Rob Clark wrote:
> > > > > On Mon, Nov 23, 2020 at 9:01 AM Sai Prakash Ranjan
> > > > > <saiprakash.ranjan@codeaurora.org> wrote:
> > > > > > On 2020-11-23 20:51, Will Deacon wrote:
> > > > > > > Modulo some minor comments I've made, this looks good to me. What is
> > > > > > > the
> > > > > > > plan for merging it? I can take the IOMMU parts, but patches 4-6 touch
> > > > > > > the
> > > > > > > MSM GPU driver and I'd like to avoid conflicts with that.
> > > > > > >
> > > > > >
> > > > > > SMMU bits are pretty much independent and GPU relies on the domain
> > > > > > attribute
> > > > > > and the quirk exposed, so as long as SMMU changes go in first it
> > > > > > should
> > > > > > be good.
> > > > > > Rob?
> > > > >
> > > > > I suppose one option would be to split out the patch that adds the
> > > > > attribute into it's own patch, and merge that both thru drm and iommu?
> > > > >
> > > >
> > > > Ok I can split out domain attr and quirk into its own patch if Will is
> > > > fine with that approach.
> > >
> > > Why don't I just queue the first two patches on their own branch and we
> > > both pull that?
> >
> > Ok, that works for me.  I normally base msm-next on -rc1 but I guess
> > as long as we base the branch on the older or our two -next branches,
> > that should work out nicely
>
> Turns out we're getting a v10 of Sai's stuff, so I've asked him to split
> patch two up anyway. Then I'll make a branch based on -rc1 that we can
> both pull.

Sounds good, thx

BR,
-R