mbox series

[RFC,0/8] drm/msm: adreno: add support for DDR bandwidth scaling via GMU

Message ID 20241113-topic-sm8x50-gpu-bw-vote-v1-0-3b8d39737a9b@linaro.org
Headers show
Series drm/msm: adreno: add support for DDR bandwidth scaling via GMU | expand

Message

Neil Armstrong Nov. 13, 2024, 3:48 p.m. UTC
The Adreno GMU Management Unit (GMU) can also vote for DDR Bandwidth
along the Frequency and Power Domain level, but by default we leave the
OPP core scale the interconnect ddr path.

While scaling the interconnect path was sufficient, newer GPUs
like the A750 requires specific vote parameters and bandwidth to
achieve full functionnality.

In order to get the vote values to be used by the GPU Management
Unit (GMU), we need to parse all the possible OPP Bandwidths and
create a vote value to be send to the appropriate Bus Control
Modules (BCMs) declared in the GPU info struct.
The added dev_pm_opp_get_bandwidth() is used in this case.

The vote array will then be used to dynamically generate the GMU
bw_table sent during the GMU power-up.

Those entries will then be used by passing the appropriate
bandwidth level when voting for a GPU frequency.

This will make sure all resources are equally voted for a
same OPP, whatever decision is done by the GMU, it will
ensure all resources votes are synchronized.

Tested on SM8650 and SM8550 platforms.

Any feedback is welcome.

Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
---
Neil Armstrong (8):
      opp: core: implement dev_pm_opp_get_bandwidth
      drm/msm: adreno: add GMU_BW_VOTE quirk
      drm/msm: adreno: add plumbing to generate bandwidth vote table for GMU
      drm/msm: adreno: dynamically generate GMU bw table
      drm/msm: adreno: find bandwidth index of OPP and set it along freq index
      drm/msm: adreno: enable GMU bandwidth for A740 and A750
      arm64: qcom: dts: sm8550: add interconnect and opp-peak-kBps for GPU
      arm64: qcom: dts: sm8650: add interconnect and opp-peak-kBps for GPU

 arch/arm64/boot/dts/qcom/sm8550.dtsi      |  11 ++
 arch/arm64/boot/dts/qcom/sm8650.dtsi      |  14 +++
 drivers/gpu/drm/msm/adreno/a6xx_catalog.c |  26 ++++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c     | 180 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h     |  14 ++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |   1 +
 drivers/gpu/drm/msm/adreno/a6xx_hfi.c     |  54 ++++++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   1 +
 drivers/opp/core.c                        |  25 +++++
 include/linux/pm_opp.h                    |   7 ++
 10 files changed, 314 insertions(+), 19 deletions(-)
---
base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
change-id: 20241113-topic-sm8x50-gpu-bw-vote-f5e022fe7a47

Best regards,

Comments

Viresh Kumar Nov. 14, 2024, 4:10 a.m. UTC | #1
On 13-11-24, 16:48, Neil Armstrong wrote:
> Add and implement the dev_pm_opp_get_bandwidth() to retrieve
> the OPP's bandwidth in the same was as the dev_pm_opp_get_voltage()

                                  way

> helper.
> 
> Retrieving bandwidth is required in the case of the Adreno GPU
> where the GPU Management Unit can handle the Bandwidth scaling.
> 
> The helper can get the peak or everage bandwidth for any of

                                 average

> the interconnect path.
> 
> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> ---
>  drivers/opp/core.c     | 25 +++++++++++++++++++++++++
>  include/linux/pm_opp.h |  7 +++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 494f8860220d97fc690ebab5ed3b7f5f04f22d73..19fb82033de26b74e9604c33b9781689df2fe80a 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -106,6 +106,31 @@ static bool assert_single_clk(struct opp_table *opp_table)
>  	return !WARN_ON(opp_table->clk_count > 1);
>  }
>  
> +/**
> + * dev_pm_opp_get_bandwidth() - Gets the peak bandwidth corresponding to an opp

s/peak bandwidth/bandwidth/

> + * @opp:	opp for which voltage has to be returned for
> + * @peak:	select peak or average bandwidth
> + * @index:	bandwidth index
> + *
> + * Return: peak bandwidth in kBps, else return 0

s/peak bandwidth/bandwidth/

> + */
> +unsigned long dev_pm_opp_get_bandwidth(struct dev_pm_opp *opp, bool peak, int index)
> +{
> +	if (IS_ERR_OR_NULL(opp)) {
> +		pr_err("%s: Invalid parameters\n", __func__);
> +		return 0;
> +	}
> +
> +	if (index > opp->opp_table->path_count)
> +		return 0;
> +
> +	if (!opp->bandwidth)
> +		return 0;
> +
> +	return peak ? opp->bandwidth[index].peak : opp->bandwidth[index].avg;
> +}
> +EXPORT_SYMBOL_GPL(dev_pm_opp_get_bandwidth);

All other bandwidth APIs are named as _bw, maybe do same here too ?
Neil Armstrong Nov. 14, 2024, 9:23 a.m. UTC | #2
Hi,

On 14/11/2024 05:10, Viresh Kumar wrote:
> On 13-11-24, 16:48, Neil Armstrong wrote:
>> Add and implement the dev_pm_opp_get_bandwidth() to retrieve
>> the OPP's bandwidth in the same was as the dev_pm_opp_get_voltage()
> 
>                                    way
> 
>> helper.
>>
>> Retrieving bandwidth is required in the case of the Adreno GPU
>> where the GPU Management Unit can handle the Bandwidth scaling.
>>
>> The helper can get the peak or everage bandwidth for any of
> 
>                                   average

Aww, good catch, thanks

> 
>> the interconnect path.
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   drivers/opp/core.c     | 25 +++++++++++++++++++++++++
>>   include/linux/pm_opp.h |  7 +++++++
>>   2 files changed, 32 insertions(+)
>>
>> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
>> index 494f8860220d97fc690ebab5ed3b7f5f04f22d73..19fb82033de26b74e9604c33b9781689df2fe80a 100644
>> --- a/drivers/opp/core.c
>> +++ b/drivers/opp/core.c
>> @@ -106,6 +106,31 @@ static bool assert_single_clk(struct opp_table *opp_table)
>>   	return !WARN_ON(opp_table->clk_count > 1);
>>   }
>>   
>> +/**
>> + * dev_pm_opp_get_bandwidth() - Gets the peak bandwidth corresponding to an opp
> 
> s/peak bandwidth/bandwidth/

Ack

> 
>> + * @opp:	opp for which voltage has to be returned for
>> + * @peak:	select peak or average bandwidth
>> + * @index:	bandwidth index
>> + *
>> + * Return: peak bandwidth in kBps, else return 0
> 
> s/peak bandwidth/bandwidth/

Ack

> 
>> + */
>> +unsigned long dev_pm_opp_get_bandwidth(struct dev_pm_opp *opp, bool peak, int index)
>> +{
>> +	if (IS_ERR_OR_NULL(opp)) {
>> +		pr_err("%s: Invalid parameters\n", __func__);
>> +		return 0;
>> +	}
>> +
>> +	if (index > opp->opp_table->path_count)
>> +		return 0;
>> +
>> +	if (!opp->bandwidth)
>> +		return 0;
>> +
>> +	return peak ? opp->bandwidth[index].peak : opp->bandwidth[index].avg;
>> +}
>> +EXPORT_SYMBOL_GPL(dev_pm_opp_get_bandwidth);
> 
> All other bandwidth APIs are named as _bw, maybe do same here too ?
> 

Sure, I wasn't sure about that, will switch to _bw.

Neil
Neil Armstrong Nov. 15, 2024, 9:21 a.m. UTC | #3
On 15/11/2024 08:07, Dmitry Baryshkov wrote:
> On Wed, Nov 13, 2024 at 04:48:28PM +0100, Neil Armstrong wrote:
>> The Adreno GMU Management Unit (GNU) can also scale the DDR Bandwidth
>> along the Frequency and Power Domain level, but by default we leave the
>> OPP core vote for the interconnect ddr path.
>>
>> While scaling via the interconnect path was sufficient, newer GPUs
>> like the A750 requires specific vote paremeters and bandwidth to
>> achieve full functionality.
>>
>> Add a new Quirk enabling DDR Bandwidth vote via GMU.
> 
> Please describe, why this is defined as a quirk rather than a proper
> platform-level property. From my experience with 6xx and 7xx, all the
> platforms need to send some kind of BW data to the GMU.

Well APRIV, CACHED_COHERENT & PREEMPTION are HW features, why this can't be part of this ?

Perhaps the "quirks" bitfield should be features instead ?

> 
>>
>> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/adreno_gpu.h | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> index e71f420f8b3a8e6cfc52dd1c4d5a63ef3704a07f..20b6b7f49473d42751cd4fb4fc82849be42cb807 100644
>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
>> @@ -57,6 +57,7 @@ enum adreno_family {
>>   #define ADRENO_QUIRK_HAS_HW_APRIV		BIT(3)
>>   #define ADRENO_QUIRK_HAS_CACHED_COHERENT	BIT(4)
>>   #define ADRENO_QUIRK_PREEMPTION			BIT(5)
>> +#define ADRENO_QUIRK_GMU_BW_VOTE		BIT(6)
>>   
>>   /* Helper for formating the chip_id in the way that userspace tools like
>>    * crashdec expect.
>>
>> -- 
>> 2.34.1
>>
>
Dmitry Baryshkov Nov. 15, 2024, 2:18 p.m. UTC | #4
On Fri, 15 Nov 2024 at 11:21, Neil Armstrong <neil.armstrong@linaro.org> wrote:
>
> On 15/11/2024 08:07, Dmitry Baryshkov wrote:
> > On Wed, Nov 13, 2024 at 04:48:28PM +0100, Neil Armstrong wrote:
> >> The Adreno GMU Management Unit (GNU) can also scale the DDR Bandwidth
> >> along the Frequency and Power Domain level, but by default we leave the
> >> OPP core vote for the interconnect ddr path.
> >>
> >> While scaling via the interconnect path was sufficient, newer GPUs
> >> like the A750 requires specific vote paremeters and bandwidth to
> >> achieve full functionality.
> >>
> >> Add a new Quirk enabling DDR Bandwidth vote via GMU.
> >
> > Please describe, why this is defined as a quirk rather than a proper
> > platform-level property. From my experience with 6xx and 7xx, all the
> > platforms need to send some kind of BW data to the GMU.
>
> Well APRIV, CACHED_COHERENT & PREEMPTION are HW features, why this can't be part of this ?
>
> Perhaps the "quirks" bitfield should be features instead ?

Sounds like that.
Dmitry Baryshkov Nov. 15, 2024, 2:35 p.m. UTC | #5
On Fri, Nov 15, 2024 at 10:11:09AM +0100, Neil Armstrong wrote:
> On 15/11/2024 08:24, Dmitry Baryshkov wrote:
> > On Wed, Nov 13, 2024 at 04:48:30PM +0100, Neil Armstrong wrote:
> > > The Adreno GPU Management Unit (GMU) can also scale the ddr
> > > bandwidth along the frequency and power domain level, but for
> > > now we statically fill the bw_table with values from the
> > > downstream driver.
> > > 
> > > Only the first entry is used, which is a disable vote, so we
> > > currently rely on scaling via the linux interconnect paths.
> > > 
> > > Let's dynamically generate the bw_table with the vote values
> > > previously calculated from the OPPs.
> > 
> > Nice to see this being worked upon. I hope the code can is generic
> > enough so that we can use it from other adreno_foo_build_bw_table()
> > functions.
> 
> I would hope so, but I don't have the HW to properly test it on those
> platforms.

Welcome to the club^W Lab.

> > > Those entried will then be used by the GMU when passing the
> > > appropriate bandwidth level when voting for a gpu frequency.
> > > 
> > > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
> > > ---
> > >   drivers/gpu/drm/msm/adreno/a6xx_hfi.c | 48 +++++++++++++++++++++++++++--------
> > >   1 file changed, 37 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > > index cb8844ed46b29c4569d05eb7a24f7b27e173190f..9a89ba95843e7805d78f0e5ddbe328677b6431dd 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_hfi.c
> > > @@ -596,22 +596,48 @@ static void a730_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
> > >   	msg->cnoc_cmds_data[1][0] = 0x60000001;
> > >   }
> > > -static void a740_build_bw_table(struct a6xx_hfi_msg_bw_table *msg)
> > > +static void a740_generate_bw_table(struct adreno_gpu *adreno_gpu, struct a6xx_gmu *gmu,
> > > +				   struct a6xx_hfi_msg_bw_table *msg)
> > >   {
> > > -	msg->bw_level_num = 1;
> > > +	const struct a6xx_info *info = adreno_gpu->info->a6xx;
> > > +	unsigned int i, j;
> > > -	msg->ddr_cmds_num = 3;
> > >   	msg->ddr_wait_bitmask = 0x7;
> > > -	msg->ddr_cmds_addrs[0] = cmd_db_read_addr("SH0");
> > > -	msg->ddr_cmds_addrs[1] = cmd_db_read_addr("MC0");
> > > -	msg->ddr_cmds_addrs[2] = cmd_db_read_addr("ACV");
> > > +	for (i = 0; i < 3; i++) {
> > > +		if (!info->bcm[i].name)
> > > +			break;
> > > +		msg->ddr_cmds_addrs[i] = cmd_db_read_addr(info->bcm[i].name);
> > > +	}
> > > +	msg->ddr_cmds_num = i;
> > > -	msg->ddr_cmds_data[0][0] = 0x40000000;
> > > -	msg->ddr_cmds_data[0][1] = 0x40000000;
> > > -	msg->ddr_cmds_data[0][2] = 0x40000000;
> > > +	for (i = 0; i < gmu->nr_gpu_bws; ++i)
> > > +		for (j = 0; j < msg->ddr_cmds_num; j++)
> > > +			msg->ddr_cmds_data[i][j] = gmu->gpu_bw_votes[i][j];
> > > +	msg->bw_level_num = gmu->nr_gpu_bws;
> > > +}
> > > +
> > > +static void a740_build_bw_table(struct adreno_gpu *adreno_gpu, struct a6xx_gmu *gmu,
> > > +				struct a6xx_hfi_msg_bw_table *msg)
> > > +{
> > > +	if ((adreno_gpu->info->quirks & ADRENO_QUIRK_GMU_BW_VOTE) && gmu->nr_gpu_bws) {
> > > +		a740_generate_bw_table(adreno_gpu, gmu, msg);
> > > +	} else {
> > 
> > Why do we need a fallback code here?
> 
> Because at this particular commit, it would generate an invalid table, I should probably remove the fallback at the end

Or move this to a generic code that generates a table if there is no bw
data (like there is none for older platforms with the current DTs).

> 
> > 
> > > +		msg->bw_level_num = 1;
> > > -	/* TODO: add a proper dvfs table */
> > > +		msg->ddr_cmds_num = 3;
> > > +		msg->ddr_wait_bitmask = 0x7;
> > > +
> > > +		msg->ddr_cmds_addrs[0] = cmd_db_read_addr("SH0");
> > > +		msg->ddr_cmds_addrs[1] = cmd_db_read_addr("MC0");
> > > +		msg->ddr_cmds_addrs[2] = cmd_db_read_addr("ACV");
> > > +
> > > +		msg->ddr_cmds_data[0][0] = 0x40000000;
> > > +		msg->ddr_cmds_data[0][1] = 0x40000000;
> > > +		msg->ddr_cmds_data[0][2] = 0x40000000;
> > > +
> > > +		/* TODO: add a proper dvfs table */
> > 
> > I think TODO is unapplicable anymore.
> > 
> > > +	}
> > >   	msg->cnoc_cmds_num = 1;
> > >   	msg->cnoc_wait_bitmask = 0x1;
> > > @@ -691,7 +717,7 @@ static int a6xx_hfi_send_bw_table(struct a6xx_gmu *gmu)
> > >   	else if (adreno_is_a730(adreno_gpu))
> > >   		a730_build_bw_table(msg);
> > >   	else if (adreno_is_a740_family(adreno_gpu))
> > > -		a740_build_bw_table(msg);
> > > +		a740_build_bw_table(adreno_gpu, gmu, msg);
> > >   	else
> > >   		a6xx_build_bw_table(msg);
> > > 
> > > -- 
> > > 2.34.1
> > > 
> > 
>
Rob Clark Nov. 15, 2024, 3:10 p.m. UTC | #6
On Fri, Nov 15, 2024 at 6:18 AM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On Fri, 15 Nov 2024 at 11:21, Neil Armstrong <neil.armstrong@linaro.org> wrote:
> >
> > On 15/11/2024 08:07, Dmitry Baryshkov wrote:
> > > On Wed, Nov 13, 2024 at 04:48:28PM +0100, Neil Armstrong wrote:
> > >> The Adreno GMU Management Unit (GNU) can also scale the DDR Bandwidth
> > >> along the Frequency and Power Domain level, but by default we leave the
> > >> OPP core vote for the interconnect ddr path.
> > >>
> > >> While scaling via the interconnect path was sufficient, newer GPUs
> > >> like the A750 requires specific vote paremeters and bandwidth to
> > >> achieve full functionality.
> > >>
> > >> Add a new Quirk enabling DDR Bandwidth vote via GMU.
> > >
> > > Please describe, why this is defined as a quirk rather than a proper
> > > platform-level property. From my experience with 6xx and 7xx, all the
> > > platforms need to send some kind of BW data to the GMU.
> >
> > Well APRIV, CACHED_COHERENT & PREEMPTION are HW features, why this can't be part of this ?
> >
> > Perhaps the "quirks" bitfield should be features instead ?
>
> Sounds like that.

But LMLOADKILL_DISABLE and TWO_PASS_USE_WFI are quirks.. so it is kind
of a mix of quirks and features.  So meh

BR,
-R

>
> --
> With best wishes
> Dmitry
Neil Armstrong Nov. 15, 2024, 3:28 p.m. UTC | #7
On 15/11/2024 16:10, Rob Clark wrote:
> On Fri, Nov 15, 2024 at 6:18 AM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>>
>> On Fri, 15 Nov 2024 at 11:21, Neil Armstrong <neil.armstrong@linaro.org> wrote:
>>>
>>> On 15/11/2024 08:07, Dmitry Baryshkov wrote:
>>>> On Wed, Nov 13, 2024 at 04:48:28PM +0100, Neil Armstrong wrote:
>>>>> The Adreno GMU Management Unit (GNU) can also scale the DDR Bandwidth
>>>>> along the Frequency and Power Domain level, but by default we leave the
>>>>> OPP core vote for the interconnect ddr path.
>>>>>
>>>>> While scaling via the interconnect path was sufficient, newer GPUs
>>>>> like the A750 requires specific vote paremeters and bandwidth to
>>>>> achieve full functionality.
>>>>>
>>>>> Add a new Quirk enabling DDR Bandwidth vote via GMU.
>>>>
>>>> Please describe, why this is defined as a quirk rather than a proper
>>>> platform-level property. From my experience with 6xx and 7xx, all the
>>>> platforms need to send some kind of BW data to the GMU.
>>>
>>> Well APRIV, CACHED_COHERENT & PREEMPTION are HW features, why this can't be part of this ?
>>>
>>> Perhaps the "quirks" bitfield should be features instead ?
>>
>> Sounds like that.
> 
> But LMLOADKILL_DISABLE and TWO_PASS_USE_WFI are quirks.. so it is kind
> of a mix of quirks and features.  So meh

Well I can do a split and move the features into a clean .features bitfield, would it be ok ?

Neil

> 
> BR,
> -R
> 
>>
>> --
>> With best wishes
>> Dmitry