Message ID | 20241205165419.54080-1-robdclark@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC] drm/msm: Add UABI to request perfcntr usage | expand |
On 12/12/24 4:58 PM, Akhil P Oommen wrote: > On 12/5/2024 10:24 PM, Rob Clark wrote: >> From: Rob Clark <robdclark@chromium.org> >> >> Performance counter usage falls into two categories: >> >> 1. Local usage, where the counter configuration, start, and end read >> happen within (locally to) a single SUBMIT. In this case, there is >> no dependency on counter configuration or values between submits, and >> in fact counters are normally cleared on context switches, making it >> impossible to rely on cross-submit state. >> >> 2. Global usage, where a single privilaged daemon/process is sampling >> counter values across all processes for profiling. >> >> The two categories are mutually exclusive. While you can have many >> processes making local counter usage, you cannot combine global and >> local usage without the two stepping on each others feet (by changing >> counter configuration). >> >> For global counter usage, there is already a SYSPROF param (since global >> counter usage requires disabling counter clearing on context switch). >> This patch adds a REQ_CNTRS param to request local counter usage. If >> one or more processes has requested counter usage, then a SYSPROF >> request will fail with -EBUSY. And if SYSPROF is active, then REQ_CNTRS >> will fail with -EBUSY, maintaining the mutual exclusivity. >> >> This is purely an advisory interface to help coordinate userspace. >> There is no real means of enforcement, but the worst that can happen if >> userspace ignores a REQ_CNTRS failure is that you'll get nonsense >> profiling data. >> >> Signed-off-by: Rob Clark <robdclark@chromium.org> >> --- >> kgsl takes a different approach, which involves a lot more UABI for >> assigning counters to different processes. But I think by taking >> advantage of the fact that mesa (freedreno+turnip) reconfigure the >> counters they need in each SUBMIT, for their respective gl/vk perf- >> counter extensions, we can take this simpler approach. > > KGSL's approach is preemption and ifpc safe (also whatever HW changes > that will come up in future generations). How will we ensure that here? > > I have plans to bring up IFPC support in near future. Also, I brought up > this point during preemption series. But from the responses, I felt that > profiling was not considered a serious usecase. Still I wonder how the > perfcounter extensions work accurately with preemption. So back then I implemented the postamble IB to clear perf counters and that gets disabled when sysprof (so global usage) is happening. The kernel is oblivious to "Local isage" of profiling but in that case really what we want to do is disable preemption which in my understanding can be done from userspace with a PKT. In my understanding this had us covered for all usecases. So what would you expect instead we should do kernel side to make profiling preemption safe? > > -Akhil > >> >> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + >> drivers/gpu/drm/msm/msm_drv.c | 5 ++- >> drivers/gpu/drm/msm/msm_gpu.c | 1 + >> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- >> drivers/gpu/drm/msm/msm_submitqueue.c | 52 ++++++++++++++++++++++++- >> include/uapi/drm/msm_drm.h | 1 + >> 6 files changed, 85 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> index 31bbf2c83de4..f688e37059b8 100644 >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct msm_file_private *ctx, >> if (!capable(CAP_SYS_ADMIN)) >> return UERR(EPERM, drm, "invalid permissions"); >> return msm_file_private_set_sysprof(ctx, gpu, value); >> + case MSM_PARAM_REQ_CNTRS: >> + return msm_file_private_request_counters(ctx, gpu, value); >> default: >> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu->name, param); >> } >> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c >> index 6416d2cb4efc..bf8314ff4a25 100644 >> --- a/drivers/gpu/drm/msm/msm_drv.c >> +++ b/drivers/gpu/drm/msm/msm_drv.c >> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file) >> * It is not possible to set sysprof param to non-zero if gpu >> * is not initialized: >> */ >> - if (priv->gpu) >> + if (ctx->sysprof) >> msm_file_private_set_sysprof(ctx, priv->gpu, 0); >> >> + if (ctx->counters_requested) >> + msm_file_private_request_counters(ctx, priv->gpu, 0); >> + >> context_close(ctx); >> } >> >> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c >> index 82f204f3bb8f..013b59ca3bb1 100644 >> --- a/drivers/gpu/drm/msm/msm_gpu.c >> +++ b/drivers/gpu/drm/msm/msm_gpu.c >> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, >> gpu->nr_rings = nr_rings; >> >> refcount_set(&gpu->sysprof_active, 1); >> + refcount_set(&gpu->local_counters_active, 1); >> >> return 0; >> >> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h >> index e25009150579..83c61e523b1b 100644 >> --- a/drivers/gpu/drm/msm/msm_gpu.h >> +++ b/drivers/gpu/drm/msm/msm_gpu.h >> @@ -195,12 +195,28 @@ struct msm_gpu { >> int nr_rings; >> >> /** >> - * sysprof_active: >> + * @sysprof_active: >> * >> - * The count of contexts that have enabled system profiling. >> + * The count of contexts that have enabled system profiling plus one. >> + * >> + * Note: refcount_t does not like 0->1 transitions.. we want to keep >> + * the under/overflow checks that refcount_t provides, but allow >> + * multiple on/off transitions so we track the logical value plus one.) >> */ >> refcount_t sysprof_active; >> >> + /** >> + * @local_counters_active: >> + * >> + * The count of contexts that have requested local (intra-submit) >> + * performance counter usage plus one. >> + * >> + * Note: refcount_t does not like 0->1 transitions.. we want to keep >> + * the under/overflow checks that refcount_t provides, but allow >> + * multiple on/off transitions so we track the logical value plus one.) >> + */ >> + refcount_t local_counters_active; >> + >> /** >> * lock: >> * >> @@ -383,6 +399,13 @@ struct msm_file_private { >> */ >> int sysprof; >> >> + /** >> + * @counters_requested: >> + * >> + * Has the context requested local perfcntr usage. >> + */ >> + bool counters_requested; >> + >> /** >> * comm: Overridden task comm, see MSM_PARAM_COMM >> * >> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); >> >> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >> struct msm_gpu *gpu, int sysprof); >> +int msm_file_private_request_counters(struct msm_file_private *ctx, >> + struct msm_gpu *gpu, int reqcntrs); >> void __msm_file_private_destroy(struct kref *kref); >> >> static inline void msm_file_private_put(struct msm_file_private *ctx) >> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c >> index 7fed1de63b5d..1e1e21e6f7ae 100644 >> --- a/drivers/gpu/drm/msm/msm_submitqueue.c >> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c >> @@ -10,6 +10,15 @@ >> int msm_file_private_set_sysprof(struct msm_file_private *ctx, >> struct msm_gpu *gpu, int sysprof) >> { >> + int ret = 0; >> + >> + mutex_lock(&gpu->lock); >> + >> + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) { >> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); >> + goto out_unlock; >> + } >> + >> /* >> * Since pm_runtime and sysprof_active are both refcounts, we >> * call apply the new value first, and then unwind the previous >> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, >> >> switch (sysprof) { >> default: >> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); >> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); >> + goto out_unlock; >> case 2: >> pm_runtime_get_sync(&gpu->pdev->dev); >> fallthrough; >> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, >> >> ctx->sysprof = sysprof; >> >> - return 0; >> +out_unlock: >> + mutex_unlock(&gpu->lock); >> + >> + return ret; >> +} >> + >> +int msm_file_private_request_counters(struct msm_file_private *ctx, >> + struct msm_gpu *gpu, int reqctrs) >> +{ >> + int ret = 0; >> + >> + mutex_lock(&gpu->lock); >> + >> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { >> + ret = UERR(EBUSY, gpu->dev, "System profiling active"); >> + goto out_unlock; >> + } >> + >> + if (reqctrs) { >> + if (ctx->counters_requested) { >> + ret = UERR(EINVAL, gpu->dev, "Already requested"); >> + goto out_unlock; >> + } >> + >> + ctx->counters_requested = true; >> + refcount_inc(&gpu->local_counters_active); >> + } else { >> + if (!ctx->counters_requested) { >> + ret = UERR(EINVAL, gpu->dev, "Not requested"); >> + goto out_unlock; >> + } >> + refcount_dec(&gpu->local_counters_active); >> + ctx->counters_requested = false; >> + } >> + >> +out_unlock: >> + mutex_unlock(&gpu->lock); >> + >> + return ret; >> } >> >> void __msm_file_private_destroy(struct kref *kref) >> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h >> index 2342cb90857e..ae7fb355e4a1 100644 >> --- a/include/uapi/drm/msm_drm.h >> +++ b/include/uapi/drm/msm_drm.h >> @@ -91,6 +91,7 @@ struct drm_msm_timespec { >> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ >> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ >> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ >> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-submit) perfcntr usage */ >> >> /* For backwards compat. The original support for preemption was based on >> * a single ring per priority level so # of priority levels equals the # > Best regards,
On Thu, Dec 12, 2024 at 7:59 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote: > > On 12/5/2024 10:24 PM, Rob Clark wrote: > > From: Rob Clark <robdclark@chromium.org> > > > > Performance counter usage falls into two categories: > > > > 1. Local usage, where the counter configuration, start, and end read > > happen within (locally to) a single SUBMIT. In this case, there is > > no dependency on counter configuration or values between submits, and > > in fact counters are normally cleared on context switches, making it > > impossible to rely on cross-submit state. > > > > 2. Global usage, where a single privilaged daemon/process is sampling > > counter values across all processes for profiling. > > > > The two categories are mutually exclusive. While you can have many > > processes making local counter usage, you cannot combine global and > > local usage without the two stepping on each others feet (by changing > > counter configuration). > > > > For global counter usage, there is already a SYSPROF param (since global > > counter usage requires disabling counter clearing on context switch). > > This patch adds a REQ_CNTRS param to request local counter usage. If > > one or more processes has requested counter usage, then a SYSPROF > > request will fail with -EBUSY. And if SYSPROF is active, then REQ_CNTRS > > will fail with -EBUSY, maintaining the mutual exclusivity. > > > > This is purely an advisory interface to help coordinate userspace. > > There is no real means of enforcement, but the worst that can happen if > > userspace ignores a REQ_CNTRS failure is that you'll get nonsense > > profiling data. > > > > Signed-off-by: Rob Clark <robdclark@chromium.org> > > --- > > kgsl takes a different approach, which involves a lot more UABI for > > assigning counters to different processes. But I think by taking > > advantage of the fact that mesa (freedreno+turnip) reconfigure the > > counters they need in each SUBMIT, for their respective gl/vk perf- > > counter extensions, we can take this simpler approach. > > KGSL's approach is preemption and ifpc safe (also whatever HW changes > that will come up in future generations). How will we ensure that here? > > I have plans to bring up IFPC support in near future. Also, I brought up > this point during preemption series. But from the responses, I felt that > profiling was not considered a serious usecase. Still I wonder how the > perfcounter extensions work accurately with preemption. Re: IFPC, I think initially we have to inhibit IFPC when SYSPROF is active Longer term, I think we want to just save and restore all of the SEL regs as well as the counters themselves on preemption. AFAIU currently only the counters themselves are saved/restored. But there is only one 32b SEL reg for each 64b counter, so I'm not sure that you save that many cycles by not just saving/restoring the SEL regs as well. (And of course with REQ_CNTRS the kernel knows which processes need counter save/restore and which do not, so you are only taking the extra context switch overhead if a process is actually using the perfcntrs.) Alternatively, I think we could just declare this as a userspace problem, and solve it with CP_SET_AMBLE PREAMBLE/POSTAMBLE? Just for background, rendernode UABI is exposed to all processes that can use the GPU, ie. basically everything. Which makes it an attractive attack surface. This is why I prefer minimalism when it comes to UABI, and not adding new ioctls and complexity in the kernel when it is not essential ;-) BR, -R > -Akhil > > > > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 + > > drivers/gpu/drm/msm/msm_drv.c | 5 ++- > > drivers/gpu/drm/msm/msm_gpu.c | 1 + > > drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++- > > drivers/gpu/drm/msm/msm_submitqueue.c | 52 ++++++++++++++++++++++++- > > include/uapi/drm/msm_drm.h | 1 + > > 6 files changed, 85 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > index 31bbf2c83de4..f688e37059b8 100644 > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct msm_file_private *ctx, > > if (!capable(CAP_SYS_ADMIN)) > > return UERR(EPERM, drm, "invalid permissions"); > > return msm_file_private_set_sysprof(ctx, gpu, value); > > + case MSM_PARAM_REQ_CNTRS: > > + return msm_file_private_request_counters(ctx, gpu, value); > > default: > > return UERR(EINVAL, drm, "%s: invalid param: %u", gpu->name, param); > > } > > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c > > index 6416d2cb4efc..bf8314ff4a25 100644 > > --- a/drivers/gpu/drm/msm/msm_drv.c > > +++ b/drivers/gpu/drm/msm/msm_drv.c > > @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file) > > * It is not possible to set sysprof param to non-zero if gpu > > * is not initialized: > > */ > > - if (priv->gpu) > > + if (ctx->sysprof) > > msm_file_private_set_sysprof(ctx, priv->gpu, 0); > > > > + if (ctx->counters_requested) > > + msm_file_private_request_counters(ctx, priv->gpu, 0); > > + > > context_close(ctx); > > } > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > > index 82f204f3bb8f..013b59ca3bb1 100644 > > --- a/drivers/gpu/drm/msm/msm_gpu.c > > +++ b/drivers/gpu/drm/msm/msm_gpu.c > > @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, > > gpu->nr_rings = nr_rings; > > > > refcount_set(&gpu->sysprof_active, 1); > > + refcount_set(&gpu->local_counters_active, 1); > > > > return 0; > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > > index e25009150579..83c61e523b1b 100644 > > --- a/drivers/gpu/drm/msm/msm_gpu.h > > +++ b/drivers/gpu/drm/msm/msm_gpu.h > > @@ -195,12 +195,28 @@ struct msm_gpu { > > int nr_rings; > > > > /** > > - * sysprof_active: > > + * @sysprof_active: > > * > > - * The count of contexts that have enabled system profiling. > > + * The count of contexts that have enabled system profiling plus one. > > + * > > + * Note: refcount_t does not like 0->1 transitions.. we want to keep > > + * the under/overflow checks that refcount_t provides, but allow > > + * multiple on/off transitions so we track the logical value plus one.) > > */ > > refcount_t sysprof_active; > > > > + /** > > + * @local_counters_active: > > + * > > + * The count of contexts that have requested local (intra-submit) > > + * performance counter usage plus one. > > + * > > + * Note: refcount_t does not like 0->1 transitions.. we want to keep > > + * the under/overflow checks that refcount_t provides, but allow > > + * multiple on/off transitions so we track the logical value plus one.) > > + */ > > + refcount_t local_counters_active; > > + > > /** > > * lock: > > * > > @@ -383,6 +399,13 @@ struct msm_file_private { > > */ > > int sysprof; > > > > + /** > > + * @counters_requested: > > + * > > + * Has the context requested local perfcntr usage. > > + */ > > + bool counters_requested; > > + > > /** > > * comm: Overridden task comm, see MSM_PARAM_COMM > > * > > @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); > > > > int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > struct msm_gpu *gpu, int sysprof); > > +int msm_file_private_request_counters(struct msm_file_private *ctx, > > + struct msm_gpu *gpu, int reqcntrs); > > void __msm_file_private_destroy(struct kref *kref); > > > > static inline void msm_file_private_put(struct msm_file_private *ctx) > > diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c > > index 7fed1de63b5d..1e1e21e6f7ae 100644 > > --- a/drivers/gpu/drm/msm/msm_submitqueue.c > > +++ b/drivers/gpu/drm/msm/msm_submitqueue.c > > @@ -10,6 +10,15 @@ > > int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > struct msm_gpu *gpu, int sysprof) > > { > > + int ret = 0; > > + > > + mutex_lock(&gpu->lock); > > + > > + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) { > > + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); > > + goto out_unlock; > > + } > > + > > /* > > * Since pm_runtime and sysprof_active are both refcounts, we > > * call apply the new value first, and then unwind the previous > > @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > > switch (sysprof) { > > default: > > - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); > > + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); > > + goto out_unlock; > > case 2: > > pm_runtime_get_sync(&gpu->pdev->dev); > > fallthrough; > > @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, > > > > ctx->sysprof = sysprof; > > > > - return 0; > > +out_unlock: > > + mutex_unlock(&gpu->lock); > > + > > + return ret; > > +} > > + > > +int msm_file_private_request_counters(struct msm_file_private *ctx, > > + struct msm_gpu *gpu, int reqctrs) > > +{ > > + int ret = 0; > > + > > + mutex_lock(&gpu->lock); > > + > > + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { > > + ret = UERR(EBUSY, gpu->dev, "System profiling active"); > > + goto out_unlock; > > + } > > + > > + if (reqctrs) { > > + if (ctx->counters_requested) { > > + ret = UERR(EINVAL, gpu->dev, "Already requested"); > > + goto out_unlock; > > + } > > + > > + ctx->counters_requested = true; > > + refcount_inc(&gpu->local_counters_active); > > + } else { > > + if (!ctx->counters_requested) { > > + ret = UERR(EINVAL, gpu->dev, "Not requested"); > > + goto out_unlock; > > + } > > + refcount_dec(&gpu->local_counters_active); > > + ctx->counters_requested = false; > > + } > > + > > +out_unlock: > > + mutex_unlock(&gpu->lock); > > + > > + return ret; > > } > > > > void __msm_file_private_destroy(struct kref *kref) > > diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h > > index 2342cb90857e..ae7fb355e4a1 100644 > > --- a/include/uapi/drm/msm_drm.h > > +++ b/include/uapi/drm/msm_drm.h > > @@ -91,6 +91,7 @@ struct drm_msm_timespec { > > #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ > > #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ > > #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ > > +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-submit) perfcntr usage */ > > > > /* For backwards compat. The original support for preemption was based on > > * a single ring per priority level so # of priority levels equals the # >
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 31bbf2c83de4..f688e37059b8 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct msm_file_private *ctx, if (!capable(CAP_SYS_ADMIN)) return UERR(EPERM, drm, "invalid permissions"); return msm_file_private_set_sysprof(ctx, gpu, value); + case MSM_PARAM_REQ_CNTRS: + return msm_file_private_request_counters(ctx, gpu, value); default: return UERR(EINVAL, drm, "%s: invalid param: %u", gpu->name, param); } diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 6416d2cb4efc..bf8314ff4a25 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file) * It is not possible to set sysprof param to non-zero if gpu * is not initialized: */ - if (priv->gpu) + if (ctx->sysprof) msm_file_private_set_sysprof(ctx, priv->gpu, 0); + if (ctx->counters_requested) + msm_file_private_request_counters(ctx, priv->gpu, 0); + context_close(ctx); } diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 82f204f3bb8f..013b59ca3bb1 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, gpu->nr_rings = nr_rings; refcount_set(&gpu->sysprof_active, 1); + refcount_set(&gpu->local_counters_active, 1); return 0; diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index e25009150579..83c61e523b1b 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -195,12 +195,28 @@ struct msm_gpu { int nr_rings; /** - * sysprof_active: + * @sysprof_active: * - * The count of contexts that have enabled system profiling. + * The count of contexts that have enabled system profiling plus one. + * + * Note: refcount_t does not like 0->1 transitions.. we want to keep + * the under/overflow checks that refcount_t provides, but allow + * multiple on/off transitions so we track the logical value plus one.) */ refcount_t sysprof_active; + /** + * @local_counters_active: + * + * The count of contexts that have requested local (intra-submit) + * performance counter usage plus one. + * + * Note: refcount_t does not like 0->1 transitions.. we want to keep + * the under/overflow checks that refcount_t provides, but allow + * multiple on/off transitions so we track the logical value plus one.) + */ + refcount_t local_counters_active; + /** * lock: * @@ -383,6 +399,13 @@ struct msm_file_private { */ int sysprof; + /** + * @counters_requested: + * + * Has the context requested local perfcntr usage. + */ + bool counters_requested; + /** * comm: Overridden task comm, see MSM_PARAM_COMM * @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref); int msm_file_private_set_sysprof(struct msm_file_private *ctx, struct msm_gpu *gpu, int sysprof); +int msm_file_private_request_counters(struct msm_file_private *ctx, + struct msm_gpu *gpu, int reqcntrs); void __msm_file_private_destroy(struct kref *kref); static inline void msm_file_private_put(struct msm_file_private *ctx) diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c index 7fed1de63b5d..1e1e21e6f7ae 100644 --- a/drivers/gpu/drm/msm/msm_submitqueue.c +++ b/drivers/gpu/drm/msm/msm_submitqueue.c @@ -10,6 +10,15 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, struct msm_gpu *gpu, int sysprof) { + int ret = 0; + + mutex_lock(&gpu->lock); + + if (sysprof && (refcount_read(&gpu->local_counters_active) > 1)) { + ret = UERR(EBUSY, gpu->dev, "Local counter usage active"); + goto out_unlock; + } + /* * Since pm_runtime and sysprof_active are both refcounts, we * call apply the new value first, and then unwind the previous @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, switch (sysprof) { default: - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof); + goto out_unlock; case 2: pm_runtime_get_sync(&gpu->pdev->dev); fallthrough; @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct msm_file_private *ctx, ctx->sysprof = sysprof; - return 0; +out_unlock: + mutex_unlock(&gpu->lock); + + return ret; +} + +int msm_file_private_request_counters(struct msm_file_private *ctx, + struct msm_gpu *gpu, int reqctrs) +{ + int ret = 0; + + mutex_lock(&gpu->lock); + + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) { + ret = UERR(EBUSY, gpu->dev, "System profiling active"); + goto out_unlock; + } + + if (reqctrs) { + if (ctx->counters_requested) { + ret = UERR(EINVAL, gpu->dev, "Already requested"); + goto out_unlock; + } + + ctx->counters_requested = true; + refcount_inc(&gpu->local_counters_active); + } else { + if (!ctx->counters_requested) { + ret = UERR(EINVAL, gpu->dev, "Not requested"); + goto out_unlock; + } + refcount_dec(&gpu->local_counters_active); + ctx->counters_requested = false; + } + +out_unlock: + mutex_unlock(&gpu->lock); + + return ret; } void __msm_file_private_destroy(struct kref *kref) diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h index 2342cb90857e..ae7fb355e4a1 100644 --- a/include/uapi/drm/msm_drm.h +++ b/include/uapi/drm/msm_drm.h @@ -91,6 +91,7 @@ struct drm_msm_timespec { #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */ #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */ #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */ +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-submit) perfcntr usage */ /* For backwards compat. The original support for preemption was based on * a single ring per priority level so # of priority levels equals the #