mbox series

[0/2] dma-fence: Rename dma_fence_is_signaled()

Message ID 20250409120640.106408-2-phasta@kernel.org
Headers show
Series dma-fence: Rename dma_fence_is_signaled() | expand

Message

Philipp Stanner April 9, 2025, 12:06 p.m. UTC
Hi all,

I'm currently debugging a Nouveau issue [1] and potentially might want to
add a function that just checks whether a fence is signaled already –
which then would obviously be called dma_fence_is_signaled().

In any case, I think it is reasonable to rename dma_fence_is_signaled()
so that it becomes very, very explicit when reading code that this is a
place where fences can get signaled.

This series obsoletes this patch: [2]

P.

[1] https://lore.kernel.org/all/20250403101353.42880-2-phasta@kernel.org/
[2] https://lore.kernel.org/all/20250408122217.61530-2-phasta@kernel.org/


Philipp Stanner (2):
  dma-fence: Rename dma_fence_is_signaled()
  dma-fence: Improve docu for dma_fence_check_and_signal()

 drivers/dma-buf/dma-fence-array.c             |  2 +-
 drivers/dma-buf/dma-fence-chain.c             |  6 +--
 drivers/dma-buf/dma-fence-unwrap.c            |  4 +-
 drivers/dma-buf/dma-fence.c                   |  6 +--
 drivers/dma-buf/dma-resv.c                    |  6 +--
 drivers/dma-buf/st-dma-fence-chain.c          | 10 ++--
 drivers/dma-buf/st-dma-fence.c                |  8 ++--
 drivers/dma-buf/sw_sync.c                     |  2 +-
 drivers/dma-buf/sync_file.c                   |  4 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c       |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c      |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c      | 10 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  6 +--
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  2 +-
 drivers/gpu/drm/drm_suballoc.c                |  6 +--
 drivers/gpu/drm/drm_syncobj.c                 |  6 +--
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c         |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c       |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |  4 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
 drivers/gpu/drm/i915/i915_deps.c              |  6 +--
 drivers/gpu/drm/i915/i915_request.c           |  6 +--
 drivers/gpu/drm/i915/i915_sw_fence.c          |  4 +-
 drivers/gpu/drm/i915/i915_vma.c               |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  4 +-
 drivers/gpu/drm/imagination/pvr_queue.c       | 10 ++--
 drivers/gpu/drm/lima/lima_sched.c             |  4 +-
 drivers/gpu/drm/msm/msm_gpu.c                 |  2 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c         |  2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c       |  2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c       |  4 +-
 drivers/gpu/drm/qxl/qxl_release.c             |  2 +-
 drivers/gpu/drm/scheduler/sched_entity.c      |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c        |  4 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  2 +-
 drivers/gpu/drm/v3d/v3d_sched.c               |  4 +-
 drivers/gpu/drm/vgem/vgem_fence.c             |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c         |  6 +--
 drivers/gpu/drm/xe/xe_bo.c                    |  2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c            |  4 +-
 drivers/gpu/drm/xe/xe_hw_fence.c              |  2 +-
 drivers/gpu/drm/xe/xe_pt.c                    |  2 +-
 drivers/gpu/drm/xe/xe_range_fence.c           |  2 +-
 include/linux/dma-fence.h                     | 47 ++++++++++++-------
 51 files changed, 133 insertions(+), 120 deletions(-)

Comments

Boris Brezillon April 9, 2025, 12:39 p.m. UTC | #1
Hi Philipp,

On Wed,  9 Apr 2025 14:06:37 +0200
Philipp Stanner <phasta@kernel.org> wrote:

> dma_fence_is_signaled()'s name strongly reads as if this function were
> intended for checking whether a fence is already signaled. Also the
> boolean it returns hints at that.
> 
> The function's behavior, however, is more complex: it can check with a
> driver callback whether the hardware's sequence number indicates that
> the fence can already be treated as signaled, although the hardware's /
> driver's interrupt handler has not signaled it yet. If that's the case,
> the function also signals the fence.
> 
> (Presumably) this has caused a bug in Nouveau (unknown commit), where
> nouveau_fence_done() uses the function to check a fence, which causes a
> race.
> 
> Give the function a more obvious name.

This is just my personal view on this, but I find the new name just as
confusing as the old one. It sounds like something is checked, but it's
clear what, and then the fence is forcibly signaled like it would be if
you call drm_fence_signal(). Of course, this clarified by the doc, but
given the goal was to make the function name clearly reflect what it
does, I'm not convinced it's significantly better.

Maybe dma_fence_check_hw_state_and_propagate(), though it might be
too long of name. Oh well, feel free to ignore this comments if a
majority is fine with the new name.

Regards,

Boris
Philipp Stanner April 9, 2025, 12:51 p.m. UTC | #2
On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
> Hi Philipp,
> 
> On Wed,  9 Apr 2025 14:06:37 +0200
> Philipp Stanner <phasta@kernel.org> wrote:
> 
> > dma_fence_is_signaled()'s name strongly reads as if this function
> > were
> > intended for checking whether a fence is already signaled. Also the
> > boolean it returns hints at that.
> > 
> > The function's behavior, however, is more complex: it can check
> > with a
> > driver callback whether the hardware's sequence number indicates
> > that
> > the fence can already be treated as signaled, although the
> > hardware's /
> > driver's interrupt handler has not signaled it yet. If that's the
> > case,
> > the function also signals the fence.
> > 
> > (Presumably) this has caused a bug in Nouveau (unknown commit),
> > where
> > nouveau_fence_done() uses the function to check a fence, which
> > causes a
> > race.
> > 
> > Give the function a more obvious name.
> 
> This is just my personal view on this, but I find the new name just
> as
> confusing as the old one. It sounds like something is checked, but
> it's
> clear what, and then the fence is forcibly signaled like it would be
> if
> you call drm_fence_signal(). Of course, this clarified by the doc,
> but
> given the goal was to make the function name clearly reflect what it
> does, I'm not convinced it's significantly better.
> 
> Maybe dma_fence_check_hw_state_and_propagate(), though it might be
> too long of name. Oh well, feel free to ignore this comments if a
> majority is fine with the new name.

Yoa, the name isn't perfect (the perfect name describing the whole
behavior would be
dma_fence_check_if_already_signaled_then_check_hardware_state_and_propa
gate() ^^'

My intention here is to have the reader realize "watch out, the fence
might get signaled here!", which is probably the most important event
regarding fences, which can race, invoke the callbacks and so on.

For details readers will then check the documentation.

But I'm of course open to see if there's a majority for this or that
name.

P.


> 
> Regards,
> 
> Boris
Christian König April 9, 2025, 1:14 p.m. UTC | #3
Am 09.04.25 um 14:56 schrieb Philipp Stanner:
> On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
>> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
>>> Hi Philipp,
>>>
>>> On Wed,  9 Apr 2025 14:06:37 +0200
>>> Philipp Stanner <phasta@kernel.org> wrote:
>>>
>>>> dma_fence_is_signaled()'s name strongly reads as if this function
>>>> were
>>>> intended for checking whether a fence is already signaled. Also
>>>> the
>>>> boolean it returns hints at that.
>>>>
>>>> The function's behavior, however, is more complex: it can check
>>>> with a
>>>> driver callback whether the hardware's sequence number indicates
>>>> that
>>>> the fence can already be treated as signaled, although the
>>>> hardware's /
>>>> driver's interrupt handler has not signaled it yet. If that's the
>>>> case,
>>>> the function also signals the fence.
>>>>
>>>> (Presumably) this has caused a bug in Nouveau (unknown commit),
>>>> where
>>>> nouveau_fence_done() uses the function to check a fence, which
>>>> causes a
>>>> race.
>>>>
>>>> Give the function a more obvious name.
>>> This is just my personal view on this, but I find the new name just
>>> as
>>> confusing as the old one. It sounds like something is checked, but
>>> it's
>>> clear what, and then the fence is forcibly signaled like it would
>>> be
>>> if
>>> you call drm_fence_signal(). Of course, this clarified by the doc,
>>> but
>>> given the goal was to make the function name clearly reflect what
>>> it
>>> does, I'm not convinced it's significantly better.
>>>
>>> Maybe dma_fence_check_hw_state_and_propagate(), though it might be
>>> too long of name. Oh well, feel free to ignore this comments if a
>>> majority is fine with the new name.
>> Yoa, the name isn't perfect (the perfect name describing the whole
>> behavior would be
>> dma_fence_check_if_already_signaled_then_check_hardware_state_and_pro
>> pa
>> gate() ^^'
>>
>> My intention here is to have the reader realize "watch out, the fence
>> might get signaled here!", which is probably the most important event
>> regarding fences, which can race, invoke the callbacks and so on.
>>
>> For details readers will then check the documentation.
>>
>> But I'm of course open to see if there's a majority for this or that
>> name.
> how about:
>
> dma_fence_check_hw_and_signal() ?

I don't think that renaming the function is a good idea in the first place.

What the function does internally is an implementation detail of the framework.

For the code using this function it's completely irrelevant if the function might also signal the fence, what matters for the caller is the returned status of the fence. I think this also counts for the dma_fence_is_signaled() documentation.

What we should improve is the documentation of the dma_fence_ops->enable_signaling and dma_fence_ops->signaled callbacks.

Especially see the comment about reference counts on enable_signaling which is missing on the signaled callback. That is most likely the root cause why nouveau implemented enable_signaling correctly but not the other one.

But putting that aside I think we should make nails with heads and let the framework guarantee that the fences stay alive until they are signaled (one way or another). This completely removes the burden to keep a reference on unsignaled fences from the drivers / implementations and make things more over all more defensive.

Regards,
Christian.

>
> P.
>
>> P.
>>
>>
>>> Regards,
>>>
>>> Boris
Philipp Stanner April 9, 2025, 2:01 p.m. UTC | #4
On Wed, 2025-04-09 at 15:14 +0200, Christian König wrote:
> Am 09.04.25 um 14:56 schrieb Philipp Stanner:
> > On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote:
> > > On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote:
> > > > Hi Philipp,
> > > > 
> > > > On Wed,  9 Apr 2025 14:06:37 +0200
> > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > 
> > > > > dma_fence_is_signaled()'s name strongly reads as if this
> > > > > function
> > > > > were
> > > > > intended for checking whether a fence is already signaled.
> > > > > Also
> > > > > the
> > > > > boolean it returns hints at that.
> > > > > 
> > > > > The function's behavior, however, is more complex: it can
> > > > > check
> > > > > with a
> > > > > driver callback whether the hardware's sequence number
> > > > > indicates
> > > > > that
> > > > > the fence can already be treated as signaled, although the
> > > > > hardware's /
> > > > > driver's interrupt handler has not signaled it yet. If that's
> > > > > the
> > > > > case,
> > > > > the function also signals the fence.
> > > > > 
> > > > > (Presumably) this has caused a bug in Nouveau (unknown
> > > > > commit),
> > > > > where
> > > > > nouveau_fence_done() uses the function to check a fence,
> > > > > which
> > > > > causes a
> > > > > race.
> > > > > 
> > > > > Give the function a more obvious name.
> > > > This is just my personal view on this, but I find the new name
> > > > just
> > > > as
> > > > confusing as the old one. It sounds like something is checked,
> > > > but
> > > > it's
> > > > clear what, and then the fence is forcibly signaled like it
> > > > would
> > > > be
> > > > if
> > > > you call drm_fence_signal(). Of course, this clarified by the
> > > > doc,
> > > > but
> > > > given the goal was to make the function name clearly reflect
> > > > what
> > > > it
> > > > does, I'm not convinced it's significantly better.
> > > > 
> > > > Maybe dma_fence_check_hw_state_and_propagate(), though it might
> > > > be
> > > > too long of name. Oh well, feel free to ignore this comments if
> > > > a
> > > > majority is fine with the new name.
> > > Yoa, the name isn't perfect (the perfect name describing the
> > > whole
> > > behavior would be
> > > dma_fence_check_if_already_signaled_then_check_hardware_state_and
> > > _pro
> > > pa
> > > gate() ^^'
> > > 
> > > My intention here is to have the reader realize "watch out, the
> > > fence
> > > might get signaled here!", which is probably the most important
> > > event
> > > regarding fences, which can race, invoke the callbacks and so on.
> > > 
> > > For details readers will then check the documentation.
> > > 
> > > But I'm of course open to see if there's a majority for this or
> > > that
> > > name.
> > how about:
> > 
> > dma_fence_check_hw_and_signal() ?
> 
> I don't think that renaming the function is a good idea in the first
> place.
> 
> What the function does internally is an implementation detail of the
> framework.
> 
> For the code using this function it's completely irrelevant if the
> function might also signal the fence, what matters for the caller is
> the returned status of the fence. I think this also counts for the
> dma_fence_is_signaled() documentation.

It does obviously matter. As it's currently implemented, a lot of
important things happen implicitly.

I only see improvement by making things more obvious.

In any case, how would you call a wrapper that just does
test_bit(IS_SIGNALED, …) ?

P.

> 
> What we should improve is the documentation of the dma_fence_ops-
> >enable_signaling and dma_fence_ops->signaled callbacks.
> 
> Especially see the comment about reference counts on enable_signaling
> which is missing on the signaled callback. That is most likely the
> root cause why nouveau implemented enable_signaling correctly but not
> the other one.
> 
> But putting that aside I think we should make nails with heads and
> let the framework guarantee that the fences stay alive until they are
> signaled (one way or another). This completely removes the burden to
> keep a reference on unsignaled fences from the drivers /
> implementations and make things more over all more defensive.
> 
> Regards,
> Christian.
> 
> > 
> > P.
> > 
> > > P.
> > > 
> > > 
> > > > Regards,
> > > > 
> > > > Boris
>