Message ID | 20250409120640.106408-2-phasta@kernel.org |
---|---|
Headers | show |
Series | dma-fence: Rename dma_fence_is_signaled() | expand |
Hi Philipp, On Wed, 9 Apr 2025 14:06:37 +0200 Philipp Stanner <phasta@kernel.org> wrote: > dma_fence_is_signaled()'s name strongly reads as if this function were > intended for checking whether a fence is already signaled. Also the > boolean it returns hints at that. > > The function's behavior, however, is more complex: it can check with a > driver callback whether the hardware's sequence number indicates that > the fence can already be treated as signaled, although the hardware's / > driver's interrupt handler has not signaled it yet. If that's the case, > the function also signals the fence. > > (Presumably) this has caused a bug in Nouveau (unknown commit), where > nouveau_fence_done() uses the function to check a fence, which causes a > race. > > Give the function a more obvious name. This is just my personal view on this, but I find the new name just as confusing as the old one. It sounds like something is checked, but it's clear what, and then the fence is forcibly signaled like it would be if you call drm_fence_signal(). Of course, this clarified by the doc, but given the goal was to make the function name clearly reflect what it does, I'm not convinced it's significantly better. Maybe dma_fence_check_hw_state_and_propagate(), though it might be too long of name. Oh well, feel free to ignore this comments if a majority is fine with the new name. Regards, Boris
On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote: > Hi Philipp, > > On Wed, 9 Apr 2025 14:06:37 +0200 > Philipp Stanner <phasta@kernel.org> wrote: > > > dma_fence_is_signaled()'s name strongly reads as if this function > > were > > intended for checking whether a fence is already signaled. Also the > > boolean it returns hints at that. > > > > The function's behavior, however, is more complex: it can check > > with a > > driver callback whether the hardware's sequence number indicates > > that > > the fence can already be treated as signaled, although the > > hardware's / > > driver's interrupt handler has not signaled it yet. If that's the > > case, > > the function also signals the fence. > > > > (Presumably) this has caused a bug in Nouveau (unknown commit), > > where > > nouveau_fence_done() uses the function to check a fence, which > > causes a > > race. > > > > Give the function a more obvious name. > > This is just my personal view on this, but I find the new name just > as > confusing as the old one. It sounds like something is checked, but > it's > clear what, and then the fence is forcibly signaled like it would be > if > you call drm_fence_signal(). Of course, this clarified by the doc, > but > given the goal was to make the function name clearly reflect what it > does, I'm not convinced it's significantly better. > > Maybe dma_fence_check_hw_state_and_propagate(), though it might be > too long of name. Oh well, feel free to ignore this comments if a > majority is fine with the new name. Yoa, the name isn't perfect (the perfect name describing the whole behavior would be dma_fence_check_if_already_signaled_then_check_hardware_state_and_propa gate() ^^' My intention here is to have the reader realize "watch out, the fence might get signaled here!", which is probably the most important event regarding fences, which can race, invoke the callbacks and so on. For details readers will then check the documentation. But I'm of course open to see if there's a majority for this or that name. P. > > Regards, > > Boris
Am 09.04.25 um 14:56 schrieb Philipp Stanner: > On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote: >> On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote: >>> Hi Philipp, >>> >>> On Wed, 9 Apr 2025 14:06:37 +0200 >>> Philipp Stanner <phasta@kernel.org> wrote: >>> >>>> dma_fence_is_signaled()'s name strongly reads as if this function >>>> were >>>> intended for checking whether a fence is already signaled. Also >>>> the >>>> boolean it returns hints at that. >>>> >>>> The function's behavior, however, is more complex: it can check >>>> with a >>>> driver callback whether the hardware's sequence number indicates >>>> that >>>> the fence can already be treated as signaled, although the >>>> hardware's / >>>> driver's interrupt handler has not signaled it yet. If that's the >>>> case, >>>> the function also signals the fence. >>>> >>>> (Presumably) this has caused a bug in Nouveau (unknown commit), >>>> where >>>> nouveau_fence_done() uses the function to check a fence, which >>>> causes a >>>> race. >>>> >>>> Give the function a more obvious name. >>> This is just my personal view on this, but I find the new name just >>> as >>> confusing as the old one. It sounds like something is checked, but >>> it's >>> clear what, and then the fence is forcibly signaled like it would >>> be >>> if >>> you call drm_fence_signal(). Of course, this clarified by the doc, >>> but >>> given the goal was to make the function name clearly reflect what >>> it >>> does, I'm not convinced it's significantly better. >>> >>> Maybe dma_fence_check_hw_state_and_propagate(), though it might be >>> too long of name. Oh well, feel free to ignore this comments if a >>> majority is fine with the new name. >> Yoa, the name isn't perfect (the perfect name describing the whole >> behavior would be >> dma_fence_check_if_already_signaled_then_check_hardware_state_and_pro >> pa >> gate() ^^' >> >> My intention here is to have the reader realize "watch out, the fence >> might get signaled here!", which is probably the most important event >> regarding fences, which can race, invoke the callbacks and so on. >> >> For details readers will then check the documentation. >> >> But I'm of course open to see if there's a majority for this or that >> name. > how about: > > dma_fence_check_hw_and_signal() ? I don't think that renaming the function is a good idea in the first place. What the function does internally is an implementation detail of the framework. For the code using this function it's completely irrelevant if the function might also signal the fence, what matters for the caller is the returned status of the fence. I think this also counts for the dma_fence_is_signaled() documentation. What we should improve is the documentation of the dma_fence_ops->enable_signaling and dma_fence_ops->signaled callbacks. Especially see the comment about reference counts on enable_signaling which is missing on the signaled callback. That is most likely the root cause why nouveau implemented enable_signaling correctly but not the other one. But putting that aside I think we should make nails with heads and let the framework guarantee that the fences stay alive until they are signaled (one way or another). This completely removes the burden to keep a reference on unsignaled fences from the drivers / implementations and make things more over all more defensive. Regards, Christian. > > P. > >> P. >> >> >>> Regards, >>> >>> Boris
On Wed, 2025-04-09 at 15:14 +0200, Christian König wrote: > Am 09.04.25 um 14:56 schrieb Philipp Stanner: > > On Wed, 2025-04-09 at 14:51 +0200, Philipp Stanner wrote: > > > On Wed, 2025-04-09 at 14:39 +0200, Boris Brezillon wrote: > > > > Hi Philipp, > > > > > > > > On Wed, 9 Apr 2025 14:06:37 +0200 > > > > Philipp Stanner <phasta@kernel.org> wrote: > > > > > > > > > dma_fence_is_signaled()'s name strongly reads as if this > > > > > function > > > > > were > > > > > intended for checking whether a fence is already signaled. > > > > > Also > > > > > the > > > > > boolean it returns hints at that. > > > > > > > > > > The function's behavior, however, is more complex: it can > > > > > check > > > > > with a > > > > > driver callback whether the hardware's sequence number > > > > > indicates > > > > > that > > > > > the fence can already be treated as signaled, although the > > > > > hardware's / > > > > > driver's interrupt handler has not signaled it yet. If that's > > > > > the > > > > > case, > > > > > the function also signals the fence. > > > > > > > > > > (Presumably) this has caused a bug in Nouveau (unknown > > > > > commit), > > > > > where > > > > > nouveau_fence_done() uses the function to check a fence, > > > > > which > > > > > causes a > > > > > race. > > > > > > > > > > Give the function a more obvious name. > > > > This is just my personal view on this, but I find the new name > > > > just > > > > as > > > > confusing as the old one. It sounds like something is checked, > > > > but > > > > it's > > > > clear what, and then the fence is forcibly signaled like it > > > > would > > > > be > > > > if > > > > you call drm_fence_signal(). Of course, this clarified by the > > > > doc, > > > > but > > > > given the goal was to make the function name clearly reflect > > > > what > > > > it > > > > does, I'm not convinced it's significantly better. > > > > > > > > Maybe dma_fence_check_hw_state_and_propagate(), though it might > > > > be > > > > too long of name. Oh well, feel free to ignore this comments if > > > > a > > > > majority is fine with the new name. > > > Yoa, the name isn't perfect (the perfect name describing the > > > whole > > > behavior would be > > > dma_fence_check_if_already_signaled_then_check_hardware_state_and > > > _pro > > > pa > > > gate() ^^' > > > > > > My intention here is to have the reader realize "watch out, the > > > fence > > > might get signaled here!", which is probably the most important > > > event > > > regarding fences, which can race, invoke the callbacks and so on. > > > > > > For details readers will then check the documentation. > > > > > > But I'm of course open to see if there's a majority for this or > > > that > > > name. > > how about: > > > > dma_fence_check_hw_and_signal() ? > > I don't think that renaming the function is a good idea in the first > place. > > What the function does internally is an implementation detail of the > framework. > > For the code using this function it's completely irrelevant if the > function might also signal the fence, what matters for the caller is > the returned status of the fence. I think this also counts for the > dma_fence_is_signaled() documentation. It does obviously matter. As it's currently implemented, a lot of important things happen implicitly. I only see improvement by making things more obvious. In any case, how would you call a wrapper that just does test_bit(IS_SIGNALED, …) ? P. > > What we should improve is the documentation of the dma_fence_ops- > >enable_signaling and dma_fence_ops->signaled callbacks. > > Especially see the comment about reference counts on enable_signaling > which is missing on the signaled callback. That is most likely the > root cause why nouveau implemented enable_signaling correctly but not > the other one. > > But putting that aside I think we should make nails with heads and > let the framework guarantee that the fences stay alive until they are > signaled (one way or another). This completely removes the burden to > keep a reference on unsignaled fences from the drivers / > implementations and make things more over all more defensive. > > Regards, > Christian. > > > > > P. > > > > > P. > > > > > > > > > > Regards, > > > > > > > > Boris >