diff mbox series

[v12,1/7] media: v4l2: Add ignore_streaming flag

Message ID 20230915-wave5_v12_on_media_master-v12-1-92fc66cd685d@collabora.com
State New
Headers show
Series [v12,1/7] media: v4l2: Add ignore_streaming flag | expand

Commit Message

Sebastian Fricke Sept. 15, 2023, 9:11 p.m. UTC
Add a new flag to the `struct v4l2_m2m_dev` to toggle whether a queue
must be streaming in order to allow queuing jobs to the ready queue.
Currently, both queues (CAPTURE & OUTPUT) must be streaming in order to
allow adding new jobs. This behavior limits the usability of M2M for
some drivers, as these have to be able, to perform analysis of the
sequence to ensure, that userspace prepares the CAPTURE queue correctly.

Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
---
 include/media/v4l2-mem2mem.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Nicolas Dufresne Sept. 21, 2023, 6:39 p.m. UTC | #1
Le mercredi 20 septembre 2023 à 16:49 +0200, Hans Verkuil a écrit :
> On 20/09/2023 16:08, Nicolas Dufresne wrote:
> > cc Tomasz Figa
> > 
> > Le mercredi 20 septembre 2023 à 14:59 +0200, Hans Verkuil a écrit :
> > > On 15/09/2023 23:11, Sebastian Fricke wrote:
> > > > Add a new flag to the `struct v4l2_m2m_dev` to toggle whether a queue
> > > > must be streaming in order to allow queuing jobs to the ready queue.
> > > > Currently, both queues (CAPTURE & OUTPUT) must be streaming in order to
> > > > allow adding new jobs. This behavior limits the usability of M2M for
> > > > some drivers, as these have to be able, to perform analysis of the
> > > 
> > > able, to -> able to
> > > 
> > > > sequence to ensure, that userspace prepares the CAPTURE queue correctly.
> > > 
> > > ensure, that -> ensure that
> > > 
> > > > 
> > > > Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> > > > Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > > > ---
> > > >  include/media/v4l2-mem2mem.h | 17 +++++++++++++++++
> > > >  1 file changed, 17 insertions(+)
> > > > 
> > > > diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
> > > > index d6c8eb2b5201..97a48e61e358 100644
> > > > --- a/include/media/v4l2-mem2mem.h
> > > > +++ b/include/media/v4l2-mem2mem.h
> > > > @@ -57,6 +57,16 @@ struct v4l2_m2m_dev;
> > > >   * @rdy_spinlock: spin lock to protect the struct usage
> > > >   * @num_rdy:	number of buffers ready to be processed
> > > >   * @buffered:	is the queue buffered?
> > > > + * @ignore_streaming: Dictates whether the queue must be streaming for a job to
> > > > + *		      be queued.
> > > > + *		      This is useful, for example, when the driver requires to
> > > > + *		      initialize the sequence with a firmware, where only a
> > > > + *		      queued OUTPUT queue buffer and STREAMON on the OUTPUT
> > > > + *		      queue is required to perform the anlysis of the bitstream
> > > > + *		      header.
> > > > + *		      This means the driver is responsible for implementing the
> > > > + *		      job_ready callback correctly to make sure that requirements
> > > > + *		      for actual decoding are met.
> > > 
> > > This is a bad description and field name.
> > 
> > I wonder what's your opinion about the buffered one then :-D
> 
> Even worse :-)
> 
> I still don't really understand what that does. Patches welcome.
> 
> > 
> > > 
> > > Basically what this field does is that, if true, the streaming state of the
> > > capture queue is ignored. So just call it that: ignore_cap_streaming.
> > > 
> > > And explain that, if true, job_ready() will be called even if the capture
> > > queue is not streaming, and that that can be used to allow hardware to
> > > analyze the bitstream header that arrives on the OUTPUT queue.
> > 
> > Ack.
> > 
> > > 
> > > Also, doesn't this field belong to struct v4l2_m2m_ctx? It makes no sense
> > > for the output queue, this is really a configuration for the m2m context as
> > > a whole.
> > 
> > Unless we come up with a completely new type of M2M that can behave like a gap
> > filler (like a video rate m2m), it indeed makes no sense for output. I'm just
> > illustrating that this is true "now" but someone can come up with valid
> > expectation. So I agree with you, we can move it up in the hierarchy.
> > 
> > Recently over IRC and other threads, Tomasz raised a concern that CODECs where
> > introducing too much complexity into M2M. And I believe buffered (which is
> > barely documented) and this mechanism was being pointed.
> > 
> > My take on that is that adding boolean configuration is what introduce
> > complexity, and we can fix it by doing less in the m2m. After this discussion, I
> > came with the idea that we should remove buffered and ignore_streaming. For
> > drivers that don't implement job_ready, this logic would be moved inside the
> > default implementation. We can then add a helper to check the common conditions.
> > 
> > The alternative suggested by Tomasz, was to layer two ops. We'd have a
> > device_ready() ops and its default implementation would include the check we
> > have and would call job_ready(). Personally, I'd rather remove then add, but I
> > understadt the reasoning and would be fine committing to that instead.
> > 
> > I'd like your feedback on this proposal. If this is something we want, I'll do
> > this prior to V13, otherwise we will address your comments and fix the added
> > mechanism. I think though that we agree that for decoders, this is nice addition
> > to not have to trigger work manually from vb2 ops.
> 
> It comes down to a matter of taste, I guess. I personally think that using bools
> to tweak the behavior of a framework does not necessarily increase complexity,
> provided it is clearly documented what it does and why it is needed.
> 
> I think an ignore_cap_streaming bool is pretty straightforward and has minimal
> impact in the code. As long as there are good comments.

So for wave5 we will opt for this and apply your suggested changes. And I may
come back later on the subject.

> 
> The 'buffered' flag is were this clearly failed completely, since I couldn't figure
> out what it is supposed to do. But that is not because it makes the code more
> complex, it is just because of shoddy documentation and naming.
> 
> Quite often implementing tweaks like that are quite easy in a framework, since
> you have all the information readily available. In a driver it can quickly become
> messy.

In this case, "buffered" is used to disable the checks for having at least one
buffer in the ready queues. In most cases, if you don't have at least 1 pending
capture and 1 pending output buffer, there is no point in calling device_run.

In reality, drivers will add use case specific checks in their job_ready()
implementation. For decoders, the cases I can think of are:

- On capture if you haven't parsed the stream header
- On capture if the driver removes them from ready queue as a way to track which
one are considered free and may be used at any time by the firmware
- On output queue, if you need device_run() to be called to complete the drain
the reorder queue

Yet, you want this check after stream headers are parsed, or whenever a new
bitstream decode operation is to be queued in the firmware. So this check gets
re-implemented, but dynamically, in all decoders.

Deinterlacers may needs this too with some algorithms (the one that introduce
delays at least). Its not clear to me why it was called buffered,
ignore_rdy_queue might have been an option, though I'm not fully confident. Note
that M2M can be confusing, since whenever you ask for last something, its always
relative to the ready queue, and may not make a lot of sense in the context it
is used.

> 
> For codec support there are a number of issues that increase complexity:
> implementing support for the LAST flag and events, and supporting buffers
> that can be held. Especially since driver implementations tend to vary.
> 
> I've been experimenting with some cleanups and changes in v4l2-mem2mem.c
> (https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=enc-dec-cmd), mainly
> surrounding the handling of the LAST flag. Note: this is failing the compliance
> tests, I haven't had the time to pursue this further.
> 
> I'm not sure whether the best approach is to move things out of the m2m framework,
> or move things into the m2m framework, or add a more codec-specific layer on top
> of the m2m framework, or a combination of all of these.
> 
> It is something that needs experimentation, just see what works.

I can see you have omitted mark_stopped() calles when refactoring, which makes
these patches change the behaviour. Could be related.

This is no longer strictly related to this patch, but I think cmd_stop()
implementation (even after your changes) are miss-fit for driver that speaks to
firmware. As the firmware is being made aware of the free buffers, you can't
just cherry-pick from the capture queue, you have to synchronise your state with
the firmware while draining. The helper should be split in two parts I suppose,
but cutting the line isn't easy.

Thread safe usage of the numerous boolean implicated in the draining state is
also difficult. There is no other option then introduce a mutex or spinlock (if
the state is needed in job_ready() implementation) to make this thread safe and
reliable.

> 
> But for this specific flag: I think it is fine to put that in the m2m framework,
> just document and name it well.

Ack.

> 
> Regards,
> 
> 	Hans
> 
> > 
> > regards,
> > Nicolas
> > 
> > > 
> > > >   *
> > > >   * Queue for buffers ready to be processed as soon as this
> > > >   * instance receives access to the device.
> > > > @@ -69,6 +79,7 @@ struct v4l2_m2m_queue_ctx {
> > > >  	spinlock_t		rdy_spinlock;
> > > >  	u8			num_rdy;
> > > >  	bool			buffered;
> > > > +	bool			ignore_streaming;
> > > >  };
> > > >  
> > > >  /**
> > > > @@ -564,6 +575,12 @@ static inline void v4l2_m2m_set_dst_buffered(struct v4l2_m2m_ctx *m2m_ctx,
> > > >  	m2m_ctx->cap_q_ctx.buffered = buffered;
> > > >  }
> > > >  
> > > > +static inline void v4l2_m2m_set_dst_ignore_streaming(struct v4l2_m2m_ctx *m2m_ctx,
> > > > +						     bool ignore_streaming)
> > > > +{
> > > > +	m2m_ctx->cap_q_ctx.ignore_streaming = ignore_streaming;
> > > > +}
> > > > +
> > > 
> > > I think this is overkill, esp. when the field is moved to m2m_ctx. Just clearly
> > > document that drivers can set this.
> > > 
> > > Regards,
> > > 
> > > 	Hans
> > > 
> > > >  /**
> > > >   * v4l2_m2m_ctx_release() - release m2m context
> > > >   *
> > > > 
> > > 
> > 
>
Hans Verkuil Sept. 25, 2023, 9:03 a.m. UTC | #2
On 22/09/2023 22:20, Nicolas Dufresne wrote:
> Le vendredi 22 septembre 2023 à 10:28 +0200, Hans Verkuil a écrit :
>> On 21/09/2023 20:39, Nicolas Dufresne wrote:
>>> Le mercredi 20 septembre 2023 à 16:49 +0200, Hans Verkuil a écrit :
>>>> On 20/09/2023 16:08, Nicolas Dufresne wrote:
>>>>> cc Tomasz Figa
>>>>>
>>>>> Le mercredi 20 septembre 2023 à 14:59 +0200, Hans Verkuil a écrit :
>>>>>> On 15/09/2023 23:11, Sebastian Fricke wrote:
>>>>>>> Add a new flag to the `struct v4l2_m2m_dev` to toggle whether a queue
>>>>>>> must be streaming in order to allow queuing jobs to the ready queue.
>>>>>>> Currently, both queues (CAPTURE & OUTPUT) must be streaming in order to
>>>>>>> allow adding new jobs. This behavior limits the usability of M2M for
>>>>>>> some drivers, as these have to be able, to perform analysis of the
>>>>>>
>>>>>> able, to -> able to
>>>>>>
>>>>>>> sequence to ensure, that userspace prepares the CAPTURE queue correctly.
>>>>>>
>>>>>> ensure, that -> ensure that
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
>>>>>>> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
>>>>>>> ---
>>>>>>>  include/media/v4l2-mem2mem.h | 17 +++++++++++++++++
>>>>>>>  1 file changed, 17 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
>>>>>>> index d6c8eb2b5201..97a48e61e358 100644
>>>>>>> --- a/include/media/v4l2-mem2mem.h
>>>>>>> +++ b/include/media/v4l2-mem2mem.h
>>>>>>> @@ -57,6 +57,16 @@ struct v4l2_m2m_dev;
>>>>>>>   * @rdy_spinlock: spin lock to protect the struct usage
>>>>>>>   * @num_rdy:	number of buffers ready to be processed
>>>>>>>   * @buffered:	is the queue buffered?
>>>>>>> + * @ignore_streaming: Dictates whether the queue must be streaming for a job to
>>>>>>> + *		      be queued.
>>>>>>> + *		      This is useful, for example, when the driver requires to
>>>>>>> + *		      initialize the sequence with a firmware, where only a
>>>>>>> + *		      queued OUTPUT queue buffer and STREAMON on the OUTPUT
>>>>>>> + *		      queue is required to perform the anlysis of the bitstream
>>>>>>> + *		      header.
>>>>>>> + *		      This means the driver is responsible for implementing the
>>>>>>> + *		      job_ready callback correctly to make sure that requirements
>>>>>>> + *		      for actual decoding are met.
>>>>>>
>>>>>> This is a bad description and field name.
>>>>>
>>>>> I wonder what's your opinion about the buffered one then :-D
>>>>
>>>> Even worse :-)
>>>>
>>>> I still don't really understand what that does. Patches welcome.
>>>>
>>>>>
>>>>>>
>>>>>> Basically what this field does is that, if true, the streaming state of the
>>>>>> capture queue is ignored. So just call it that: ignore_cap_streaming.
>>>>>>
>>>>>> And explain that, if true, job_ready() will be called even if the capture
>>>>>> queue is not streaming, and that that can be used to allow hardware to
>>>>>> analyze the bitstream header that arrives on the OUTPUT queue.
>>>>>
>>>>> Ack.
>>>>>
>>>>>>
>>>>>> Also, doesn't this field belong to struct v4l2_m2m_ctx? It makes no sense
>>>>>> for the output queue, this is really a configuration for the m2m context as
>>>>>> a whole.
>>>>>
>>>>> Unless we come up with a completely new type of M2M that can behave like a gap
>>>>> filler (like a video rate m2m), it indeed makes no sense for output. I'm just
>>>>> illustrating that this is true "now" but someone can come up with valid
>>>>> expectation. So I agree with you, we can move it up in the hierarchy.
>>>>>
>>>>> Recently over IRC and other threads, Tomasz raised a concern that CODECs where
>>>>> introducing too much complexity into M2M. And I believe buffered (which is
>>>>> barely documented) and this mechanism was being pointed.
>>>>>
>>>>> My take on that is that adding boolean configuration is what introduce
>>>>> complexity, and we can fix it by doing less in the m2m. After this discussion, I
>>>>> came with the idea that we should remove buffered and ignore_streaming. For
>>>>> drivers that don't implement job_ready, this logic would be moved inside the
>>>>> default implementation. We can then add a helper to check the common conditions.
>>>>>
>>>>> The alternative suggested by Tomasz, was to layer two ops. We'd have a
>>>>> device_ready() ops and its default implementation would include the check we
>>>>> have and would call job_ready(). Personally, I'd rather remove then add, but I
>>>>> understadt the reasoning and would be fine committing to that instead.
>>>>>
>>>>> I'd like your feedback on this proposal. If this is something we want, I'll do
>>>>> this prior to V13, otherwise we will address your comments and fix the added
>>>>> mechanism. I think though that we agree that for decoders, this is nice addition
>>>>> to not have to trigger work manually from vb2 ops.
>>>>
>>>> It comes down to a matter of taste, I guess. I personally think that using bools
>>>> to tweak the behavior of a framework does not necessarily increase complexity,
>>>> provided it is clearly documented what it does and why it is needed.
>>>>
>>>> I think an ignore_cap_streaming bool is pretty straightforward and has minimal
>>>> impact in the code. As long as there are good comments.
>>>
>>> So for wave5 we will opt for this and apply your suggested changes. And I may
>>> come back later on the subject.
>>>
>>>>
>>>> The 'buffered' flag is were this clearly failed completely, since I couldn't figure
>>>> out what it is supposed to do. But that is not because it makes the code more
>>>> complex, it is just because of shoddy documentation and naming.
>>>>
>>>> Quite often implementing tweaks like that are quite easy in a framework, since
>>>> you have all the information readily available. In a driver it can quickly become
>>>> messy.
>>>
>>> In this case, "buffered" is used to disable the checks for having at least one
>>> buffer in the ready queues. In most cases, if you don't have at least 1 pending
>>> capture and 1 pending output buffer, there is no point in calling device_run.
>>
>> So it is really similar to ignore_cap_streaming: that relaxes the streaming test,
>> and 'buffered' relaxes the 'must have at least one capture and output buffer ready'
>> test.
>>
>> So this should be renamed to: allow_empty_queues
>>
>> Although I would prefer to split this into two bools: allow_empty_capture_queue and
>> allow_empty_output_queue. It is more flexible that way and I actually think it is
>> easier to understand.
> 
> Its on the queue ctx, so it does not have to be typed. It would have to be typed
> if moved to m2m ctx.

Oops, I missed that. I'm not actually sure that's where it should be. If you
support flags that tweak the behavior, then put them together in a single place,
and not all over.

Regards,

	Hans

> 
>>
>> I see also see in the v4l2-mem2mem.c source that the debug messages are very poorly
>> worded:
>> 	src = v4l2_m2m_next_src_buf(m2m_ctx);
>>
>>         if (!src && !m2m_ctx->out_q_ctx.buffered) {
>>                 dprintk("No input buffers available\n");
>>                 goto job_unlock;
>>         }
>>
>> This should be either "source buffers" or "output buffers", but definitely not
>> "input buffers".
>>
>> Ditto for the dst part.
> 
> Indeed, I'll store this node somewhere for future work on the framework, this is
> not strictly related to wave5 anymore.
> 
>>
>>>
>>> In reality, drivers will add use case specific checks in their job_ready()
>>> implementation. For decoders, the cases I can think of are:
>>>
>>> - On capture if you haven't parsed the stream header
>>> - On capture if the driver removes them from ready queue as a way to track which
>>> one are considered free and may be used at any time by the firmware
>>> - On output queue, if you need device_run() to be called to complete the drain
>>> the reorder queue
>>>
>>> Yet, you want this check after stream headers are parsed, or whenever a new
>>> bitstream decode operation is to be queued in the firmware. So this check gets
>>> re-implemented, but dynamically, in all decoders.
>>>
>>> Deinterlacers may needs this too with some algorithms (the one that introduce
>>> delays at least). Its not clear to me why it was called buffered,
>>> ignore_rdy_queue might have been an option, though I'm not fully confident. Note
>>> that M2M can be confusing, since whenever you ask for last something, its always
>>> relative to the ready queue, and may not make a lot of sense in the context it
>>> is used.
>>>
>>>>
>>>> For codec support there are a number of issues that increase complexity:
>>>> implementing support for the LAST flag and events, and supporting buffers
>>>> that can be held. Especially since driver implementations tend to vary.
>>>>
>>>> I've been experimenting with some cleanups and changes in v4l2-mem2mem.c
>>>> (https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=enc-dec-cmd), mainly
>>>> surrounding the handling of the LAST flag. Note: this is failing the compliance
>>>> tests, I haven't had the time to pursue this further.
>>>>
>>>> I'm not sure whether the best approach is to move things out of the m2m framework,
>>>> or move things into the m2m framework, or add a more codec-specific layer on top
>>>> of the m2m framework, or a combination of all of these.
>>>>
>>>> It is something that needs experimentation, just see what works.
>>>
>>> I can see you have omitted mark_stopped() calles when refactoring, which makes
>>> these patches change the behaviour. Could be related.
>>
>> Could be. I hope to be able to spend a bit of time on this today.
>>
>>>
>>> This is no longer strictly related to this patch, but I think cmd_stop()
>>> implementation (even after your changes) are miss-fit for driver that speaks to
>>> firmware. As the firmware is being made aware of the free buffers, you can't
>>> just cherry-pick from the capture queue, you have to synchronise your state with
>>> the firmware while draining. The helper should be split in two parts I suppose,
>>> but cutting the line isn't easy.
>>>
>>> Thread safe usage of the numerous boolean implicated in the draining state is
>>> also difficult. There is no other option then introduce a mutex or spinlock (if
>>> the state is needed in job_ready() implementation) to make this thread safe and
>>> reliable.
>>
>> Regards,
>>
>> 	Hans
>>
>>>
>>>>
>>>> But for this specific flag: I think it is fine to put that in the m2m framework,
>>>> just document and name it well.
>>>
>>> Ack.
>>>
>>>>
>>>> Regards,
>>>>
>>>> 	Hans
>>>>
>>>>>
>>>>> regards,
>>>>> Nicolas
>>>>>
>>>>>>
>>>>>>>   *
>>>>>>>   * Queue for buffers ready to be processed as soon as this
>>>>>>>   * instance receives access to the device.
>>>>>>> @@ -69,6 +79,7 @@ struct v4l2_m2m_queue_ctx {
>>>>>>>  	spinlock_t		rdy_spinlock;
>>>>>>>  	u8			num_rdy;
>>>>>>>  	bool			buffered;
>>>>>>> +	bool			ignore_streaming;
>>>>>>>  };
>>>>>>>  
>>>>>>>  /**
>>>>>>> @@ -564,6 +575,12 @@ static inline void v4l2_m2m_set_dst_buffered(struct v4l2_m2m_ctx *m2m_ctx,
>>>>>>>  	m2m_ctx->cap_q_ctx.buffered = buffered;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static inline void v4l2_m2m_set_dst_ignore_streaming(struct v4l2_m2m_ctx *m2m_ctx,
>>>>>>> +						     bool ignore_streaming)
>>>>>>> +{
>>>>>>> +	m2m_ctx->cap_q_ctx.ignore_streaming = ignore_streaming;
>>>>>>> +}
>>>>>>> +
>>>>>>
>>>>>> I think this is overkill, esp. when the field is moved to m2m_ctx. Just clearly
>>>>>> document that drivers can set this.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> 	Hans
>>>>>>
>>>>>>>  /**
>>>>>>>   * v4l2_m2m_ctx_release() - release m2m context
>>>>>>>   *
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
diff mbox series

Patch

diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
index d6c8eb2b5201..97a48e61e358 100644
--- a/include/media/v4l2-mem2mem.h
+++ b/include/media/v4l2-mem2mem.h
@@ -57,6 +57,16 @@  struct v4l2_m2m_dev;
  * @rdy_spinlock: spin lock to protect the struct usage
  * @num_rdy:	number of buffers ready to be processed
  * @buffered:	is the queue buffered?
+ * @ignore_streaming: Dictates whether the queue must be streaming for a job to
+ *		      be queued.
+ *		      This is useful, for example, when the driver requires to
+ *		      initialize the sequence with a firmware, where only a
+ *		      queued OUTPUT queue buffer and STREAMON on the OUTPUT
+ *		      queue is required to perform the anlysis of the bitstream
+ *		      header.
+ *		      This means the driver is responsible for implementing the
+ *		      job_ready callback correctly to make sure that requirements
+ *		      for actual decoding are met.
  *
  * Queue for buffers ready to be processed as soon as this
  * instance receives access to the device.
@@ -69,6 +79,7 @@  struct v4l2_m2m_queue_ctx {
 	spinlock_t		rdy_spinlock;
 	u8			num_rdy;
 	bool			buffered;
+	bool			ignore_streaming;
 };
 
 /**
@@ -564,6 +575,12 @@  static inline void v4l2_m2m_set_dst_buffered(struct v4l2_m2m_ctx *m2m_ctx,
 	m2m_ctx->cap_q_ctx.buffered = buffered;
 }
 
+static inline void v4l2_m2m_set_dst_ignore_streaming(struct v4l2_m2m_ctx *m2m_ctx,
+						     bool ignore_streaming)
+{
+	m2m_ctx->cap_q_ctx.ignore_streaming = ignore_streaming;
+}
+
 /**
  * v4l2_m2m_ctx_release() - release m2m context
  *