Message ID | 20240714-linux-next-24-07-13-camss-fixes-v1-0-8f8954bc8c85@linaro.org |
---|---|
Headers | show |
Series | media: qcom: camss: Fix two CAMSS bugs found by dogfooding with SoftISP | expand |
On Sun, Jul 14, 2024 at 11:53:58PM +0100, Bryan O'Donoghue wrote: > The use_count check was introduced so that multiple concurrent Raw Data > Interfaces RDIs could be driven by different virtual channels VCs on the > CSIPHY input driving the video pipeline. > > This is an invalid use of use_count though as use_count pertains to the > number of times a video entity has been opened by user-space not the number > of active streams. > > If use_count and stream-on count don't agree then stop_streaming() will > break as is currently the case and has become apparent when using CAMSS > with libcamera's released softisp 0.3. > > The use of use_count like this is a bit hacky and right now breaks regular > usage of CAMSS for a single stream case. Please be a bit more specific about how this manifest itself to the user. I see error message when stopping a stream (e.g. stopping qcam) and the stream cannot be restarted (e.g. qcam fails with -EBUSY). > One CAMSS specific way to handle multiple VCs on the same RDI might be: > > - Reference count each pipeline enable for CSIPHY, CSID, VFE and RDIx. > - The video buffers are already associated with msm_vfeN_rdiX so > release video buffers when told to do so by stop_streaming. > - Only release the power-domains for the CSIPHY, CSID and VFE when > their internal refcounts drop. > > Either way refusing to release video buffers based on use_count is > erroneous and should be reverted. The silicon enabling code for selecting > VCs is perfectly fine. Its a "known missing feature" that concurrent VCs > won't work with CAMSS right now. > > Initial testing with this code didn't show an error but, SoftISP and "real" > usage with Google Hangouts breaks the upstream code pretty quickly, we need > to do a partial revert and take another pass at VCs. Please include the error messages that users see so that people can find this patch, for example: [ 1265.509831] WARNING: CPU: 5 PID: 919 at drivers/media/common/videobuf2/videobuf2-core.c:2183 __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] ... [ 1265.510630] Call trace: [ 1265.510636] __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] [ 1265.510648] vb2_core_streamoff+0x24/0xcc [videobuf2_common] [ 1265.510660] vb2_ioctl_streamoff+0x5c/0xa8 [videobuf2_v4l2] [ 1265.510673] v4l_streamoff+0x24/0x30 [videodev] [ 1265.510707] __video_do_ioctl+0x190/0x3f4 [videodev] [ 1265.510732] video_usercopy+0x304/0x8c4 [videodev] [ 1265.510757] video_ioctl2+0x18/0x34 [videodev] [ 1265.510782] v4l2_ioctl+0x40/0x60 [videodev] ... [ 1265.510944] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 0 in active state [ 1265.511175] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 1 in active state [ 1265.511398] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 2 in active state > This commit partially reverts commit 89013969e232 ("media: camss: sm8250: > Pipeline starting and stopping for multiple virtual channels") > > Fixes: 89013969e232 ("media: camss: sm8250: Pipeline starting and stopping for multiple virtual channels") Looks like you're missing a CC stable tag here? > Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Reported-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com/ Tested-by: Johan Hovold <johan+linaro@kernel.org> Johan
Dogfooding with SoftISP has uncovered two bugs in this series which I'm posting fixes for. - The first error: A simple race condition which to be honest I'm surprised I haven't found earlier nor has anybody else. Simply stated the order we typically end up loading CAMSS on boot has masked out the pm_runtime_enable() race condition that has been present in CAMSS for a long time. If you blacklist qcom-camss in modules.d and then modprobe after boot, the race condition shows up easily. Moving the pm_runtime_enable prior to subdevice registration fixes the problem. The second error: Nomenclature: - CSIPHY: CSI Physical layer analogue to digital domain serialiser - CSID: CSI Decoder - VFE: Video Front End - RDI: Raw Data Interface - VC: Virtual Channel In order to support streaming multiple virtual-channels on the same RDI a V4L2 provided use_count variable is used to decide whether or not to actually terminate streaming and release buffers for 'msm_vfe_rdiX'. Unfortunately use_count indicates the number of times msm_vfe_rdiX has been opened by user-space not the number of concurrent streams on msm_vfe_rdiX. Simply stated use_count and stream_count are two different things. The silicon enabling code to select between VCs is valid but, a different solution needs to be found to support _concurrent_ VC streams. Right now the upstream use_count as-is is breaking the non concurrent VC case and I don't believe there are upstream users of concurrent VCs on CAMSS. This series implements a revert for the invalid use_count check, retaining the ability to select which VC is active on the RDI. Dogfooding with libcamera's SoftISP in Hangouts, Zoom and multiple runs of libcamera's "qcam" application is a very different test-case to the simple capture of frames we previously did when validating the 'use_count' change. A partial revert in expectation of a renewed push to fixup that concurrent VC issue is included. Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> --- Bryan O'Donoghue (2): media: qcom: camss: Remove use_count guard in stop_streaming media: qcom: camss: Fix ordering of pm_runtime_enable drivers/media/platform/qcom/camss/camss-video.c | 6 ------ drivers/media/platform/qcom/camss/camss.c | 5 +++-- 2 files changed, 3 insertions(+), 8 deletions(-) --- base-commit: c6ce8f9ab92edc9726996a0130bfc1c408132d47 change-id: 20240713-linux-next-24-07-13-camss-fixes-fa98c0965a5d Best regards,