diff mbox series

[v2,1/2] drm/msm/dpu1: don't choke on disabling the writeback connector

Message ID 20240802-dpu-fix-wb-v2-1-7eac9eb8e895@linaro.org
State New
Headers show
Series drm/msm/dpu: two fixes targeting 6.11 | expand

Commit Message

Dmitry Baryshkov Aug. 2, 2024, 7:47 p.m. UTC
During suspend/resume process all connectors are explicitly disabled and
then reenabled. However resume fails because of the connector_status check:

[ 1185.831970] [dpu error]connector not connected 3

It doesn't make sense to check for the Writeback connected status (and
other drivers don't perform such check), so drop the check.

Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
Cc: stable@vger.kernel.org
Reported-by: Leonard Lausen <leonard@lausen.nl>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Leonard Lausen Aug. 5, 2024, 2:27 a.m. UTC | #1
Dear Dmitry,

Thank you for the patch. Unfortunately, the patch triggers a regression with
respect to DRM CRTC state handling. With the patch applied, suspending and
resuming a lazor sc7180 with external display connected, looses CRTC state on
resume and prevents applying a new CRTC state. Without the patch, CRTC state is
preserved across suspend and resume and it remains possible to change CRTC
settings after resume. This means the patch regresses the user experience,
preventing "Night Light" mode to work as expected. I've validated this on
v6.10.2 vs. v6.10.2 with this patch applied.

While the cause for the bug uncovered by this change is likely separate, given
it's impact, would it be prudent to delay the application of this patch until
the related bug is identified and fixed? Otherwise we would be fixing a dmesg
error message "[dpu error]connector not connected 3" that appears to do no harm
but thereby break more critical user visible behavior.

Best regards
Leonard

On 8/2/24 15:47, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [ 1185.831970] [dpu error]connector not connected 3
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c9..8ff496082902 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>  	if (!conn_state || !conn_state->connector) {
>  		DPU_ERROR("invalid connector state\n");
>  		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>  	}
>  
>  	crtc = conn_state->crtc;
>
Abhinav Kumar Aug. 5, 2024, 7:19 p.m. UTC | #2
On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [ 1185.831970] [dpu error]connector not connected 3
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c9..8ff496082902 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}

For this issue, do we hit the connector->force = DRM_FORCE_OFF path?

Because otherwise, writeback does not implement .detect() callback today 
so its always connected.

But if that was the case how come this error is only for writeback. Even 
DP has the same connected check in atomic_check()

Change seems fine with me because ideally this seems like a no-op to me 
because writeback connector is assumed to be always connected but the 
issue is missing some details here.

>   
>   	crtc = conn_state->crtc;
>
Dmitry Baryshkov Aug. 7, 2024, 10:44 a.m. UTC | #3
On August 5, 2024 9:27:39 AM GMT+07:00, Leonard Lausen <leonard@lausen.nl> wrote:
>Dear Dmitry,
>
>Thank you for the patch. Unfortunately, the patch triggers a regression with
>respect to DRM CRTC state handling. With the patch applied, suspending and
>resuming a lazor sc7180 with external display connected, looses CRTC state on
>resume and prevents applying a new CRTC state. Without the patch, CRTC state is
>preserved across suspend and resume and it remains possible to change CRTC
>settings after resume. This means the patch regresses the user experience,
>preventing "Night Light" mode to work as expected. I've validated this on
>v6.10.2 vs. v6.10.2 with this patch applied.
>

Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.

>While the cause for the bug uncovered by this change is likely separate, given
>it's impact, would it be prudent to delay the application of this patch until
>the related bug is identified and fixed? Otherwise we would be fixing a dmesg
>error message "[dpu error]connector not connected 3" that appears to do no harm
>but thereby break more critical user visible behavior.
>
>Best regards
>Leonard
>
>On 8/2/24 15:47, Dmitry Baryshkov wrote:
>> During suspend/resume process all connectors are explicitly disabled and
>> then reenabled. However resume fails because of the connector_status check:
>> 
>> [ 1185.831970] [dpu error]connector not connected 3
>> 
>> It doesn't make sense to check for the Writeback connected status (and
>> other drivers don't perform such check), so drop the check.
>> 
>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>> Cc: stable@vger.kernel.org
>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> ---
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>  1 file changed, 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> index 16f144cbc0c9..8ff496082902 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>  	if (!conn_state || !conn_state->connector) {
>>  		DPU_ERROR("invalid connector state\n");
>>  		return -EINVAL;
>> -	} else if (conn_state->connector->status != connector_status_connected) {
>> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>> -		return -EINVAL;
>>  	}
>>  
>>  	crtc = conn_state->crtc;
>> 
>
Dmitry Baryshkov Aug. 7, 2024, 10:46 a.m. UTC | #4
On August 6, 2024 2:19:46 AM GMT+07:00, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
>> During suspend/resume process all connectors are explicitly disabled and
>> then reenabled. However resume fails because of the connector_status check:
>> 
>> [ 1185.831970] [dpu error]connector not connected 3
>> 
>> It doesn't make sense to check for the Writeback connected status (and
>> other drivers don't perform such check), so drop the check.
>> 
>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>> Cc: stable@vger.kernel.org
>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>   1 file changed, 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> index 16f144cbc0c9..8ff496082902 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>   	if (!conn_state || !conn_state->connector) {
>>   		DPU_ERROR("invalid connector state\n");
>>   		return -EINVAL;
>> -	} else if (conn_state->connector->status != connector_status_connected) {
>> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>> -		return -EINVAL;
>>   	}
>
>For this issue, do we hit the connector->force = DRM_FORCE_OFF path?

It was hit during the suspend/resume, so yes, it is a forced off, but by the different means.

>
>Because otherwise, writeback does not implement .detect() callback today so its always connected.

It is undefined/unkown (3), not connected (1)

>
>But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check()
>
>Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here.
>
>>     	crtc = conn_state->crtc;
>>
Leonard Lausen Aug. 7, 2024, 3:41 p.m. UTC | #5
On 8/7/24 06:44, Dmitry Baryshkov wrote:> Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.

71174f362d67 specifically, or v6.9 more broadly regress in that we get "[dpu
error]connector not connected 3" and "[drm:drm_mode_config_helper_resume]
*ERROR* Failed to resume (-22)" if suspending and resuming the system while
external display is connected over USB-C DP. Suspend and resume itself
still works, and the external display also works after the resume, albeit
perhaps with a small delay due to the dpu error. This is also mentioned in
the issue description of https://gitlab.freedesktop.org/drm/msm/-/issues/57.
So while suspend/resume isn't fully broken, the error is still unexpected and
I thus bisected and identified 71174f362d67 as the first commit to trigger it.
While your patch avoids the dpu/drm error, it triggers issue with the CRTC state,
breaking the CRTC functionality after resume.

Might we be facing a race condition here, which is accidentally exposed by
71174f362d67 but requires a separate fix?
György Kurucz Aug. 30, 2024, 5:36 p.m. UTC | #6
Dear Dmitry,

For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues 
with the display staying black after sleep. As a workaround, I could 
switch to a different VT and back.

> [ 1185.831970] [dpu error]connector not connected 3

I can confirm that I was seeing this exact error message as well.

>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}
>   
>   	crtc = conn_state->crtc;

After applying this patch, the screen now resumes successfully, and the 
errors are gone.

(For future reference, I am using this kernel: 
https://github.com/jhovold/linux/commits/wip/x1e80100-6.11-rc5/, commit 
cc2cb95cc77fec43edd407c993122085fa8c2dd7.)

Tested-by: György Kurucz <me@kuruczgy.com>

Best regards,
György
Leonard Lausen Aug. 31, 2024, 2:53 p.m. UTC | #7
Dear György,

On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.

Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.

On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Best regards
Leonard
Leonard Lausen Aug. 31, 2024, 3:47 p.m. UTC | #8
(Resend as there was an email SPF record issue)

Dear György,

On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.

Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.

On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Best regards
Leonard
György Kurucz Aug. 31, 2024, 6:46 p.m. UTC | #9
Dear Leonard,

> Do you observe this issue on every suspend-resume cycle?

I just did 10 suspend/resume cycles in a row to double check, and 
without this patch the screen never comes back (always have to switch VT 
back-and-forth to bring it back). The

[dpu error]connector not connected 3
[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)

pair of error messages also consistently appears after all resumes.

Though I think e.g. Rob Clark reported that suspend/resume already works 
properly for him without this patch, so this experience is not universal 
on the Yoga Slim 7x.

> On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Unfortunately I cannot test this, as color temperature adjustments seems 
to be completely non-functional for me in the first place. For color 
temperature adjustment, I use gammastep on my machines, which uses 
wlr_gamma_control_unstable_v1 under the hood. It outputs the following 
warnings:

Warning: Zero outputs support gamma adjustment.
Warning: 1/1 output(s) do not support gamma adjustment.

I haven't dug deeper into the cause yet, based on these it seems that 
wlroots isn't detecting the display as being gamma-adjustable in the 
first place.

Best regards,
György
Leonard Lausen Aug. 31, 2024, 7 p.m. UTC | #10
Dear György,

>> Do you observe this issue on every suspend-resume cycle?
> 
> I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The
> 
> [dpu error]connector not connected 3
> [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)
> 
> pair of error messages also consistently appears after all resumes.
> 
> Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x.

Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?

>> On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
> 
> Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings:
> 
> Warning: Zero outputs support gamma adjustment.
> Warning: 1/1 output(s) do not support gamma adjustment.
> 
> I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place.

The cause is simple: Qualcomm SoCs don't implement GAMMA_LUT support. Your desktop environment needs to use Color Transform Matrix (CTM) on ARM/QCom devices. You can refer to https://bugs.kde.org/show_bug.cgi?id=455720 for further details. It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE. Gnome does not support CTM yet (https://gitlab.gnome.org/GNOME/mutter/-/issues/2318).

Best regards
Leonard
György Kurucz Aug. 31, 2024, 9:50 p.m. UTC | #11
Dear Leonard,

I installed KDE. First, I ran it with the my regular kernel without this 
patch. The first interesting thing I notice is that the screen *does* 
come back after resume. (The error messages are still present though.)

> Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?

Yes, at least that's what I gathered from our conversations on IRC. But 
with the above in mind, I now suspect that this comes down to desktop 
environment differences.

> It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE.

Well "Night Light" seems to be even more broken under KDE. I went into 
System Settings, set it to "Always on night light", and tried to adjust 
the temperature slider. While adjusting the slider, the screen goes 
black, and only comes back after a few seconds. The color temperature 
does not change, no matter what I change the slider to. Afterwards I 
tried with this patch as well, but it produces the exact same behavior.

Best regards,
György
Johan Hovold Nov. 19, 2024, 1:52 p.m. UTC | #12
On Fri, Aug 30, 2024 at 07:36:32PM +0200, György Kurucz wrote:

> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues 
> with the display staying black after sleep. As a workaround, I could 
> switch to a different VT and back.
> 
> > [ 1185.831970] [dpu error]connector not connected 3
> 
> I can confirm that I was seeing this exact error message as well.
> 
> >   	if (!conn_state || !conn_state->connector) {
> >   		DPU_ERROR("invalid connector state\n");
> >   		return -EINVAL;
> > -	} else if (conn_state->connector->status != connector_status_connected) {
> > -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> > -		return -EINVAL;
> >   	}
> >   
> >   	crtc = conn_state->crtc;
> 
> After applying this patch, the screen now resumes successfully, and the 
> errors are gone.

I'm seeing the same issue as György on the x1e80100 CRD and Lenovo
ThinkPad T14s. Without this patch, the internal display fails to resume
properly (switching VT brings it back) and the following errors are
logged:

	[dpu error]connector not connected 3
	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)

I see the same symptoms with Xorg as well as sway.

Can we please get this fixed and backported as soon as possible?

Even if there are further issues with some "Night Light" functionality
on one machine, keeping this bug as workaround does not seem warranted
given that it breaks basic functionality for users.

The x1e80100 is the only platform I have access to with a writeback
connector, but this regression potentially affects a whole host of older
platforms as well.

Johan
Johan Hovold Nov. 20, 2024, 8:32 a.m. UTC | #13
On Tue, Nov 19, 2024 at 10:02:33PM -0500, Leonard Lausen wrote:

> The finding is that while 6.10.14 with this patch applied still suffers from
> that regression, 6.11.9 and 6.12 do not face the CRTC state regression.
> Therefore, whatever issue the patch uncovered in older kernels and which
> justified not merging it before due to regressing basic CTM functionality, is
> now fixed. The patch should be good to merge and backport to 6.11, but from my
> perspective should not be backported to older kernels unless the interaction
> with the DRM CRTC state issue is understood and an associated fix backported as
> well.

Thanks for testing. The 6.9 and 6.10 stable trees are EOL and backporting
to 6.11 should not cause any trouble then.

Johan
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
index 16f144cbc0c9..8ff496082902 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
@@ -42,9 +42,6 @@  static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
 	if (!conn_state || !conn_state->connector) {
 		DPU_ERROR("invalid connector state\n");
 		return -EINVAL;
-	} else if (conn_state->connector->status != connector_status_connected) {
-		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
-		return -EINVAL;
 	}
 
 	crtc = conn_state->crtc;