Message ID | 20240802-dpu-fix-wb-v2-1-7eac9eb8e895@linaro.org |
---|---|
State | New |
Headers | show |
Series | drm/msm/dpu: two fixes targeting 6.11 | expand |
Dear Dmitry, Thank you for the patch. Unfortunately, the patch triggers a regression with respect to DRM CRTC state handling. With the patch applied, suspending and resuming a lazor sc7180 with external display connected, looses CRTC state on resume and prevents applying a new CRTC state. Without the patch, CRTC state is preserved across suspend and resume and it remains possible to change CRTC settings after resume. This means the patch regresses the user experience, preventing "Night Light" mode to work as expected. I've validated this on v6.10.2 vs. v6.10.2 with this patch applied. While the cause for the bug uncovered by this change is likely separate, given it's impact, would it be prudent to delay the application of this patch until the related bug is identified and fixed? Otherwise we would be fixing a dmesg error message "[dpu error]connector not connected 3" that appears to do no harm but thereby break more critical user visible behavior. Best regards Leonard On 8/2/24 15:47, Dmitry Baryshkov wrote: > During suspend/resume process all connectors are explicitly disabled and > then reenabled. However resume fails because of the connector_status check: > > [ 1185.831970] [dpu error]connector not connected 3 > > It doesn't make sense to check for the Writeback connected status (and > other drivers don't perform such check), so drop the check. > > Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") > Cc: stable@vger.kernel.org > Reported-by: Leonard Lausen <leonard@lausen.nl> > Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > --- > drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > index 16f144cbc0c9..8ff496082902 100644 > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, > if (!conn_state || !conn_state->connector) { > DPU_ERROR("invalid connector state\n"); > return -EINVAL; > - } else if (conn_state->connector->status != connector_status_connected) { > - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); > - return -EINVAL; > } > > crtc = conn_state->crtc; >
On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote: > During suspend/resume process all connectors are explicitly disabled and > then reenabled. However resume fails because of the connector_status check: > > [ 1185.831970] [dpu error]connector not connected 3 > > It doesn't make sense to check for the Writeback connected status (and > other drivers don't perform such check), so drop the check. > > Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") > Cc: stable@vger.kernel.org > Reported-by: Leonard Lausen <leonard@lausen.nl> > Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > --- > drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > index 16f144cbc0c9..8ff496082902 100644 > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c > @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, > if (!conn_state || !conn_state->connector) { > DPU_ERROR("invalid connector state\n"); > return -EINVAL; > - } else if (conn_state->connector->status != connector_status_connected) { > - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); > - return -EINVAL; > } For this issue, do we hit the connector->force = DRM_FORCE_OFF path? Because otherwise, writeback does not implement .detect() callback today so its always connected. But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check() Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here. > > crtc = conn_state->crtc; >
On August 5, 2024 9:27:39 AM GMT+07:00, Leonard Lausen <leonard@lausen.nl> wrote: >Dear Dmitry, > >Thank you for the patch. Unfortunately, the patch triggers a regression with >respect to DRM CRTC state handling. With the patch applied, suspending and >resuming a lazor sc7180 with external display connected, looses CRTC state on >resume and prevents applying a new CRTC state. Without the patch, CRTC state is >preserved across suspend and resume and it remains possible to change CRTC >settings after resume. This means the patch regresses the user experience, >preventing "Night Light" mode to work as expected. I've validated this on >v6.10.2 vs. v6.10.2 with this patch applied. > Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message. >While the cause for the bug uncovered by this change is likely separate, given >it's impact, would it be prudent to delay the application of this patch until >the related bug is identified and fixed? Otherwise we would be fixing a dmesg >error message "[dpu error]connector not connected 3" that appears to do no harm >but thereby break more critical user visible behavior. > >Best regards >Leonard > >On 8/2/24 15:47, Dmitry Baryshkov wrote: >> During suspend/resume process all connectors are explicitly disabled and >> then reenabled. However resume fails because of the connector_status check: >> >> [ 1185.831970] [dpu error]connector not connected 3 >> >> It doesn't make sense to check for the Writeback connected status (and >> other drivers don't perform such check), so drop the check. >> >> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") >> Cc: stable@vger.kernel.org >> Reported-by: Leonard Lausen <leonard@lausen.nl> >> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 >> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> >> --- >> drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> index 16f144cbc0c9..8ff496082902 100644 >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, >> if (!conn_state || !conn_state->connector) { >> DPU_ERROR("invalid connector state\n"); >> return -EINVAL; >> - } else if (conn_state->connector->status != connector_status_connected) { >> - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); >> - return -EINVAL; >> } >> >> crtc = conn_state->crtc; >> >
On August 6, 2024 2:19:46 AM GMT+07:00, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote: > > >On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote: >> During suspend/resume process all connectors are explicitly disabled and >> then reenabled. However resume fails because of the connector_status check: >> >> [ 1185.831970] [dpu error]connector not connected 3 >> >> It doesn't make sense to check for the Writeback connected status (and >> other drivers don't perform such check), so drop the check. >> >> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") >> Cc: stable@vger.kernel.org >> Reported-by: Leonard Lausen <leonard@lausen.nl> >> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 >> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> >> --- >> drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> index 16f144cbc0c9..8ff496082902 100644 >> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c >> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, >> if (!conn_state || !conn_state->connector) { >> DPU_ERROR("invalid connector state\n"); >> return -EINVAL; >> - } else if (conn_state->connector->status != connector_status_connected) { >> - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); >> - return -EINVAL; >> } > >For this issue, do we hit the connector->force = DRM_FORCE_OFF path? It was hit during the suspend/resume, so yes, it is a forced off, but by the different means. > >Because otherwise, writeback does not implement .detect() callback today so its always connected. It is undefined/unkown (3), not connected (1) > >But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check() > >Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here. > >> crtc = conn_state->crtc; >>
On 8/7/24 06:44, Dmitry Baryshkov wrote:> Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message. 71174f362d67 specifically, or v6.9 more broadly regress in that we get "[dpu error]connector not connected 3" and "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" if suspending and resuming the system while external display is connected over USB-C DP. Suspend and resume itself still works, and the external display also works after the resume, albeit perhaps with a small delay due to the dpu error. This is also mentioned in the issue description of https://gitlab.freedesktop.org/drm/msm/-/issues/57. So while suspend/resume isn't fully broken, the error is still unexpected and I thus bisected and identified 71174f362d67 as the first commit to trigger it. While your patch avoids the dpu/drm error, it triggers issue with the CRTC state, breaking the CRTC functionality after resume. Might we be facing a race condition here, which is accidentally exposed by 71174f362d67 but requires a separate fix?
Dear Dmitry, For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back. > [ 1185.831970] [dpu error]connector not connected 3 I can confirm that I was seeing this exact error message as well. > if (!conn_state || !conn_state->connector) { > DPU_ERROR("invalid connector state\n"); > return -EINVAL; > - } else if (conn_state->connector->status != connector_status_connected) { > - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); > - return -EINVAL; > } > > crtc = conn_state->crtc; After applying this patch, the screen now resumes successfully, and the errors are gone. (For future reference, I am using this kernel: https://github.com/jhovold/linux/commits/wip/x1e80100-6.11-rc5/, commit cc2cb95cc77fec43edd407c993122085fa8c2dd7.) Tested-by: György Kurucz <me@kuruczgy.com> Best regards, György
Dear György,
On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Best regards
Leonard
(Resend as there was an email SPF record issue)
Dear György,
On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Best regards
Leonard
Dear Leonard, > Do you observe this issue on every suspend-resume cycle? I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The [dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) pair of error messages also consistently appears after all resumes. Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x. > On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality. Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings: Warning: Zero outputs support gamma adjustment. Warning: 1/1 output(s) do not support gamma adjustment. I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place. Best regards, György
Dear György, >> Do you observe this issue on every suspend-resume cycle? > > I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The > > [dpu error]connector not connected 3 > [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22) > > pair of error messages also consistently appears after all resumes. > > Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x. Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue? >> On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality. > > Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings: > > Warning: Zero outputs support gamma adjustment. > Warning: 1/1 output(s) do not support gamma adjustment. > > I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place. The cause is simple: Qualcomm SoCs don't implement GAMMA_LUT support. Your desktop environment needs to use Color Transform Matrix (CTM) on ARM/QCom devices. You can refer to https://bugs.kde.org/show_bug.cgi?id=455720 for further details. It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE. Gnome does not support CTM yet (https://gitlab.gnome.org/GNOME/mutter/-/issues/2318). Best regards Leonard
Dear Leonard, I installed KDE. First, I ran it with the my regular kernel without this patch. The first interesting thing I notice is that the screen *does* come back after resume. (The error messages are still present though.) > Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue? Yes, at least that's what I gathered from our conversations on IRC. But with the above in mind, I now suspect that this comes down to desktop environment differences. > It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE. Well "Night Light" seems to be even more broken under KDE. I went into System Settings, set it to "Always on night light", and tried to adjust the temperature slider. While adjusting the slider, the screen goes black, and only comes back after a few seconds. The color temperature does not change, no matter what I change the slider to. Afterwards I tried with this patch as well, but it produces the exact same behavior. Best regards, György
On Fri, Aug 30, 2024 at 07:36:32PM +0200, György Kurucz wrote: > For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues > with the display staying black after sleep. As a workaround, I could > switch to a different VT and back. > > > [ 1185.831970] [dpu error]connector not connected 3 > > I can confirm that I was seeing this exact error message as well. > > > if (!conn_state || !conn_state->connector) { > > DPU_ERROR("invalid connector state\n"); > > return -EINVAL; > > - } else if (conn_state->connector->status != connector_status_connected) { > > - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); > > - return -EINVAL; > > } > > > > crtc = conn_state->crtc; > > After applying this patch, the screen now resumes successfully, and the > errors are gone. I'm seeing the same issue as György on the x1e80100 CRD and Lenovo ThinkPad T14s. Without this patch, the internal display fails to resume properly (switching VT brings it back) and the following errors are logged: [dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22) I see the same symptoms with Xorg as well as sway. Can we please get this fixed and backported as soon as possible? Even if there are further issues with some "Night Light" functionality on one machine, keeping this bug as workaround does not seem warranted given that it breaks basic functionality for users. The x1e80100 is the only platform I have access to with a writeback connector, but this regression potentially affects a whole host of older platforms as well. Johan
On Tue, Nov 19, 2024 at 10:02:33PM -0500, Leonard Lausen wrote: > The finding is that while 6.10.14 with this patch applied still suffers from > that regression, 6.11.9 and 6.12 do not face the CRTC state regression. > Therefore, whatever issue the patch uncovered in older kernels and which > justified not merging it before due to regressing basic CTM functionality, is > now fixed. The patch should be good to merge and backport to 6.11, but from my > perspective should not be backported to older kernels unless the interaction > with the DRM CRTC state issue is understood and an associated fix backported as > well. Thanks for testing. The 6.9 and 6.10 stable trees are EOL and backporting to 6.11 should not cause any trouble then. Johan
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL; - } else if (conn_state->connector->status != connector_status_connected) { - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); - return -EINVAL; } crtc = conn_state->crtc;
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check: [ 1185.831970] [dpu error]connector not connected 3 It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check. Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen <leonard@lausen.nl> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> --- drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)