From patchwork Thu Jun 4 08:12:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8F7FC433FC for ; Thu, 4 Jun 2020 08:13:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF48120738 for ; Thu, 4 Jun 2020 08:13:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="IqxGBAw7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727038AbgFDIN4 (ORCPT ); Thu, 4 Jun 2020 04:13:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727025AbgFDIMm (ORCPT ); Thu, 4 Jun 2020 04:12:42 -0400 Received: from mail-wm1-x342.google.com (mail-wm1-x342.google.com [IPv6:2a00:1450:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB6FBC08C5C3 for ; Thu, 4 Jun 2020 01:12:39 -0700 (PDT) Received: by mail-wm1-x342.google.com with SMTP id q25so4635196wmj.0 for ; Thu, 04 Jun 2020 01:12:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Z86jDMPjYosEVmpvz8g87itNg+tEFmiX7DLzwmjjruY=; b=IqxGBAw7Ux8mFxKbZqFfhWHBvUThVxHGPJeIMG6V8JAySneDzg1MMfr9dhIgUYw2Fn OTAZ0xbCrYCL4eXPkPqfd9hDPavzifvRAdCmfeUmSdOIDkGg1FY43BlyAFoqsAL1182F slvaQBsul976QOmZFtX03fE52IJl/0XI4phXM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Z86jDMPjYosEVmpvz8g87itNg+tEFmiX7DLzwmjjruY=; b=uCLMZLVKnqIAQf8Qti1h3VKYVBj+2m8gG1kvszrve6CVcNqDngSz93hQYDygSmkEpA L96YYsYnpMTyUrcNAQFvwfMAUt/F/5cOTbRVybz8vpOzzbC7nx+YzgcRt9rSOrRorZLK TLyuzm8dSBgb/LgcZ75fAbEjqiZ4ZeO6jTVV7c9XoDHDcC9KEX0XqyVajfBgcuLMHbYB jHoPNFiG+y14XfZDqaT0KBVNjqOyOGrhGJC7EEIk5MKcrzYE+sYsr0txK0cTYLkAMbEo WSTuIljeiyyTcSkEceH/6Pq47GaOve9XYmlManl1j1q1Br8ECY6rpe8PU/lr+lBn9CFr EFRA== X-Gm-Message-State: AOAM533atX2+u1Bcpc1PWrTGmhtB+KyUhG/EXvmrTxO7cZklH40kTfEF HqbapjpBx4+aoL8X3QmwgIfxgA== X-Google-Smtp-Source: ABdhPJyKQbuv5baACXqVPivWSRwfBW+Y73i/xXeG533ND6BCj6YxPKujbnsSL0XPuqc9zCvklUc7sg== X-Received: by 2002:a7b:c385:: with SMTP id s5mr3041108wmj.121.1591258358519; Thu, 04 Jun 2020 01:12:38 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:37 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter , Rodrigo Siqueira , Haneen Mohammed , Daniel Vetter Subject: [PATCH 05/18] drm/vkms: Annotate vblank timer Date: Thu, 4 Jun 2020 10:12:11 +0200 Message-Id: <20200604081224.863494-6-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org This is needed to signal the fences from page flips, annotate it accordingly. We need to annotate entire timer callback since if we get stuck anywhere in there, then the timer stops, and hence fences stop. Just annotating the top part that does the vblank handling isn't enough. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter Cc: Rodrigo Siqueira Cc: Haneen Mohammed Cc: Daniel Vetter --- drivers/gpu/drm/vkms/vkms_crtc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c index ac85e17428f8..a53a40848a72 100644 --- a/drivers/gpu/drm/vkms/vkms_crtc.c +++ b/drivers/gpu/drm/vkms/vkms_crtc.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0+ +#include + #include #include #include @@ -14,7 +16,9 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) struct drm_crtc *crtc = &output->crtc; struct vkms_crtc_state *state; u64 ret_overrun; - bool ret; + bool ret, fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer, output->period_ns); @@ -49,6 +53,8 @@ static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) DRM_DEBUG_DRIVER("Composer worker already queued\n"); } + dma_fence_end_signalling(fence_cookie); + return HRTIMER_RESTART; } From patchwork Thu Jun 4 08:12:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4DBFC433E0 for ; Thu, 4 Jun 2020 08:13:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C19B120738 for ; Thu, 4 Jun 2020 08:13:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="A/nDmxB1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727094AbgFDIMp (ORCPT ); Thu, 4 Jun 2020 04:12:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727051AbgFDIMn (ORCPT ); Thu, 4 Jun 2020 04:12:43 -0400 Received: from mail-wm1-x344.google.com (mail-wm1-x344.google.com [IPv6:2a00:1450:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FE56C03E96D for ; Thu, 4 Jun 2020 01:12:42 -0700 (PDT) Received: by mail-wm1-x344.google.com with SMTP id f5so4624370wmh.2 for ; Thu, 04 Jun 2020 01:12:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nP9MOB+NRJcIZvAbkEER1wMU4XBwsFVpcXFNKANe0X8=; b=A/nDmxB1z0+GjCIm4Zq3jaNGjJ23cPpfIRlR5gm4EnJUVo5qYiGFagWY005kpAT8FA ArHPraQTxuq4p9JNPAua05uIHltLnWCPHrN4LJm936hW5tG5uT72nfIWJAuitOyRxRAc kB2mcQ2NolCjuilnD8ztQG9UfFpafIoWQgtOQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nP9MOB+NRJcIZvAbkEER1wMU4XBwsFVpcXFNKANe0X8=; b=OAqKFYeDAQj60OGaW/IgIsbDF3S390LzW/ATZGuhC1V7jpQ1A4qs6ksFHtmbtDpAVs 8FKoqvyFTemNX2SA9p5t7cXk6FcP8b8LqYMrjfGXj7Jpb0N+8xtzkmlb3wbqhH0InYSQ ubwhAGmcCfy/nXwrO1s3bEU6Hfn7aKjv0sJ2Lva6kgZElM4vbLCgkh8gZ0FaatEwhi1T 1JTj8RdyCijMZEcc7uJhjlnxj1Q+ft/VSXExyCwBcHnRzmiRuRipWhsRxNfKJtHFEK/7 9YBT+hV2pGb5fj1mZhmr4q3QEvGGnPkn7A9CLqi6TFvCOeKARAUq7jGhRrML/NGbN95P gUIg== X-Gm-Message-State: AOAM533LdwN758IfqUZ88V5nKrnYOyYVu8QHg0v66y6d+yBwsB5hDdC2 8E+Dzq0JY9r8liLBSxqtIyVDuw== X-Google-Smtp-Source: ABdhPJykIL9KNzrWb72K3NVL+V2g/6mdC3SH7CAQEoT09C7y7QbvAegOLkcv21ECmp0YTrkrfLSkzQ== X-Received: by 2002:a1c:9e13:: with SMTP id h19mr2939010wme.107.1591258361040; Thu, 04 Jun 2020 01:12:41 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:40 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 07/18] drm/atomic-helper: Add dma-fence annotations Date: Thu, 4 Jun 2020 10:12:13 +0200 Message-Id: <20200604081224.863494-8-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org This is a bit disappointing since we need to split the annotations over all the different parts. I was considering just leaking the critical section into the ->atomic_commit_tail callback of each driver. But that would mean we need to pass the fence_cookie into each driver (there's a total of 13 implementations of this hook right now), so bad flag day. And also a bit leaky abstraction. Hence just do it function-by-function. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 7cd7fe0d57b4..bfcc7857a9a1 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -1549,6 +1549,7 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done); void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling(); drm_atomic_helper_commit_modeset_disables(dev, old_state); @@ -1560,6 +1561,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state) drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1579,6 +1582,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_tail); void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state) { struct drm_device *dev = old_state->dev; + bool fence_cookie = dma_fence_begin_signalling(); drm_atomic_helper_commit_modeset_disables(dev, old_state); @@ -1591,6 +1595,8 @@ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state) drm_atomic_helper_commit_hw_done(old_state); + dma_fence_end_signalling(fence_cookie); + drm_atomic_helper_wait_for_vblanks(dev, old_state); drm_atomic_helper_cleanup_planes(dev, old_state); @@ -1606,6 +1612,9 @@ static void commit_tail(struct drm_atomic_state *old_state) ktime_t start; s64 commit_time_ms; unsigned int i, new_self_refresh_mask = 0; + bool fence_cookie; + + fence_cookie = dma_fence_begin_signalling(); funcs = dev->mode_config.helper_private; @@ -1634,6 +1643,8 @@ static void commit_tail(struct drm_atomic_state *old_state) if (new_crtc_state->self_refresh_active) new_self_refresh_mask |= BIT(i); + dma_fence_end_signalling(fence_cookie); + if (funcs && funcs->atomic_commit_tail) funcs->atomic_commit_tail(old_state); else @@ -1789,6 +1800,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, bool nonblock) { int ret; + bool fence_cookie; if (state->async_update) { ret = drm_atomic_helper_prepare_planes(dev, state); @@ -1811,6 +1823,8 @@ int drm_atomic_helper_commit(struct drm_device *dev, if (ret) return ret; + fence_cookie = dma_fence_begin_signalling(); + if (!nonblock) { ret = drm_atomic_helper_wait_for_fences(dev, state, true); if (ret) @@ -1848,6 +1862,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, */ drm_atomic_state_get(state); + dma_fence_end_signalling(fence_cookie); if (nonblock) queue_work(system_unbound_wq, &state->commit_work); else @@ -1856,6 +1871,7 @@ int drm_atomic_helper_commit(struct drm_device *dev, return 0; err: + dma_fence_end_signalling(fence_cookie); drm_atomic_helper_cleanup_planes(dev, state); return ret; } From patchwork Thu Jun 4 08:12:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40541C433E3 for ; Thu, 4 Jun 2020 08:12:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FFBF207ED for ; Thu, 4 Jun 2020 08:12:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="IEeSF84M" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727769AbgFDIMr (ORCPT ); Thu, 4 Jun 2020 04:12:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727107AbgFDIMq (ORCPT ); Thu, 4 Jun 2020 04:12:46 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB603C03E97D for ; Thu, 4 Jun 2020 01:12:44 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id r9so4295649wmh.2 for ; Thu, 04 Jun 2020 01:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=IEeSF84MYGmRcfj4sMCd4J+zYZKwLXaJxMSpi3fUTEKdsfossRgSDuWbe+zZ21xpAx 3y3tAVuXjHWRsIi+zJrg554cTrk73ezXwhTX3+vOBsrV//Okft/By/0MmGxexiInifje DtXukw743V5Gql830qv9EKpT7CFp/fzcgsUTM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QpRcEBLBFti7+uwIPhFpaG5e28opFzcZsYVebGatC0I=; b=a4CV0RI/yLmgN+h7lrSA95rqQjB7iw/AC4HeF7q7OvHtDT3+WmaTJ8yoVHw4fvsTk9 DE2xhQEwzeDl2/AidV+xq+j9drOxhgXjSGFv8sOOoWz8dLHKuSpNbVMG/rxvDfdfqZU8 Uv8YPRhsIE1ofa11q2H3+9CyRCnFFC+7f1KLUzC/uaV3qTsSAKXV78ulvVsSbf5kwBXr fUvwcDL34Xny2sjH5PTo74fov2xdwRSC3JCmI/iMdAqBGgrj4Wh26cR58P+j+99dK9FJ b080Bo1EQkXwNNSLzLlp7Buoyexxt10ZQDPctX1MwXR8XlppMFhKaouseX7r2yHYyhGq VQGw== X-Gm-Message-State: AOAM530Ovz3iTMbeF+jqJsJgTyRMmj3onm9IcYYlSKpLcXt2OlsS09hx lig2UV36sW0ds1okhS33DFRc/A== X-Google-Smtp-Source: ABdhPJzK094CFf7KReF2ex18c6m+K0h2mhk+2HMb53cWYgjLEFwQL7H9L4hoB3ty2HpwgAOoCcUgiQ== X-Received: by 2002:a1c:5502:: with SMTP id j2mr3075373wmb.56.1591258363614; Thu, 04 Jun 2020 01:12:43 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:42 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 09/18] drm/scheduler: use dma-fence annotations in main thread Date: Thu, 4 Jun 2020 10:12:15 +0200 Message-Id: <20200604081224.863494-10-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org If the scheduler rt thread gets stuck on a mutex that we're holding while waiting for gpu workloads to complete, we have a problem. Add dma-fence annotations so that lockdep can check this for us. I've tried to quite carefully review this, and I think it's at the right spot. But obviosly no expert on drm scheduler. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2f319102ae9f..06a736e506ad 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -763,9 +763,12 @@ static int drm_sched_main(void *param) struct sched_param sparam = {.sched_priority = 1}; struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param; int r; + bool fence_cookie; sched_setscheduler(current, SCHED_FIFO, &sparam); + fence_cookie = dma_fence_begin_signalling(); + while (!kthread_should_stop()) { struct drm_sched_entity *entity = NULL; struct drm_sched_fence *s_fence; @@ -823,6 +826,9 @@ static int drm_sched_main(void *param) wake_up(&sched->job_scheduled); } + + dma_fence_end_signalling(fence_cookie); + return 0; } From patchwork Thu Jun 4 08:12:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9A0FC433E1 for ; Thu, 4 Jun 2020 08:12:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5D6320825 for ; Thu, 4 Jun 2020 08:12:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="dPeG6Vwr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727801AbgFDIMu (ORCPT ); Thu, 4 Jun 2020 04:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727781AbgFDIMr (ORCPT ); Thu, 4 Jun 2020 04:12:47 -0400 Received: from mail-wm1-x343.google.com (mail-wm1-x343.google.com [IPv6:2a00:1450:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8651DC03E96D for ; Thu, 4 Jun 2020 01:12:47 -0700 (PDT) Received: by mail-wm1-x343.google.com with SMTP id u26so6243059wmn.1 for ; Thu, 04 Jun 2020 01:12:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=dPeG6VwrGWYvNwNN0ZEMIw4vWjkVci2qTgMSxope6UETE3fRl01e8viI9oFa4083Jj DdYccEo90yhy8XH5cdnxNbrh1jbM/izjkui8sSYKCs2y4PK9kM6I2sbEqQBA/eIPT+Ca 9IeXeC1lYJxLhEUT92aKE6kgkpD4NJkeUxG+k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r5PRcG/JtprEwCOUf2/ZPMLqKS9RbW1APrGVc2im3Nw=; b=Xnx570tYZSdiHl4hXm+u1p1dfC2MAg6z22IyVvdO5PuzmYDuSfPPnDFmGfCGKOQKpH D4Pn3twGAPvPhkRfKy2WdOzvu0axc01HIaLHR/Og15juMSUpwf8Ijr3FYtvYjb/HI9l0 A2o0jkGtmv18thgvIQySd8OSwKOzw5lZPc/r9CQ6kQ8eKSzp9qtj5L8aQbT3MW+wF9jT 4Whki60L00IHpbKC1WfOkl9Pd5AdIF7gIgjZBUNYQr8B/1K1scaDkaJBAiTx/hBa+TPm vXUabiWG6ygevFauD5zCX14gVAEAPgMdWDgUDnoiY2BlMSAPK55t0SVyE2UiBZjD4elI 7GLQ== X-Gm-Message-State: AOAM5323TvrivD16jXTT9uy6Af7YuZSHCNy6XCPQeF6a3ds7PhmnOQ9u yPCvbIIjEAe4bVMheYqHE1hQgA== X-Google-Smtp-Source: ABdhPJxlvHiwLUKpMc/fibX83rHdeLrkjFY5wrTostf2iwbBSmbJPCuso4jxxSuY+d7sEL6BpjrzbQ== X-Received: by 2002:a1c:7305:: with SMTP id d5mr2952331wmb.85.1591258366300; Thu, 04 Jun 2020 01:12:46 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:45 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 11/18] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Date: Thu, 4 Jun 2020 10:12:17 +0200 Message-Id: <20200604081224.863494-12-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org My dma-fence lockdep annotations caught an inversion because we allocate memory where we really shouldn't: kmem_cache_alloc+0x2b/0x6d0 amdgpu_fence_emit+0x30/0x330 [amdgpu] amdgpu_ib_schedule+0x306/0x550 [amdgpu] amdgpu_job_run+0x10f/0x260 [amdgpu] drm_sched_main+0x1b9/0x490 [gpu_sched] kthread+0x12e/0x150 Trouble right now is that lockdep only validates against GFP_FS, which would be good enough for shrinkers. But for mmu_notifiers we actually need !GFP_ATOMIC, since they can be called from any page laundering, even if GFP_NOFS or GFP_NOIO are set. I guess we should improve the lockdep annotations for fs_reclaim_acquire/release. Ofc real fix is to properly preallocate this fence and stuff it into the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of the way. v2: Two more allocations in scheduler paths. Frist one: __kmalloc+0x58/0x720 amdgpu_vmid_grab+0x100/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Second one: kmem_cache_alloc+0x2b/0x6d0 amdgpu_sync_fence+0x7e/0x110 [amdgpu] amdgpu_vmid_grab+0x86b/0xca0 [amdgpu] amdgpu_job_dependency+0xf9/0x120 [amdgpu] drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched] drm_sched_main+0xf9/0x490 [gpu_sched] Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index d878fe7fee51..055b47241bb1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f, uint32_t seq; int r; - fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL); + fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC); if (fence == NULL) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index fe92dcd94d4a..fdcd6659f5ad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm, if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait)) return amdgpu_sync_fence(sync, ring->vmid_wait, false); - fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL); + fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC); if (!fences) return -ENOMEM; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index b87ca171986a..330476cc0c86 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f, if (amdgpu_sync_add_later(sync, f, explicit)) return 0; - e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL); + e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC); if (!e) return -ENOMEM; From patchwork Thu Jun 4 08:12:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E15EC433E8 for ; Thu, 4 Jun 2020 08:13:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E6D4207F9 for ; Thu, 4 Jun 2020 08:13:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="ifotk2PJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727902AbgFDIM5 (ORCPT ); Thu, 4 Jun 2020 04:12:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727829AbgFDIMu (ORCPT ); Thu, 4 Jun 2020 04:12:50 -0400 Received: from mail-wr1-x443.google.com (mail-wr1-x443.google.com [IPv6:2a00:1450:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09F5EC08C5C3 for ; Thu, 4 Jun 2020 01:12:50 -0700 (PDT) Received: by mail-wr1-x443.google.com with SMTP id l11so5071809wru.0 for ; Thu, 04 Jun 2020 01:12:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zHLq9yPZgKKERQ/k46N5wAtq1tEa3e90dS7EqKAAVV8=; b=ifotk2PJMfPqXT/+gwn4eR7/3IWQV+W9DCcY7OyrRdaAZFQwKftRJdQXo0rpUM3/yB gFJrkbneOCQ2QX77TBZL8U/VuKVxgzOxX4tlg9WC2CbjGOvgEb6AVwmiQ15EpDYBd7lG Tzy0RTnpPG+HiMGAwz3SGe+WNdJdbJrdEf6D0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zHLq9yPZgKKERQ/k46N5wAtq1tEa3e90dS7EqKAAVV8=; b=FVDb8iDNGB0ASbg0cdPEfaQcr9bNQ9Ep3WcpcH5n1DIk4t3SA5MOn3hsNhg8WXSPE/ aD/0tGCaMTi6ee5YB//JlHucscmoEum6me/36NhEuM/0Ut1H+FyDAPo9GHYg7wZOzoHa v8lMdsNvuScYg1CFYJeOiJ54q7KjBlKcXjX69QvtkZ7fK1HK0MJdOaRSIqXPvjrzplv1 s51CM9uTHUNyRRWgDvCgswMLx+HLflE0lJmnHNpInuNNrJpZrkaKnwG+5ZjWLRXZGzgI 831iC+sEWeost/4FaPE5HZDE5d7BROAsWAFdm4K2kJ1j9duSCSrigDNSEqaikWaKRDd8 M/mw== X-Gm-Message-State: AOAM530qMQ/lrvShZBHZMnm2jD3HBJec98nww5BWZXctN4liPAyeGZdK Be4CJCreshEfio+rkLbeZ2HM8Q== X-Google-Smtp-Source: ABdhPJwmXgxb0rRlbz9RyKT6KVGNFYaugD+jseyFJKYlXk9wNmaSF3eMQq44tjneQVcPWbVrYiOUJw== X-Received: by 2002:adf:fb0f:: with SMTP id c15mr3424740wrr.410.1591258368775; Thu, 04 Jun 2020 01:12:48 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:48 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 13/18] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Date: Thu, 4 Jun 2020 10:12:19 +0200 Message-Id: <20200604081224.863494-14-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Trying to grab dma_resv_lock while in commit_tail before we've done all the code that leads to the eventual signalling of the vblank event (which can be a dma_fence) is deadlock-y. Don't do that. Here the solution is easy because just grabbing locks to read something races anyway. We don't need to bother, READ_ONCE is equivalent. And avoids the locking issue. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index c575e7394d03..04c11443b9ca 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6910,7 +6910,11 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, * explicitly on fences instead * and in general should be called for * blocking commit to as per framework helpers + * + * Yes, this deadlocks, since you're calling dma_resv_lock in a + * path that leads to a dma_fence_signal(). Don't do that. */ +#if 0 r = amdgpu_bo_reserve(abo, true); if (unlikely(r != 0)) DRM_ERROR("failed to reserve buffer before flip\n"); @@ -6920,6 +6924,12 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, tmz_surface = amdgpu_bo_encrypted(abo); amdgpu_bo_unreserve(abo); +#endif + /* + * this races anyway, so READ_ONCE isn't any better or worse + * than the stuff above. Except the stuff above can deadlock. + */ + tiling_flags = READ_ONCE(abo->tiling_flags); fill_dc_plane_info_and_addr( dm->adev, new_plane_state, tiling_flags, From patchwork Thu Jun 4 08:12:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBC48C433E8 for ; Thu, 4 Jun 2020 08:13:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94749207D3 for ; Thu, 4 Jun 2020 08:13:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="MQtZQlsV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728020AbgFDINU (ORCPT ); Thu, 4 Jun 2020 04:13:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727843AbgFDIMw (ORCPT ); Thu, 4 Jun 2020 04:12:52 -0400 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63437C08C5C6 for ; Thu, 4 Jun 2020 01:12:51 -0700 (PDT) Received: by mail-wr1-x441.google.com with SMTP id l10so5015884wrr.10 for ; Thu, 04 Jun 2020 01:12:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cSDqH5ij1BPhN02yjRk7X7Ch94EKjTTeY1hANr+YPIg=; b=MQtZQlsVT4D6M/TZ/TTIidOW50d1o6Q0AiFZHAR30zUPft45srmFqn7Q/chHCQgEwb QuUQyDP7PvsPZgFWkvGTZI4QJxfIwFPjWqa5WQ26vsMQBqacHIfA9jXi1YXDD4rvkBs/ wmbeqc0eo5WnTY22+0zRQvessA2Qhc3AShHpw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cSDqH5ij1BPhN02yjRk7X7Ch94EKjTTeY1hANr+YPIg=; b=AuEzTqVHExUKq/tjAgWVYJ8kEvZkB9scEw9HRZRq3G4z21wGtbUDDn9o1QMyOev0ft Rj/frjmU5rAKw4BCxqv8t1pXif0IMrkkBUvGI8hFPZ0xT9KIn+SHst8Y6JYzhL54qUrP yqDMxB7SOnAZYvcM0Gzmzljo/31cQQnTZiUozRvVnYEc6VX+RFHclgvAQ71a42tIZH5G HAP9cNjYZVSQERsejTW0BGzBs1Fb4d4qUVIxegWKrJgQCoV2wIEODsZAz3rM4QMLjpWi vKstIXbgW/dDdPr4VRBEMx+7f4nAROuXYpqq3/VjIR65teUcS7v/RTJd1EnjEF0rHcwt 1lBQ== X-Gm-Message-State: AOAM53018igeWrOkAoyoCWG4bwx/DCBevocvsRfeZgudbN6fT6erjfDn pmA8/gRUaFOKlerZ2gb8aN13+g== X-Google-Smtp-Source: ABdhPJwTJtq4ZMbXkz9yUomRGTcjfw0qYxpotGG6g0Tb3unDYU1epqhHkACAq8ejGSpX2fZsvYsoJQ== X-Received: by 2002:a05:6000:7:: with SMTP id h7mr3471787wrx.55.1591258370059; Thu, 04 Jun 2020 01:12:50 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:49 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 14/18] drm/scheduler: use dma-fence annotations in tdr work Date: Thu, 4 Jun 2020 10:12:20 +0200 Message-Id: <20200604081224.863494-15-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org In the face of unpriviledged userspace being able to submit bogus gpu workloads the kernel needs gpu timeout and reset (tdr) to guarantee that dma_fences actually complete. Annotate this worker to make sure we don't have any accidental locking inversions or other problems lurking. Originally this was part of the overall scheduler annotation patch. But amdgpu has some glorious inversions here: - grabs console_lock - does a full modeset, which grabs all kinds of locks (drm_modeset_lock, dma_resv_lock) which can deadlock with dma_fence_wait held inside them. - almost minor at that point, but the modeset code also allocates memory These all look like they'll be very hard to fix properly, the hardware seems to require a full display reset with any gpu recovery. Hence split out as a seperate patch. Since amdgpu isn't the only hardware driver that needs to reset the display (at least gen2/3 on intel have the same problem) we need a generic solution for this. There's two tricks we could still from drm/i915 and lift to dma-fence: - The big whack, aka force-complete all fences. i915 does this for all pending jobs if the reset is somehow stuck. Trouble is we'd need to do this for all fences in the entire system, and just the book-keeping for that will be fun. Plus lots of drivers use fences for all kinds of internal stuff like memory management, so unconditionally resetting all of them doesn't work. I'm also hoping that with these fence annotations we could enlist lockdep in finding the last offenders causing deadlocks, and we could remove this get-out-of-jail trick. - The more feasible approach (across drivers at least as part of the dma_fence contract) is what drm/i915 does for gen2/3: When we need to reset the display we wake up all dma_fence_wait_interruptible calls, or well at least the equivalent of those in i915 internally. Relying on ioctl restart we force all other threads to release their locks, which means the tdr thread is guaranteed to be able to get them. I think we could implement this at the dma_fence level, including proper lockdep annotations. dma_fence_begin_tdr(): - must be nested within a dma_fence_begin/end_signalling section - will wake up all interruptible (but not the non-interruptible) dma_fence_wait() calls and force them to complete with a -ERESTARTSYS errno code. All new interrupitble calls to dma_fence_wait() will immeidately fail with the same error code. dma_fence_end_trdr(): - this will convert dma_fence_wait() calls back to normal. Of course interrupting dma_fence_wait is only ok if the caller specified that, which means we need to split the annotations into interruptible and non-interruptible version. If we then make sure that we only use interruptible dma_fence_wait() calls while holding drm_modeset_lock we can grab them in tdr code, and allow display resets. Doing the same for dma_resv_lock might be a lot harder, so buffer updates must be avoided. What's worse, we're not going to be able to make the dma_fence_wait calls in mmu-notifiers interruptible, that doesn't work. So allocating memory still wont' be allowed, even in tdr sections. Plus obviously we can use this trick only in tdr, it is rather intrusive. Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/scheduler/sched_main.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 06a736e506ad..e34a44376e87 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -279,9 +279,12 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; + bool fence_cookie; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); + fence_cookie = dma_fence_begin_signalling(); + /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->ring_mirror_list, @@ -313,6 +316,8 @@ static void drm_sched_job_timedout(struct work_struct *work) spin_lock(&sched->job_list_lock); drm_sched_start_timeout(sched); spin_unlock(&sched->job_list_lock); + + dma_fence_end_signalling(fence_cookie); } /** From patchwork Thu Jun 4 08:12:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Vetter X-Patchwork-Id: 209626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D568C433E4 for ; Thu, 4 Jun 2020 08:13:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45CEE207ED for ; Thu, 4 Jun 2020 08:13:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ffwll.ch header.i=@ffwll.ch header.b="hFJHKrGc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727966AbgFDINL (ORCPT ); Thu, 4 Jun 2020 04:13:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727882AbgFDIMz (ORCPT ); Thu, 4 Jun 2020 04:12:55 -0400 Received: from mail-wr1-x444.google.com (mail-wr1-x444.google.com [IPv6:2a00:1450:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B2D0C0A88B4 for ; Thu, 4 Jun 2020 01:12:55 -0700 (PDT) Received: by mail-wr1-x444.google.com with SMTP id x6so4997175wrm.13 for ; Thu, 04 Jun 2020 01:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xBHH8HV/ynkGTblaQCBVPIasEigLpWOOzD+/aDDngeE=; b=hFJHKrGcuuTLGVLvJHPMkbscDyUX8o+8t48i8gjHX/wrNO8TtoqmHu26IkBnYNObr4 KrI3LjBc0l2LHjqOn1T5MjPDcrAq4+mlQUnp02d+7XQ707QBLNKeZkrBDQuMdNYSE4rB ZVSbprOpIkob9k4ZgEk5Bvg5TCd7X96OYidgU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xBHH8HV/ynkGTblaQCBVPIasEigLpWOOzD+/aDDngeE=; b=drh8ybJmSTXbaQOfJRRjqYX39hON8M+K5sqlx/W8BSuyEKAIknOta6aezIs3/x5bPv TPxBuiTe2otSSfB8Pv6WsH4I/aFGKwfrSa118pmWfPet0hamCiGU9LAFRnqSvxdl1sUF X8IYK7AVzo3FCOmSs+AijzFO3bEqaLHxJgCpdOpjqJAIod8GuAJf1lpQNZ47MEgnQcak BEV5gpWliLejhmH6hYsUyWOx6PSMKh13XI9+YUHK0JiuWaecf/30eLSDrnRUodN3VXjj 0KQ8j0zA4RFw/UWLOQCiPHdhay5l0Sa+SVDw84m7vFnMeYnEA67F1WHAlgQ3Yq2g5MtA LUcQ== X-Gm-Message-State: AOAM530NuWmaACge+k7ZWUgkSgH3ox3sGwyH1rZr4ckz576QmZqVuitL Go2HaNKxPCUIzbhIyGtaaCqxsA== X-Google-Smtp-Source: ABdhPJyObyzep5ImU/AAS5w8hAKuTWNCEAJ1espIeo8JSFze8n8cPO+9cK+VeTUgMxSDryXkAOC4zw== X-Received: by 2002:adf:a4dd:: with SMTP id h29mr3422543wrb.372.1591258373746; Thu, 04 Jun 2020 01:12:53 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id f11sm6873305wrj.2.2020.06.04.01.12.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 01:12:53 -0700 (PDT) From: Daniel Vetter To: DRI Development Cc: Intel Graphics Development , LKML , linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, Daniel Vetter , linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, Chris Wilson , Maarten Lankhorst , =?utf-8?q?Christian_K=C3=B6nig?= , Daniel Vetter Subject: [PATCH 17/18] drm/amdgpu: gpu recovery does full modesets Date: Thu, 4 Jun 2020 10:12:23 +0200 Message-Id: <20200604081224.863494-18-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200604081224.863494-1-daniel.vetter@ffwll.ch> References: <20200604081224.863494-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org ... I think it's time to stop this little exercise. The lockdep splat, for the record: [ 132.583381] ====================================================== [ 132.584091] WARNING: possible circular locking dependency detected [ 132.584775] 5.7.0-rc3+ #346 Tainted: G W [ 132.585461] ------------------------------------------------------ [ 132.586184] kworker/2:3/865 is trying to acquire lock: [ 132.586857] ffffc90000677c70 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.587569] but task is already holding lock: [ 132.589044] ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.589803] which lock already depends on the new lock. [ 132.592009] the existing dependency chain (in reverse order) is: [ 132.593507] -> #2 (dma_fence_map){++++}-{0:0}: [ 132.595019] dma_fence_begin_signalling+0x50/0x60 [ 132.595767] drm_atomic_helper_commit+0xa1/0x180 [drm_kms_helper] [ 132.596567] drm_client_modeset_commit_atomic+0x1ea/0x250 [drm] [ 132.597420] drm_client_modeset_commit_locked+0x55/0x190 [drm] [ 132.598178] drm_client_modeset_commit+0x24/0x40 [drm] [ 132.598948] drm_fb_helper_restore_fbdev_mode_unlocked+0x4b/0xa0 [drm_kms_helper] [ 132.599738] drm_fb_helper_set_par+0x30/0x40 [drm_kms_helper] [ 132.600539] fbcon_init+0x2e8/0x660 [ 132.601344] visual_init+0xce/0x130 [ 132.602156] do_bind_con_driver+0x1bc/0x2b0 [ 132.602970] do_take_over_console+0x115/0x180 [ 132.603763] do_fbcon_takeover+0x58/0xb0 [ 132.604564] register_framebuffer+0x1ee/0x300 [ 132.605369] __drm_fb_helper_initial_config_and_unlock+0x36e/0x520 [drm_kms_helper] [ 132.606187] amdgpu_fbdev_init+0xb3/0xf0 [amdgpu] [ 132.607032] amdgpu_device_init.cold+0xe90/0x1677 [amdgpu] [ 132.607862] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.608697] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.609511] local_pci_probe+0x42/0x80 [ 132.610324] pci_device_probe+0x104/0x1a0 [ 132.611130] really_probe+0x147/0x3c0 [ 132.611939] driver_probe_device+0xb6/0x100 [ 132.612766] device_driver_attach+0x53/0x60 [ 132.613593] __driver_attach+0x8c/0x150 [ 132.614419] bus_for_each_dev+0x7b/0xc0 [ 132.615249] bus_add_driver+0x14c/0x1f0 [ 132.616071] driver_register+0x6c/0xc0 [ 132.616902] do_one_initcall+0x5d/0x2f0 [ 132.617731] do_init_module+0x5c/0x230 [ 132.618560] load_module+0x2981/0x2bc0 [ 132.619391] __do_sys_finit_module+0xaa/0x110 [ 132.620228] do_syscall_64+0x5a/0x250 [ 132.621064] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.621903] -> #1 (crtc_ww_class_mutex){+.+.}-{3:3}: [ 132.623587] __ww_mutex_lock.constprop.0+0xcc/0x10c0 [ 132.624448] ww_mutex_lock+0x43/0xb0 [ 132.625315] drm_modeset_lock+0x44/0x120 [drm] [ 132.626184] drmm_mode_config_init+0x2db/0x8b0 [drm] [ 132.627098] amdgpu_device_init.cold+0xbd1/0x1677 [amdgpu] [ 132.628007] amdgpu_driver_load_kms+0x5a/0x200 [amdgpu] [ 132.628920] amdgpu_pci_probe+0xf7/0x180 [amdgpu] [ 132.629804] local_pci_probe+0x42/0x80 [ 132.630690] pci_device_probe+0x104/0x1a0 [ 132.631583] really_probe+0x147/0x3c0 [ 132.632479] driver_probe_device+0xb6/0x100 [ 132.633379] device_driver_attach+0x53/0x60 [ 132.634275] __driver_attach+0x8c/0x150 [ 132.635170] bus_for_each_dev+0x7b/0xc0 [ 132.636069] bus_add_driver+0x14c/0x1f0 [ 132.636974] driver_register+0x6c/0xc0 [ 132.637870] do_one_initcall+0x5d/0x2f0 [ 132.638765] do_init_module+0x5c/0x230 [ 132.639654] load_module+0x2981/0x2bc0 [ 132.640522] __do_sys_finit_module+0xaa/0x110 [ 132.641372] do_syscall_64+0x5a/0x250 [ 132.642203] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 132.643022] -> #0 (crtc_ww_class_acquire){+.+.}-{0:0}: [ 132.644643] __lock_acquire+0x1241/0x23f0 [ 132.645469] lock_acquire+0xad/0x370 [ 132.646274] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.647071] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.647902] dm_suspend+0x1c/0x60 [amdgpu] [ 132.648698] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.649498] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.650300] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.651084] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.651825] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.652594] process_one_work+0x23c/0x580 [ 132.653402] worker_thread+0x50/0x3b0 [ 132.654139] kthread+0x12e/0x150 [ 132.654868] ret_from_fork+0x27/0x50 [ 132.655598] other info that might help us debug this: [ 132.657739] Chain exists of: crtc_ww_class_acquire --> crtc_ww_class_mutex --> dma_fence_map [ 132.659877] Possible unsafe locking scenario: [ 132.661416] CPU0 CPU1 [ 132.662126] ---- ---- [ 132.662847] lock(dma_fence_map); [ 132.663574] lock(crtc_ww_class_mutex); [ 132.664319] lock(dma_fence_map); [ 132.665063] lock(crtc_ww_class_acquire); [ 132.665799] *** DEADLOCK *** [ 132.667965] 4 locks held by kworker/2:3/865: [ 132.668701] #0: ffff8887fb81c938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.669462] #1: ffffc90000677e58 ((work_completion)(&(&sched->work_tdr)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x580 [ 132.670242] #2: ffffffff82318c80 (dma_fence_map){++++}-{0:0}, at: drm_sched_job_timedout+0x25/0xf0 [gpu_sched] [ 132.671039] #3: ffff8887b84a1748 (&adev->lock_reset){+.+.}-{3:3}, at: amdgpu_device_gpu_recover.cold+0x59e/0xe64 [amdgpu] [ 132.671902] stack backtrace: [ 132.673515] CPU: 2 PID: 865 Comm: kworker/2:3 Tainted: G W 5.7.0-rc3+ #346 [ 132.674347] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4011 04/19/2018 [ 132.675194] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 132.676046] Call Trace: [ 132.676897] dump_stack+0x8f/0xd0 [ 132.677748] check_noncircular+0x162/0x180 [ 132.678604] ? stack_trace_save+0x4b/0x70 [ 132.679459] __lock_acquire+0x1241/0x23f0 [ 132.680311] lock_acquire+0xad/0x370 [ 132.681163] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.682021] ? cpumask_next+0x16/0x20 [ 132.682880] ? module_assert_mutex_or_preempt+0x14/0x40 [ 132.683737] ? __module_address+0x28/0xf0 [ 132.684601] drm_modeset_acquire_init+0xd2/0x100 [drm] [ 132.685466] ? drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.686335] drm_atomic_helper_suspend+0x38/0x120 [drm_kms_helper] [ 132.687255] dm_suspend+0x1c/0x60 [amdgpu] [ 132.688152] amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu] [ 132.689057] ? amdgpu_fence_process+0x4c/0x150 [amdgpu] [ 132.689963] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu] [ 132.690893] amdgpu_device_gpu_recover.cold+0x4e6/0xe64 [amdgpu] [ 132.691818] amdgpu_job_timedout+0xfb/0x150 [amdgpu] [ 132.692707] drm_sched_job_timedout+0x8a/0xf0 [gpu_sched] [ 132.693597] process_one_work+0x23c/0x580 [ 132.694487] worker_thread+0x50/0x3b0 [ 132.695373] ? process_one_work+0x580/0x580 [ 132.696264] kthread+0x12e/0x150 [ 132.697154] ? kthread_create_worker_on_cpu+0x70/0x70 [ 132.698057] ret_from_fork+0x27/0x50 Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-rdma@vger.kernel.org Cc: amd-gfx@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: Chris Wilson Cc: Maarten Lankhorst Cc: Christian König Signed-off-by: Daniel Vetter --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 4c4492de670c..3ea4b9258fb0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2441,6 +2441,14 @@ static int amdgpu_device_ip_suspend_phase1(struct amdgpu_device *adev) /* displays are handled separately */ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) { /* XXX handle errors */ + + /* + * This is dm_suspend, which calls modeset locks, and + * that a pretty good inversion against dma_fence_signal + * which gpu recovery is supposed to guarantee. + * + * Dont ask me how to fix this. + */ r = adev->ip_blocks[i].version->funcs->suspend(adev); /* XXX handle errors */ if (r) {