[4.9,43/53] futex: Handle early deadlock return correctly

From: Thomas Gleixner <tglx@linutronix.de>

From: Thomas Gleixner <tglx@linutronix.de>

commit 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434 upstream.

commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the
rtmutex") changed the locking rules in the futex code so that the hash
bucket lock is not longer held while the waiter is enqueued into the
rtmutex wait list. This made the lock and the unlock path symmetric, but
unfortunately the possible early exit from __rt_mutex_proxy_start() due to
a detected deadlock was not updated accordingly. That allows a concurrent
unlocker to observe inconsitent state which triggers the warning in the
unlock path.

futex_lock_pi()                         futex_unlock_pi()
  lock(hb->lock)
  queue(hb_waiter)				lock(hb->lock)
  lock(rtmutex->wait_lock)
  unlock(hb->lock)
                                        // acquired hb->lock
                                        hb_waiter = futex_top_waiter()
                                        lock(rtmutex->wait_lock)
  __rt_mutex_proxy_start()
     ---> fail
          remove(rtmutex_waiter);
     ---> returns -EDEADLOCK
  unlock(rtmutex->wait_lock)
                                        // acquired wait_lock
                                        wake_futex_pi()
                                        rt_mutex_next_owner()
					  --> returns NULL
                                          --> WARN

  lock(hb->lock)
  unqueue(hb_waiter)

The problem is caused by the remove(rtmutex_waiter) in the failure case of
__rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

The original commit handles this correctly for the other early return cases
(timeout, signal) by delaying the removal of the rtmutex waiter until the
returning task reacquired the hash bucket lock.

Treat the failure case of __rt_mutex_proxy_start() in the same way and let
the existing cleanup code handle the eventual handover of the rtmutex
gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
removal for the failure case, so that the other callsites are still
operating correctly.

Add proper comments to the code so all these details are fully documented.

Thanks to Peter for helping with the analysis and writing the really
valuable code comments.

Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Stefan Liebler <stli@linux.ibm.com>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/futex.c           |   28 ++++++++++++++++++----------
 kernel/locking/rtmutex.c |   37 ++++++++++++++++++++++++++++++++-----
 2 files changed, 50 insertions(+), 15 deletions(-)

Message ID	20210329075608.925964742@linuxfoundation.org
State	New
Headers	show Return-Path: <stable-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20CC0C433E0 for <stable@archiver.kernel.org>; Mon, 29 Mar 2021 08:04:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CA73761938 for <stable@archiver.kernel.org>; Mon, 29 Mar 2021 08:04:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231553AbhC2ID7 (ORCPT <rfc822;stable@archiver.kernel.org>); Mon, 29 Mar 2021 04:03:59 -0400 Received: from mail.kernel.org ([198.145.29.99]:45950 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231510AbhC2IDS (ORCPT <rfc822;stable@vger.kernel.org>); Mon, 29 Mar 2021 04:03:18 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 42139619A0; Mon, 29 Mar 2021 08:03:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1617004997; bh=Q+re3X1dSQ9ejPCBLkAlPmPEPaHh6xq2gq+7q2obHfE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UcCbJ1abF0542nZSL57nFSahMdF/H2OWhINleMZWdxLpugpU2HBGAmsSLqLFQpfHH /of/R0AFS/meKVnoyh2/daNI/+UTu+nq8FwI6rEc0neWz1UkaWdxlQR6gyYgOoi9iK yv+q9PSXM+3KJuE3JACZU+Nh76I6yttwCan861pQ= From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Heiko Carstens <heiko.carstens@de.ibm.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Martin Schwidefsky <schwidefsky@de.ibm.com>, linux-s390@vger.kernel.org, Stefan Liebler <stli@linux.ibm.com>, Sebastian Sewior <bigeasy@linutronix.de>, Ben Hutchings <ben@decadent.org.uk> Subject: [PATCH 4.9 43/53] futex: Handle early deadlock return correctly Date: Mon, 29 Mar 2021 09:58:18 +0200 Message-Id: <20210329075608.925964742@linuxfoundation.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210329075607.561619583@linuxfoundation.org> References: <20210329075607.561619583@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <stable.vger.kernel.org> X-Mailing-List: stable@vger.kernel.org
Series	None \| expand [4.9,02/53] powerpc/4xx: Fix build errors from mfdcr() [4.9,03/53] atm: eni: dont release is never initialized [4.9,04/53] atm: lanai: dont run lanai_dev_close if not open [4.9,05/53] ixgbe: Fix memleak in ixgbe_configure_clsu32 [4.9,06/53] net: tehuti: fix error return code in bdx_probe() [4.9,07/53] sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count [4.9,08/53] nfs: fix PNFS_FLEXFILE_LAYOUT Kconfig default [4.9,09/53] NFS: Correct size calculation for create reply length [4.9,10/53] net: wan: fix error return code of uhdlc_init() [4.9,11/53] atm: uPD98402: fix incorrect allocation [4.9,12/53] atm: idt77252: fix null-ptr-dereference [4.9,13/53] u64_stats,lockdep: Fix u64_stats_init() vs lockdep [4.9,14/53] nfs: we dont support removing system.nfs4_acl [4.9,15/53] ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls [4.9,16/53] ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign [4.9,17/53] x86/tlb: Flush global mappings when KAISER is disabled [4.9,18/53] squashfs: fix inode lookup sanity checks [4.9,19/53] squashfs: fix xattr id and id lookup sanity checks [4.9,20/53] arm64: dts: ls1043a: mark crypto engine dma coherent [4.9,21/53] bus: omap_l3_noc: mark l3 irqs as IRQF_NO_THREAD [4.9,22/53] macvlan: macvlan_count_rx() needs to be aware of preemption [4.9,23/53] net: dsa: bcm_sf2: Qualify phydev->dev_flags based on port [4.9,24/53] e1000e: add rtnl_lock() to e1000_reset_task [4.9,25/53] e1000e: Fix error handling in e1000_set_d0_lplu_state_82571 [4.9,26/53] net/qlcnic: Fix a use after free in qlcnic_83xx_get_minidump_template [4.9,27/53] can: c_can_pci: c_can_pci_remove(): fix use-after-free [4.9,28/53] can: c_can: move runtime PM enable/disable to c_can_platform [4.9,29/53] can: m_can: m_can_do_rx_poll(): fix extraneous msg loss warning [4.9,30/53] mac80211: fix rate mask reset [4.9,31/53] net: cdc-phonet: fix data-interface release on probe failure [4.9,32/53] RDMA/cxgb4: Fix adapter LE hash errors while destroying ipv6 listening server [4.9,33/53] ACPI: scan: Rearrange memory allocation in acpi_device_add() [4.9,34/53] ACPI: scan: Use unique number for instance_no [4.9,35/53] perf auxtrace: Fix auxtrace queue conflict [4.9,36/53] idr: add ida_is_empty [4.9,37/53] futex: Use smp_store_release() in mark_wake_futex() [4.9,38/53] futex,rt_mutex: Introduce rt_mutex_init_waiter() [4.9,39/53] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() [4.9,40/53] futex: Drop hb->lock before enqueueing on the rtmutex [4.9,41/53] futex: Avoid freeing an active timer [4.9,42/53] futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock() [4.9,43/53] futex: Handle early deadlock return correctly [4.9,44/53] futex: Fix (possible) missed wakeup [4.9,45/53] locking/futex: Allow low-level atomic operations to return -EAGAIN [4.9,46/53] arm64: futex: Bound number of LDXR/STXR loops in FUTEX_WAKE_OP [4.9,47/53] futex: Prevent robust futex exit race [4.9,48/53] futex: Fix incorrect should_fail_futex() handling [4.9,49/53] futex: Handle transient "ownerless" rtmutex state correctly [4.9,50/53] can: dev: Move device back to init netns on owning netns delete [4.9,51/53] net: sched: validate stab values [4.9,52/53] net: qrtr: fix a kernel-infoleak in qrtr_recvmsg() [4.9,53/53] mac80211: fix double free in ibss_leave

[4.9,43/53] futex: Handle early deadlock return correctly

Commit Message

Patch