[5.4,08/98] btrfs: fix lockdep splat when reading qgroup config on mount

From: Filipe Manana <fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

commit 3d05cad3c357a2b749912914356072b38435edfa upstream.

Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
		 but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
		 which lock already depends on the new lock.

  [ 9482.127078]
		 the existing dependency chain (in reverse order) is:
  [ 9482.127213]
		 -> #1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
		 -> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
		 other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
		  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
		 stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/qgroup.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Message ID	20201201084653.760873662@linuxfoundation.org
State	Superseded
Headers	show Return-Path: <stable-owner@kernel.org> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>, David Sterba <dsterba@suse.com> Subject: [PATCH 5.4 08/98] btrfs: fix lockdep splat when reading qgroup config on mount Date: Tue, 1 Dec 2020 09:52:45 +0100 Message-Id: <20201201084653.760873662@linuxfoundation.org> In-Reply-To: <20201201084652.827177826@linuxfoundation.org> References: <20201201084652.827177826@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	None \| expand [5.4,02/98] spi: bcm2835: Fix use-after-free on unbind [5.4,03/98] ipv4: use IS_ENABLED instead of ifdef [5.4,04/98] netfilter: clear skb->next in NF_HOOK_LIST() [5.4,05/98] btrfs: tree-checker: add missing return after error in root_item [5.4,06/98] btrfs: tree-checker: add missing returns after data_ref alignment checks [5.4,07/98] btrfs: dont access possibly stale fs_info data for printing duplicate device [5.4,08/98] btrfs: fix lockdep splat when reading qgroup config on mount [5.4,09/98] wireless: Use linux/stddef.h instead of stddef.h [5.4,10/98] smb3: Call cifs reconnect from demultiplex thread [5.4,11/98] smb3: Avoid Mid pending list corruption [5.4,12/98] smb3: Handle error case during offload read path [5.4,13/98] cifs: fix a memleak with modefromsid [5.4,14/98] KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page [5.4,15/98] KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace [5.4,16/98] KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint [5.4,17/98] KVM: x86: Fix split-irqchip vs interrupt injection window request [5.4,18/98] trace: fix potenial dangerous pointer [5.4,19/98] arm64: pgtable: Fix pte_accessible() [5.4,20/98] arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() [5.4,21/98] HID: uclogic: Add ID for Trust Flex Design Tablet [5.4,22/98] HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses [5.4,23/98] HID: cypress: Support Varmilo Keyboards media hotkeys [5.4,24/98] HID: add support for Sega Saturn [5.4,25/98] Input: i8042 - allow insmod to succeed on devices without an i8042 controller [5.4,26/98] HID: hid-sensor-hub: Fix issue with devices with no report ID [5.4,27/98] staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK [5.4,28/98] HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices [5.4,29/98] dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant [5.4,30/98] x86/xen: dont unbind uninitialized lock_kicker_irq [5.4,31/98] HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge [5.4,32/98] HID: Add Logitech Dinovo Edge battery quirk [5.4,33/98] proc: dont allow async path resolution of /proc/self components [5.4,34/98] nvme: free sq/cq dbbuf pointers when dbbuf set fails [5.4,35/98] vhost scsi: fix cmd completion race [5.4,36/98] dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size [5.4,37/98] scsi: libiscsi: Fix NOP race condition [5.4,38/98] scsi: target: iscsi: Fix cmd abort fabric stop race [5.4,39/98] perf/x86: fix sysfs type mismatches [5.4,40/98] xtensa: uaccess: Add missing __user to strncpy_from_user() prototype [5.4,41/98] net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset [5.4,42/98] bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw [5.4,43/98] ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() [5.4,44/98] phy: tegra: xusb: Fix dangling pointer on probe failure [5.4,45/98] iwlwifi: mvm: write queue_sync_state only for sync [5.4,46/98] batman-adv: set .owner to THIS_MODULE [5.4,47/98] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed [5.4,48/98] ARM: dts: dra76x: m_can: fix order of clocks [5.4,49/98] scsi: ufs: Fix race between shutdown and runtime resume flow [5.4,50/98] bnxt_en: fix error return code in bnxt_init_one() [5.4,51/98] bnxt_en: fix error return code in bnxt_init_board() [5.4,52/98] video: hyperv_fb: Fix the cache type when mapping the VRAM [5.4,53/98] bnxt_en: Release PCI regions when DMA mask setup fails during probe. [5.4,54/98] cxgb4: fix the panic caused by non smac rewrite [5.4,55/98] s390/qeth: make af_iucv TX notification call more robust [5.4,56/98] s390/qeth: fix af_iucv notification race [5.4,57/98] s390/qeth: fix tear down of async TX buffers [5.4,58/98] ibmvnic: fix call_netdevice_notifiers in do_reset [5.4,59/98] ibmvnic: notify peers when failover and migration happen [5.4,60/98] powerpc/64s: Fix allnoconfig build since uaccess flush [5.4,61/98] IB/mthca: fix return value of error branch in mthca_init_cq() [5.4,62/98] i40e: Fix removing driver while bare-metal VFs pass traffic [5.4,63/98] nfc: s3fwrn5: use signed integer for parsing GPIO numbers [5.4,64/98] net: ena: set initial DMA width to avoid intel iommu issue [5.4,65/98] ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues [5.4,66/98] ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq [5.4,67/98] optee: add writeback to valid memory type [5.4,68/98] arm64: tegra: Wrong AON HSP reg property size [5.4,69/98] efivarfs: revert "fix memory leak in efivarfs_create()" [5.4,70/98] efi: EFI_EARLYCON should depend on EFI [5.4,71/98] can: gs_usb: fix endianess problem with candleLight firmware [5.4,72/98] platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time [5.4,73/98] platform/x86: toshiba_acpi: Fix the wrong variable assignment [5.4,74/98] RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP [5.4,75/98] RDMA/hns: Bugfix for memory window mtpt configuration [5.4,76/98] can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()s flags [5.4,77/98] can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 [5.4,78/98] perf stat: Use proper cpu for shadow stats [5.4,79/98] perf probe: Fix to die_entrypc() returns error correctly [5.4,80/98] spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe [5.4,81/98] USB: core: Change %pK for __user pointers to %px [5.4,82/98] usb: gadget: f_midi: Fix memleak in f_midi_alloc [5.4,83/98] USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb... [5.4,84/98] usb: gadget: Fix memleak in gadgetfs_fill_super [5.4,85/98] irqchip/exiu: Fix the index of fwspec for IRQ type [5.4,86/98] x86/mce: Do not overwrite no_way_out if mce_end() fails [5.4,87/98] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb [5.4,88/98] x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak [5.4,89/98] x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak [5.4,90/98] USB: core: Fix regression in Hercules audio card [5.4,91/98] ASoC: Intel: Skylake: Remove superfluous chip initialization [5.4,92/98] ASoC: Intel: Skylake: Select hda configuration permissively [5.4,93/98] ASoC: Intel: Skylake: Enable codec wakeup during chip init [5.4,94/98] ASoC: Intel: Skylake: Shield against no-NHLT configurations [5.4,95/98] ASoC: Intel: Allow for ROM init retry on CNL platforms [5.4,96/98] ASoC: Intel: Skylake: Await purge request ack on CNL [5.4,97/98] ASoC: Intel: Multiple I/O PCM format support for pipe [5.4,98/98] ASoC: Intel: Skylake: Automatic DMIC format configuration according to information fr...

[5.4,08/98] btrfs: fix lockdep splat when reading qgroup config on mount

Commit Message

Patch