Message ID | 20241101114410.234311-1-dmantipov@yandex.ru |
---|---|
State | New |
Headers | show |
Series | [v2] Bluetooth: fix use-after-free in device_for_each_child() | expand |
Hi Dmitry, On Sat, Nov 2, 2024 at 5:51 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote: > > > Looks like this doesn't solve the problem, in fact I think you are > > getting it backwards, you are trying to reparent the parent dev not > > the child and I assume by destroying the parent device there should be > > some way to reset the parent which seems to be the intent the > > following code in hci_conn_del_sysfs: > > > > while (1) { > > struct device *dev; > > > > dev = device_find_child(&conn->dev, NULL, __match_tty); > > if (!dev) > > break; > > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > > put_device(dev); > > } > > > > But note that it only does that after matching tty, but I guess we > > want to do it regardless otherwise we may have the child objects still > > access it, that said we should probably use device_for_each_child > > though if that is safe to do calls to device_move under its callback. > > I'm not sure that I've got your point. IIUC the problem is that a controller > (parent) instance may be freed _after_ the child (connection) has passed > 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev' > from the devices hierarchy, thus leaving the dangling 'conn->dev.parent' > pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly. > And handling children of 'conn->dev' (i.e. the grandchilren of the controller) > is out of this problem's scope at all. > > And nothing to say about syzbot's innards but manual testing shows that the > following thing: > > void hci_conn_del_sysfs(struct hci_conn *conn) > { > struct hci_dev *hdev = conn->hdev; > > bt_dev_dbg(hdev, "conn %p", conn); > > if (!device_is_registered(&conn->dev)) { > /* If device_add() has *not* succeeded, use *only* put_device() > * to drop the reference count. > */ > put_device(&conn->dev); > return; > } > > while (1) { > struct device *dev; > > dev = device_find_any_child(&conn->dev); > if (!dev) > break; > printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n", > __FILE__, __LINE__, dev, dev_name(dev), dev->parent, > (dev->parent ? dev_name(dev->parent) : "<none>")); > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > put_device(dev); > } > > device_unregister(&conn->dev); > } > > occasionally triggers the following crash: > > net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200) > Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI > KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] > CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 > RIP: 0010:klist_put+0x4d/0x1d0 > Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49 > RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000 > RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058 > RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c > R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805 > FS: 00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0 > Call Trace: > <TASK> > ? die_addr.cold+0x8/0xd > ? exc_general_protection+0x147/0x240 > ? asm_exc_general_protection+0x26/0x30 > ? klist_remove+0x155/0x2b0 > ? klist_put+0x15/0x1d0 > ? klist_put+0x4d/0x1d0 > klist_remove+0x15a/0x2b0 > ? __pfx_klist_remove+0x10/0x10 > device_move+0x12d/0x10b0 > hci_conn_del_sysfs.cold+0xcf/0x14a > hci_conn_del+0x467/0xd60 > hci_conn_hash_flush+0x18f/0x270 > hci_dev_close_sync+0x549/0x1260 > hci_dev_do_close+0x2e/0x90 > hci_unregister_dev+0x213/0x630 > vhci_release+0x79/0xf0 > ? __pfx_vhci_release+0x10/0x10 > __fput+0x3f6/0xb30 > task_work_run+0x151/0x250 > ? __pfx_task_work_run+0x10/0x10 > do_exit+0xa79/0x2c30 > ? do_raw_spin_lock+0x12a/0x2b0 > ? __pfx_do_exit+0x10/0x10 > do_group_exit+0xd5/0x2a0 > __x64_sys_exit_group+0x3e/0x50 > x64_sys_call+0x14af/0x14b0 > do_syscall_64+0xc7/0x250 > entry_SYSCALL_64_after_hwframe+0x77/0x7f Weird, this was not reproduced by syzbot when I asked it to test, how are you reproducing this? This looks like to be a problem with klist being corrupted somehow, anyway if we can't unparent the objects we need a way to unregister the child without waiting for the bnep thread to clean it up. > Dmitry
On 11/4/24 5:06 PM, Luiz Augusto von Dentz wrote: > Weird, this was not reproduced by syzbot when I asked it to test, how > are you reproducing this? This is not regular, looks like a race, and I'm not sure how many runs are required to reproduce. Anyway we can't blame syzbot just because it was unable to reproduce some weird thing. Dmitry
Hi, On Mon, Nov 4, 2024 at 9:58 AM Dmitry Antipov <dmantipov@yandex.ru> wrote: > > On 11/4/24 5:06 PM, Luiz Augusto von Dentz wrote: > > > Weird, this was not reproduced by syzbot when I asked it to test, how > > are you reproducing this? > > This is not regular, looks like a race, and I'm not sure how many runs > are required to reproduce. Anyway we can't blame syzbot just because > it was unable to reproduce some weird thing. But how you reproducing, or you just running syzbot test over and over locally? Anyway I will look into how we can actually unregister the likes of bnep net_dev in a more direct manner rather than doing via bnep kthread. > Dmitry >
diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index 367e32fe30eb..80ac537fa500 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -73,6 +73,8 @@ void hci_conn_del_sysfs(struct hci_conn *conn) return; } + device_move(&conn->dev, NULL, DPM_ORDER_DEV_LAST); + while (1) { struct device *dev;
Syzbot has reported the following KASAN splat: BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0 Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980 CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x190 ? device_for_each_child+0x18f/0x1a0 print_report+0x13a/0x4cb ? __virt_addr_valid+0x5e/0x590 ? __phys_addr+0xc6/0x150 ? device_for_each_child+0x18f/0x1a0 kasan_report+0xda/0x110 ? device_for_each_child+0x18f/0x1a0 ? __pfx_dev_memalloc_noio+0x10/0x10 device_for_each_child+0x18f/0x1a0 ? __pfx_device_for_each_child+0x10/0x10 pm_runtime_set_memalloc_noio+0xf2/0x180 netdev_unregister_kobject+0x1ed/0x270 unregister_netdevice_many_notify+0x123c/0x1d80 ? __mutex_trylock_common+0xde/0x250 ? __pfx_unregister_netdevice_many_notify+0x10/0x10 ? trace_contention_end+0xe6/0x140 ? __mutex_lock+0x4e7/0x8f0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x12/0xc0 ? unregister_netdev+0x12/0x30 unregister_netdevice_queue+0x30d/0x3f0 ? __pfx_unregister_netdevice_queue+0x10/0x10 ? __pfx_down_write+0x10/0x10 unregister_netdev+0x1c/0x30 bnep_session+0x1fb3/0x2ab0 ? __pfx_bnep_session+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_woken_wake_function+0x10/0x10 ? __kthread_parkme+0x132/0x200 ? __pfx_bnep_session+0x10/0x10 ? kthread+0x13a/0x370 ? __pfx_bnep_session+0x10/0x10 kthread+0x2b7/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x48/0x80 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 4974: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_kmalloc+0xaa/0xb0 __kmalloc_noprof+0x1d1/0x440 hci_alloc_dev_priv+0x1d/0x2820 __vhci_create_device+0xef/0x7d0 vhci_write+0x2c7/0x480 vfs_write+0x6a0/0xfc0 ksys_write+0x12f/0x260 do_syscall_64+0xc7/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 4979: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x4f/0x70 kfree+0x141/0x490 hci_release_dev+0x4d9/0x600 bt_host_release+0x6a/0xb0 device_release+0xa4/0x240 kobject_put+0x1ec/0x5a0 put_device+0x1f/0x30 vhci_release+0x81/0xf0 __fput+0x3f6/0xb30 task_work_run+0x151/0x250 do_exit+0xa79/0x2c30 do_group_exit+0xd5/0x2a0 get_signal+0x1fcd/0x2210 arch_do_signal_or_restart+0x93/0x780 syscall_exit_to_user_mode+0x140/0x290 do_syscall_64+0xd4/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when an underlying (kobject) reference counter is greater than 1. This means that reparenting (happened when the device is actually freed) is delayed and, during that delay, parent controller device (hciX) may be deleted. Since the latter may create a dangling pointer to freed parent, avoid that scenario by reparenting to NULL explicitly. Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> --- v2: reparent per-connection 'struct device' explicitly --- net/bluetooth/hci_sysfs.c | 2 ++ 1 file changed, 2 insertions(+)