mbox series

[v2,0/6] fixes for debugfs/wireless locking issue

Message ID 20231124162522.16344-7-johannes@sipsolutions.net
Headers show
Series fixes for debugfs/wireless locking issue | expand

Message

Johannes Berg Nov. 24, 2023, 4:25 p.m. UTC
There's a locking issue in wireless where it takes a lock inside
a debugfs file handler that's also taken around the removal of
the debugfs file, and this causes a deadlock due to the proxy
object use. Fixing the debugfs removal is tricky because some
of the objects represented there fundamentally are deleted with
the lock held. Not taking the lock in the debugfs file is also
not really the right thing to do. Therefore, right now, the only
way to fix this would be to not have the debugfs files entirely,
but that isn't really helpful either.

Thus, here's a way to fix it that doesn't just remove the files.

The first patch is just a consistency thing in debugfs, right now
directory dentries don't have d_fsdata, normal file use the proxy
fops object, and automount uses the original autmount (function)
pointer, no proxy object. This is an issue if an automount dentry
were ever removed, which right now it isn't as far as I can tell,
but it still makes little sense, so fix it here by also allocating
an object for it, just not with real_fops making it a proxy, but
with the automount pointer inside.

The second patch annotates debugfs with lockdep to actually find
deadlock scenarios such as the one in wireless, indeed with that
and accessing one of the affected debugfs files, lockdep detects
the situation.

The third patch introduces a concept of 'cancellation', whereby a
debugfs file handler that is "long-running" may enter (and later
leave, of course) a cancellation section, where a function call
is made when the file is removed while the handler is running, and
the cancellation function can then abort the handling. This is
pretty generic so far.

The later patches (and I assume those would go through the wireless
tree) then fix the locking issue by declaring that the work that's
going to happen under lock is "long-running" per the above, and
setting up a cancellation. The way it works is that it actually
defers the real handling to a workqueue but then waits for it, but
that waiting can be aborted (and the work stopped) if the file is
removed before the work was able to run, thus fixing the deadlock.

I hope this is an acceptable compromise in terms of functionality
in debugfs vs. the user. It'd be possible to set up something of
the same sort in debugfs, but I feel the cancellation API is more
generic and thus more useful, and the actual details of what's in
the actual file handlers don't matter then.

johannes

Comments

Ben Greear Nov. 28, 2023, 3:48 p.m. UTC | #1
On 11/24/23 08:25, Johannes Berg wrote:
> There's a locking issue in wireless where it takes a lock inside
> a debugfs file handler that's also taken around the removal of
> the debugfs file, and this causes a deadlock due to the proxy
> object use. Fixing the debugfs removal is tricky because some
> of the objects represented there fundamentally are deleted with
> the lock held. Not taking the lock in the debugfs file is also
> not really the right thing to do. Therefore, right now, the only
> way to fix this would be to not have the debugfs files entirely,
> but that isn't really helpful either.

I have been running the RFC series and haven't noticed problems, so you
can add:

Tested-by: Ben Greear <greearb@candelatech.com>

Thanks,
Ben