mbox series

[00/10] netfs, ceph, nfs, cachefiles: Miscellaneous fixes/changes

Message ID 20241213135013.2964079-1-dhowells@redhat.com
Headers show
Series netfs, ceph, nfs, cachefiles: Miscellaneous fixes/changes | expand

Message

David Howells Dec. 13, 2024, 1:50 p.m. UTC
Hi Christian,

Here are some miscellaneous fixes and changes for netfslib and the ceph and
nfs filesystems:

 (1) Ignore silly-rename files from afs and nfs when building the header
     archive in a kernel build.

 (2) netfs: Fix the way read result collection applies results to folios
     when each folio is being read by multiple subrequests and the results
     come out of order.

 (3) netfs: Fix ENOMEM handling in buffered reads.

 (4) nfs: Fix an oops in nfs_netfs_init_request() when copying to the cache.

 (5) cachefiles: Parse the "secctx" command immediately to get the correct
     error rather than leaving it to the "bind" command.

 (6) netfs: Remove a redundant smp_rmb().  This isn't a bug per se and
     could be deferred.

 (7) netfs: Fix missing barriers by using clear_and_wake_up_bit().

 (8) netfs: Work around recursion in read retry by failing and abandoning
     the retried subrequest if no I/O is performed.

     [!] NOTE: This only works around the recursion problem if the
     	 recursion keeps returning no data.  If the server manages, say, to
     	 repeatedly return a single byte of data faster than the retry
     	 algorithm can complete, it will still recurse and the stack
     	 overrun may still occur.  Actually fixing this requires quite an
     	 intrusive change which will hopefully make the next merge window.

 (9) netfs: Fix the clearance of a folio_queue when unlocking the page if
     we're going to want to subsequently send the queue for copying to the
     cache (if, for example, we're using ceph).

(10) netfs: Fix the lack of cancellation of copy-to-cache when the cache
     for a file is temporarily disabled (for example when a DIO write is
     done to the file).  This patch and (9) fix hangs with ceph.

With these patches, I can run xfstest -g quick to completion on ceph with a
local cache.

The patches can also be found here with a bonus cifs patch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes

Thanks,
David

David Howells (8):
  kheaders: Ignore silly-rename files
  netfs: Fix non-contiguous donation between completed reads
  netfs: Fix enomem handling in buffered reads
  nfs: Fix oops in nfs_netfs_init_request() when copying to cache
  netfs: Fix missing barriers by using clear_and_wake_up_bit()
  netfs: Work around recursion by abandoning retry if nothing read
  netfs: Fix ceph copy to cache on write-begin
  netfs: Fix the (non-)cancellation of copy when cache is temporarily
    disabled

Max Kellermann (1):
  cachefiles: Parse the "secctx" immediately

Zilin Guan (1):
  netfs: Remove redundant use of smp_rmb()

 fs/9p/vfs_addr.c         |  6 +++++-
 fs/afs/write.c           |  5 ++++-
 fs/cachefiles/daemon.c   | 14 +++++++-------
 fs/cachefiles/internal.h |  3 ++-
 fs/cachefiles/security.c |  6 +++---
 fs/netfs/buffered_read.c | 28 ++++++++++++++++------------
 fs/netfs/direct_write.c  |  1 -
 fs/netfs/read_collect.c  | 33 +++++++++++++++++++--------------
 fs/netfs/read_pgpriv2.c  |  4 ++++
 fs/netfs/read_retry.c    |  6 ++++--
 fs/netfs/write_collect.c | 14 +++++---------
 fs/netfs/write_issue.c   |  2 ++
 fs/nfs/fscache.c         |  9 ++++++++-
 fs/smb/client/cifssmb.c  | 13 +++++++++----
 fs/smb/client/smb2pdu.c  |  9 ++++++---
 include/linux/netfs.h    |  6 +++---
 kernel/gen_kheaders.sh   |  1 +
 17 files changed, 98 insertions(+), 62 deletions(-)

Comments

David Howells Dec. 14, 2024, 1:44 p.m. UTC | #1
[Adding Paul McKenney as he's the expert.]

Akira Yokosawa <akiyks@gmail.com> wrote:

> David Howells wrote:
> > Use clear_and_wake_up_bit() rather than something like:
> > 
> > 	clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
> > 	wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
> > 
> > as there needs to be a barrier inserted between which is present in
> > clear_and_wake_up_bit().
> 
> If I am reading the kernel-doc comment of clear_bit_unlock() [1, 2]:
> 
>     This operation is atomic and provides release barrier semantics.
> 
> correctly, there already seems to be a barrier which should be
> good enough.
> 
> [1]: https://www.kernel.org/doc/html/latest/core-api/kernel-api.html#c.clear_bit_unlock
> [2]: include/asm-generic/bitops/instrumented-lock.h
> 
> > 
> > Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
> > Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
> 
> So I'm not sure this fixes anything.
> 
> What am I missing?

We may need two barriers.  You have three things to synchronise:

 (1) The stuff you did before unlocking.

 (2) The lock bit.

 (3) The task state.

clear_bit_unlock() interposes a release barrier between (1) and (2).

Neither clear_bit_unlock() nor wake_up_bit(), however, necessarily interpose a
barrier between (2) and (3).  I'm not sure it entirely matters, but it seems
that since we have a function that combines the two, we should probably use
it - though, granted, it might not actually be a fix.

David