mbox series

[0/6] Change ->mkdir() and vfs_mkdir() to return a dentry.

Message ID 20250220234630.983190-1-neilb@suse.de
Headers show
Series Change ->mkdir() and vfs_mkdir() to return a dentry. | expand

Message

NeilBrown Feb. 20, 2025, 11:36 p.m. UTC
I'm posting this to a wider audience now as I think it is close to its final form.
I have not included every fs maintainer explicitly (though this patch touches every writable FS)
but hope that fsdevel will catch enough of those).  I have included the affected clients
of vfs_mkdir: nfsd, smb/server, cachefiles, and the filesystems with non-trivial changes:
nfs, cephfs, hostfs, fuse.

mkdir is unique among object creation interfaces as there can only be
one dentry for an directory inode.  There is a possibilty of races which
could result in the inode created by mkdir already having a dentry when
mkdir comes to attach one.  To cope with this, three users of
vfs_mkdir() sometimes do a lookup to find the correct dentry when the
one that was passed in wasn't used.  This lookup is clumsy and racy.

This patch set changes mkdir interface so that the filesystem can
provide the correct dentry.  Some times this still requires a look-up
which can be racey, but having the filesystem do it limits this to only
when it is absolutely necessary.

So this series changes ->mkdir and vfs_mkdir() to allow a dentry to be
returned, changes a few filesystems to actually return a dentry
sometimes, and changes the callers of vfs_mkdir() to use the returned dentry.

I think it best if this could all land through the VFS tree as ask maitainers of:
 cachefiles nfsd smb/server
 hostfs ceph nfs fuse
to provide a Reviewed-by.

Thanks,
NeilBrown

Comments

Viacheslav Dubeyko Feb. 21, 2025, 1:48 a.m. UTC | #1
On Fri, 2025-02-21 at 10:36 +1100, NeilBrown wrote:
> ceph already splices the correct dentry (in splice_dentry()) from the
> result of mkdir but does nothing more with it.
> 
> Now that ->mkdir can return a dentry, return the correct dentry.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/ceph/dir.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 39e0f240de06..c1a1c168bb27 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1099,6 +1099,7 @@ static struct dentry *ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	struct ceph_client *cl = mdsc->fsc->client;
>  	struct ceph_mds_request *req;
>  	struct ceph_acl_sec_ctx as_ctx = {};
> +	struct dentry *ret = NULL;

I believe that it makes sense to initialize pointer by error here and always
return ret as output. If something goes wrong in the logic, then we already have
error.

>  	int err;
>  	int op;
>  
> @@ -1166,14 +1167,20 @@ static struct dentry *ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	    !req->r_reply_info.head->is_dentry)
>  		err = ceph_handle_notrace_create(dir, dentry);
>  out_req:
> +	if (!err && req->r_dentry != dentry)
> +		/* Some other dentry was spliced in */
> +		ret = dget(req->r_dentry);
>  	ceph_mdsc_put_request(req);
>  out:
>  	if (!err)
> +		/* Should this use 'ret' ?? */

Could we make a decision should or shouldn't? :)
It looks not good to leave this comment instead of proper implementation. Do we
have some obstacles to make this decision?

>  		ceph_init_inode_acls(d_inode(dentry), &as_ctx);
>  	else
>  		d_drop(dentry);
>  	ceph_release_acl_sec_ctx(&as_ctx);
> -	return ERR_PTR(err);
> +	if (err)
> +		return ERR_PTR(err);
> +	return ret;

What's about this?

return err ? ERR_PTR(err) : ret;

Thanks,
Slava.

>  }
>  
>  static int ceph_link(struct dentry *old_dentry, struct inode *dir,
Jeff Layton Feb. 21, 2025, 1:31 p.m. UTC | #2
On Fri, 2025-02-21 at 10:36 +1100, NeilBrown wrote:
> ceph already splices the correct dentry (in splice_dentry()) from the
> result of mkdir but does nothing more with it.
> 
> Now that ->mkdir can return a dentry, return the correct dentry.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/ceph/dir.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 39e0f240de06..c1a1c168bb27 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -1099,6 +1099,7 @@ static struct dentry *ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	struct ceph_client *cl = mdsc->fsc->client;
>  	struct ceph_mds_request *req;
>  	struct ceph_acl_sec_ctx as_ctx = {};
> +	struct dentry *ret = NULL;
>  	int err;
>  	int op;
>  
> @@ -1166,14 +1167,20 @@ static struct dentry *ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  	    !req->r_reply_info.head->is_dentry)
>  		err = ceph_handle_notrace_create(dir, dentry);
>  out_req:
> +	if (!err && req->r_dentry != dentry)
> +		/* Some other dentry was spliced in */
> +		ret = dget(req->r_dentry);
>  	ceph_mdsc_put_request(req);
>  out:
>  	if (!err)
> +		/* Should this use 'ret' ?? */

Probably? Is there a guarantee that "dentry" will even have an inode
attached if it got replaced by an disconnected one in the dcache?

>  		ceph_init_inode_acls(d_inode(dentry), &as_ctx);
>  	else
>  		d_drop(dentry);
>  	ceph_release_acl_sec_ctx(&as_ctx);
> -	return ERR_PTR(err);
> +	if (err)
> +		return ERR_PTR(err);
> +	return ret;
>  }
>  
>  static int ceph_link(struct dentry *old_dentry, struct inode *dir,