[RFC,v2,0/3] ceph: add support for snapshot names encryption

Message ID	20220315161959.19453-1-lhenriques@suse.de
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: =?utf-8?q?Lu=C3=ADs_Henriques?= <lhenriques@suse.de> To: Jeff Layton <jlayton@kernel.org>, Xiubo Li <xiubli@redhat.com>, Ilya Dryomov <idryomov@gmail.com> Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Lu?= =?utf-8?q?=C3=ADs_Henriques?= <lhenriques@suse.de> Subject: [RFC PATCH v2 0/3] ceph: add support for snapshot names encryption Date: Tue, 15 Mar 2022 16:19:56 +0000 Message-Id: <20220315161959.19453-1-lhenriques@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ceph: add support for snapshot names encryption \| expand [RFC,v2,0/3] ceph: add support for snapshot names encryption [RFC,v2,1/3] ceph: add support for encrypted snapshot names [RFC,v2,2/3] ceph: add support for handling encrypted snapshot names [RFC,v2,3/3] ceph: update documentation regarding snapshot naming limitations

Message ID

20220315161959.19453-1-lhenriques@suse.de

Headers

From: =?utf-8?q?Lu=C3=ADs_Henriques?= <lhenriques@suse.de>
To: Jeff Layton <jlayton@kernel.org>, Xiubo Li <xiubli@redhat.com>,
        Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org, =?utf-8?q?Lu?=
	=?utf-8?q?=C3=ADs_Henriques?= <lhenriques@suse.de>
Subject: [RFC PATCH v2 0/3] ceph: add support for snapshot names encryption
Date: Tue, 15 Mar 2022 16:19:56 +0000
Message-Id: <20220315161959.19453-1-lhenriques@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

ceph: add support for snapshot names encryption | expand

Message

Luis Henriques March 15, 2022, 4:19 p.m. UTC

Hi!

A couple of changes since v1:

- Dropped the dentry->d_flags change in ceph_mkdir().  Thanks to Xiubo
  suggestion, patch 0001 now skips calling ceph_fscrypt_prepare_context()
  if we're handling a snapshot.

- Added error handling to ceph_get_snapdir() in patch 0001 (Jeff had
  already pointed that out but I forgot to include that change in previous
  revision).

- Rebased patch 0002 to the latest wip-fscrypt branch.

- Added some documentation regarding snapshots naming restrictions.

As before, in order to test this code the following PRs are required:

  mds: add protection from clients without fscrypt support #45073
  mds: use the whole string as the snapshot long name #45192
  mds: support alternate names for snapshots #45224
  mds: limit the snapshot names to 240 characters #45312

Luís Henriques (3):
  ceph: add support for encrypted snapshot names
  ceph: add support for handling encrypted snapshot names
  ceph: update documentation regarding snapshot naming limitations

 Documentation/filesystems/ceph.rst |  10 ++
 fs/ceph/crypto.c                   | 158 +++++++++++++++++++++++++----
 fs/ceph/crypto.h                   |  11 +-
 fs/ceph/inode.c                    |  31 +++++-
 4 files changed, 182 insertions(+), 28 deletions(-)

Comments

Xiubo Li March 16, 2022, 12:07 a.m. UTC | #1

On 3/16/22 12:19 AM, Luís Henriques wrote:
> Since filenames in encrypted directories are already encrypted and shown
> as a base64-encoded string when the directory is locked, snapshot names
> should show a similar behaviour.
>
> Signed-off-by: Luís Henriques <lhenriques@suse.de>
> ---
>   fs/ceph/inode.c | 31 +++++++++++++++++++++++++++----
>   1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 7b670e2405c1..359e29896f16 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -91,9 +91,15 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
>   	if (err < 0)
>   		goto out_err;
>   
> -	err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
> -	if (err)
> -		goto out_err;
> +	/*
> +	 * We'll skip setting fscrypt context for snapshots, leaving that for
> +	 * the handle_reply().
> +	 */
> +	if (ceph_snap(dir) != CEPH_SNAPDIR) {
> +		err = ceph_fscrypt_prepare_context(dir, inode, as_ctx);
> +		if (err)
> +			goto out_err;
> +	}
>   
>   	return inode;
>   out_err:
> @@ -157,6 +163,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
>   	};
>   	struct inode *inode = ceph_get_inode(parent->i_sb, vino, NULL);
>   	struct ceph_inode_info *ci = ceph_inode(inode);
> +	int ret = -ENOTDIR;
>   
>   	if (IS_ERR(inode))
>   		return inode;
> @@ -182,6 +189,22 @@ struct inode *ceph_get_snapdir(struct inode *parent)
>   	ci->i_rbytes = 0;
>   	ci->i_btime = ceph_inode(parent)->i_btime;
>   
> +	/* if encrypted, just borrow fscrypt_auth from parent */
> +	if (IS_ENCRYPTED(parent)) {
> +		struct ceph_inode_info *pci = ceph_inode(parent);
> +
> +		ci->fscrypt_auth = kmemdup(pci->fscrypt_auth,
> +					   pci->fscrypt_auth_len,
> +					   GFP_KERNEL);
> +		if (ci->fscrypt_auth) {
> +			inode->i_flags |= S_ENCRYPTED;
> +			ci->fscrypt_auth_len = pci->fscrypt_auth_len;
> +		} else {
> +			dout("Failed to alloc snapdir fscrypt_auth\n");
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +	}
>   	if (inode->i_state & I_NEW) {
>   		inode->i_op = &ceph_snapdir_iops;
>   		inode->i_fop = &ceph_snapdir_fops;
> @@ -195,7 +218,7 @@ struct inode *ceph_get_snapdir(struct inode *parent)
>   		discard_new_inode(inode);
>   	else
>   		iput(inode);
> -	return ERR_PTR(-ENOTDIR);
> +	return ERR_PTR(ret);
>   }
>   
>   const struct inode_operations ceph_file_iops = {
>
LGTM.

Reviewed-by: Xiubo Li <xiubli@redhat.com>

Luis Henriques March 16, 2022, 11 a.m. UTC | #2

Xiubo Li <xiubli@redhat.com> writes:

> On 3/16/22 12:19 AM, Luís Henriques wrote:
>> When creating a snapshot, the .snap directories for every subdirectory will
>> show the snapshot name in the "long format":
>>
>>    # mkdir .snap/my-snap
>>    # ls my-dir/.snap/
>>    _my-snap_1099511627782
>>
>> Encrypted snapshots will need to be able to handle these snapshot names by
>> encrypting/decrypting only the snapshot part of the string ('my-snap').
>>
>> Also, since the MDS prevents snapshot names to be bigger than 240 characters
>> it is necessary to adapt CEPH_NOHASH_NAME_MAX to accommodate this extra
>> limitation.
>>
>> Signed-off-by: Luís Henriques <lhenriques@suse.de>
>> ---
>>   fs/ceph/crypto.c | 158 +++++++++++++++++++++++++++++++++++++++++------
>>   fs/ceph/crypto.h |  11 ++--
>>   2 files changed, 145 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>> index c125a79019b3..06a4b918201c 100644
>> --- a/fs/ceph/crypto.c
>> +++ b/fs/ceph/crypto.c
>> @@ -128,18 +128,95 @@ void ceph_fscrypt_as_ctx_to_req(struct ceph_mds_request *req, struct ceph_acl_se
>>   	swap(req->r_fscrypt_auth, as->fscrypt_auth);
>>   }
>>   -int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr
>> *d_name, char *buf)
>> +/*
>> + * User-created snapshots can't start with '_'.  Snapshots that start with this
>> + * character are special (hint: there aren't real snapshots) and use the
>> + * following format:
>> + *
>> + *   _<SNAPSHOT-NAME>_<INODE-NUMBER>
>> + *
>> + * where:
>> + *  - <SNAPSHOT-NAME> - the real snapshot name that may need to be decrypted,
>> + *  - <INODE-NUMBER> - the inode number for the actual snapshot
>> + *
>> + * This function parses these snapshot names and returns the inode
>> + * <INODE-NUMBER>.  'name_len' will also bet set with the <SNAPSHOT-NAME>
>> + * length.
>> + */
>> +static struct inode *parse_longname(const struct inode *parent, const char *name,
>> +				    int *name_len)
>>   {
>> +	struct inode *dir = NULL;
>> +	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>> +	char *inode_number;
>> +	char *name_end;
>> +	int orig_len = *name_len;
>> +	int ret = -EIO;
>> +
>> +	/* Skip initial '_' */
>> +	name++;
>> +	name_end = strrchr(name, '_');
>> +	if (!name_end) {
>> +		dout("Failed to parse long snapshot name: %s\n", name);
>> +		return ERR_PTR(-EIO);
>> +	}
>> +	*name_len = (name_end - name);
>> +	if (*name_len <= 0) {
>> +		pr_err("Failed to parse long snapshot name\n");
>> +		return ERR_PTR(-EIO);
>> +	}
>> +
>> +	/* Get the inode number */
>> +	inode_number = kmemdup_nul(name_end + 1,
>> +				   orig_len - *name_len - 2,
>> +				   GFP_KERNEL);
>> +	if (!inode_number)
>> +		return ERR_PTR(-ENOMEM);
>> +	ret = kstrtou64(inode_number, 0, &vino.ino);
>> +	if (ret) {
>> +		dout("Failed to parse inode number: %s\n", name);
>> +		dir = ERR_PTR(ret);
>> +		goto out;
>> +	}
>> +
>> +	/* And finally the inode */
>> +	dir = ceph_get_inode(parent->i_sb, vino, NULL);
>
> Maybe you should use ceph_find_inode() here ? We shouldn't insert a new one
> here. And IMO the parent dir inode must be in the cache...

Right, that makes sense.  I'll swap it for the ceph_find_inode().

>> +	if (IS_ERR(dir))
>> +		dout("Can't find inode %s (%s)\n", inode_number, name);
>> +
>> +out:
>> +	kfree(inode_number);
>> +	return dir;
>> +}
>
> Here I think you have missed one case, not all the long snap names are needed to
> be dencrypted if they are from the parent snap realms, who are not encrypted,
> for example:
>
> mkdir dir1
>
> fscrypt encrypt dir1
>
> mkdir dir1/dir2
>
> mkdir .snap/root_snap
>
> mkdir dir1/.snap/dir1_snap
>
> ls dir1/dir2/.snap/
>
> _root_snap_1  _dir1_snap_1099511628283
>
> You shouldn't encrypt the "_root_snap_1" long name.

Ah!  Good catch!  Yes, this case isn't being covered.  I'll fix it with by
following your suggestion bellow.

>> +
>> +int ceph_encode_encrypted_dname(struct inode *parent, struct qstr *d_name, char *buf)
>> +{
>> +	struct inode *dir = parent;
>> +	struct qstr iname;
>>   	u32 len;
>> +	int name_len;
>>   	int elen;
>>   	int ret;
>> -	u8 *cryptbuf;
>> +	u8 *cryptbuf = NULL;
>>     	if (!fscrypt_has_encryption_key(parent)) {
>>   		memcpy(buf, d_name->name, d_name->len);
>>   		return d_name->len;
>>   	}
>>   +	iname.name = d_name->name;
>> +	name_len = d_name->len;
>> +
>> +	/* Handle the special case of snapshot names that start with '_' */
>> +	if ((ceph_snap(dir) == CEPH_SNAPDIR) && (name_len > 0) &&
>> +	    (iname.name[0] == '_')) {
>> +		dir = parse_longname(parent, iname.name, &name_len);
>> +		if (IS_ERR(dir))
>> +			return PTR_ERR(dir);
>> +		iname.name++; /* skip initial '_' */
>> +	}
>> +	iname.len = name_len;
>> +
>
> Maybe you can do this just before checking the fscrypt_has_encryption_key() to
> fix the issue mentioned above ?
>
>
>>   	/*
>>   	 * convert cleartext d_name to ciphertext
>>   	 * if result is longer than CEPH_NOKEY_NAME_MAX,
>> @@ -147,18 +224,22 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,
>>   	 *
>>   	 * See: fscrypt_setup_filename
>>   	 */
>> -	if (!fscrypt_fname_encrypted_size(parent, d_name->len, NAME_MAX, &len))
>> -		return -ENAMETOOLONG;
>> +	if (!fscrypt_fname_encrypted_size(dir, iname.len, NAME_MAX, &len)) {
>> +		elen = -ENAMETOOLONG;
>> +		goto out;
>> +	}
>>     	/* Allocate a buffer appropriate to hold the result */
>>   	cryptbuf = kmalloc(len > CEPH_NOHASH_NAME_MAX ? NAME_MAX : len, GFP_KERNEL);
>> -	if (!cryptbuf)
>> -		return -ENOMEM;
>> +	if (!cryptbuf) {
>> +		elen = -ENOMEM;
>> +		goto out;
>> +	}
>>   -	ret = fscrypt_fname_encrypt(parent, d_name, cryptbuf, len);
>> +	ret = fscrypt_fname_encrypt(dir, &iname, cryptbuf, len);
>>   	if (ret) {
>> -		kfree(cryptbuf);
>> -		return ret;
>> +		elen = ret;
>> +		goto out;
>>   	}
>>     	/* hash the end if the name is long enough */
>> @@ -174,12 +255,24 @@ int ceph_encode_encrypted_dname(const struct inode *parent, struct qstr *d_name,
>>     	/* base64 encode the encrypted name */
>>   	elen = fscrypt_base64url_encode(cryptbuf, len, buf);
>> -	kfree(cryptbuf);
>>   	dout("base64-encoded ciphertext name = %.*s\n", elen, buf);
>> +
>> +	if ((elen > 0) && (dir != parent)) {
>> +		char tmp_buf[FSCRYPT_BASE64URL_CHARS(NAME_MAX)];
>> +
>
> Do we really need FSCRYPT_BASE64URL_CHARS(NAME_MAX) ? Since you have fix the
> 189->180 code, then the encrypted long snap name shouldn't exceed 255.
>
> I think the NAME_MAX is enough.

Yes, correct.  I'll change that too.

> And also you should check the elen here it shouldn't exceed 240 after encrypted,
> or should we fail it here directly with a warning log ?

Right, that should probably be logged.  I'll had that check.

Thanks a lot for your review, Xiubo.

Cheers,

Luis Henriques March 17, 2022, 3:59 p.m. UTC | #3

Jeff Layton <jlayton@kernel.org> writes:

> On Thu, 2022-03-17 at 11:11 +0000, Luís Henriques wrote:
>> Xiubo Li <xiubli@redhat.com> writes:
>> 
>> > On 3/17/22 6:01 PM, Jeff Layton wrote:
>> > > I'm not sure we want to worry about .snap directories here since they
>> > > aren't "real". IIRC, snaps are inherited from parents too, so you could
>> > > do something like
>> > > 
>> > >      mkdir dir1
>> > >      mkdir dir1/.snap/snap1
>> > >      mkdir dir1/dir2
>> > >      fscrypt encrypt dir1/dir2
>> > > 
>> > > There should be nothing to prevent encrypting dir2, but I'm pretty sure
>> > > dir2/.snap will not be empty at that point.
>> > 
>> > If we don't take care of this. Then we don't know which snapshots should do
>> > encrypt/dencrypt and which shouldn't when building the path in lookup and when
>> > reading the snapdir ?
>> 
>> In my patchset (which I plan to send a new revision later today, I think I
>> still need to rebase it) this is handled by using the *real* snapshot
>> parent inode.  If we're decrypting/encrypting a name for a snapshot that
>> starts with a '_' character, we first find the parent inode for that
>> snapshot and only do the operation if that parent is encrypted.
>> 
>> In the other email I suggested that we could prevent enabling encryption
>> in a directory when there are snapshots above in the hierarchy.  But now
>> that I think more about it, it won't solve any problem because you could
>> create those snapshots later and then you would still need to handle these
>> (non-encrypted) "_name_xxxx" snapshots anyway.
>> 
>
> Yeah, that sounds about right.
>
> What happens if you don't have the snapshot parent's inode in cache?
> That can happen if you (e.g.) are running NFS over ceph, or if you get
> crafty with name_to_handle_at() and open_by_handle_at().
>
> Do we have to do a LOOKUPINO in that case or does the trace contain that
> info? If it doesn't then that could really suck in a big hierarchy if
> there are a lot of different snapshot parent inodes to hunt down.
>
> I think this is a case where the client just doesn't have complete
> control over the dentry name. It may be better to just not encrypt them
> if it's too ugly.

I *think* this is covered by my last revision.  I didn't really tested
NFS, but this was why the patches are using ceph_get_inode() and falling
back to ceph_find_inode().  I tested this by directly mounting an
encrypted directory that had snapshots from a realm that wasn't in the
mount root.

(Obviously, these snapshot names are *not* encrypted because they belong
to snapshots that are not encrypted either.)

Cheers,