diff mbox series

fs/ceph/mds_client: give up on paths longer than PATH_MAX

Message ID 20241118222828.240530-1-max.kellermann@ionos.com
State New
Headers show
Series fs/ceph/mds_client: give up on paths longer than PATH_MAX | expand

Commit Message

Max Kellermann Nov. 18, 2024, 10:28 p.m. UTC
If the full path to be built by ceph_mdsc_build_path() happens to be
longer than PATH_MAX, then this function will enter an endless (retry)
loop, effectively blocking the whole task.  Most of the machine
becomes unusable, making this a very simple and effective DoS
vulnerability.

I cannot imagine why this retry was ever implemented, but it seems
rather useless and harmful to me.  Let's remove it and fail with
ENAMETOOLONG instead.

Cc: stable@vger.kernel.org
Reported-by: Dario Weißer <dario@cure53.de>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/ceph/mds_client.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Ilya Dryomov Nov. 19, 2024, 12:38 p.m. UTC | #1
On Mon, Nov 18, 2024 at 11:28 PM Max Kellermann
<max.kellermann@ionos.com> wrote:
>
> If the full path to be built by ceph_mdsc_build_path() happens to be
> longer than PATH_MAX, then this function will enter an endless (retry)
> loop, effectively blocking the whole task.  Most of the machine
> becomes unusable, making this a very simple and effective DoS
> vulnerability.
>
> I cannot imagine why this retry was ever implemented, but it seems
> rather useless and harmful to me.  Let's remove it and fail with
> ENAMETOOLONG instead.

Hi Max,

When this was put in place in 2009, I think the idea of a retry was
copied from CIFS.  Jeff preserved the retry when he massaged this code
to not warn in case a rename race is detected [1].  CIFS got rid of it
only a couple of years ago [2][3].

Adding Patrick and Venky as well, please chime in.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5946bcc5e79038f9f7cb66ec25bd3b2d39b2775
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f6a9bc336b600e1266e6eebb0972d75d5b93aea9
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=991e72eb0e99764219865b9a3a07328695148e14

Thanks,

                Ilya

>
> Cc: stable@vger.kernel.org
> Reported-by: Dario Weißer <dario@cure53.de>
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
>  fs/ceph/mds_client.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index c4a5fd94bbbb..4f6ac015edcd 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2808,12 +2808,11 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
>
>         if (pos < 0) {
>                 /*
> -                * A rename didn't occur, but somehow we didn't end up where
> -                * we thought we would. Throw a warning and try again.
> +                * The path is longer than PATH_MAX and this function
> +                * cannot ever succeed.  Creating paths that long is
> +                * possible with Ceph, but Linux cannot use them.
>                  */
> -               pr_warn_client(cl, "did not end path lookup where expected (pos = %d)\n",
> -                              pos);
> -               goto retry;
> +               return ERR_PTR(-ENAMETOOLONG);
>         }
>
>         *pbase = base;
> --
> 2.45.2
>
Max Kellermann Nov. 19, 2024, 1:02 p.m. UTC | #2
On Tue, Nov 19, 2024 at 1:51 PM Jeff Layton <jlayton@kernel.org> wrote:
> -ENAMETOOLONG could be problematic there. This function is often called
> when we have a dentry and need to build a path to it to send to the MDS
> in a call. The system call that caused us to generate this path
> probably doesn't involve a pathname itself, so the caller may be
> confused by an -ENAMETOOLONG return.

It is unfortunate that the Ceph-MDS protocol requires having to
convert a file descriptor back to a path name - but do you really
believe EIO would cause less confusion? ENAMETOOLONG is exactly what
happens, even if it's an internal error. But there are many error
codes that describe internal errors, so there's some prior art.

EIO just doesn't fit, returning EIO would be confusing - even more so
because EIO isn't a documented error code for open().

If this is about building path names for sending to the MDS, and not
for the userspace ABI, maybe the PATH_MAX limitation is wrong here. If
Ceph doesn't have such a limitation, the Ceph code shouldn't use the
userspace ABI limit for protocol use.

> You may want to go with a more generic error code here -- -EIO or
> something. It might also be worthwhile to leave in a pr_warn_once or
> something since there may be users confused by this error return.

Users cannot read the kernel log, and this allows users to flood the
kernel log. So we get all the disadvantages of the kernel log while
our users get none of the advantages.
Patrick Donnelly Nov. 19, 2024, 1:58 p.m. UTC | #3
On Tue, Nov 19, 2024 at 8:02 AM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Nov 19, 2024 at 1:51 PM Jeff Layton <jlayton@kernel.org> wrote:
> > -ENAMETOOLONG could be problematic there. This function is often called
> > when we have a dentry and need to build a path to it to send to the MDS
> > in a call. The system call that caused us to generate this path
> > probably doesn't involve a pathname itself, so the caller may be
> > confused by an -ENAMETOOLONG return.
>
> It is unfortunate that the Ceph-MDS protocol requires having to
> convert a file descriptor back to a path name - but do you really
> believe EIO would cause less confusion? ENAMETOOLONG is exactly what
> happens, even if it's an internal error. But there are many error
> codes that describe internal errors, so there's some prior art.

The protocol does **not** require building the full path for most
operations unless it involves a snapshot. For snapshots we have to
climb the directory tree until we find the directory with the
snapshot. e.g.:

$ tree -a foo/ foo/.snap/ foo/bar/baz/.snap/
foo/
└── bar
    └── baz
        └── file
foo/.snap/
└── 1
    └── bar
        └── baz
            └── file
foo/bar/baz/.snap/
├── _1_1099511627779
│   └── file
└── 2
    └── file


If you read "file" via foo/.snap/1 you get:

2024-11-19T13:47:23.523+0000 7f9b3b79b640  1 --
[v2:172.21.10.4:6874/192645635,v1:172.21.10.4:6875/192645635] <==
client.4417 172.21.10.4:0/1322260999 43506 ====
client_request(client.4417:121 open #0x10000000003//1/bar/baz/file
2024-11-19T13:47:23.524518+0000 caller_uid=1141,
caller_gid=1141{1000,1141,}) ==== 199+0+0 (crc 0 0 0) 0x55acb2d55180
con 0x55acb2b3cc00

and for foo/bar/baz/.snap/2/

2024-11-19T13:47:56.796+0000 7f9b3b79b640  1 --
[v2:172.21.10.4:6874/192645635,v1:172.21.10.4:6875/192645635] <==
client.4417 172.21.10.4:0/1322260999 43578 ====
client_request(client.4417:155 open #0x10000000005//2/file
2024-11-19T13:47:56.798370+0000 caller_uid=1141,
caller_gid=1141{1000,1141,}) ==== 191+0+0 (crc 0 0 0) 0x55acb2d56000
con 0x55acb2b3cc00

(Note: the MDS protocol indicates a snapshot in the relative file path
via double forward slash.)

If you create "file":

2024-11-19T13:56:56.895+0000 7f9b34f8e640  7 mds.0.server
reply_client_request 0 ((0) Success) client_request(client.4467:7
create owner_uid=1141, owner_gid=1141 #0x10000000005/file
2024-11-19T13:56:56.890430+0000 caller_uid=1141,
caller_gid=1141{1000,1141,})

(During path lookups when planning to read the file, the client will
usually get read caps so it doesn't need to formally open the file. So
this last example uses a create.)
Patrick Donnelly Nov. 19, 2024, 3:13 p.m. UTC | #4
On Tue, Nov 19, 2024 at 9:54 AM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> On Tue, Nov 19, 2024 at 2:58 PM Patrick Donnelly <pdonnell@redhat.com> wrote:
> > The protocol does **not** require building the full path for most
> > operations unless it involves a snapshot.
>
> We don't use Ceph snapshots, but before today's emergency update, we
> could shoot down an arbitrary server with a single (unprivileged)
> system call using this vulnerability.
>
> I'm not sure what your point is, but this vulnerability exists, it
> works without snapshots and we think it's serious.

I'm not suggesting there isn't a bug. I'm correcting a misunderstanding.
diff mbox series

Patch

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index c4a5fd94bbbb..4f6ac015edcd 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2808,12 +2808,11 @@  char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
 
 	if (pos < 0) {
 		/*
-		 * A rename didn't occur, but somehow we didn't end up where
-		 * we thought we would. Throw a warning and try again.
+		 * The path is longer than PATH_MAX and this function
+		 * cannot ever succeed.  Creating paths that long is
+		 * possible with Ceph, but Linux cannot use them.
 		 */
-		pr_warn_client(cl, "did not end path lookup where expected (pos = %d)\n",
-			       pos);
-		goto retry;
+		return ERR_PTR(-ENAMETOOLONG);
 	}
 
 	*pbase = base;