Message ID | 20200918124533.3487701-2-hch@lst.de |
---|---|
State | New |
Headers | show |
Series | [1/9] kernel: add a PF_FORCE_COMPAT flag | expand |
On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote: > Add a flag to force processing a syscall as a compat syscall. This is > required so that in_compat_syscall() works for I/O submitted by io_uring > helper threads on behalf of compat syscalls. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > arch/sparc/include/asm/compat.h | 3 ++- > arch/x86/include/asm/compat.h | 2 +- > fs/io_uring.c | 9 +++++++++ > include/linux/compat.h | 5 ++++- > include/linux/sched.h | 1 + > 5 files changed, 17 insertions(+), 3 deletions(-) > > diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h > index 40a267b3bd5208..fee6c51d36e869 100644 > --- a/arch/sparc/include/asm/compat.h > +++ b/arch/sparc/include/asm/compat.h > @@ -211,7 +211,8 @@ static inline int is_compat_task(void) > static inline bool in_compat_syscall(void) > { > /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */ > - return pt_regs_trap_type(current_pt_regs()) == 0x110; > + return pt_regs_trap_type(current_pt_regs()) == 0x110 || > + (current->flags & PF_FORCE_COMPAT); Can't say I like that approach ;-/ Reasoning about the behaviour is much harder when it's controlled like that - witness set_fs() shite...
On Fri, Sep 18, 2020 at 3:44 PM Christoph Hellwig <hch@lst.de> wrote: > > On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote: > > > /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */ > > > - return pt_regs_trap_type(current_pt_regs()) == 0x110; > > > + return pt_regs_trap_type(current_pt_regs()) == 0x110 || > > > + (current->flags & PF_FORCE_COMPAT); > > > > Can't say I like that approach ;-/ Reasoning about the behaviour is much > > harder when it's controlled like that - witness set_fs() shite... > > I don't particularly like it either. But do you have a better idea > how to deal with io_uring vs compat tasks? Do we need to worry about something other than the compat_iovec struct for now? Regarding the code in io_import_iovec(), it would seem that can easily be handled by exposing an internal helper. Instead of #ifdef CONFIG_COMPAT if (req->ctx->compat) return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter); #endif return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter); This could do __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter, req->ctx->compat); With the normal import_iovec() becoming a trivial wrapper around the same thing: ssize_t import_iovec(int type, const struct iovec __user * uvector, unsigned nr_segs, unsigned fast_segs, struct iovec **iov, struct iov_iter *i) { return __import_iovec(type, uvector, nr_segs, fast_segs, iov, i, in_compat_syscall()); } Arnd
From: Al Viro > Sent: 18 September 2020 14:58 > > On Fri, Sep 18, 2020 at 03:44:06PM +0200, Christoph Hellwig wrote: > > On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote: > > > > /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */ > > > > - return pt_regs_trap_type(current_pt_regs()) == 0x110; > > > > + return pt_regs_trap_type(current_pt_regs()) == 0x110 || > > > > + (current->flags & PF_FORCE_COMPAT); > > > > > > Can't say I like that approach ;-/ Reasoning about the behaviour is much > > > harder when it's controlled like that - witness set_fs() shite... > > > > I don't particularly like it either. But do you have a better idea > > how to deal with io_uring vs compat tasks? > > <wry> git rm fs/io_uring.c would make a good starting point </wry> > Yes, I know it's not going to happen, but one can dream... Maybe the io_uring code needs some changes to make it vaguely safe. - No support for 32-bit compat mixed working (or at all?). Plausibly a special worker could do 32bit work. - ring structure (I'm assuming mapped by mmap()) never mapped in more than one process (not cloned by fork()). - No implicit handover of files to another process. Would need an munmap, handover, mmap sequence. In any case the io_ring rather abuses the import_iovec() interface. The canonical sequence is (types from memory): struct iovec cache[8], *iov = cache; struct iter iter; ... rval = import_iovec(..., &iov, 8, &iter); // Do read/write user using 'iter' free(iov); I don't think there is any strict requirement that iter.iov is set to either 'cache' or 'iov' (it probably must point into one of them.) But the io_uring code will make that assumption because the actual copies can be done much later and it doesn't save 'iter'. It gets itself in a right mess because it doesn't separate the 'address I need to free' from 'the iov[] for any transfers'. io_uring is also the only code that relies on import_iovec() returning the iter.count on success. It would be much better to have: iov = import_iovec(..., &cache, ...); free(iov); and use ERR_PTR() et al for error detectoion. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote: > > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote: > > Said that, why not provide a variant that would take an explicit > > "is it compat" argument and use it there? And have the normal > > one pass in_compat_syscall() to that... > > That would help to not introduce a regression with this series yes. > But it wouldn't fix existing bugs when io_uring is used to access > read or write methods that use in_compat_syscall(). One example that > I recently ran into is drivers/scsi/sg.c. Aside from the potentially nasty use of per-task variables, one thing I don't like about PF_FORCE_COMPAT is that it's one-way. If we're going to have a generic mechanism for this, shouldn't we allow a full override of the syscall arch instead of just allowing forcing compat so that a compat syscall can do a non-compat operation?
On Sat, 19 Sep 2020, Arnd Bergmann wrote: > On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote: > > > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote: > > > > Said that, why not provide a variant that would take an explicit > > > > "is it compat" argument and use it there? And have the normal one > > > > pass in_compat_syscall() to that... > > > > > > That would help to not introduce a regression with this series yes. > > > But it wouldn't fix existing bugs when io_uring is used to access > > > read or write methods that use in_compat_syscall(). One example > > > that I recently ran into is drivers/scsi/sg.c. > > Ah, so reading /dev/input/event* would suffer from the same issue, and > that one would in fact be broken by your patch in the hypothetical case > that someone tried to use io_uring to read /dev/input/event on x32... > > For reference, I checked the socket timestamp handling that has a number > of corner cases with time32/time64 formats in compat mode, but none of > those appear to be affected by the problem. > > > Aside from the potentially nasty use of per-task variables, one thing > > I don't like about PF_FORCE_COMPAT is that it's one-way. If we're > > going to have a generic mechanism for this, shouldn't we allow a full > > override of the syscall arch instead of just allowing forcing compat > > so that a compat syscall can do a non-compat operation? > > The only reason it's needed here is that the caller is in a kernel > thread rather than a system call. Are there any possible scenarios where > one would actually need the opposite? > Quite possibly. The ext4 vs. compat getdents bug is still unresolved. Please see, https://lore.kernel.org/lkml/CAFEAcA9W+JK7_TrtTnL1P2ES1knNPJX9wcUvhfLwxLq9augq1w@mail.gmail.com/ > Arnd >
On Sun, Sep 20, 2020 at 11:07 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote: > > On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote: > > > Add a flag to force processing a syscall as a compat syscall. This is > > > required so that in_compat_syscall() works for I/O submitted by io_uring > > > helper threads on behalf of compat syscalls. > > > > Al doesn't like this much, but my suggestion is to introduce two new > > opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32. The compat code > > can translate IORING_OP_READV to IORING_OP_READV32 and then the core > > code can know what that user pointer is pointing to. > > Let's separate two issues: > 1) compat syscalls want 32bit iovecs. Nothing to do with the > drivers, dealt with just fine. > 2) a few drivers are really fucked in head. They use different > *DATA* layouts for reads/writes, depending upon the calling process. > IOW, if you fork/exec a 32bit binary and your stdin is one of those, > reads from stdin in parent and child will yield different data layouts. > On the same struct file. > That's what Christoph worries about (/dev/sg he'd mentioned is > one of those). > > IMO we should simply have that dozen or so of pathological files > marked with FMODE_SHITTY_ABI; it's not about how they'd been opened - > it describes the userland ABI provided by those. And it's cast in stone. > I wonder if this is really quite cast in stone. We could also have FMODE_SHITTY_COMPAT and set that when a file like this is *opened* in compat mode. Then that particular struct file would be read and written using the compat data format. The change would be user-visible, but the user that would see it would be very strange indeed. I don't have a strong opinion as to whether that is better or worse than denying io_uring access to these things, but at least it moves the special case out of io_uring. --Andy
On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote: > > IMO it's much saner to mark those and refuse to touch them from io_uring... > > Simpler solution is to remove io_uring from the 32-bit syscall list. > If you're a 32-bit process, you don't get to use io_uring. Would > any real users actually care about that? We could go one step farther and declare that we're done adding *any* new compat syscalls :) -- Andy Lutomirski AMA Capital Management, LLC
From: Arnd Bergmann > Sent: 20 September 2020 21:49 > > On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote: > > On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote: > > > > IMO it's much saner to mark those and refuse to touch them from io_uring... > > > > > > Simpler solution is to remove io_uring from the 32-bit syscall list. > > > If you're a 32-bit process, you don't get to use io_uring. Would > > > any real users actually care about that? > > > > We could go one step farther and declare that we're done adding *any* > > new compat syscalls :) > > Would you also stop adding system calls to native 32-bit systems then? > > On memory constrained systems (less than 2GB a.t.m.), there is still a > strong demand for running 32-bit user space, but all of the recent Arm > cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so > that compat mode may eventually become the primary way to run > Linux on cheap embedded systems. > > I don't think there is any chance we can realistically take away io_uring > from the 32-bit ABI any more now. Can't it just run requests from 32bit apps in a kernel thread that has the 'in_compat_syscall' flag set? Not that i recall seeing the code where it saves the 'compat' nature of any requests. It is already completely f*cked if you try to pass the command ring to a child process - it uses the wrong 'mm'. I suspect there are some really horrid security holes in that area. David. - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 21/09/2020 00:13, David Laight wrote: > From: Arnd Bergmann >> Sent: 20 September 2020 21:49 >> >> On Sun, Sep 20, 2020 at 9:28 PM Andy Lutomirski <luto@kernel.org> wrote: >>> On Sun, Sep 20, 2020 at 12:23 PM Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> On Sun, Sep 20, 2020 at 08:10:31PM +0100, Al Viro wrote: >>>>> IMO it's much saner to mark those and refuse to touch them from io_uring... >>>> >>>> Simpler solution is to remove io_uring from the 32-bit syscall list. >>>> If you're a 32-bit process, you don't get to use io_uring. Would >>>> any real users actually care about that? >>> >>> We could go one step farther and declare that we're done adding *any* >>> new compat syscalls :) >> >> Would you also stop adding system calls to native 32-bit systems then? >> >> On memory constrained systems (less than 2GB a.t.m.), there is still a >> strong demand for running 32-bit user space, but all of the recent Arm >> cores (after Cortex-A55) dropped the ability to run 32-bit kernels, so >> that compat mode may eventually become the primary way to run >> Linux on cheap embedded systems. >> >> I don't think there is any chance we can realistically take away io_uring >> from the 32-bit ABI any more now. > > Can't it just run requests from 32bit apps in a kernel thread that has > the 'in_compat_syscall' flag set? > Not that i recall seeing the code where it saves the 'compat' nature > of any requests. > > It is already completely f*cked if you try to pass the command ring > to a child process - it uses the wrong 'mm'. And how so? io_uring uses mm of a submitter. The exception is SQPOLL mode, but it requires CAP_SYS_ADMIN or CAP_SYS_NICE anyway. > I suspect there are some really horrid security holes in that area. > > David. > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > -- Pavel Begunkov
diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h index 40a267b3bd5208..fee6c51d36e869 100644 --- a/arch/sparc/include/asm/compat.h +++ b/arch/sparc/include/asm/compat.h @@ -211,7 +211,8 @@ static inline int is_compat_task(void) static inline bool in_compat_syscall(void) { /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */ - return pt_regs_trap_type(current_pt_regs()) == 0x110; + return pt_regs_trap_type(current_pt_regs()) == 0x110 || + (current->flags & PF_FORCE_COMPAT); } #define in_compat_syscall in_compat_syscall #endif diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index d4edf281fff49d..fbab072d4e5b31 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -208,7 +208,7 @@ static inline bool in_32bit_syscall(void) #ifdef CONFIG_COMPAT static inline bool in_compat_syscall(void) { - return in_32bit_syscall(); + return in_32bit_syscall() || (current->flags & PF_FORCE_COMPAT); } #define in_compat_syscall in_compat_syscall /* override the generic impl */ #endif diff --git a/fs/io_uring.c b/fs/io_uring.c index 3790c7fe9fee22..5755d557c3f7bc 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5449,6 +5449,9 @@ static int io_req_defer_prep(struct io_kiocb *req, if (unlikely(ret)) return ret; + if (req->ctx->compat) + current->flags |= PF_FORCE_COMPAT; + switch (req->opcode) { case IORING_OP_NOP: break; @@ -5546,6 +5549,7 @@ static int io_req_defer_prep(struct io_kiocb *req, break; } + current->flags &= ~PF_FORCE_COMPAT; return ret; } @@ -5669,6 +5673,9 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, struct io_ring_ctx *ctx = req->ctx; int ret; + if (ctx->compat) + current->flags |= PF_FORCE_COMPAT; + switch (req->opcode) { case IORING_OP_NOP: ret = io_nop(req, cs); @@ -5898,6 +5905,8 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, break; } + current->flags &= ~PF_FORCE_COMPAT; + if (ret) return ret; diff --git a/include/linux/compat.h b/include/linux/compat.h index b354ce58966e2d..685066f7ad325f 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -891,7 +891,10 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args); */ #ifndef in_compat_syscall -static inline bool in_compat_syscall(void) { return is_compat_task(); } +static inline bool in_compat_syscall(void) +{ + return is_compat_task() || (current->flags & PF_FORCE_COMPAT); +} #endif /** diff --git a/include/linux/sched.h b/include/linux/sched.h index afe01e232935fa..c8b183b5655a1e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1491,6 +1491,7 @@ extern struct pid *cad_pid; */ #define PF_IDLE 0x00000002 /* I am an IDLE thread */ #define PF_EXITING 0x00000004 /* Getting shut down */ +#define PF_FORCE_COMPAT 0x00000008 /* acting as compat task */ #define PF_VCPU 0x00000010 /* I'm a virtual CPU */ #define PF_WQ_WORKER 0x00000020 /* I'm a workqueue worker */ #define PF_FORKNOEXEC 0x00000040 /* Forked but didn't exec */
Add a flag to force processing a syscall as a compat syscall. This is required so that in_compat_syscall() works for I/O submitted by io_uring helper threads on behalf of compat syscalls. Signed-off-by: Christoph Hellwig <hch@lst.de> --- arch/sparc/include/asm/compat.h | 3 ++- arch/x86/include/asm/compat.h | 2 +- fs/io_uring.c | 9 +++++++++ include/linux/compat.h | 5 ++++- include/linux/sched.h | 1 + 5 files changed, 17 insertions(+), 3 deletions(-)