Message ID | 20221121152909.3414096-1-Jason@zx2c4.com |
---|---|
Headers | show |
Series | implement getrandom() in vDSO | expand |
Hey folks, Exciting development: one of the glibc maintainers, Adhemerval, has written up the beginning of an implementation for this series: https://github.com/zatrazz/glibc/commits/azanella/arc4random-vdso I assume it'll continue to mature while this patch stews on the list here. But so far in my testing, it works well, and the performance boost is there and real. I've patched it into my system's glibc and am daily driving it. Jason
* Jason A. Donenfeld: > + * The vgetrandom() function in userspace requires an opaque state, which this > + * function provides to userspace, by mapping a certain number of special pages > + * into the calling process. It takes a hint as to the number of opaque states > + * desired, and returns the number of opaque states actually allocated, the > + * size of each one in bytes, and the address of the first state. > + */ > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, > + unsigned long __user *, size_per_each, unsigned int, flags) I think you should make this __u64, so that you get a consistent userspace interface on all architectures, without the need for compat system calls. Thanks, Florian
* Jason A. Donenfeld: > Hi Florian, > > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote: >> * Jason A. Donenfeld: >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this >> > + * function provides to userspace, by mapping a certain number of special pages >> > + * into the calling process. It takes a hint as to the number of opaque states >> > + * desired, and returns the number of opaque states actually allocated, the >> > + * size of each one in bytes, and the address of the first state. >> > + */ >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, >> > + unsigned long __user *, size_per_each, unsigned int, flags) >> >> I think you should make this __u64, so that you get a consistent >> userspace interface on all architectures, without the need for compat >> system calls. > > That would be quite unconventional. Most syscalls that take lengths do > so with the native register size (`unsigned long`, `size_t`), rather > than u64. If you can point to a recent trend away from this by > indicating some commits that added new syscalls with u64, I'd be happy > to be shown otherwise. But AFAIK, that's not the way it's done. See clone3 and struct clone_args. It's more common with pointers, which are now 64 bits unconditionally: struct futex_waitv, struct rseq_cs and struct rseq. If the length or pointer is a system call argument, widening it to 64 bits is not necessary because zero-extension to the full register eliminates the need for a compat system call. But if you pass the address to a size or pointer, you'll need compat syscalls if you don't make the passed data __u64. Thanks, Florian
Hi Florian, On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote: > * Jason A. Donenfeld: > > > Hi Florian, > > > > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote: > >> * Jason A. Donenfeld: > >> > >> > + * The vgetrandom() function in userspace requires an opaque state, which this > >> > + * function provides to userspace, by mapping a certain number of special pages > >> > + * into the calling process. It takes a hint as to the number of opaque states > >> > + * desired, and returns the number of opaque states actually allocated, the > >> > + * size of each one in bytes, and the address of the first state. > >> > + */ > >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, > >> > + unsigned long __user *, size_per_each, unsigned int, flags) > >> > >> I think you should make this __u64, so that you get a consistent > >> userspace interface on all architectures, without the need for compat > >> system calls. > > > > That would be quite unconventional. Most syscalls that take lengths do > > so with the native register size (`unsigned long`, `size_t`), rather > > than u64. If you can point to a recent trend away from this by > > indicating some commits that added new syscalls with u64, I'd be happy > > to be shown otherwise. But AFAIK, that's not the way it's done. > > See clone3 and struct clone_args. The struct is one thing. But actually, clone3 takes a `size_t`: SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size) I take from this that I too should use `size_t` rather than `unsigned long.` And it doesn't seem like there's any compat clone3. Jason
Hi Florian, On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote: > * Jason A. Donenfeld: > > > Hi Florian, > > > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote: > >> * Jason A. Donenfeld: > >> > >> > Hi Florian, > >> > > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote: > >> >> * Jason A. Donenfeld: > >> >> > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this > >> >> > + * function provides to userspace, by mapping a certain number of special pages > >> >> > + * into the calling process. It takes a hint as to the number of opaque states > >> >> > + * desired, and returns the number of opaque states actually allocated, the > >> >> > + * size of each one in bytes, and the address of the first state. > >> >> > + */ > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags) > >> >> > >> >> I think you should make this __u64, so that you get a consistent > >> >> userspace interface on all architectures, without the need for compat > >> >> system calls. > >> > > >> > That would be quite unconventional. Most syscalls that take lengths do > >> > so with the native register size (`unsigned long`, `size_t`), rather > >> > than u64. If you can point to a recent trend away from this by > >> > indicating some commits that added new syscalls with u64, I'd be happy > >> > to be shown otherwise. But AFAIK, that's not the way it's done. > >> > >> See clone3 and struct clone_args. > > > > The struct is one thing. But actually, clone3 takes a `size_t`: > > > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size) > > > > I take from this that I too should use `size_t` rather than `unsigned > > long.` And it doesn't seem like there's any compat clone3. > > But vgetrandom_alloc does not use unsigned long, but unsigned long *. > You need to look at the contents for struct clone_args for comparison. Ah! I see what you mean; that's a good point. The usual register clearing thing isn't going to happen because these are addresses. I still am somewhat hesitant, though, because `size_t` is really the "proper" type to be used. Maybe the compat syscall thing is just a necessary evil? The other direction would be making this a u32, since 640k ought to be enough for anybody and such, but maybe that'd be a mistake too. So I'm not sure. Anybody else on the list with experience adding syscalls have an opinion? Jason
On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote: > Hi Florian, > > On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote: > > * Jason A. Donenfeld: > > > > > Hi Florian, > > > > > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote: > > >> * Jason A. Donenfeld: > > >> > > >> > Hi Florian, > > >> > > > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote: > > >> >> * Jason A. Donenfeld: > > >> >> > > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this > > >> >> > + * function provides to userspace, by mapping a certain number of special pages > > >> >> > + * into the calling process. It takes a hint as to the number of opaque states > > >> >> > + * desired, and returns the number of opaque states actually allocated, the > > >> >> > + * size of each one in bytes, and the address of the first state. > > >> >> > + */ > > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, > > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags) > > >> >> > > >> >> I think you should make this __u64, so that you get a consistent > > >> >> userspace interface on all architectures, without the need for compat > > >> >> system calls. > > >> > > > >> > That would be quite unconventional. Most syscalls that take lengths do > > >> > so with the native register size (`unsigned long`, `size_t`), rather > > >> > than u64. If you can point to a recent trend away from this by > > >> > indicating some commits that added new syscalls with u64, I'd be happy > > >> > to be shown otherwise. But AFAIK, that's not the way it's done. > > >> > > >> See clone3 and struct clone_args. For system calls that take structs as arguments we use u64 in the struct for proper alignment so we can extend structs without regressing old kernels. We have a few of those extensible struct system calls. But we don't really have a lot system calls that pass u64 as a pointer outside of a structure so far. Neither as register and nor as pointer iirc. Passing them as a register arg is problematic because of 32bit arches. But passing as pointer should be fine but it is indeed uncommon. > > > > > > The struct is one thing. But actually, clone3 takes a `size_t`: > > > > > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size) > > > > > > I take from this that I too should use `size_t` rather than `unsigned > > > long.` And it doesn't seem like there's any compat clone3. > > > > But vgetrandom_alloc does not use unsigned long, but unsigned long *. > > You need to look at the contents for struct clone_args for comparison. > > Ah! I see what you mean; that's a good point. The usual register > clearing thing isn't going to happen because these are addresses. > > I still am somewhat hesitant, though, because `size_t` is really the > "proper" type to be used. Maybe the compat syscall thing is just a > necessary evil? We try to avoid adding new compat-requiring syscalls like the plague usually. (At least for new syscalls that don't need to inherit behavior from earlier syscalls they are a revisions of.) > > The other direction would be making this a u32, since 640k ought to be > enough for anybody and such, but maybe that'd be a mistake too. I think making this a size_t is fine. We haven't traditionally used u32 for sizes. All syscalls that pass structs versioned by size use size_t. So I would recommend to stick with that. Alternatively, you could also introduce a simple struct versioned by size for this system call similar to mount_setatt() and clone3() and so on. This way you don't need to worry about future extensibilty. Just a thought.
On Thu, Nov 24, 2022, at 13:48, Jason A. Donenfeld wrote: > On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote: > Looks like set_mempolicy, get_mempoliy, and migrate_pages pass an > unsigned long pointer and I don't see any compat stuff around it: > > SYSCALL_DEFINE3(set_mempolicy, int, mode, const unsigned long > __user *, nmask, > unsigned long, maxnode) > > SYSCALL_DEFINE5(get_mempolicy, int __user *, policy, > unsigned long __user *, nmask, unsigned long, maxnode, > unsigned long, addr, unsigned long, flags) > > SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode, > const unsigned long __user *, old_nodes, > const unsigned long __user *, new_nodes) Compat handling for these is done all the way down in the pointer access: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/mempolicy.c#n1368 This works here because it's a special bitmap but is not the best approach if you just have a pointer to a single value. Arnd
On Thu, Nov 24, 2022 at 01:24:42PM +0100, Jason A. Donenfeld wrote: > Hi Florian, > > On Thu, Nov 24, 2022 at 01:15:24PM +0100, Florian Weimer wrote: > > * Jason A. Donenfeld: > > > > > Hi Florian, > > > > > > On Thu, Nov 24, 2022 at 06:25:39AM +0100, Florian Weimer wrote: > > >> * Jason A. Donenfeld: > > >> > > >> > Hi Florian, > > >> > > > >> > On Wed, Nov 23, 2022 at 11:46:58AM +0100, Florian Weimer wrote: > > >> >> * Jason A. Donenfeld: > > >> >> > > >> >> > + * The vgetrandom() function in userspace requires an opaque state, which this > > >> >> > + * function provides to userspace, by mapping a certain number of special pages > > >> >> > + * into the calling process. It takes a hint as to the number of opaque states > > >> >> > + * desired, and returns the number of opaque states actually allocated, the > > >> >> > + * size of each one in bytes, and the address of the first state. > > >> >> > + */ > > >> >> > +SYSCALL_DEFINE3(vgetrandom_alloc, unsigned long __user *, num, > > >> >> > + unsigned long __user *, size_per_each, unsigned int, flags) > > >> >> > > >> >> I think you should make this __u64, so that you get a consistent > > >> >> userspace interface on all architectures, without the need for compat > > >> >> system calls. > > >> > > > >> > That would be quite unconventional. Most syscalls that take lengths do > > >> > so with the native register size (`unsigned long`, `size_t`), rather > > >> > than u64. If you can point to a recent trend away from this by > > >> > indicating some commits that added new syscalls with u64, I'd be happy > > >> > to be shown otherwise. But AFAIK, that's not the way it's done. > > >> > > >> See clone3 and struct clone_args. > > > > > > The struct is one thing. But actually, clone3 takes a `size_t`: > > > > > > SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size) > > > > > > I take from this that I too should use `size_t` rather than `unsigned > > > long.` And it doesn't seem like there's any compat clone3. > > > > But vgetrandom_alloc does not use unsigned long, but unsigned long *. > > You need to look at the contents for struct clone_args for comparison. > > The other direction would be making this a u32 I think `unsigned int` is actually a sensible size for what these values should be. That eliminates the problem and potential bikeshed too. So I'll go with that for v+1. Jason