Message ID | 20191108211323.1806194-2-arnd@arndb.de |
---|---|
State | Accepted |
Commit | bdd565f817a74b9e30edec108f7cb1dbc762b8a6 |
Headers | show |
Series | y2038 cleanups | expand |
On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote: > There are two 'struct timeval' fields in 'struct rusage'. > > Unfortunately the definition of timeval is now ambiguous when used in > user space with a libc that has a 64-bit time_t, and this also changes > the 'rusage' definition in user space in a way that is incompatible with > the system call interface. > > While there is no good solution to avoid all ambiguity here, change > the definition in the kernel headers to be compatible with the kernel > ABI, using __kernel_old_timeval as an unambiguous base type. > > In previous discussions, there was also a plan to add a replacement > for rusage based on 64-bit timestamps and nanosecond resolution, > i.e. 'struct __kernel_timespec'. I have patches for that as well, > if anyone thinks we should do that. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' > here, to make them completely unambiguous? The patch looks ok to me. I must confess I looked into rusage long ago so __kernel_timespec type used in uapi made me nervious at first, but then i found that we've this type defined in time_types.h uapi so userspace should be safe. I also like the idea of __kernel_rusage but definitely on top of the series. Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote: > > --- > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' > > here, to make them completely unambiguous? > > The patch looks ok to me. I must confess I looked into rusage long ago > so __kernel_timespec type used in uapi made me nervious at first, > but then i found that we've this type defined in time_types.h uapi > so userspace should be safe. I also like the idea of __kernel_rusage > but definitely on top of the series. There are clearly too many time types at the moment, but I'm in the process of throwing out the ones we no longer need now. I do have a number patches implementing other variants for the syscall, and I suppose that if we end up adding __kernel_rusage, that would have to go with a set of syscalls using 64-bit seconds/nanoseconds rather than the old 32/64 microseconds. I don't know what other changes remain that anyone would want from sys_waitid() now that it does support pidfd. If there is still a need for a new waitid() replacement, that should take that new __kernel_rusage I think, but until then I hope we are fine with today's getrusage+waitid based on the current struct rusage. BSD has wait6() to return separate rusage structures for 'self' and 'children', but I could not find any application (using the freebsd sources and debian code search) that actually uses that information, so there might not be any demand for that. > Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Thanks, Arnd
On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote: ... > > There are clearly too many time types at the moment, but I'm in the > process of throwing out the ones we no longer need now. Cool! > I do have a number patches implementing other variants for the syscall, > and I suppose that if we end up adding __kernel_rusage, that would > have to go with a set of syscalls using 64-bit seconds/nanoseconds > rather than the old 32/64 microseconds. I don't know what other > changes remain that anyone would want from sys_waitid() now that > it does support pidfd. > > If there is still a need for a new waitid() replacement, that should take > that new __kernel_rusage I think, but until then I hope we are fine > with today's getrusage+waitid based on the current struct rusage. Definitely. > > BSD has wait6() to return separate rusage structures for 'self' and > 'children', but I could not find any application (using the freebsd > sources and debian code search) that actually uses that information, > so there might not be any demand for that. Thanks for detailed info Arnd!
On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote: > On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote: > > > > --- > > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' > > > here, to make them completely unambiguous? > > > > The patch looks ok to me. I must confess I looked into rusage long ago > > so __kernel_timespec type used in uapi made me nervious at first, > > but then i found that we've this type defined in time_types.h uapi > > so userspace should be safe. I also like the idea of __kernel_rusage > > but definitely on top of the series. > > There are clearly too many time types at the moment, but I'm in the > process of throwing out the ones we no longer need now. > > I do have a number patches implementing other variants for the syscall, > and I suppose that if we end up adding __kernel_rusage, that would > have to go with a set of syscalls using 64-bit seconds/nanoseconds > rather than the old 32/64 microseconds. I don't know what other > changes remain that anyone would want from sys_waitid() now that > it does support pidfd. > > If there is still a need for a new waitid() replacement, that should take > that new __kernel_rusage I think, but until then I hope we are fine > with today's getrusage+waitid based on the current struct rusage. Note, that glibc does _not_ expose the rusage argument, i.e. most of userspace is unaware that waitid() does allow you to get rusage information. So users first need to know that waitid() has an rusage argument and then need to call the waitid() syscall directly. > > BSD has wait6() to return separate rusage structures for 'self' and > 'children', but I could not find any application (using the freebsd > sources and debian code search) that actually uses that information, > so there might not be any demand for that. Speaking specifically for Linux now, I think that rusage does not actually expose the information most relevant users are interested in. On Linux nowadays it is _way_ more interesting to retrieve stats relative to the cgroup the task lived in etc. Doing a git grep -i rusage in the systemd source code shows that rusage is used _nowhere_. And I consider an init system to be the most likely candidate to be interested in rusage. Christian
On Thu, Nov 14, 2019 at 1:38 AM Christian Brauner <christian.brauner@ubuntu.com> wrote: > On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote: > > On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > > > > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote: > > > > > > --- > > > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' > > > > here, to make them completely unambiguous? > > > > > > The patch looks ok to me. I must confess I looked into rusage long ago > > > so __kernel_timespec type used in uapi made me nervious at first, > > > but then i found that we've this type defined in time_types.h uapi > > > so userspace should be safe. I also like the idea of __kernel_rusage > > > but definitely on top of the series. > > > > There are clearly too many time types at the moment, but I'm in the > > process of throwing out the ones we no longer need now. > > > > I do have a number patches implementing other variants for the syscall, > > and I suppose that if we end up adding __kernel_rusage, that would > > have to go with a set of syscalls using 64-bit seconds/nanoseconds > > rather than the old 32/64 microseconds. I don't know what other > > changes remain that anyone would want from sys_waitid() now that > > it does support pidfd. > > > > If there is still a need for a new waitid() replacement, that should take > > that new __kernel_rusage I think, but until then I hope we are fine > > with today's getrusage+waitid based on the current struct rusage. > > Note, that glibc does _not_ expose the rusage argument, i.e. most of > userspace is unaware that waitid() does allow you to get rusage > information. So users first need to know that waitid() has an rusage > argument and then need to call the waitid() syscall directly. On architectures that don't have a wait4 syscall (riscv32 for now), glibc uses waitid to implement wait4 and wait3. > > BSD has wait6() to return separate rusage structures for 'self' and > > 'children', but I could not find any application (using the freebsd > > sources and debian code search) that actually uses that information, > > so there might not be any demand for that. > > Speaking specifically for Linux now, I think that rusage does not > actually expose the information most relevant users are interested in. > On Linux nowadays it is _way_ more interesting to retrieve stats > relative to the cgroup the task lived in etc. > Doing a git grep -i rusage in the systemd source code shows that rusage > is used _nowhere_. And I consider an init system to be the most likely > candidate to be interested in rusage. I looked at a couple of implementations of time(1), this is one example that sometimes uses wait3(), though other implementations just call getrusage() in the parent process before the fork/exec. None of them actually seem to report better than millisecond resolution, so there is not a strict reason to do a timespec replacement for these. Arnd
[+Cc Florian, libc-alpha] On Thu, Nov 14, 2019 at 11:18:15AM +0100, Arnd Bergmann wrote: > On Thu, Nov 14, 2019 at 1:38 AM Christian Brauner > <christian.brauner@ubuntu.com> wrote: > > On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote: > > > On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote: > > > > > > > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote: > > > > > > > > --- > > > > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' > > > > > here, to make them completely unambiguous? > > > > > > > > The patch looks ok to me. I must confess I looked into rusage long ago > > > > so __kernel_timespec type used in uapi made me nervious at first, > > > > but then i found that we've this type defined in time_types.h uapi > > > > so userspace should be safe. I also like the idea of __kernel_rusage > > > > but definitely on top of the series. > > > > > > There are clearly too many time types at the moment, but I'm in the > > > process of throwing out the ones we no longer need now. > > > > > > I do have a number patches implementing other variants for the syscall, > > > and I suppose that if we end up adding __kernel_rusage, that would > > > have to go with a set of syscalls using 64-bit seconds/nanoseconds > > > rather than the old 32/64 microseconds. I don't know what other > > > changes remain that anyone would want from sys_waitid() now that > > > it does support pidfd. > > > > > > If there is still a need for a new waitid() replacement, that should take > > > that new __kernel_rusage I think, but until then I hope we are fine > > > with today's getrusage+waitid based on the current struct rusage. > > > > Note, that glibc does _not_ expose the rusage argument, i.e. most of > > userspace is unaware that waitid() does allow you to get rusage > > information. So users first need to know that waitid() has an rusage > > argument and then need to call the waitid() syscall directly. > > On architectures that don't have a wait4 syscall (riscv32 for now), > glibc uses waitid to implement wait4 and wait3. Yes, and there's an ongoing discussion to implement wait4() on all arches through waitid(), I think. I haven't followed it too closely. > > > > BSD has wait6() to return separate rusage structures for 'self' and > > > 'children', but I could not find any application (using the freebsd > > > sources and debian code search) that actually uses that information, > > > so there might not be any demand for that. > > > > Speaking specifically for Linux now, I think that rusage does not > > actually expose the information most relevant users are interested in. > > On Linux nowadays it is _way_ more interesting to retrieve stats > > relative to the cgroup the task lived in etc. > > Doing a git grep -i rusage in the systemd source code shows that rusage > > is used _nowhere_. And I consider an init system to be the most likely > > candidate to be interested in rusage. > > I looked at a couple of implementations of time(1), this is one example > that sometimes uses wait3(), though other implementations just call > getrusage() in the parent process before the fork/exec. None of them > actually seem to report better than millisecond resolution, so there is > not a strict reason to do a timespec replacement for these. Right, though I have to say that for the sake of consistency I'd much rather have a replacement. We're doing all this work right now so we might as well. But I get the point. Christian
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index bf497b8b0ec6..bbe7a0da6264 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -963,7 +963,7 @@ put_tv32(struct timeval32 __user *o, struct timespec64 *i) } static inline long -put_tv_to_tv32(struct timeval32 __user *o, struct timeval *i) +put_tv_to_tv32(struct timeval32 __user *o, struct __kernel_old_timeval *i) { return copy_to_user(o, &(struct timeval32){ .tv_sec = i->tv_sec, diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h index cc00fd079631..74ef57b38f9f 100644 --- a/include/uapi/linux/resource.h +++ b/include/uapi/linux/resource.h @@ -22,8 +22,8 @@ #define RUSAGE_THREAD 1 /* only the calling thread */ struct rusage { - struct timeval ru_utime; /* user time used */ - struct timeval ru_stime; /* system time used */ + struct __kernel_old_timeval ru_utime; /* user time used */ + struct __kernel_old_timeval ru_stime; /* system time used */ __kernel_long_t ru_maxrss; /* maximum resident set size */ __kernel_long_t ru_ixrss; /* integral shared memory size */ __kernel_long_t ru_idrss; /* integral unshared data size */ diff --git a/kernel/sys.c b/kernel/sys.c index a611d1d58c7d..d3aef31e24dc 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1763,8 +1763,8 @@ void getrusage(struct task_struct *p, int who, struct rusage *r) unlock_task_sighand(p, &flags); out: - r->ru_utime = ns_to_timeval(utime); - r->ru_stime = ns_to_timeval(stime); + r->ru_utime = ns_to_kernel_old_timeval(utime); + r->ru_stime = ns_to_kernel_old_timeval(stime); if (who != RUSAGE_CHILDREN) { struct mm_struct *mm = get_task_mm(p);
There are two 'struct timeval' fields in 'struct rusage'. Unfortunately the definition of timeval is now ambiguous when used in user space with a libc that has a 64-bit time_t, and this also changes the 'rusage' definition in user space in a way that is incompatible with the system call interface. While there is no good solution to avoid all ambiguity here, change the definition in the kernel headers to be compatible with the kernel ABI, using __kernel_old_timeval as an unambiguous base type. In previous discussions, there was also a plan to add a replacement for rusage based on 64-bit timestamps and nanosecond resolution, i.e. 'struct __kernel_timespec'. I have patches for that as well, if anyone thinks we should do that. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- Question: should we also rename 'struct rusage' into 'struct __kernel_rusage' here, to make them completely unambiguous? --- arch/alpha/kernel/osf_sys.c | 2 +- include/uapi/linux/resource.h | 4 ++-- kernel/sys.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) -- 2.20.0