mbox series

[0/6] sched: Cleanup task_struct::state

Message ID 20210602131225.336600299@infradead.org
Headers show
Series sched: Cleanup task_struct::state | expand

Message

Peter Zijlstra June 2, 2021, 1:12 p.m. UTC
Hi!

The task_struct::state variable is a bit odd in a number of ways:

 - it's declared 'volatile' (against current practises);
 - it's 'unsigned long' which is a weird size;
 - it's type is inconsistent when used for function arguments.

These patches clean that up by making it consistently 'unsigned int', and
replace (almost) all accesses with READ_ONCE()/WRITE_ONCE(). In order to not
miss any, the variable is renamed, ensuring a missed conversion results in a
compile error.

The first few patches fix a number of pre-existing errors and introduce a few
helpers to make the final conversion less painful.

Comments

Mathieu Desnoyers June 2, 2021, 2:01 p.m. UTC | #1
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Remove yet another few p->state accesses.

[...]

> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -212,6 +212,8 @@ struct task_group;
> 
> #endif
> 
> +#define get_current_state()	READ_ONCE(current->state)

Why use a macro rather than a static inline here ?

Thanks,

Mathieu
Mathieu Desnoyers June 2, 2021, 2:06 p.m. UTC | #2
----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
[...]
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c

[...]
> @@ -1559,7 +1560,8 @@ static int fill_psinfo(struct elf_prpsin
> 	psinfo->pr_pgrp = task_pgrp_vnr(p);
> 	psinfo->pr_sid = task_session_vnr(p);
> 
> -	i = p->state ? ffz(~p->state) + 1 : 0;
> +	state = READ_ONCE(p->__state);
> +	i = state ? ffz(~state) + 1 : 0;
> 	psinfo->pr_state = i;
> 	psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
> 	psinfo->pr_zomb = psinfo->pr_sname == 'Z';

[...]

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -113,13 +113,13 @@ struct task_group;
> 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> 					 TASK_PARKED)
> 
> -#define task_is_running(task)		(READ_ONCE((task)->state) == TASK_RUNNING)
> +#define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
> 
> -#define task_is_traced(task)		((task->state & __TASK_TRACED) != 0)
> +#define task_is_traced(task)		((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
> 
> -#define task_is_stopped(task)		((task->state & __TASK_STOPPED) != 0)
> +#define task_is_stopped(task)		((READ_ONCE(task->__state) & __TASK_STOPPED) !=
> 0)
> 
> -#define task_is_stopped_or_traced(task)	((task->state & (__TASK_STOPPED |
> __TASK_TRACED)) != 0)
> +#define task_is_stopped_or_traced(task)	((READ_ONCE(task->__state) &
> (__TASK_STOPPED | __TASK_TRACED)) != 0)
> 
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> 
> @@ -134,14 +134,14 @@ struct task_group;
> 	do {							\
> 		WARN_ON_ONCE(is_special_task_state(state_value));\
> 		current->task_state_change = _THIS_IP_;		\
> -		current->state = (state_value);			\
> +		WRITE_ONCE(current->__state, (state_value));	\
> 	} while (0)

Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
READ_ONCE() and WRITE_ONCE() all over the kernel ?

Thanks,

Mathieu
Peter Zijlstra June 2, 2021, 2:12 p.m. UTC | #3
On Wed, Jun 02, 2021 at 10:01:29AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > Remove yet another few p->state accesses.
> 
> [...]
> 
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -212,6 +212,8 @@ struct task_group;
> > 
> > #endif
> > 
> > +#define get_current_state()	READ_ONCE(current->state)
> 
> Why use a macro rather than a static inline here ?

Mostly to be consistent, all that state stuff is macros. I suppose we
could try and make them inlines at the end or so -- if the header maze
allows.
Peter Zijlstra June 2, 2021, 2:20 p.m. UTC | #4
On Wed, Jun 02, 2021 at 10:06:58AM -0400, Mathieu Desnoyers wrote:
> ----- On Jun 2, 2021, at 9:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> > @@ -134,14 +134,14 @@ struct task_group;
> > 	do {							\
> > 		WARN_ON_ONCE(is_special_task_state(state_value));\
> > 		current->task_state_change = _THIS_IP_;		\
> > -		current->state = (state_value);			\
> > +		WRITE_ONCE(current->__state, (state_value));	\
> > 	} while (0)
> 
> Why not introduce set_task_state(p) and get_task_state(p) rather than sprinkle
> READ_ONCE() and WRITE_ONCE() all over the kernel ?

set_task_state() is fundamentally unsound, there's very few sites that
_set_ state on anything other than current, and those sites are super
tricky, eg. ptrace.

Having get_task_state() would seem to suggest it's actually a sane thing
to do, it's not really. Inspecting remote state is full of races, and
some of that really wants cleaning up, but that's for another day.
Will Deacon June 2, 2021, 2:59 p.m. UTC | #5
On Wed, Jun 02, 2021 at 03:12:27PM +0200, Peter Zijlstra wrote:
> Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
> task_is_running(p).
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/process.c |    4 ++--
>  block/blk-mq.c            |    2 +-
>  include/linux/sched.h     |    2 ++
>  kernel/locking/lockdep.c  |    2 +-
>  kernel/rcu/tree_plugin.h  |    2 +-
>  kernel/sched/core.c       |    6 +++---
>  kernel/sched/stats.h      |    2 +-
>  kernel/signal.c           |    2 +-
>  kernel/softirq.c          |    3 +--
>  mm/compaction.c           |    2 +-
>  10 files changed, 14 insertions(+), 13 deletions(-)
> 
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -931,7 +931,7 @@ unsigned long get_wchan(struct task_stru
>  	unsigned long start, bottom, top, sp, fp, ip, ret = 0;
>  	int count = 0;
>  
> -	if (p == current || p->state == TASK_RUNNING)
> +	if (p == current || task_is_running(p))

Looks like this one in get_wchan() has been cargo-culted across most of
arch/ so they'll need fixing up before you rename the struct member.

There's also a weird one in tools/bpf/runqslower/runqslower.bpf.c (!)

Will
Will Deacon June 2, 2021, 3:02 p.m. UTC | #6
On Wed, Jun 02, 2021 at 03:12:29PM +0200, Peter Zijlstra wrote:
> Remove yet another few p->state accesses.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c        |    2 +-
>  include/linux/sched.h |    2 ++
>  kernel/freezer.c      |    2 +-
>  kernel/sched/core.c   |    6 +++---
>  4 files changed, 7 insertions(+), 5 deletions(-)

I think you can include kernel/kcsan/report.c here too.

With that:

Acked-by: Will Deacon <will@kernel.org>

Will
Will Deacon June 2, 2021, 3:10 p.m. UTC | #7
On Wed, Jun 02, 2021 at 03:12:31PM +0200, Peter Zijlstra wrote:
> Change the type and name of task_struct::state. Drop the volatile and
> shrink it to an 'unsigned int'. Rename it in order to find all uses
> such that we can use READ_ONCE/WRITE_ONCE as appropriate.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  block/blk-mq.c                 |    2 -
>  drivers/md/dm.c                |    6 ++--
>  fs/binfmt_elf.c                |    8 +++---
>  fs/userfaultfd.c               |    4 +--
>  include/linux/sched.h          |   31 +++++++++++------------
>  include/linux/sched/debug.h    |    2 -
>  include/linux/sched/signal.h   |    2 -
>  init/init_task.c               |    2 -
>  kernel/cgroup/cgroup-v1.c      |    2 -
>  kernel/debug/kdb/kdb_support.c |   18 +++++++------
>  kernel/fork.c                  |    4 +--
>  kernel/hung_task.c             |    2 -
>  kernel/kthread.c               |    4 +--
>  kernel/locking/mutex.c         |    6 ++--
>  kernel/locking/rtmutex.c       |    4 +--
>  kernel/locking/rwsem.c         |    2 -
>  kernel/ptrace.c                |   12 ++++-----
>  kernel/rcu/rcutorture.c        |    4 +--
>  kernel/rcu/tree_stall.h        |   12 ++++-----
>  kernel/sched/core.c            |   53 +++++++++++++++++++++--------------------
>  kernel/sched/deadline.c        |   10 +++----
>  kernel/sched/fair.c            |   11 +++++---
>  lib/syscall.c                  |    4 +--
>  net/core/dev.c                 |    2 -
>  24 files changed, 108 insertions(+), 99 deletions(-)

I think this makes the code a _lot_ easier to understand, so:

Acked-by: Will Deacon <will@kernel.org>

on the assumption that you'll fix get_wchan() for !x86 as well.

Cheers,

Will
Davidlohr Bueso June 2, 2021, 7:54 p.m. UTC | #8
On Wed, 02 Jun 2021, Peter Zijlstra wrote:

-ENOCHANGELONG

But yeah, I thought we had gotten rid of all these.

>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Davidlohr Bueso June 2, 2021, 11:15 p.m. UTC | #9
On Wed, 02 Jun 2021, Peter Zijlstra wrote:

>Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
>task_is_running(p).
>
>Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Davidlohr Bueso

But afaict ....

>---
> arch/x86/kernel/process.c |    4 ++--
> block/blk-mq.c            |    2 +-
> include/linux/sched.h     |    2 ++
> kernel/locking/lockdep.c  |    2 +-
> kernel/rcu/tree_plugin.h  |    2 +-
> kernel/sched/core.c       |    6 +++---
> kernel/sched/stats.h      |    2 +-
> kernel/signal.c           |    2 +-
> kernel/softirq.c          |    3 +--
> mm/compaction.c           |    2 +-
> 10 files changed, 14 insertions(+), 13 deletions(-)

there are also (on top of the already mentioned arch/):

kernel/kcsan/report.c:  const bool is_running = current->state == TASK_RUNNING;
kernel/locking/lockdep.c:       if (p->state == TASK_RUNNING && p != current)
Peter Zijlstra June 3, 2021, 6:39 a.m. UTC | #10
On Wed, Jun 02, 2021 at 12:54:58PM -0700, Davidlohr Bueso wrote:
> On Wed, 02 Jun 2021, Peter Zijlstra wrote:
> 
> -ENOCHANGELONG

I completely failed to come up with something useful, still do. Subject
says it all.

> But yeah, I thought we had gotten rid of all these.

I too was surprised to find it :-)