Message ID | 359c926bc85cdf79650e39f2344c2083002545bb.1427347966.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
On 27 March 2015 at 01:48, Andrew Morton <akpm@linux-foundation.org> wrote: > Shouldn't this be viewed as a shortcoming of the core timer code? Yeah, it is. Some (not so pretty) solutions were tried earlier to fix that, but they are rejected for obviously reasons [1]. > vmstat_shepherd() is merely rescheduling itself with > schedule_delayed_work(). That's a dead bog simple operation and if > it's producing suboptimal behaviour then we shouldn't be fixing it with > elaborate workarounds in the caller? I understand that, and that's why I sent it as an RFC to get the discussion started. Does anyone else have got another (acceptable) idea to get this resolved ? -- viresh [1] http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Mar 27, 2015 at 3:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Mar 27, 2015 at 10:16:13AM +0100, Peter Zijlstra wrote: >> So the issue seems to be that we need base->running_timer in order to >> tell if a callback is running, right? >> >> We could align the base on 8 bytes to gain an extra bit in the pointer >> and use that bit to indicate the running state. Then these sites can >> spin on that bit while we can change the actual base pointer. > > Even though tvec_base has ____cacheline_aligned stuck on, most are > allocated using kzalloc_node() which does not actually respect that but > already guarantees a minimum u64 alignment, so I think we can use that > third bit without too much magic. Right. So what I tried earlier was very much similar to you are suggesting. The only difference was that I haven't made much attempts towards saving memory. But Thomas didn't like it for sure and I believe that Rant will hold true for what you are suggesting as well, isn't it ? http://lists.linaro.org/pipermail/linaro-kernel/2013-November/008866.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 27 March 2015 at 17:32, Peter Zijlstra <peterz@infradead.org> wrote: > What's not clear to me is why that thing is allocated at all, AFAICT > something like: > > static DEFINE_PER_CPU(struct tvec_base, tvec_bases); > > Should do the right thing and be much simpler. Does this comment from timers.c answers your query ? /* * This is for the boot CPU - we use compile-time * static initialisation because per-cpu memory isn't * ready yet and because the memory allocators are not * initialised either. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 27 March 2015 at 19:49, Michal Hocko <mhocko@suse.cz> wrote: > Wouldn't something like I was suggesting few months back > (http://article.gmane.org/gmane.linux.kernel.mm/127569) solve this > problem as well? Scheduler should be idle aware, no? I mean it shouldn't > wake up an idle CPU if the task might run on another one. Probably yes. Lets see what others have to say about it.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/mm/vmstat.c b/mm/vmstat.c index 4f5cd974e11a..d45e4243a046 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1424,8 +1424,18 @@ static bool need_update(int cpu) * inactivity. */ static void vmstat_shepherd(struct work_struct *w); +static DECLARE_WORK(shepherd, vmstat_shepherd); -static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd); +/* + * Two timers are used here to avoid waking up an idle CPU. If a single timer is + * kept, then re-arming the timer from its handler doesn't let us migrate it. + */ +static struct timer_list shepherd_timer[2]; +#define toggle_timer() (shepherd_timer_index = !shepherd_timer_index, \ + &shepherd_timer[shepherd_timer_index]) + +static void vmstat_shepherd_timer(unsigned long data); +static int shepherd_timer_index; static void vmstat_shepherd(struct work_struct *w) { @@ -1441,15 +1451,19 @@ static void vmstat_shepherd(struct work_struct *w) &per_cpu(vmstat_work, cpu), 0); put_online_cpus(); +} - schedule_delayed_work(&shepherd, - round_jiffies_relative(sysctl_stat_interval)); +static void vmstat_shepherd_timer(unsigned long data) +{ + mod_timer(toggle_timer(), + jiffies + round_jiffies_relative(sysctl_stat_interval)); + schedule_work(&shepherd); } static void __init start_shepherd_timer(void) { - int cpu; + int cpu, i = -1; for_each_possible_cpu(cpu) INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu), @@ -1459,8 +1473,13 @@ static void __init start_shepherd_timer(void) BUG(); cpumask_copy(cpu_stat_off, cpu_online_mask); - schedule_delayed_work(&shepherd, - round_jiffies_relative(sysctl_stat_interval)); + while (++i < 2) { + init_timer(&shepherd_timer[i]); + shepherd_timer[i].function = vmstat_shepherd_timer; + }; + + mod_timer(toggle_timer(), + jiffies + round_jiffies_relative(sysctl_stat_interval)); } static void vmstat_cpu_dead(int node)