Message ID | 20171205171018.9203-1-patrick.bellasi@arm.com |
---|---|
Headers | show |
Series | Utilization estimation (util_est) for FAIR tasks | expand |
On Tue, Dec 05, 2017 at 05:10:14PM +0000, Patrick Bellasi wrote: > With this feature enabled, the measured overhead is in the range of ~1% > on the same HW/SW test configuration. That's quite a lot; did you look where that comes from?
On Tue, 2017-12-05 at 17:10 +0000, Patrick Bellasi wrote: > This is a respin of: > https://lkml.org/lkml/2017/11/9/546 > which has been rebased on v4.15-rc2 to have util_est now working on top > of the recent PeterZ's: > [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul > > The aim of this series is to improve some PELT behaviors to make it a > better fit for the scheduling of tasks common in embedded mobile > use-cases, without affecting other classes of workloads. I thought perhaps this patch set would improve the below behavior, but alas it does not. That's 3 instances of firefox playing youtube clips being shoved into a corner by hogs sitting on 7 of 8 runqueues. PELT serializes the threaded desktop, making that threading kinda pointless, and CFS not all that fair. 6569 root 20 0 4048 704 628 R 100.0 0.004 5:10.48 7 cpuhog 6573 root 20 0 4048 712 636 R 100.0 0.004 5:07.47 5 cpuhog 6581 root 20 0 4048 696 620 R 100.0 0.004 5:07.36 1 cpuhog 6585 root 20 0 4048 812 736 R 100.0 0.005 5:08.14 4 cpuhog 6589 root 20 0 4048 712 636 R 100.0 0.004 5:06.42 6 cpuhog 6577 root 20 0 4048 720 644 R 99.80 0.005 5:06.52 3 cpuhog 6593 root 20 0 4048 728 652 R 99.60 0.005 5:04.25 0 cpuhog 6755 mikeg 20 0 2714788 885324 179196 S 19.96 5.544 2:14.36 2 Web Content 6620 mikeg 20 0 2318348 312336 145044 S 8.383 1.956 0:51.51 2 firefox 3190 root 20 0 323944 71704 42368 S 3.194 0.449 0:11.90 2 Xorg 3718 root 20 0 3009580 67112 49256 S 0.599 0.420 0:02.89 2 kwin_x11 3761 root 20 0 769760 90740 62048 S 0.399 0.568 0:03.46 2 konsole 3845 root 9 -11 791224 20132 14236 S 0.399 0.126 0:03.00 2 pulseaudio 3722 root 20 0 3722308 172568 88088 S 0.200 1.081 0:04.35 2 plasmashel ------------------------------------------------------------------------------------------------------------------------------------ Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Sum delay ms | Maximum delay at | ------------------------------------------------------------------------------------------------------------------------------------ Web Content:6755 | 2864.862 ms | 7314 | avg: 0.299 ms | max: 40.374 ms | sum: 2189.472 ms | max at: 375.769240 | Compositor:6680 | 1889.847 ms | 4672 | avg: 0.531 ms | max: 29.092 ms | sum: 2478.559 ms | max at: 375.759405 | MediaPl~back #3:(13) | 3269.777 ms | 7853 | avg: 0.218 ms | max: 19.451 ms | sum: 1711.635 ms | max at: 391.123970 | MediaPl~back #4:(10) | 1472.986 ms | 8189 | avg: 0.236 ms | max: 18.653 ms | sum: 1933.886 ms | max at: 376.124211 | MediaPl~back #1:(9) | 601.788 ms | 6598 | avg: 0.247 ms | max: 17.823 ms | sum: 1627.852 ms | max at: 401.122567 | firefox:6620 | 303.181 ms | 6232 | avg: 0.111 ms | max: 15.602 ms | sum: 691.865 ms | max at: 385.078558 | Socket Thread:6639 | 667.537 ms | 4806 | avg: 0.069 ms | max: 12.638 ms | sum: 329.387 ms | max at: 380.827323 | MediaPD~oder #1:6835 | 154.737 ms | 1592 | avg: 0.700 ms | max: 10.139 ms | sum: 1113.688 ms | max at: 392.575370 | MediaTimer #1:6828 | 42.660 ms | 5250 | avg: 0.575 ms | max: 9.845 ms | sum: 3018.994 ms | max at: 380.823677 | MediaPD~oder #2:6840 | 150.822 ms | 1583 | avg: 0.703 ms | max: 9.639 ms | sum: 1112.962 ms | max at: 380.823741 | ...
Hi Mike, On 13-Dec 18:56, Mike Galbraith wrote: > On Tue, 2017-12-05 at 17:10 +0000, Patrick Bellasi wrote: > > This is a respin of: > > https://lkml.org/lkml/2017/11/9/546 > > which has been rebased on v4.15-rc2 to have util_est now working on top > > of the recent PeterZ's: > > [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul > > > > The aim of this series is to improve some PELT behaviors to make it a > > better fit for the scheduling of tasks common in embedded mobile > > use-cases, without affecting other classes of workloads. > > I thought perhaps this patch set would improve the below behavior, but > alas it does not. That's 3 instances of firefox playing youtube clips > being shoved into a corner by hogs sitting on 7 of 8 runqueues. PELT > serializes the threaded desktop, making that threading kinda pointless, > and CFS not all that fair. Perhaps I don't completely get your use-case. Are the cpuhog thread pinned to a CPU or just happens to be always running on the same CPU? I guess you would expect the three Firefox instances to be spread on different CPUs. But whether this is possible depends also on the specific tasks composition generated by Firefox, isn't it? Being a video playback pipeline I would not be surprised to see that most of the time we actually have only 1 or 2 tasks RUNNABLE, while the others are sleeping... and if an HW decoder is involved, even if you have three instances running you likely get only one pipeline active at each time... If that's the case, why should CFS move Fairfox tasks around? > 6569 root 20 0 4048 704 628 R 100.0 0.004 5:10.48 7 cpuhog > 6573 root 20 0 4048 712 636 R 100.0 0.004 5:07.47 5 cpuhog > 6581 root 20 0 4048 696 620 R 100.0 0.004 5:07.36 1 cpuhog > 6585 root 20 0 4048 812 736 R 100.0 0.005 5:08.14 4 cpuhog > 6589 root 20 0 4048 712 636 R 100.0 0.004 5:06.42 6 cpuhog > 6577 root 20 0 4048 720 644 R 99.80 0.005 5:06.52 3 cpuhog > 6593 root 20 0 4048 728 652 R 99.60 0.005 5:04.25 0 cpuhog > 6755 mikeg 20 0 2714788 885324 179196 S 19.96 5.544 2:14.36 2 Web Content > 6620 mikeg 20 0 2318348 312336 145044 S 8.383 1.956 0:51.51 2 firefox > 3190 root 20 0 323944 71704 42368 S 3.194 0.449 0:11.90 2 Xorg > 3718 root 20 0 3009580 67112 49256 S 0.599 0.420 0:02.89 2 kwin_x11 > 3761 root 20 0 769760 90740 62048 S 0.399 0.568 0:03.46 2 konsole > 3845 root 9 -11 791224 20132 14236 S 0.399 0.126 0:03.00 2 pulseaudio > 3722 root 20 0 3722308 172568 88088 S 0.200 1.081 0:04.35 2 plasmashel Is this always happening... or sometimes Firefox tasks gets a chance to run on CPUs other then CPU2? Could be that looking at an htop output we don't see these small opportunities? > ------------------------------------------------------------------------------------------------------------------------------------ > Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Sum delay ms | Maximum delay at | > ------------------------------------------------------------------------------------------------------------------------------------ > Web Content:6755 | 2864.862 ms | 7314 | avg: 0.299 ms | max: 40.374 ms | sum: 2189.472 ms | max at: 375.769240 | > Compositor:6680 | 1889.847 ms | 4672 | avg: 0.531 ms | max: 29.092 ms | sum: 2478.559 ms | max at: 375.759405 | > MediaPl~back #3:(13) | 3269.777 ms | 7853 | avg: 0.218 ms | max: 19.451 ms | sum: 1711.635 ms | max at: 391.123970 | > MediaPl~back #4:(10) | 1472.986 ms | 8189 | avg: 0.236 ms | max: 18.653 ms | sum: 1933.886 ms | max at: 376.124211 | > MediaPl~back #1:(9) | 601.788 ms | 6598 | avg: 0.247 ms | max: 17.823 ms | sum: 1627.852 ms | max at: 401.122567 | > firefox:6620 | 303.181 ms | 6232 | avg: 0.111 ms | max: 15.602 ms | sum: 691.865 ms | max at: 385.078558 | > Socket Thread:6639 | 667.537 ms | 4806 | avg: 0.069 ms | max: 12.638 ms | sum: 329.387 ms | max at: 380.827323 | > MediaPD~oder #1:6835 | 154.737 ms | 1592 | avg: 0.700 ms | max: 10.139 ms | sum: 1113.688 ms | max at: 392.575370 | > MediaTimer #1:6828 | 42.660 ms | 5250 | avg: 0.575 ms | max: 9.845 ms | sum: 3018.994 ms | max at: 380.823677 | > MediaPD~oder #2:6840 | 150.822 ms | 1583 | avg: 0.703 ms | max: 9.639 ms | sum: 1112.962 ms | max at: 380.823741 | How do you get these stats? It's definitively an interesting use-case however I think it's out of the scope of util_est. Regarding the specific statement "CFS not all that fair", I would say that the fairness of CFS is defined and has to be evaluated within a single CPU and on a temporal (not clock cycles) base. AFAIK, vruntime is progressed based on elapsed time, thus you can have two tasks which gets the same slice time but consume it at different frequencies. In this case also we are not that fair, isn't it? Thus, at the end it all boils down to some (as much as possible) low-overhead heuristics. Thus, a proper description of a reproducible use-case can help on improving them. Can we model your use-case using a simple rt-app configuration? This would likely help to have a simple and reproducible testing scenario to better understand where the issue eventually is... maybe by looking at an execution trace. Cheers Patrick -- #include <best/regards.h> Patrick Bellasi
On Fri, 2017-12-15 at 16:13 +0000, Patrick Bellasi wrote: > Hi Mike, > > On 13-Dec 18:56, Mike Galbraith wrote: > > On Tue, 2017-12-05 at 17:10 +0000, Patrick Bellasi wrote: > > > This is a respin of: > > > https://lkml.org/lkml/2017/11/9/546 > > > which has been rebased on v4.15-rc2 to have util_est now working on top > > > of the recent PeterZ's: > > > [PATCH -v2 00/18] sched/fair: A bit of a cgroup/PELT overhaul > > > > > > The aim of this series is to improve some PELT behaviors to make it a > > > better fit for the scheduling of tasks common in embedded mobile > > > use-cases, without affecting other classes of workloads. > > > > I thought perhaps this patch set would improve the below behavior, but > > alas it does not. That's 3 instances of firefox playing youtube clips > > being shoved into a corner by hogs sitting on 7 of 8 runqueues. PELT > > serializes the threaded desktop, making that threading kinda pointless, > > and CFS not all that fair. > > Perhaps I don't completely get your use-case. > Are the cpuhog thread pinned to a CPU or just happens to be always > running on the same CPU? Nothing is pinned. > I guess you would expect the three Firefox instances to be spread on > different CPUs. But whether this is possible depends also on the > specific tasks composition generated by Firefox, isn't it? It depends on load balancing. We're letting firefox threads stack up to 6 deep while single hogs dominate the box. > Being a video playback pipeline I would not be surprised to see that > most of the time we actually have only 1 or 2 tasks RUNNABLE, while > the others are sleeping... and if an HW decoder is involved, even if > you have three instances running you likely get only one pipeline > active at each time... > > If that's the case, why should CFS move Fairfox tasks around? No, while they are indeed ~fairly synchronous, there is overlap. If there were not, there would be no wait time being accumulated. The load wants to consume roughly one full core worth, but to achieve that, it needs access to more than one runqueue, which we are not facilitating. > Is this always happening... or sometimes Firefox tasks gets a chance > to run on CPUs other then CPU2? There is some escape going on, but not enough for the load to get its fair share. I have it sort of fixed up locally, but while patch keeps changing, it's not getting any prettier, nor is it particularly interested in letting me keep some performance gains I want, so... > How do you get these stats? perf sched record/perf sched lat. I twiddled it to output accumulated wait times as well for convenience, stock only shows max. See below. If you play with perf sched, you'll notice some.. oddities about it. > It's definitively an interesting use-case however I think it's out of > the scope of util_est. Yeah. If I had been less busy and read the whole thing, I wouldn't have taken it out for a spin. > Regarding the specific statement "CFS not all that fair", I would say > that the fairness of CFS is defined and has to be evaluated within a > single CPU and on a temporal (not clock cycles) base. No, that doesn't really fly. In fact, in the group scheduling code, we actively pursue box wide fairness. PELT is going a bit too far ATM. Point: if you think it's OK to serialize these firefox threads, would you still think so if those were kernel threads instead? Serializing your kernel is a clear fail, but unpinned kthreads can be stacked up just as effectively as those browser threads are, eat needless wakeup latency and pass it on. > AFAIK, vruntime is progressed based on elapsed time, thus you can have > two tasks which gets the same slice time but consume it at different > frequencies. In this case also we are not that fair, isn't it? Time slices don't really exist as a concrete quantum in CFS. There's vruntime equalization, and that's it. > Thus, at the end it all boils down to some (as much as possible) > low-overhead heuristics. Thus, a proper description of a > reproducible use-case can help on improving them. Nah, heuristics are fickle beasts, they WILL knife you in the back, it's just a question of how often, and how deep. > Can we model your use-case using a simple rt-app configuration? No idea. > This would likely help to have a simple and reproducible testing > scenario to better understand where the issue eventually is... > maybe by looking at an execution trace. It should be reproducible by anyone, just fire up NR_CPUS-1 pure hogs, point firefox at youtube, open three clips in tabs, watch tasks stack. Root cause IMHO is PELT having grown too aggressive. SIS was made more aggressive to compensate, but when you slam that door you get the full PELT impact, and it stings, as does too aggressive bouncing when you leave the escape hatch open. Sticky wicket that. Both of those want a gentle wrap upside the head, as they're both acting a bit nutty. -Mike --- tools/perf/builtin-sched.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) --- a/tools/perf/builtin-sched.c +++ b/tools/perf/builtin-sched.c @@ -212,6 +212,7 @@ struct perf_sched { u64 run_avg; u64 all_runtime; u64 all_count; + u64 all_lat; u64 cpu_last_switched[MAX_CPUS]; struct rb_root atom_root, sorted_atom_root, merged_atom_root; struct list_head sort_list, cmp_pid; @@ -1286,6 +1287,7 @@ static void output_lat_thread(struct per sched->all_runtime += work_list->total_runtime; sched->all_count += work_list->nb_atoms; + sched->all_lat += work_list->total_lat; if (work_list->num_merged > 1) ret = printf(" %s:(%d) ", thread__comm_str(work_list->thread), work_list->num_merged); @@ -1298,10 +1300,11 @@ static void output_lat_thread(struct per avg = work_list->total_lat / work_list->nb_atoms; timestamp__scnprintf_usec(work_list->max_lat_at, max_lat_at, sizeof(max_lat_at)); - printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %13s s\n", + printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | sum:%9.3f ms | max at: %13s s\n", (double)work_list->total_runtime / NSEC_PER_MSEC, work_list->nb_atoms, (double)avg / NSEC_PER_MSEC, (double)work_list->max_lat / NSEC_PER_MSEC, + (double)work_list->total_lat / NSEC_PER_MSEC, max_lat_at); } @@ -1347,6 +1350,16 @@ static int max_cmp(struct work_atoms *l, return 0; } +static int sum_cmp(struct work_atoms *l, struct work_atoms *r) +{ + if (l->total_lat < r->total_lat) + return -1; + if (l->total_lat > r->total_lat) + return 1; + + return 0; +} + static int switch_cmp(struct work_atoms *l, struct work_atoms *r) { if (l->nb_atoms < r->nb_atoms) @@ -1378,6 +1391,10 @@ static int sort_dimension__add(const cha .name = "max", .cmp = max_cmp, }; + static struct sort_dimension sum_sort_dimension = { + .name = "sum", + .cmp = sum_cmp, + }; static struct sort_dimension pid_sort_dimension = { .name = "pid", .cmp = pid_cmp, @@ -1394,6 +1411,7 @@ static int sort_dimension__add(const cha &pid_sort_dimension, &avg_sort_dimension, &max_sort_dimension, + &sum_sort_dimension, &switch_sort_dimension, &runtime_sort_dimension, }; @@ -3090,9 +3108,9 @@ static int perf_sched__lat(struct perf_s perf_sched__merge_lat(sched); perf_sched__sort_lat(sched); - printf("\n -----------------------------------------------------------------------------------------------------------------\n"); - printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |\n"); - printf(" -----------------------------------------------------------------------------------------------------------------\n"); + printf("\n ------------------------------------------------------------------------------------------------------------------------------------\n"); + printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Sum delay ms | Maximum delay at |\n"); + printf(" ------------------------------------------------------------------------------------------------------------------------------------\n"); next = rb_first(&sched->sorted_atom_root); @@ -3105,11 +3123,11 @@ static int perf_sched__lat(struct perf_s thread__zput(work_list->thread); } - printf(" -----------------------------------------------------------------------------------------------------------------\n"); - printf(" TOTAL: |%11.3f ms |%9" PRIu64 " |\n", - (double)sched->all_runtime / NSEC_PER_MSEC, sched->all_count); + printf(" ------------------------------------------------------------------------------------------------------------\n"); + printf(" TOTAL: |%11.3f ms |%9" PRIu64 " | |%14.3f ms |\n", + (double)sched->all_runtime / NSEC_PER_MSEC, sched->all_count, (double)sched->all_lat / NSEC_PER_MSEC); - printf(" ---------------------------------------------------\n"); + printf(" ------------------------------------------------------------------------------------------------------------\n"); print_bad_events(sched); printf("\n");
On Tuesday, December 5, 2017 6:10:18 PM CET Patrick Bellasi wrote: > When schedutil looks at the CPU utilization, the current PELT value for > that CPU is returned straight away. In certain scenarios this can have > undesired side effects and delays on frequency selection. > > For example, since the task utilization is decayed at wakeup time, a > long sleeping big task newly enqueued does not add immediately a > significant contribution to the target CPU. This introduces some latency > before schedutil will be able to detect the best frequency required by > that task. > > Moreover, the PELT signal build-up time is function of the current > frequency, because of the scale invariant load tracking support. Thus, > starting from a lower frequency, the utilization build-up time will > increase even more and further delays the selection of the actual > frequency which better serves the task requirements. > > In order to reduce these kind of latencies, this patch integrates the > usage of the CPU's estimated utilization in the sugov_get_util function. > > The estimated utilization of a CPU is defined to be the maximum between > its PELT's utilization and the sum of the estimated utilization of each > currently RUNNABLE task on that CPU. > This allows to properly represent the expected utilization of a CPU which, > for example, has just got a big task running after a long sleep period, > and ultimately it allows to select the best frequency to run a task > right after it wakes up. > > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> > Reviewed-by: Brendan Jackman <brendan.jackman@arm.com> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Viresh Kumar <viresh.kumar@linaro.org> > Cc: Paul Turner <pjt@google.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Cc: Morten Rasmussen <morten.rasmussen@arm.com> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: linux-kernel@vger.kernel.org > Cc: linux-pm@vger.kernel.org > > --- > Changes v1->v2: > - rebase on top of v4.15-rc2 > - tested that overhauled PELT code does not affect the util_est > --- > kernel/sched/cpufreq_schedutil.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 2f52ec0f1539..465430d99440 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -183,7 +183,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu) > > cfs_max = arch_scale_cpu_capacity(NULL, cpu); > > - *util = min(rq->cfs.avg.util_avg, cfs_max); > + *util = rq->cfs.avg.util_avg; I would use a local variable here. That *util everywhere looks a bit dirtyish. > + if (sched_feat(UTIL_EST)) > + *util = max(*util, rq->cfs.util_est_runnable); > + *util = min(*util, cfs_max); > + > *max = cfs_max; > } > >
On Fri, 2017-12-15 at 21:23 +0100, Mike Galbraith wrote: > > Point: if you think it's OK to serialize these firefox threads, would > you still think so if those were kernel threads instead? Serializing > your kernel is a clear fail, but unpinned kthreads can be stacked up > just as effectively as those browser threads are, eat needless wakeup > latency and pass it on. FWIW, somewhat cheezy example of that below. (later, /me returns to [apparently endless] squabble w. PELT/SIS;) bonnie in nfs mount of own box competing with 7 hogs: ------------------------------------------------------------------------------------------------------------------------------------ Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Sum delay ms | Maximum delay at | ------------------------------------------------------------------------------------------------------------------------------------ kworker/3:0:29 | 630.078 ms | 89669 | avg: 0.011 ms | max: 102.340 ms | sum: 962.919 ms | max at: 310.501277 | kworker/3:1H:464 | 1179.868 ms | 101944 | avg: 0.005 ms | max: 102.232 ms | sum: 480.915 ms | max at: 310.501273 | kswapd0:78 | 2662.230 ms | 1661 | avg: 0.128 ms | max: 93.935 ms | sum: 213.258 ms | max at: 310.503419 | nfsd:2039 | 3257.143 ms | 78448 | avg: 0.112 ms | max: 86.039 ms | sum: 8795.767 ms | max at: 258.847140 | nfsd:2038 | 3185.730 ms | 76253 | avg: 0.113 ms | max: 78.348 ms | sum: 8580.676 ms | max at: 258.831370 | nfsd:2042 | 3256.554 ms | 81423 | avg: 0.110 ms | max: 74.941 ms | sum: 8929.015 ms | max at: 288.397203 | nfsd:2040 | 3314.826 ms | 80396 | avg: 0.105 ms | max: 51.039 ms | sum: 8471.816 ms | max at: 363.870078 | nfsd:2036 | 3058.867 ms | 70460 | avg: 0.115 ms | max: 44.629 ms | sum: 8092.319 ms | max at: 250.074253 | nfsd:2037 | 3113.592 ms | 74276 | avg: 0.115 ms | max: 43.294 ms | sum: 8556.110 ms | max at: 310.443722 | konsole:4013 | 402.509 ms | 894 | avg: 0.148 ms | max: 38.129 ms | sum: 132.050 ms | max at: 332.156495 | haveged:497 | 11.831 ms | 1224 | avg: 0.104 ms | max: 37.575 ms | sum: 127.706 ms | max at: 350.669645 | nfsd:2043 | 3316.033 ms | 78303 | avg: 0.115 ms | max: 36.511 ms | sum: 8995.138 ms | max at: 248.576108 | nfsd:2035 | 3064.108 ms | 67413 | avg: 0.115 ms | max: 28.221 ms | sum: 7746.306 ms | max at: 313.785682 | bash:7022 | 0.342 ms | 1 | avg: 22.959 ms | max: 22.959 ms | sum: 22.959 ms | max at: 262.258960 | kworker/u16:4:354 | 2073.383 ms | 1550 | avg: 0.050 ms | max: 21.203 ms | sum: 77.185 ms | max at: 332.220678 | kworker/4:3:6975 | 1189.868 ms | 115776 | avg: 0.018 ms | max: 20.856 ms | sum: 2071.894 ms | max at: 348.142757 | kworker/2:4:6981 | 335.895 ms | 26617 | avg: 0.023 ms | max: 20.726 ms | sum: 625.102 ms | max at: 248.522083 | bash:7021 | 0.517 ms | 2 | avg: 10.363 ms | max: 20.726 ms | sum: 20.727 ms | max at: 262.235708 | ksoftirqd/2:22 | 65.718 ms | 998 | avg: 0.138 ms | max: 19.072 ms | sum: 137.827 ms | max at: 332.221676 | kworker/7:3:6969 | 625.724 ms | 84153 | avg: 0.010 ms | max: 18.838 ms | sum: 876.603 ms | max at: 264.188983 | bonnie:6965 | 79637.998 ms | 35434 | avg: 0.007 ms | max: 18.719 ms | sum: 256.748 ms | max at: 331.299867 |
Hi Rafael, On 16-Dec 03:35, Rafael J. Wysocki wrote: > On Tuesday, December 5, 2017 6:10:18 PM CET Patrick Bellasi wrote: [...] > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 2f52ec0f1539..465430d99440 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -183,7 +183,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu) > > > > cfs_max = arch_scale_cpu_capacity(NULL, cpu); > > > > - *util = min(rq->cfs.avg.util_avg, cfs_max); > > + *util = rq->cfs.avg.util_avg; > > I would use a local variable here. > > That *util everywhere looks a bit dirtyish. Yes, right... will update for the next respin. > > > + if (sched_feat(UTIL_EST)) > > + *util = max(*util, rq->cfs.util_est_runnable); > > + *util = min(*util, cfs_max); > > + > > *max = cfs_max; > > } > > > > Cheers Patrick -- #include <best/regards.h> Patrick Bellasi