Message ID | 1555443521-579-4-git-send-email-thara.gopinath@linaro.org |
---|---|
State | New |
Headers | show |
Series | Introduce Thermal Pressure | expand |
On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote: > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > if (policy->max > clipped_freq) > cpufreq_verify_within_limits(policy, 0, clipped_freq); > + > + sched_update_thermal_pressure(policy->cpus, > + policy->max, policy->cpuinfo.max_freq); Is this something we could do this CPUFreq ? Directly in cpufreq_verify_within_limits() perhaps ? That would re-define the 'thermal pressure' framework in a more abstract way and make the scheduler look at 'frequency capping' events, regardless of the reason for capping. That would reflect user-defined frequency constraint into cpu_capacity, in addition to the thermal stuff. I'm not sure if there is another use case for frequency capping ? Perhaps the Intel boost stuff could be factored in there ? That is, at times when the boost freq is not reachable capacity_of() would appear smaller ... Unless this wants to be reflected instantaneously ? Thoughts ? Quentin
On 04/18/2019 05:48 AM, Quentin Perret wrote: > On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote: >> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c >> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, >> >> if (policy->max > clipped_freq) >> cpufreq_verify_within_limits(policy, 0, clipped_freq); >> + >> + sched_update_thermal_pressure(policy->cpus, >> + policy->max, policy->cpuinfo.max_freq); > > Is this something we could do this CPUFreq ? Directly in > cpufreq_verify_within_limits() perhaps ? > > That would re-define the 'thermal pressure' framework in a more abstract > way and make the scheduler look at 'frequency capping' events, > regardless of the reason for capping. > > That would reflect user-defined frequency constraint into cpu_capacity, > in addition to the thermal stuff. I'm not sure if there is another use > case for frequency capping ? Hi Quentin, Thanks for the review. Sorry for the delay in response as I was on vacation for the past few days. I think there is one major difference between user-defined frequency constraints and frequency constraints due to thermal events in terms of the time period the system spends in the the constraint state. Typically, a user constraint lasts for seconds if not minutes and I think in this case cpu_capacity_orig should reflect this constraint and not cpu_capacity like this patch set. Also, in case of the user constraint, there is possibly no need to accumulate and average the capacity constraints and instantaneous values can be directly applied to cpu_capacity_orig. On the other hand thermal pressure is more spiky and sometimes in the order of ms and us requiring the accumulating and averaging. > > Perhaps the Intel boost stuff could be factored in there ? That is, > at times when the boost freq is not reachable capacity_of() would appear > smaller ... Unless this wants to be reflected instantaneously ? Again, do you think intel boost is more applicable to be reflected in cpu_capacity_orig and not cpu_capacity? > > Thoughts ? > Quentin > -- Regards Thara
Hi guys, On 23/04/2019 23:38, Thara Gopinath wrote: > On 04/18/2019 05:48 AM, Quentin Perret wrote: >> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote: >>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c >>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, >>> >>> if (policy->max > clipped_freq) >>> cpufreq_verify_within_limits(policy, 0, clipped_freq); >>> + >>> + sched_update_thermal_pressure(policy->cpus, >>> + policy->max, policy->cpuinfo.max_freq); >> >> Is this something we could do this CPUFreq ? Directly in >> cpufreq_verify_within_limits() perhaps ? >> >> That would re-define the 'thermal pressure' framework in a more abstract >> way and make the scheduler look at 'frequency capping' events, >> regardless of the reason for capping. >> >> That would reflect user-defined frequency constraint into cpu_capacity, >> in addition to the thermal stuff. I'm not sure if there is another use >> case for frequency capping ? > Hi Quentin, > Thanks for the review. Sorry for the delay in response as I was on > vacation for the past few days. > I think there is one major difference between user-defined frequency > constraints and frequency constraints due to thermal events in terms of > the time period the system spends in the the constraint state. > Typically, a user constraint lasts for seconds if not minutes and I > think in this case cpu_capacity_orig should reflect this constraint and > not cpu_capacity like this patch set. Also, in case of the user > constraint, there is possibly no need to accumulate and average the > capacity constraints and instantaneous values can be directly applied to > cpu_capacity_orig. On the other hand thermal pressure is more spiky and > sometimes in the order of ms and us requiring the accumulating and > averaging. I think we can't make any assumptions in regards to the intentions of the user when restricting the OPP range though the cpufreq interface, but it would still be nice to do something and reflecting it as thermal pressure would be a good start. It might not be due to thermal, but it is a capacity restriction that would have the same result. Also, if the user has the ability to tune the decay period he has the control over the behavior of the signal. Given that currently there isn't a smarter mechanism (modifying capacity orig, re-normalising the capacity range) for long-term capping, even treating it as short-term capping is a good start. But this is a bigger exercise and it needs thorough consideration, so it could be skipped, in my opinion, for now.. Also, if we want to stick with the "definition", userspace would still be able to reflect thermal pressure though the thermal limits interface by setting the cooling device state, which will be reflected in this update as well. So userspace would have a mechanism to reflect thermal pressure. One addition.. I like that the thermal pressure framework is not tied to cpufreq. There are firmware solutions that do not bother informing cpufreq of limits being changed, and therefore all of this could be skipped. But any firmware driver could call sched_update_thermal_pressure on notifications for limits changing from firmware, which is an important feature. >> >> Perhaps the Intel boost stuff could be factored in there ? That is, >> at times when the boost freq is not reachable capacity_of() would appear >> smaller ... Unless this wants to be reflected instantaneously ? > Again, do you think intel boost is more applicable to be reflected in > cpu_capacity_orig and not cpu_capacity? >> >> Thoughts ? >> Quentin >> > The changes here would happen even faster than thermal capping, same as other restrictions imposed by firmware, so it would not seem right to me to reflect it in capacity_orig. Reflecting it as thermal pressure is another matter, which I'd say it should be up to the client. The big disadvantage I'd see for this is coping with decisions made while being capped, when you're not capped any longer, and the other way around. I believe these changes would happen too often and they will not happen in a ramp-up/ramp-down behavior that we expect from thermal mitigation. That's why I believe averaging/regulation of the signal works well in this case, and it might not for power related fast restrictions. But given these three cases above, it might be that the ideal solution is for this framework to be made more generic and for each client to be able to obtain and configure a pressure signal to be reflected separately in the capacity of each CPU. My two pennies' worth, Ionela.
On Tue, Apr 16, 2019 at 03:38:41PM -0400, Thara Gopinath wrote: > Enable cpufreq cooling device to update the thermal pressure in > event of a capped maximum frequency or removal of capped maximum > frequency. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > drivers/thermal/cpu_cooling.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > index 6fff161..d5cc3c3 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -31,6 +31,7 @@ > #include <linux/slab.h> > #include <linux/cpu.h> > #include <linux/cpu_cooling.h> > +#include <linux/sched/thermal.h> > > #include <trace/events/thermal.h> > > @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > if (policy->max > clipped_freq) > cpufreq_verify_within_limits(policy, 0, clipped_freq); > + > + sched_update_thermal_pressure(policy->cpus, > + policy->max, policy->cpuinfo.max_freq); If it's already telling the cpufreq thing, why not get it from sugov instead?
On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote: > I think there is one major difference between user-defined frequency > constraints and frequency constraints due to thermal events in terms of > the time period the system spends in the the constraint state. > Typically, a user constraint lasts for seconds if not minutes and I > think in this case cpu_capacity_orig should reflect this constraint and > not cpu_capacity like this patch set. That might not always be true I think. There's tons of userspace thermal deamons out there, and I wouldn't be suprised if they were writing into the cpufreq sysfs files, although I'm not sure. Another thing is, if you want to change the capacity_orig value, you'll need to rebuild the sched domains and all I believe. Otherwise there is a risk to 'break' the sd_asym flags. So we need to make sure we're happy to pay that price. > Also, in case of the user > constraint, there is possibly no need to accumulate and average the > capacity constraints and instantaneous values can be directly applied to > cpu_capacity_orig. On the other hand thermal pressure is more spiky and > sometimes in the order of ms and us requiring the accumulating and > averaging. > > > > Perhaps the Intel boost stuff could be factored in there ? That is, > > at times when the boost freq is not reachable capacity_of() would appear > > smaller ... Unless this wants to be reflected instantaneously ? > Again, do you think intel boost is more applicable to be reflected in > cpu_capacity_orig and not cpu_capacity? I'm not even sure if we want to reflect it at all TBH, but I'd be interested to see what Intel folks think :-) Thanks, Quentin
On Thu, 25 Apr 2019 at 12:45, Quentin Perret <quentin.perret@arm.com> wrote: > > On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote: > > I think there is one major difference between user-defined frequency > > constraints and frequency constraints due to thermal events in terms of > > the time period the system spends in the the constraint state. > > Typically, a user constraint lasts for seconds if not minutes and I > > think in this case cpu_capacity_orig should reflect this constraint and > > not cpu_capacity like this patch set. > > That might not always be true I think. There's tons of userspace thermal > deamons out there, and I wouldn't be suprised if they were writing into > the cpufreq sysfs files, although I'm not sure. They would better use the sysfs set_target interface of cpu_cooling device in this case. > > Another thing is, if you want to change the capacity_orig value, you'll > need to rebuild the sched domains and all I believe. Otherwise there is > a risk to 'break' the sd_asym flags. So we need to make sure we're happy > to pay that price. That would be the goal, if userspace uses the sysfs interface of cpufreq to set a new max frequency, it should be considered as a long change in regards to the scheduling rate and in this case it should be interesting to update cpacity_orig and rebuild sched_domain. > > > Also, in case of the user > > constraint, there is possibly no need to accumulate and average the > > capacity constraints and instantaneous values can be directly applied to > > cpu_capacity_orig. On the other hand thermal pressure is more spiky and > > sometimes in the order of ms and us requiring the accumulating and > > averaging. > > > > > > Perhaps the Intel boost stuff could be factored in there ? That is, > > > at times when the boost freq is not reachable capacity_of() would appear > > > smaller ... Unless this wants to be reflected instantaneously ? > > Again, do you think intel boost is more applicable to be reflected in > > cpu_capacity_orig and not cpu_capacity? > > I'm not even sure if we want to reflect it at all TBH, but I'd be > interested to see what Intel folks think :-) > > Thanks, > Quentin
On Thursday 25 Apr 2019 at 14:04:10 (+0200), Vincent Guittot wrote: > On Thu, 25 Apr 2019 at 12:45, Quentin Perret <quentin.perret@arm.com> wrote: > > > > On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote: > > > I think there is one major difference between user-defined frequency > > > constraints and frequency constraints due to thermal events in terms of > > > the time period the system spends in the the constraint state. > > > Typically, a user constraint lasts for seconds if not minutes and I > > > think in this case cpu_capacity_orig should reflect this constraint and > > > not cpu_capacity like this patch set. > > > > That might not always be true I think. There's tons of userspace thermal > > deamons out there, and I wouldn't be suprised if they were writing into > > the cpufreq sysfs files, although I'm not sure. > > They would better use the sysfs set_target interface of cpu_cooling > device in this case. Right > > Another thing is, if you want to change the capacity_orig value, you'll > > need to rebuild the sched domains and all I believe. Otherwise there is > > a risk to 'break' the sd_asym flags. So we need to make sure we're happy > > to pay that price. > > That would be the goal, if userspace uses the sysfs interface of > cpufreq to set a new max frequency, it should be considered as a long > change in regards to the scheduling rate and in this case it should be > interesting to update cpacity_orig and rebuild sched_domain. I guess as long as we don't rebuild too frequently that could work. Perhaps we could put some rate limiting in there to enforce that. Though we don't do it for hotplug so ... :/
On 04/24/2019 11:56 AM, Ionela Voinescu wrote: > Hi guys, > > On 23/04/2019 23:38, Thara Gopinath wrote: >> On 04/18/2019 05:48 AM, Quentin Perret wrote: >>> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote: >>>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c >>>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, >>>> >>>> if (policy->max > clipped_freq) >>>> cpufreq_verify_within_limits(policy, 0, clipped_freq); >>>> + >>>> + sched_update_thermal_pressure(policy->cpus, >>>> + policy->max, policy->cpuinfo.max_freq); >>> >>> Is this something we could do this CPUFreq ? Directly in >>> cpufreq_verify_within_limits() perhaps ? >>> >>> That would re-define the 'thermal pressure' framework in a more abstract >>> way and make the scheduler look at 'frequency capping' events, >>> regardless of the reason for capping. >>> >>> That would reflect user-defined frequency constraint into cpu_capacity, >>> in addition to the thermal stuff. I'm not sure if there is another use >>> case for frequency capping ? >> Hi Quentin, >> Thanks for the review. Sorry for the delay in response as I was on >> vacation for the past few days. >> I think there is one major difference between user-defined frequency >> constraints and frequency constraints due to thermal events in terms of >> the time period the system spends in the the constraint state. >> Typically, a user constraint lasts for seconds if not minutes and I >> think in this case cpu_capacity_orig should reflect this constraint and >> not cpu_capacity like this patch set. Also, in case of the user >> constraint, there is possibly no need to accumulate and average the >> capacity constraints and instantaneous values can be directly applied to >> cpu_capacity_orig. On the other hand thermal pressure is more spiky and >> sometimes in the order of ms and us requiring the accumulating and >> averaging. > > I think we can't make any assumptions in regards to the intentions of > the user when restricting the OPP range though the cpufreq interface, > but it would still be nice to do something and reflecting it as thermal > pressure would be a good start. It might not be due to thermal, but it > is a capacity restriction that would have the same result. Also, if the > user has the ability to tune the decay period he has the control over > the behavior of the signal. Given that currently there isn't a smarter > mechanism (modifying capacity orig, re-normalising the capacity range) > for long-term capping, even treating it as short-term capping is a good > start. But this is a bigger exercise and it needs thorough > consideration, so it could be skipped, in my opinion, for now.. > > Also, if we want to stick with the "definition", userspace would still > be able to reflect thermal pressure though the thermal limits interface > by setting the cooling device state, which will be reflected in this > update as well. So userspace would have a mechanism to reflect thermal > pressure. Yes, target_state under cooling devices can be set and this will reflect as thermal pressure. > > One addition.. I like that the thermal pressure framework is not tied to > cpufreq. There are firmware solutions that do not bother informing > cpufreq of limits being changed, and therefore all of this could be > skipped. But any firmware driver could call sched_update_thermal_pressure > on notifications for limits changing from firmware, which is an > important feature. For me, I am open to discussion on the best place to call sched_update_thermal_pressure from. Seeing the discussion and different opinions, I am wondering should there be a SoC or platform specific hook provided for better abstraction. Regards Thara > >>> >>> Perhaps the Intel boost stuff could be factored in there ? That is, >>> at times when the boost freq is not reachable capacity_of() would appear >>> smaller ... Unless this wants to be reflected instantaneously ? >> Again, do you think intel boost is more applicable to be reflected in >> cpu_capacity_orig and not cpu_capacity? >>> >>> Thoughts ? >>> Quentin >>> >> > > The changes here would happen even faster than thermal capping, same as > other restrictions imposed by firmware, so it would not seem right to me > to reflect it in capacity_orig. Reflecting it as thermal pressure is > another matter, which I'd say it should be up to the client. The big > disadvantage I'd see for this is coping with decisions made while being > capped, when you're not capped any longer, and the other way around. I > believe these changes would happen too often and they will not happen in > a ramp-up/ramp-down behavior that we expect from thermal mitigation. > That's why I believe averaging/regulation of the signal works well in > this case, and it might not for power related fast restrictions. > > But given these three cases above, it might be that the ideal solution > is for this framework to be made more generic and for each client to be > able to obtain and configure a pressure signal to be reflected > separately in the capacity of each CPU. > > My two pennies' worth, > Ionela. > > > -- Regards Thara
On 04/25/2019 06:45 AM, Quentin Perret wrote: > On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote: >> I think there is one major difference between user-defined frequency >> constraints and frequency constraints due to thermal events in terms of >> the time period the system spends in the the constraint state. >> Typically, a user constraint lasts for seconds if not minutes and I >> think in this case cpu_capacity_orig should reflect this constraint and >> not cpu_capacity like this patch set. > > That might not always be true I think. There's tons of userspace thermal > deamons out there, and I wouldn't be suprised if they were writing into > the cpufreq sysfs files, although I'm not sure. > > Another thing is, if you want to change the capacity_orig value, you'll > need to rebuild the sched domains and all I believe. Otherwise there is > a risk to 'break' the sd_asym flags. So we need to make sure we're happy > to pay that price. Hi Quentin, I saw Vincent's reply on this and my answer is similar. I completely agree that this will involve a rebuild of sched domains. My thought on cpufreq capping max frequency from the user space is that the capping is for long term basis and hence we could live with re-building sched domains. If user space wants to control the max frequency of a cpu for thermal reasons then the cooling device sys interface should be used. In practical scenario, I am interested in knowing why thermal daemons control cpufreq sysfs files instead of cooling device files. Regards Thara > >> Also, in case of the user >> constraint, there is possibly no need to accumulate and average the >> capacity constraints and instantaneous values can be directly applied to >> cpu_capacity_orig. On the other hand thermal pressure is more spiky and >> sometimes in the order of ms and us requiring the accumulating and >> averaging. >>> >>> Perhaps the Intel boost stuff could be factored in there ? That is, >>> at times when the boost freq is not reachable capacity_of() would appear >>> smaller ... Unless this wants to be reflected instantaneously ? >> Again, do you think intel boost is more applicable to be reflected in >> cpu_capacity_orig and not cpu_capacity? > > I'm not even sure if we want to reflect it at all TBH, but I'd be > interested to see what Intel folks think :-) > > Thanks, > Quentin > -- Regards Thara
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index 6fff161..d5cc3c3 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -31,6 +31,7 @@ #include <linux/slab.h> #include <linux/cpu.h> #include <linux/cpu_cooling.h> +#include <linux/sched/thermal.h> #include <trace/events/thermal.h> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, if (policy->max > clipped_freq) cpufreq_verify_within_limits(policy, 0, clipped_freq); + + sched_update_thermal_pressure(policy->cpus, + policy->max, policy->cpuinfo.max_freq); break; } mutex_unlock(&cooling_list_lock);
Enable cpufreq cooling device to update the thermal pressure in event of a capped maximum frequency or removal of capped maximum frequency. Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> --- drivers/thermal/cpu_cooling.c | 4 ++++ 1 file changed, 4 insertions(+) -- 2.1.4