Message ID | tencent_6D2374392DB66C9D23BF6E2546638A42EC08@qq.com |
---|---|
State | New |
Headers | show |
Series | [v3] PM: EM: Fix potential division-by-zero error in em_compute_costs() | expand |
On 4/14/25 10:04, Yaxiong Tian wrote: > From: Yaxiong Tian <tianyaxiong@kylinos.cn> > > When the device is of a non-CPU type, table[i].performance won't be > initialized in the previous em_init_performance(), resulting in division > by zero when calculating costs in em_compute_costs(). > > Since the 'cost' algorithm is only used for EAS energy efficiency > calculations and is currently not utilized by other device drivers, we > should add the _is_cpu_device(dev) check to prevent this division-by-zero > issue. > > Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") > Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > --- > kernel/power/energy_model.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index d9b7e2b38c7a..fc972cc1fc12 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > > /* Compute the cost of each performance state. */ > for (i = nr_states - 1; i >= 0; i--) { > - unsigned long power_res, cost; > + unsigned long power_res, cost = 0; > > if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { > ret = cb->get_cost(dev, table[i].frequency, &cost); > @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > cost, ret); > return -EINVAL; > } > - } else { > + } else if (_is_cpu_device(dev)) { > /* increase resolution of 'cost' precision */ > power_res = table[i].power * 10; > cost = power_res / table[i].performance; LGTM, Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
On Mon, Apr 14, 2025 at 11:09 AM Yaxiong Tian <iambestgod@qq.com> wrote: > > From: Yaxiong Tian <tianyaxiong@kylinos.cn> > > When the device is of a non-CPU type, table[i].performance won't be > initialized in the previous em_init_performance(), resulting in division > by zero when calculating costs in em_compute_costs(). > > Since the 'cost' algorithm is only used for EAS energy efficiency > calculations and is currently not utilized by other device drivers, we > should add the _is_cpu_device(dev) check to prevent this division-by-zero > issue. > > Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") Please look at the Fixes: tags in the kernel git history. They don't look like the one above. > Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > --- > kernel/power/energy_model.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > index d9b7e2b38c7a..fc972cc1fc12 100644 > --- a/kernel/power/energy_model.c > +++ b/kernel/power/energy_model.c > @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > > /* Compute the cost of each performance state. */ > for (i = nr_states - 1; i >= 0; i--) { > - unsigned long power_res, cost; > + unsigned long power_res, cost = 0; > > if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { > ret = cb->get_cost(dev, table[i].frequency, &cost); > @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > cost, ret); > return -EINVAL; > } > - } else { > + } else if (_is_cpu_device(dev)) { Can't you just check this upfront at the beginning of the function and make it bail out if dev is not a CPU device? > /* increase resolution of 'cost' precision */ > power_res = table[i].power * 10; > cost = power_res / table[i].performance; > --
在 2025/4/16 01:17, Rafael J. Wysocki 写道: > On Mon, Apr 14, 2025 at 11:09 AM Yaxiong Tian <iambestgod@qq.com> wrote: >> >> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >> >> When the device is of a non-CPU type, table[i].performance won't be >> initialized in the previous em_init_performance(), resulting in division >> by zero when calculating costs in em_compute_costs(). >> >> Since the 'cost' algorithm is only used for EAS energy efficiency >> calculations and is currently not utilized by other device drivers, we >> should add the _is_cpu_device(dev) check to prevent this division-by-zero >> issue. >> >> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") > > Please look at the Fixes: tags in the kernel git history. They don't > look like the one above. > Yes, there's an extra '<>' here. >> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >> --- >> kernel/power/energy_model.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >> index d9b7e2b38c7a..fc972cc1fc12 100644 >> --- a/kernel/power/energy_model.c >> +++ b/kernel/power/energy_model.c >> @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, >> >> /* Compute the cost of each performance state. */ >> for (i = nr_states - 1; i >= 0; i--) { >> - unsigned long power_res, cost; >> + unsigned long power_res, cost = 0; >> >> if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { >> ret = cb->get_cost(dev, table[i].frequency, &cost); >> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, >> cost, ret); >> return -EINVAL; >> } >> - } else { >> + } else if (_is_cpu_device(dev)) { > > Can't you just check this upfront at the beginning of the function and > make it bail out if dev is not a CPU device? > Sure, But the current implementation applies em_compute_costs() to both non-CPU devices and CPU devices. After carefully reviewing the latest code, I've found this issue has expanded in scope. There are currently three call paths for invoking em_compute_costs(): 1) Registering performance domains (for both non-CPU and CPU devices) em_dev_register_perf_domain() → em_create_pd() → em_create_perf_table() → em_compute_costs() 2)EM update paths (CPU devices only) Periodic 1000ms update check via em_update_work work item: em_check_capacity_update() → em_adjust_new_capacity() → em_recalc_and_update() → em_compute_costs() Exynos-chip initialization: em_dev_update_chip_binning() → em_recalc_and_update() → em_compute_costs() 3) Device cost computation (non-CPU devices only - currently unused) em_dev_compute_costs() → em_compute_costs() Note: In em_dev_compute_costs(), when calling em_compute_costs(), neither the callback (cb) nor flags are set.In fact, it either does nothing at all or performs incorrect operations. Therefore, should we mandate that non-CPU devices must provide a get_cost callback? So Should we add a check at the beginning of the em_compute_costs() to: if (!_is_cpu_device(dev) && !cb->get_cost) { dev_dbg(dev, "EM: No get_cost provided, cost unset.\n"); return 0; } And Modify em_dev_compute_costs() to require callers to provide the cb callback function,Also need to update its corresponding comments. >> /* increase resolution of 'cost' precision */ >> power_res = table[i].power * 10; >> cost = power_res / table[i].performance; >> --
On Wed, Apr 16, 2025 at 4:57 AM Yaxiong Tian <iambestgod@qq.com> wrote: > > 在 2025/4/16 01:17, Rafael J. Wysocki 写道: > > On Mon, Apr 14, 2025 at 11:09 AM Yaxiong Tian <iambestgod@qq.com> wrote: > >> > >> From: Yaxiong Tian <tianyaxiong@kylinos.cn> > >> > >> When the device is of a non-CPU type, table[i].performance won't be > >> initialized in the previous em_init_performance(), resulting in division > >> by zero when calculating costs in em_compute_costs(). > >> > >> Since the 'cost' algorithm is only used for EAS energy efficiency > >> calculations and is currently not utilized by other device drivers, we > >> should add the _is_cpu_device(dev) check to prevent this division-by-zero > >> issue. > >> > >> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") > > > > Please look at the Fixes: tags in the kernel git history. They don't > > look like the one above. > > > Yes, there's an extra '<>' here. > > >> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> > >> --- > >> kernel/power/energy_model.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c > >> index d9b7e2b38c7a..fc972cc1fc12 100644 > >> --- a/kernel/power/energy_model.c > >> +++ b/kernel/power/energy_model.c > >> @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > >> > >> /* Compute the cost of each performance state. */ > >> for (i = nr_states - 1; i >= 0; i--) { > >> - unsigned long power_res, cost; > >> + unsigned long power_res, cost = 0; > >> > >> if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { > >> ret = cb->get_cost(dev, table[i].frequency, &cost); > >> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, > >> cost, ret); > >> return -EINVAL; > >> } > >> - } else { > >> + } else if (_is_cpu_device(dev)) { > > > > Can't you just check this upfront at the beginning of the function and > > make it bail out if dev is not a CPU device? > > > Sure, But the current implementation applies em_compute_costs() to both > non-CPU devices and CPU devices. Maybe it shouldn't do that for non-CPU ones? > After carefully reviewing the latest code, > I've found this issue has expanded in scope. > > There are currently three call paths for invoking em_compute_costs(): > > 1) Registering performance domains (for both non-CPU and CPU devices) > em_dev_register_perf_domain() → em_create_pd() → > em_create_perf_table() → em_compute_costs() > > 2)EM update paths (CPU devices only) > > Periodic 1000ms update check via em_update_work work item: > em_check_capacity_update() → em_adjust_new_capacity() → > em_recalc_and_update() → em_compute_costs() > > Exynos-chip initialization: > em_dev_update_chip_binning() → em_recalc_and_update() → em_compute_costs() > > 3) Device cost computation (non-CPU devices only - currently unused) > em_dev_compute_costs() → em_compute_costs() So because this one is unused and AFAICS the cost values are never used for non-CPU devices, it's better to just avoid computing them at all. > Note: In em_dev_compute_costs(), when calling em_compute_costs(), > neither the callback (cb) nor flags are set.In fact, it either does > nothing at all or performs incorrect operations. > > Therefore, should we mandate that non-CPU devices must provide a > get_cost callback? Why would that be an improvement? > So Should we add a check at the beginning of the em_compute_costs() to: > > if (!_is_cpu_device(dev) && !cb->get_cost) { > dev_dbg(dev, "EM: No get_cost provided, cost unset.\n"); > return 0; > } > And Modify em_dev_compute_costs() to require callers to provide the cb > callback function,Also need to update its corresponding comments. > > > >> /* increase resolution of 'cost' precision */ > >> power_res = table[i].power * 10; > >> cost = power_res / table[i].performance; > >> -- I think until there is a user of em_dev_compute_costs() this is all moot and hard to figure out. I would drop em_dev_compute_costs() altogether for now and put a _is_cpu_device(dev) upfront check into em_compute_costs().
On 4/16/25 12:58, Rafael J. Wysocki wrote: > On Wed, Apr 16, 2025 at 4:57 AM Yaxiong Tian <iambestgod@qq.com> wrote: >> >> 在 2025/4/16 01:17, Rafael J. Wysocki 写道: >>> On Mon, Apr 14, 2025 at 11:09 AM Yaxiong Tian <iambestgod@qq.com> wrote: >>>> >>>> From: Yaxiong Tian <tianyaxiong@kylinos.cn> >>>> >>>> When the device is of a non-CPU type, table[i].performance won't be >>>> initialized in the previous em_init_performance(), resulting in division >>>> by zero when calculating costs in em_compute_costs(). >>>> >>>> Since the 'cost' algorithm is only used for EAS energy efficiency >>>> calculations and is currently not utilized by other device drivers, we >>>> should add the _is_cpu_device(dev) check to prevent this division-by-zero >>>> issue. >>>> >>>> Fixes: <1b600da51073> ("PM: EM: Optimize em_cpu_energy() and remove division") >>> >>> Please look at the Fixes: tags in the kernel git history. They don't >>> look like the one above. >>> >> Yes, there's an extra '<>' here. >> >>>> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> >>>> --- >>>> kernel/power/energy_model.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c >>>> index d9b7e2b38c7a..fc972cc1fc12 100644 >>>> --- a/kernel/power/energy_model.c >>>> +++ b/kernel/power/energy_model.c >>>> @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, >>>> >>>> /* Compute the cost of each performance state. */ >>>> for (i = nr_states - 1; i >= 0; i--) { >>>> - unsigned long power_res, cost; >>>> + unsigned long power_res, cost = 0; >>>> >>>> if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { >>>> ret = cb->get_cost(dev, table[i].frequency, &cost); >>>> @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, >>>> cost, ret); >>>> return -EINVAL; >>>> } >>>> - } else { >>>> + } else if (_is_cpu_device(dev)) { >>> >>> Can't you just check this upfront at the beginning of the function and >>> make it bail out if dev is not a CPU device? >>> >> Sure, But the current implementation applies em_compute_costs() to both >> non-CPU devices and CPU devices. > > Maybe it shouldn't do that for non-CPU ones? It shouldn't call this cost computation for non-CPU devices. Let me check that.
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index d9b7e2b38c7a..fc972cc1fc12 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -235,7 +235,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, /* Compute the cost of each performance state. */ for (i = nr_states - 1; i >= 0; i--) { - unsigned long power_res, cost; + unsigned long power_res, cost = 0; if ((flags & EM_PERF_DOMAIN_ARTIFICIAL) && cb->get_cost) { ret = cb->get_cost(dev, table[i].frequency, &cost); @@ -244,7 +244,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, cost, ret); return -EINVAL; } - } else { + } else if (_is_cpu_device(dev)) { /* increase resolution of 'cost' precision */ power_res = table[i].power * 10; cost = power_res / table[i].performance;