Message ID | 20250113044107.566-1-gautham.shenoy@amd.com |
---|---|
State | Accepted |
Commit | 0834667545962ef1c5e8684ed32b45d9c574acd3 |
Headers | show |
Series | acpi-cpufreq: Fix max-frequency computation | expand |
On Mon, Jan 13, 2025 at 10:11:07AM +0530, Gautham R. Shenoy wrote: > commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover > boost frequencies") introduces an assumption in > acpi_cpufreq_cpu_init() that the first entry in the P-state table is > the nominal frequency. This assumption is incorrect. The frequency > corresponding to the P0 P-State need not be the same as the nominal > frequency advertised via CPPC. > > Since the driver is using the CPPC.highest_perf and CPPC.nominal_perf > to compute the boost-ratio, it makes sense to use CPPC.nominal_freq to > compute the max-frequency. CPPC.nominal_freq is advertised on > platforms supporting CPPC revisions 3 or higher. > > Hence, fallback to using the first entry in the P-State table only on > platforms that do not advertise CPPC.nominal_freq. > Gautham, this got recently pulled in 5.15.179 [0] but it seems to have broken what max CPU get reported. I hit the issue on Ubuntu 22.04 with kernel 5.15.0-140-generic. My read from [1] is that that kernel is pretty much 5.15.79. I rebuilt it with this patch removed and max CPU now show as before. Here some output that may help, which is what is mostly down to what is reported by /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq . Posting the whole lscpu hoping that contain more useful data. Happy to provide more if needed. Ubuntu 22.04 with 5.15.0-140-generic (affected, note CPU max MHz: 2000.0000): $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD EPYC 7713P 64-Core Processor CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 Stepping: 1 Frequency boost: enabled CPU max MHz: 2000.0000 CPU min MHz: 1500.0000 BogoMIPS: 3992.30 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monit or ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy a bm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm _local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyas id decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow _recov succor smca fsrm Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 2 MiB (64 instances) L1i: 2 MiB (64 instances) L2: 32 MiB (64 instances) L3: 256 MiB (8 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-127 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; safe RET Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affe cted Srbds: Not affected Tsx async abort: Not affected With 5.15.0-999-generic (5.15.0-140-generic without this patch), max CPU is back to 3720.7029, which is also what I get with 5.15.0-139-generic. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD EPYC 7713P 64-Core Processor CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 Stepping: 1 Frequency boost: enabled CPU max MHz: 3720.7029 CPU min MHz: 1500.0000 BogoMIPS: 3992.55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monit or ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy a bm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm _local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyas id decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow _recov succor smca fsrm Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 2 MiB (64 instances) L1i: 2 MiB (64 instances) L2: 32 MiB (64 instances) L3: 256 MiB (8 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-127 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; safe RET Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affe cted Srbds: Not affected Tsx async abort: Not affected Thought to post here instead of [0] to get your thought on this. Am I missing something simple to get the right value? Or should this be pulled out of 5.15 LTS? Thanks [0]: https://lore.kernel.org/all/2025031356-rerun-remold-30b9@gregkh/ [1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106026 > Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") > Tested-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> > --- > drivers/cpufreq/acpi-cpufreq.c | 36 +++++++++++++++++++++++++--------- > 1 file changed, 27 insertions(+), 9 deletions(-) > > diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c > index c9ebacf5c88e..744fcdeab173 100644 > --- a/drivers/cpufreq/acpi-cpufreq.c > +++ b/drivers/cpufreq/acpi-cpufreq.c > @@ -623,7 +623,14 @@ static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c) > #endif > > #ifdef CONFIG_ACPI_CPPC_LIB > -static u64 get_max_boost_ratio(unsigned int cpu) > +/* > + * get_max_boost_ratio: Computes the max_boost_ratio as the ratio > + * between the highest_perf and the nominal_perf. > + * > + * Returns the max_boost_ratio for @cpu. Returns the CPPC nominal > + * frequency via @nominal_freq if it is non-NULL pointer. > + */ > +static u64 get_max_boost_ratio(unsigned int cpu, u64 *nominal_freq) > { > struct cppc_perf_caps perf_caps; > u64 highest_perf, nominal_perf; > @@ -652,6 +659,9 @@ static u64 get_max_boost_ratio(unsigned int cpu) > > nominal_perf = perf_caps.nominal_perf; > > + if (nominal_freq) > + *nominal_freq = perf_caps.nominal_freq; > + > if (!highest_perf || !nominal_perf) { > pr_debug("CPU%d: highest or nominal performance missing\n", cpu); > return 0; > @@ -664,8 +674,12 @@ static u64 get_max_boost_ratio(unsigned int cpu) > > return div_u64(highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf); > } > + > #else > -static inline u64 get_max_boost_ratio(unsigned int cpu) { return 0; } > +static inline u64 get_max_boost_ratio(unsigned int cpu, u64 *nominal_freq) > +{ > + return 0; > +} > #endif > > static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) > @@ -677,7 +691,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) > struct cpuinfo_x86 *c = &cpu_data(cpu); > unsigned int valid_states = 0; > unsigned int result = 0; > - u64 max_boost_ratio; > + u64 max_boost_ratio, nominal_freq = 0; > unsigned int i; > #ifdef CONFIG_SMP > static int blacklisted; > @@ -827,16 +841,20 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) > } > freq_table[valid_states].frequency = CPUFREQ_TABLE_END; > > - max_boost_ratio = get_max_boost_ratio(cpu); > + max_boost_ratio = get_max_boost_ratio(cpu, &nominal_freq); > if (max_boost_ratio) { > - unsigned int freq = freq_table[0].frequency; > + unsigned int freq = nominal_freq; > > /* > - * Because the loop above sorts the freq_table entries in the > - * descending order, freq is the maximum frequency in the table. > - * Assume that it corresponds to the CPPC nominal frequency and > - * use it to set cpuinfo.max_freq. > + * The loop above sorts the freq_table entries in the > + * descending order. If ACPI CPPC has not advertised > + * the nominal frequency (this is possible in CPPC > + * revisions prior to 3), then use the first entry in > + * the pstate table as a proxy for nominal frequency. > */ > + if (!freq) > + freq = freq_table[0].frequency; > + > policy->cpuinfo.max_freq = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; > } else { > /* > -- > 2.34.1 >
Hello Manu, On Tue, May 27, 2025 at 08:24:27PM -0700, Manu Bretelle wrote: > On Mon, Jan 13, 2025 at 10:11:07AM +0530, Gautham R. Shenoy wrote: > > commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover > > boost frequencies") introduces an assumption in > > acpi_cpufreq_cpu_init() that the first entry in the P-state table is > > the nominal frequency. This assumption is incorrect. The frequency > > corresponding to the P0 P-State need not be the same as the nominal > > frequency advertised via CPPC. > > > > Since the driver is using the CPPC.highest_perf and CPPC.nominal_perf > > to compute the boost-ratio, it makes sense to use CPPC.nominal_freq to > > compute the max-frequency. CPPC.nominal_freq is advertised on > > platforms supporting CPPC revisions 3 or higher. > > > > Hence, fallback to using the first entry in the P-State table only on > > platforms that do not advertise CPPC.nominal_freq. > > > > Gautham, this got recently pulled in 5.15.179 [0] but it seems to have broken > what max CPU get reported. Thanks for reporting this. > > I hit the issue on Ubuntu 22.04 with kernel 5.15.0-140-generic. My read from [1] > is that that kernel is pretty much 5.15.79. > I rebuilt it with this patch removed and max CPU now show as before. > > Here some output that may help, which is what is mostly down to what is reported > by /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq . Posting the whole > lscpu hoping that contain more useful data. Happy to provide more if needed. > > Ubuntu 22.04 with 5.15.0-140-generic (affected, note CPU max MHz: 2000.0000): > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 48 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Vendor ID: AuthenticAMD > Model name: AMD EPYC 7713P 64-Core Processor > CPU family: 25 > Model: 1 > Thread(s) per core: 2 > Core(s) per socket: 64 > Socket(s): 1 > Stepping: 1 > Frequency boost: enabled > CPU max MHz: 2000.0000 > CPU min MHz: 1500.0000 [..snip..] > > With 5.15.0-999-generic (5.15.0-140-generic without this patch), max CPU is back > to 3720.7029, which is also what I get with 5.15.0-139-generic. > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 48 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 128 > On-line CPU(s) list: 0-127 > Vendor ID: AuthenticAMD > Model name: AMD EPYC 7713P 64-Core Processor > CPU family: 25 > Model: 1 > Thread(s) per core: 2 > Core(s) per socket: 64 > Socket(s): 1 > Stepping: 1 > Frequency boost: enabled > CPU max MHz: 3720.7029 > CPU min MHz: 1500.0000 [..snip..] > > > Thought to post here instead of [0] to get your thought on this. Am I missing > something simple to get the right value? Or should this be pulled out of 5.15 > LTS? No, the patch has a bug. The nominal_frequency returned from the get_max_boost_ratio() function was in MHz, while cpufreq maintains frequencies in KHz due to which the computed max_frequency was incorrect and thus as a fallback, cpufreq reported P0 frequency as the cpuinfo_max_freq. Can you please try the following patch on top of the original one? ------------------------x8------------------------------------------------
> > No, the patch has a bug. The nominal_frequency returned from the > get_max_boost_ratio() function was in MHz, while cpufreq maintains > frequencies in KHz due to which the computed max_frequency was > incorrect and thus as a fallback, cpufreq reported P0 frequency as the > cpuinfo_max_freq. > > Can you please try the following patch on top of the original one? Thanks for the quick turnaround Gautham. I applied this patch on top of a fresh Ubuntu 22.04 5.15.0-140-generic tree and confirmed that CPU max MHz reports its original value. Thanks! Manu $ uname -r 5.15.0-9991-generic $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD EPYC 7713P 64-Core Processor CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 Stepping: 1 Frequency boost: enabled CPU max MHz: 3720.7029 CPU min MHz: 1500.0000 BogoMIPS: 3992.55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscal l nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perf ctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xs aveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdp ru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausef ilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succ or smca fsrm Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 2 MiB (64 instances) L1i: 2 MiB (64 instances) L2: 32 MiB (64 instances) L3: 256 MiB (8 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-127 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; safe RET Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Srbds: Not affected Tsx async abort: Not affected > > > ------------------------x8------------------------------------------------ > > From 13d5c28823ed03353059801281d3b22e9f139a8d Mon Sep 17 00:00:00 2001 > From: "Gautham R. Shenoy" <gautham.shenoy@amd.com> > Date: Wed, 28 May 2025 16:43:33 +0530 > Subject: [PATCH] acpi-cpufreq: Fix nominal_freq units to KHz in get_max_boost_ratio() > > commit 083466754596 ("cpufreq: ACPI: Fix max-frequency computation") > modified get_max_boost_ratio() to return the nominal_freq advertised > in the _CPC object for the purposes of computing the maximum > frequency. The frequencies advertised in _CPC objects are in MHz but > cpufreq expects the frequency to be in KHz. Because the > nominal_frequency was not converted to KHz, the cpuinfo_max_frequency > that got computed was incorrect and the cpufreq reported the P0 > frequency as the cpuinfo_max_freq. > > Fix this by returning nominal_freq in KHz in get_max_boost_ratio() > > Reported-by: Manu Bretelle <chantr4@gmail.com> > Closes: https://lore.kernel.org/lkml/aDaB63tDvbdcV0cg@HQ-GR2X1W2P57/ > Fixes: 083466754596 ("cpufreq: ACPI: Fix max-frequency computation") > Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Tested-by: Manu Bretelle <chantr4@gmail.com> > --- > drivers/cpufreq/acpi-cpufreq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c > index d26b610e4f24..76768fe213a9 100644 > --- a/drivers/cpufreq/acpi-cpufreq.c > +++ b/drivers/cpufreq/acpi-cpufreq.c > @@ -660,7 +660,7 @@ static u64 get_max_boost_ratio(unsigned int cpu, u64 *nominal_freq) > nominal_perf = perf_caps.nominal_perf; > > if (nominal_freq) > - *nominal_freq = perf_caps.nominal_freq; > + *nominal_freq = perf_caps.nominal_freq * 1000; > > if (!highest_perf || !nominal_perf) { > pr_debug("CPU%d: highest or nominal performance missing\n", cpu); > -- > 2.34.1 > > > -- > Thanks and Regards > gautham.
diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c index c9ebacf5c88e..744fcdeab173 100644 --- a/drivers/cpufreq/acpi-cpufreq.c +++ b/drivers/cpufreq/acpi-cpufreq.c @@ -623,7 +623,14 @@ static int acpi_cpufreq_blacklist(struct cpuinfo_x86 *c) #endif #ifdef CONFIG_ACPI_CPPC_LIB -static u64 get_max_boost_ratio(unsigned int cpu) +/* + * get_max_boost_ratio: Computes the max_boost_ratio as the ratio + * between the highest_perf and the nominal_perf. + * + * Returns the max_boost_ratio for @cpu. Returns the CPPC nominal + * frequency via @nominal_freq if it is non-NULL pointer. + */ +static u64 get_max_boost_ratio(unsigned int cpu, u64 *nominal_freq) { struct cppc_perf_caps perf_caps; u64 highest_perf, nominal_perf; @@ -652,6 +659,9 @@ static u64 get_max_boost_ratio(unsigned int cpu) nominal_perf = perf_caps.nominal_perf; + if (nominal_freq) + *nominal_freq = perf_caps.nominal_freq; + if (!highest_perf || !nominal_perf) { pr_debug("CPU%d: highest or nominal performance missing\n", cpu); return 0; @@ -664,8 +674,12 @@ static u64 get_max_boost_ratio(unsigned int cpu) return div_u64(highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf); } + #else -static inline u64 get_max_boost_ratio(unsigned int cpu) { return 0; } +static inline u64 get_max_boost_ratio(unsigned int cpu, u64 *nominal_freq) +{ + return 0; +} #endif static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) @@ -677,7 +691,7 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) struct cpuinfo_x86 *c = &cpu_data(cpu); unsigned int valid_states = 0; unsigned int result = 0; - u64 max_boost_ratio; + u64 max_boost_ratio, nominal_freq = 0; unsigned int i; #ifdef CONFIG_SMP static int blacklisted; @@ -827,16 +841,20 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) } freq_table[valid_states].frequency = CPUFREQ_TABLE_END; - max_boost_ratio = get_max_boost_ratio(cpu); + max_boost_ratio = get_max_boost_ratio(cpu, &nominal_freq); if (max_boost_ratio) { - unsigned int freq = freq_table[0].frequency; + unsigned int freq = nominal_freq; /* - * Because the loop above sorts the freq_table entries in the - * descending order, freq is the maximum frequency in the table. - * Assume that it corresponds to the CPPC nominal frequency and - * use it to set cpuinfo.max_freq. + * The loop above sorts the freq_table entries in the + * descending order. If ACPI CPPC has not advertised + * the nominal frequency (this is possible in CPPC + * revisions prior to 3), then use the first entry in + * the pstate table as a proxy for nominal frequency. */ + if (!freq) + freq = freq_table[0].frequency; + policy->cpuinfo.max_freq = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; } else { /*