Message ID | 20250211194407.2577252-1-sohil.mehta@intel.com |
---|---|
Headers | show |
Series | Prepare for new Intel Family numbers | expand |
X86_FEATURE_REP_GOOD is a linux defined feature flag to track whether
fast string operations should be used for copy_page(). It is also used
as a backup alternative for clear_page() if enhanced fast string
operations (ERMS) are not available.
Currently, the flag is only set for Family 6 processors. Extend the
check to include upcoming processors in Family 18 and 19.
It is uncertain whether X86_FEATURE_REP_GOOD should be set for Family 15
(Pentium 4) as well. Commit 185f3b9da24c ("x86: make intel.c have 64-bit
support code") that originally set the flag also set the
x86_cache_alignment preference for Family 15 processors in the same
commit. The omission of the Family 15 may have been intentional.
Also, move the check before a related check in early_init_intel() to
avoid resetting the flag.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v2: Separate out the REP_GOOD (copy page) specific change into a
separate commit.
On 2/11/25 11:44, Sohil Mehta wrote: > Introduce names for some old pentium 4 models and replace the x86_model > checks with VFM ones. Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
On 2/11/25 11:44, Sohil Mehta wrote: > + if (c->x86_vendor == X86_VENDOR_INTEL && > + ((c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm <= INTEL_PENTIUM_M_DOTHAN) || > + (c->x86_vfm >= INTEL_P4_WILLAMETTE && c->x86_vfm <= INTEL_P4_CEDARMILL))) { Since these are both closed checks and not open-ended, is the if (c->x86_vendor == X86_VENDOR_INTEL && bit needed or superfluous? Also, super nit, can you vertically align the two range checks, please? ((c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm <= INTEL_PENTIUM_M_DOTHAN) || (c->x86_vfm >= INTEL_P4_WILLAMETTE && c->x86_vfm <= INTEL_P4_CEDARMILL))) {
On 2/11/2025 12:58 PM, Dave Hansen wrote: > On 2/11/25 11:43, Sohil Mehta wrote: >> + /* >> + * Return without adjustment if the Family isn't 6. >> + * The rest of the function assumes Family 6. >> + */ >> + if (c->x86 != 6) >> + return tjmax; > > Shouldn't we be converting this over to the vfm matches? > For drivers/, I mainly focused on fixes instead of cleanups. Converting drivers over to VFM checks is significant work. There are a lot of such comparisons and switch cases (probably more than 50) across drivers/cpufreq/ and drivers/hwmon/. Some of the functions might need significant refactoring and rewrites. I think someone with expertise in that particular driver should probably do it. I did start with it initially but it is beyond my bandwidth at the moment. > This is kinda icky: > >> + return family > 15 || >> + (family == 6 && >> + model > 0xe && >> + model != 0x1c && >> + model != 0x26 && >> + model != 0x27 && >> + model != 0x35 && >> + model != 0x36); >> } > > I'm not sure how this escaped so far. Probably because it's not in arch/x86. >
On 2/11/2025 1:09 PM, Dave Hansen wrote: > On 2/11/25 11:44, Sohil Mehta wrote: >> + if (c->x86_vendor == X86_VENDOR_INTEL && >> + ((c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm <= INTEL_PENTIUM_M_DOTHAN) || >> + (c->x86_vfm >= INTEL_P4_WILLAMETTE && c->x86_vfm <= INTEL_P4_CEDARMILL))) { > > Since these are both closed checks and not open-ended, is the > > if (c->x86_vendor == X86_VENDOR_INTEL && > > bit needed or superfluous? > You are right, since it is close ended on both sides we should be able to remove the X86_VENDOR_INTEL. I was thinking if we should leave it there to avoid confusion. But, INTEL_* in the VFM string is a good enough hint that the checks are Intel specific. Also, it's not like this check is going to be modified frequently. > Also, super nit, can you vertically align the two range checks, please? > > ((c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm <= > INTEL_PENTIUM_M_DOTHAN) || > (c->x86_vfm >= INTEL_P4_WILLAMETTE && c->x86_vfm <= > INTEL_P4_CEDARMILL))) { > >
On 11/02/2025 8:53 pm, Dave Hansen wrote: > On 2/11/25 11:43, Sohil Mehta wrote: >> + /* >> + * Modern CPUs are generally expected to have a sane fast string >> + * implementation. However, the BIOS may disable it on certain CPUs >> + * via the architectural FAST_STRING bit. >> + */ >> + if (IS_ENABLED(CONFIG_X86_64) && (c->x86 == 6 || c->x86 > 15)) >> + set_cpu_cap(c, X86_FEATURE_REP_GOOD); > I'm not sure the BIOS comment is helpful here. > > Also, at this point, let's just make the check >=6 (or the >=PPRO > equivalent). > > It will only matter if *all* of these are true: > 1. Someone has a 64-bit capable P4 that powers on > 2. They're running a 64-bit mainline kernel > 3. String copy is *actually* slower than the alternative > 4. They are performance sensitive enough to notice > > We don't even know the answer to #3 for sure. Let's just say what we're > doing in a comment: > > /* Assume that any 64-bit CPU has a good implementation */ If you're going to override the BIOS setting, then you need to explicitly set MSR_MISC_ENABLE.FAST_STRINGS. Otherwise you're claiming to Linux that REP is good even when hardware is prohibited from using optimisations. ~Andrew
On Tue, 2025-02-11 at 12:58 -0800, Dave Hansen wrote: > On 2/11/25 11:43, Sohil Mehta wrote: > > + /* > > + * Return without adjustment if the Family isn't 6. > > + * The rest of the function assumes Family 6. > > + */ > > + if (c->x86 != 6) > > + return tjmax; > > Shouldn't we be converting this over to the vfm matches? > > This is kinda icky: > > > + return family > 15 || > > + (family == 6 && > > + model > 0xe && > > + model != 0x1c && > > + model != 0x26 && > > + model != 0x27 && > > + model != 0x35 && > > + model != 0x36); > > } > > I'm not sure how this escaped so far. Probably because it's not in > arch/x86. > This code was introduced 10+ years ago, and it only brings a warning message when reading MSR_IA32_TEMPERATURE_TARGET fails. So probably no one has ever checked this. thanks, rui
On Tue, 2025-02-11 at 13:38 -0800, Sohil Mehta wrote: > On 2/11/2025 12:58 PM, Dave Hansen wrote: > > On 2/11/25 11:43, Sohil Mehta wrote: > > > + /* > > > + * Return without adjustment if the Family isn't 6. > > > + * The rest of the function assumes Family 6. > > > + */ > > > + if (c->x86 != 6) > > > + return tjmax; > > > > Shouldn't we be converting this over to the vfm matches? > > > > For drivers/, I mainly focused on fixes instead of cleanups. > > Converting drivers over to VFM checks is significant work. There are > a > lot of such comparisons and switch cases (probably more than 50) > across > drivers/cpufreq/ and drivers/hwmon/. > > Some of the functions might need significant refactoring and > rewrites. I > think someone with expertise in that particular driver should > probably > do it. I did start with it initially but it is beyond my bandwidth at > the moment. > I agree. adjust_tjmax() contains a list of quirks based on PCI- ID/x86_vendor_id/x86_model/x86_stepping. The common problem is that all the quirks are for Fam6 processors but the family id is not checked. So the fix is sufficient. In fact, I think it is better to move the check to the very beginning of adjust_tjmax(). Plus that, I do think we can have more cleanups on top 1. rename adjust_tjmax() to adjust_tjmax_for_fam6() 2. move all model specific quirks altogether and avoid model checks in the main functions. 3. for processors newer than fam6, the driver should fail to probe rather than using a hardcoded value when reading MSR_IA32_TEMPERATURE_TARGET fails. maybe I can start with something like below. --- drivers/hwmon/coretemp.c | 98 +++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 41 deletions(-) diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index 1aa67a2b5f18..fc2cf607aa36 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -99,6 +99,7 @@ struct platform_data { struct device_attribute name_attr; }; +/* Beginning of Model specific quirks */ struct tjmax_pci { unsigned int device; int tjmax; @@ -147,12 +148,11 @@ static const struct tjmax_model tjmax_model_table[] = { */ }; -static bool is_pkg_temp_data(struct temp_data *tdata) -{ - return tdata->index < 0; -} - -static int adjust_tjmax(struct cpuinfo_x86 *c, u32 id, struct device *dev) +/* + * Adjust tjmax value for early Fam6 CPUs with unreadable MSR_IA32_TEMPERATURE_TARGET + * NOTE: the calculated value may not be correct. + */ +static int adjust_tjmax_for_fam6(struct cpuinfo_x86 *c, u32 id, struct device *dev) { /* The 100C is default for both mobile and non mobile CPUs */ @@ -163,8 +163,16 @@ static int adjust_tjmax(struct cpuinfo_x86 *c, u32 id, struct device *dev) u32 eax, edx; int i; u16 devfn = PCI_DEVFN(0, 0); - struct pci_dev *host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn); + struct pci_dev *host_bridge; + + /* + * Return without adjustment if the Family isn't 6. + * The rest of the function assumes Family 6. + */ + if (c->x86 != 6) + return tjmax; + host_bridge = pci_get_domain_bus_and_slot(0, 0, devfn); /* * Explicit tjmax table entries override heuristics. * First try PCI host bridge IDs, followed by model ID strings @@ -185,12 +193,6 @@ static int adjust_tjmax(struct cpuinfo_x86 *c, u32 id, struct device *dev) return tjmax_table[i].tjmax; } - /* - * Return without adjustment if the Family isn't 6. - * The rest of the function assumes Family 6. - */ - if (c->x86 != 6) - return tjmax; for (i = 0; i < ARRAY_SIZE(tjmax_model_table); i++) { const struct tjmax_model *tm = &tjmax_model_table[i]; @@ -280,6 +282,37 @@ static bool cpu_has_tjmax(struct cpuinfo_x86 *c) model != 0x36); } +static bool cpu_has_ttarget(struct temp_data *tdata) +{ + struct cpuinfo_x86 *c = &cpu_data(tdata->cpu); + + /* + * The target temperature is available on older CPUs but not in the + * MSR_IA32_TEMPERATURE_TARGET register. Atoms don't have the register + * at all. + */ + if (c->x86 > 15 || (c->x86 == 6 && c->x86_model > 0xe && c->x86_model != 0x1c)) + return true; + return false; +} + +static bool cpu_has_broken_ucode(unsigned int cpu) +{ + struct cpuinfo_x86 *c = &cpu_data(cpu); + + /* + * Check if we have problem with errata AE18 of Core processors: + * Readings might stop update when processor visited too deep sleep, + * fixed for stepping D0 (6EC). + */ + if (c->x86 == 6 && c->x86_model == 0xe && c->x86_stepping < 0xc && c->microcode < 0x39) { + pr_err("Errata AE18 not fixed, update BIOS or microcode of the CPU!\n"); + return true; + } + return false; +} +/* End of Model specific quirks */ + static int get_tjmax(struct temp_data *tdata, struct device *dev) { struct cpuinfo_x86 *c = &cpu_data(tdata->cpu); @@ -312,9 +345,8 @@ static int get_tjmax(struct temp_data *tdata, struct device *dev) } else { /* * An assumption is made for early CPUs and unreadable MSR. - * NOTE: the calculated value may not be correct. */ - tdata->tjmax = adjust_tjmax(c, tdata->cpu, dev); + tdata->tjmax = adjust_tjmax_for_fam6(c, tdata->cpu, dev); } return tdata->tjmax; } @@ -324,6 +356,8 @@ static int get_ttarget(struct temp_data *tdata, struct device *dev) u32 eax, edx; int tjmax, ttarget_offset, ret; + if (!cpu_has_ttarget(tdata)) + return -ENODEV; /* * ttarget is valid only if tjmax can be retrieved from * MSR_IA32_TEMPERATURE_TARGET @@ -348,6 +382,11 @@ static int max_zones __read_mostly; /* Array of zone pointers. Serialized by cpu hotplug lock */ static struct platform_device **zone_devices; +static bool is_pkg_temp_data(struct temp_data *tdata) +{ + return tdata->index < 0; +} + static ssize_t show_label(struct device *dev, struct device_attribute *devattr, char *buf) { @@ -460,23 +499,6 @@ static int create_core_attrs(struct temp_data *tdata, struct device *dev) return sysfs_create_group(&dev->kobj, &tdata->attr_group); } - -static int chk_ucode_version(unsigned int cpu) -{ - struct cpuinfo_x86 *c = &cpu_data(cpu); - - /* - * Check if we have problem with errata AE18 of Core processors: - * Readings might stop update when processor visited too deep sleep, - * fixed for stepping D0 (6EC). - */ - if (c->x86 == 6 && c->x86_model == 0xe && c->x86_stepping < 0xc && c->microcode < 0x39) { - pr_err("Errata AE18 not fixed, update BIOS or microcode of the CPU!\n"); - return -ENODEV; - } - return 0; -} - static struct platform_device *coretemp_get_pdev(unsigned int cpu) { int id = topology_logical_die_id(cpu); @@ -585,14 +607,8 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu, /* Make sure tdata->tjmax is a valid indicator for dynamic/static tjmax */ get_tjmax(tdata, &pdev->dev); - /* - * The target temperature is available on older CPUs but not in the - * MSR_IA32_TEMPERATURE_TARGET register. Atoms don't have the register - * at all. - */ - if (c->x86 > 15 || (c->x86 == 6 && c->x86_model > 0xe && c->x86_model != 0x1c)) - if (get_ttarget(tdata, &pdev->dev) >= 0) - tdata->attr_size++; + if (get_ttarget(tdata, &pdev->dev) >= 0) + tdata->attr_size++; /* Create sysfs interfaces */ err = create_core_attrs(tdata, pdata->hwmon_dev); @@ -696,7 +712,7 @@ static int coretemp_cpu_online(unsigned int cpu) struct device *hwmon; /* Check the microcode version of the CPU */ - if (chk_ucode_version(cpu)) + if (cpu_has_broken_ucode(cpu)) return -EINVAL; /*
On 2/12/25 05:43, Zhang, Rui wrote: > I agree. > adjust_tjmax() contains a list of quirks based on PCI- > ID/x86_vendor_id/x86_model/x86_stepping. The common problem is that all > the quirks are for Fam6 processors but the family id is not checked. So > the fix is sufficient. In fact, I think it is better to move the check > to the very beginning of adjust_tjmax(). Or, heck, just remove the model list. dev_warn_once() if the rdmsr fails. Who cares about one more line in dmesg? Why not do the attached patch?
On 2/11/2025 4:54 PM, Andrew Cooper wrote: > If you're going to override the BIOS setting, then you need to > explicitly set MSR_MISC_ENABLE.FAST_STRINGS. > > Otherwise you're claiming to Linux that REP is good even when hardware > is prohibited from using optimisations. > I think the current checks have unnecessary overlap which makes them confusing. We should be fine if we only rely on the architectural MSR_MISC_ENABLE.FAST_STRINGS bit and rely just on the BIOS setting. My justification is below. The simplified version of the current checks is as follows: Check 1 (Based on Family Model numbers): > /* > * Unconditionally set REP_GOOD on early Family 6 processors > */ > if (IS_ENABLED(CONFIG_X86_64) && > (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) > set_cpu_cap(c, X86_FEATURE_REP_GOOD); This check is mostly redundant since it is targeted for 64 bit and very few if any of those CPUs support 64 bit processing. I suggest that we get rid of this check completely. The risk here is fairly limited as well. Check 2 (Based on MISC_ENABLE.FAST_STRING): > /* > * If fast string is not enabled in IA32_MISC_ENABLE for any reason, > * clear the fast string and enhanced fast string CPU capabilities. > */ > if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) { > rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); > if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { > /* X86_FEATURE_ERMS will be automatically set based on CPUID */ > set_cpu_cap(c, X86_FEATURE_REP_GOOD); > } else { > pr_info("Disabled fast string operations\n"); > setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); > setup_clear_cpu_cap(X86_FEATURE_ERMS); > } > } This is the only real check that is needed and should likely suffice in all meaningful scenarios. Comments?
On 12/02/2025 9:19 pm, Sohil Mehta wrote: > Check 1 (Based on Family Model numbers): >> /* >> * Unconditionally set REP_GOOD on early Family 6 processors >> */ >> if (IS_ENABLED(CONFIG_X86_64) && >> (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) >> set_cpu_cap(c, X86_FEATURE_REP_GOOD); > This check is mostly redundant since it is targeted for 64 bit and very > few if any of those CPUs support 64 bit processing. I suggest that we > get rid of this check completely. The risk here is fairly limited as well. PENTIUM_PRO is model 0x1. M_DOTHAN isn't introduced until patch 10, but is model 0xd. And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is dead code given the CONFIG_X86_64 which the compiler can't actually optimise out. > > Check 2 (Based on MISC_ENABLE.FAST_STRING): >> /* >> * If fast string is not enabled in IA32_MISC_ENABLE for any reason, >> * clear the fast string and enhanced fast string CPU capabilities. I'd suggest that a better way of phrasing this is: /* BIOSes typically have a knob for Fast Strings. Honour the user's wishes. */ >> */ >> if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) { >> rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); >> if (misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING) { >> /* X86_FEATURE_ERMS will be automatically set based on CPUID */ >> set_cpu_cap(c, X86_FEATURE_REP_GOOD); >> } else { >> pr_info("Disabled fast string operations\n"); >> setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); >> setup_clear_cpu_cap(X86_FEATURE_ERMS); >> } >> } MSR_MISC_ENABLE exists on all 64bit CPUs, and some 32bit ones too. Therefore, this section alone seems to suffice in order to set up REP_GOOD properly. ~Andrew
On 2/13/2025 3:02 PM, Andrew Cooper wrote: > On 12/02/2025 9:19 pm, Sohil Mehta wrote: >> Check 1 (Based on Family Model numbers): >>> /* >>> * Unconditionally set REP_GOOD on early Family 6 processors >>> */ >>> if (IS_ENABLED(CONFIG_X86_64) && >>> (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_PENTIUM_M_DOTHAN)) >>> set_cpu_cap(c, X86_FEATURE_REP_GOOD); >> This check is mostly redundant since it is targeted for 64 bit and very >> few if any of those CPUs support 64 bit processing. I suggest that we >> get rid of this check completely. The risk here is fairly limited as well. > > PENTIUM_PRO is model 0x1. M_DOTHAN isn't introduced until patch 10, but > is model 0xd. > > And model 0xf (Memron) is the first 64bit capable fam6 CPU, so this is > dead code given the CONFIG_X86_64 which the compiler can't actually > optimise out. > Thanks for confirming. I figured this is likely dead code but wasn't completely sure.
On Wed, 2025-02-12 at 08:57 -0800, Dave Hansen wrote: > On 2/12/25 05:43, Zhang, Rui wrote: > > I agree. > > adjust_tjmax() contains a list of quirks based on PCI- > > ID/x86_vendor_id/x86_model/x86_stepping. The common problem is that > > all > > the quirks are for Fam6 processors but the family id is not > > checked. So > > the fix is sufficient. In fact, I think it is better to move the > > check > > to the very beginning of adjust_tjmax(). > > Or, heck, just remove the model list. dev_warn_once() if the rdmsr > fails. Who cares about one more line in dmesg? > > Why not do the attached patch? The patch looks good to me. -rui