diff mbox series

[7/7] x86_64: Optimize modf/modff for x86_64-v2

Message ID 20250528180100.172042-8-adhemerval.zanella@linaro.org
State New
Headers show
Series Simplify and optimize modf/modff | expand

Commit Message

Adhemerval Zanella Netto May 28, 2025, 5:59 p.m. UTC
The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size.  On Ryzen 9 (zen3) with
gcc 14.2.1:

x86_64-v2
reciprocal-throughput        master        patch       difference
workload-0_1                 7.9610       7.7914            2.13%
workload-1_maxint            9.4323       7.8021           17.28%
workload-maxint_maxfloat     8.7379       7.8049           10.68%
workload-integral            7.9492       7.7991            1.89%

latency                      master        patch       difference
workload-0_1                 7.9511      10.8910          -36.97%
workload-1_maxint           15.8278      10.9048           31.10%
workload-maxint_maxfloat    11.3495      10.9139            3.84%
workload-integral           11.5938      10.9071            5.92%

x86_64-v3
reciprocal-throughput        master        patch       difference
workload-0_1                 8.7522       7.9781            8.84%
workload-1_maxint            9.6690       7.9872           17.39%
workload-maxint_maxfloat     8.7634       7.9857            8.87%
workload-integral            8.7397       7.9893            8.59%

latency                      master        patch       difference
workload-0_1                 8.7447       9.5589           -9.31%
workload-1_maxint           13.7480       9.5690           30.40%
workload-maxint_maxfloat    10.0092       9.5680            4.41%
workload-integral            9.7518       9.5743            1.82%

Checked on x86_64-linux-gnu.
---
 sysdeps/x86_64/fpu/math-use-builtins-trunc.h | 9 +++++++++
 1 file changed, 9 insertions(+)
 create mode 100644 sysdeps/x86_64/fpu/math-use-builtins-trunc.h
diff mbox series

Patch

diff --git a/sysdeps/x86_64/fpu/math-use-builtins-trunc.h b/sysdeps/x86_64/fpu/math-use-builtins-trunc.h
new file mode 100644
index 0000000000..c2387eb3da
--- /dev/null
+++ b/sysdeps/x86_64/fpu/math-use-builtins-trunc.h
@@ -0,0 +1,9 @@ 
+#ifdef __SSE4_1__
+# define USE_TRUNC_BUILTIN 1
+# define USE_TRUNCF_BUILTIN 1
+#else
+# define USE_TRUNC_BUILTIN 0
+# define USE_TRUNCF_BUILTIN 0
+#endif
+#define USE_TRUNCL_BUILTIN 0
+#define USE_TRUNCF128_BUILTIN 0