math: Add internal roundeven_finite

Message ID	20241119122522.3290493-1-adhemerval.zanella@linaro.org
State	Accepted
Commit	3b1c5a539b7b8cb833f22012d1a95a4847594747
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F25153858CDA From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr> Subject: [PATCH] math: Add internal roundeven_finite Date: Tue, 19 Nov 2024 09:23:58 -0300 Message-ID: <20241119122522.3290493-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org
Series	math: Add internal roundeven_finite \| expand math: Add internal roundeven_finite

Message ID

20241119122522.3290493-1-adhemerval.zanella@linaro.org

State

Accepted

Commit

3b1c5a539b7b8cb833f22012d1a95a4847594747

Headers

Received-SPF: pass (google.com: domain of
 libc-alpha-bounces~patch=linaro.org@sourceware.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F25153858CDA
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Subject: [PATCH] math: Add internal roundeven_finite
Date: Tue, 19 Nov 2024 09:23:58 -0300
Message-ID: <20241119122522.3290493-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org

Series

math: Add internal roundeven_finite | expand

Commit Message

Adhemerval Zanella Netto Nov. 19, 2024, 12:23 p.m. UTC

Some CORE-MATH routines uses roundeven and most of ISA do not have
an specific instruction for the operation.  In this case, the call
will be routed to generic implementation.

However, if the ISA does support round() and ctz() there is a better
alternative (as used by CORE-MATH).

This patch adds such optimization and also enables it on powerpc.
On a power10 it shows the following improvement:

expm1f
latency       master       patched       improvement
power10       4.3910        2.6595            39.43%
power10       15.8168      11.9993            24.14%

Checked on powerpc64le-linux-gnu and aarch64-linux-gnu.
---
 sysdeps/ieee754/flt-32/e_gammaf_r.c  |  2 +-
 sysdeps/ieee754/flt-32/math_config.h | 29 ++++++++++++++++++++++++++++
 sysdeps/ieee754/flt-32/s_expm1f.c    |  2 +-
 sysdeps/powerpc/fpu/math_private.h   |  5 +++++
 4 files changed, 36 insertions(+), 2 deletions(-)

Comments

Andreas Schwab Nov. 19, 2024, 12:40 p.m. UTC | #1

On Nov 19 2024, Adhemerval Zanella wrote:

> +/* Round x to nearest integer value in floating-point format, rounding halfway
> +  cases to even.  If the input is non finte the result is unspecified.  */
                                         finite

Florian Weimer Nov. 19, 2024, 1:32 p.m. UTC | #2

* Adhemerval Zanella:

> +static inline double
> +roundeven_finite (double_t x)
> +{
> +  double_t y = round (x);
> +  if (fabs (x - y) == 0.5)
> +    {
> +      union { double f; uint64_t i; } u = {y};
> +      union { double f; uint64_t i; } v = {(x > 0) ? y - 1.0 : y + 1.0};
> +      if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i))
> +        y = v.f;
> +    }
> +  return y;
> +}

Inconsistent use of double and double_t?

Would this work instead?

  union { double f; unsigned long i; } v = {y - __builtin_copysign (1.0, x)};

And maybe add

   if (!isfinite (x))
    __builtin_unreachable ();

at the start?

Thanks,
Florian

Adhemerval Zanella Netto Nov. 20, 2024, 12:40 p.m. UTC | #3

On 19/11/24 09:40, Andreas Schwab wrote:
> On Nov 19 2024, Adhemerval Zanella wrote:
> 
>> +/* Round x to nearest integer value in floating-point format, rounding halfway
>> +  cases to even.  If the input is non finte the result is unspecified.  */
>                                          finite
> 

Ack.

Adhemerval Zanella Netto Nov. 20, 2024, 12:53 p.m. UTC | #4

On 19/11/24 10:32, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> +static inline double
>> +roundeven_finite (double_t x)
>> +{
>> +  double_t y = round (x);
>> +  if (fabs (x - y) == 0.5)
>> +    {
>> +      union { double f; uint64_t i; } u = {y};
>> +      union { double f; uint64_t i; } v = {(x > 0) ? y - 1.0 : y + 1.0};
>> +      if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i))
>> +        y = v.f;
>> +    }
>> +  return y;
>> +}
> 
> Inconsistent use of double and double_t?

Indeed, it should be double here.

> 
> Would this work instead?
> 
>   union { double f; unsigned long i; } v = {y - __builtin_copysign (1.0, x)};

Hum, it would and it does generate a slight better code on powerpc.

> 
> And maybe add
> 
>    if (!isfinite (x))
>     __builtin_unreachable ();

Ok.

> 
> at the start?
> 
> Thanks,
> Florian
>

diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c
index 6b1f95d50f..66e8caee0b 100644
--- a/sysdeps/ieee754/flt-32/e_gammaf_r.c
+++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c
@@ -140,7 +140,7 @@  __ieee754_gammaf_r (float x, int *signgamp)
    };
 
   double m = z - 0x1.7p+1;
-  double i = roundeven (m);
+  double i = roundeven_finite (m);
   double step = copysign (1.0, i);
   double d = m - i, d2 = d * d, d4 = d2 * d2, d8 = d4 * d4;
   double f = (c[0] + d * c[1]) + d2 * (c[2] + d * c[3])
diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h
index dc07ebd459..4336beb926 100644
--- a/sysdeps/ieee754/flt-32/math_config.h
+++ b/sysdeps/ieee754/flt-32/math_config.h
@@ -57,6 +57,35 @@  static inline int32_t
 converttoint (double_t x);
 #endif
 
+#ifndef ROUNDEVEN_INTRINSICS
+/* When set, roundeven_finite will route to the internal roundeven function.  */
+# define ROUNDEVEN_INTRINSICS 1
+#endif
+
+#if ROUNDEVEN_INTRINSICS
+/* Round x to nearest integer value in floating-point format, rounding halfway
+  cases to even.  If the input is non finte the result is unspecified.  */
+static inline double_t
+roundeven_finite (double_t x)
+{
+  return roundeven (x);
+}
+#else
+static inline double
+roundeven_finite (double_t x)
+{
+  double_t y = round (x);
+  if (fabs (x - y) == 0.5)
+    {
+      union { double f; uint64_t i; } u = {y};
+      union { double f; uint64_t i; } v = {(x > 0) ? y - 1.0 : y + 1.0};
+      if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i))
+        y = v.f;
+    }
+  return y;
+}
+#endif
+
 static inline uint32_t
 asuint (float f)
 {
diff --git a/sysdeps/ieee754/flt-32/s_expm1f.c b/sysdeps/ieee754/flt-32/s_expm1f.c
index edd7c9acf8..a36e5781f5 100644
--- a/sysdeps/ieee754/flt-32/s_expm1f.c
+++ b/sysdeps/ieee754/flt-32/s_expm1f.c
@@ -95,7 +95,7 @@  __expm1f (float x)
       return __math_oflowf (0);
     }
   double a = iln2 * z;
-  double ia = roundeven (a);
+  double ia = roundeven_finite (a);
   double h = a - ia;
   double h2 = h * h;
   uint64_t u = asuint64 (ia + big);
diff --git a/sysdeps/powerpc/fpu/math_private.h b/sysdeps/powerpc/fpu/math_private.h
index 9ef35b20cd..b22f53d366 100644
--- a/sysdeps/powerpc/fpu/math_private.h
+++ b/sysdeps/powerpc/fpu/math_private.h
@@ -59,4 +59,9 @@  __ieee754_sqrtf128 (_Float128 __x)
 #define _GL_HAS_BUILTIN_ILOGB 0
 #endif
 
+#ifdef _ARCH_PWR6
+/* ISA 2.03 provides frin/round() and cntlzw/ctznll().  */
+# define ROUNDEVEN_INTRINSICS 0
+#endif
+
 #endif /* _PPC_MATH_PRIVATE_H_ */

math: Add internal roundeven_finite

Commit Message

Comments

Patch