Message ID | 20250131191844.2582716-9-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add c23 CORE-MATH binary32 implementations to libm | expand |
I confirm we get correct rounding for all rounding modes and all binary32 entries on x86_64. I let others review the code, since as a CORE-MATH developer I might be biased. Paul > From: Adhemerval Zanella <adhemerval.zanella@linaro.org> > Cc: DJ Delorie <dj@redhat.com>, > Joseph Myers <josmyers@redhat.com>, > Paul Zimmermann <Paul.Zimmermann@inria.fr>, > Alexei Sibidanov <sibid@uvic.ca> > Date: Fri, 31 Jan 2025 16:17:12 -0300 > > The CORE-MATH implementation is correctly rounded (for any rounding mode) > and shows better performance to the generic acospif. > > The code was adapted to glibc style and to use the definition of > math_config.h (to handle errno, overflow, and underflow). > > Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, > gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): > > latency master patched improvement > x86_64 54.8281 42.9070 21.74% > x86_64v2 54.1717 42.7497 21.08% > x86_64v3 49.3552 34.1512 30.81% > aarch64 (Neoverse) 17.9395 14.3733 19.88% > power8 20.3110 8.8609 56.37% > power10 11.3113 8.84067 21.84% > > reciprocal-throughput master patched improvement > x86_64 21.2301 14.4803 31.79% > x86_64v2 20.6858 13.9506 32.56% > x86_64v3 16.1944 11.3377 29.99% > aarch64 (Neoverse) 11.4474 7.13282 37.69% > power8 10.6916 3.57547 66.56% > power10 4.64269 3.54145 23.72% > --- > SHARED-FILES | 4 + > sysdeps/aarch64/libm-test-ulps | 4 - > sysdeps/arc/fpu/libm-test-ulps | 4 - > sysdeps/arc/nofpu/libm-test-ulps | 1 - > sysdeps/arm/libm-test-ulps | 4 - > sysdeps/hppa/fpu/libm-test-ulps | 4 - > sysdeps/i386/fpu/libm-test-ulps | 4 - > .../i386/i686/fpu/multiarch/libm-test-ulps | 4 - > sysdeps/ieee754/flt-32/s_acospif.c | 137 ++++++++++++++++++ > sysdeps/loongarch/lp64/libm-test-ulps | 4 - > sysdeps/mips/mips64/libm-test-ulps | 4 - > sysdeps/or1k/fpu/libm-test-ulps | 4 - > sysdeps/or1k/nofpu/libm-test-ulps | 1 - > sysdeps/powerpc/fpu/libm-test-ulps | 4 - > sysdeps/riscv/nofpu/libm-test-ulps | 1 - > sysdeps/riscv/rvd/libm-test-ulps | 4 - > sysdeps/s390/fpu/libm-test-ulps | 4 - > sysdeps/sparc/fpu/libm-test-ulps | 4 - > sysdeps/x86_64/fpu/libm-test-ulps | 4 - > 19 files changed, 141 insertions(+), 59 deletions(-) > create mode 100644 sysdeps/ieee754/flt-32/s_acospif.c > > diff --git a/SHARED-FILES b/SHARED-FILES > index 032c407881..3fde72644a 100644 > --- a/SHARED-FILES > +++ b/SHARED-FILES > @@ -334,3 +334,7 @@ sysdeps/ieee754/flt-32/s_tanhf.c: > (src/binary32/tanh/tanhf.c in CORE-MATH) > - the code was adapted to use glibc code style and internal > functions to handle errno, overflow, and underflow. > +sysdeps/ieee754/flt-32/s_acospif.c: > + (src/binary32/acospi/acospif.c in CORE-MATH) > + - the code was adapted to use glibc code style and internal > + functions to handle errno, overflow, and underflow. > diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps > index 59ec929176..1a403d95b6 100644 > --- a/sysdeps/aarch64/libm-test-ulps > +++ b/sysdeps/aarch64/libm-test-ulps > @@ -51,22 +51,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps > index 82bc15602c..c0c5daa589 100644 > --- a/sysdeps/arc/fpu/libm-test-ulps > +++ b/sysdeps/arc/fpu/libm-test-ulps > @@ -27,19 +27,15 @@ double: 3 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > > Function: "acospi_upward": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps > index aa93d71244..2b34f5a0ab 100644 > --- a/sysdeps/arc/nofpu/libm-test-ulps > +++ b/sysdeps/arc/nofpu/libm-test-ulps > @@ -9,7 +9,6 @@ double: 2 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps > index 218ffa8b4b..afb0532a66 100644 > --- a/sysdeps/arm/libm-test-ulps > +++ b/sysdeps/arm/libm-test-ulps > @@ -27,19 +27,15 @@ double: 2 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > > Function: "acospi_upward": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps > index 2b8fa35078..b9959c8a12 100644 > --- a/sysdeps/hppa/fpu/libm-test-ulps > +++ b/sysdeps/hppa/fpu/libm-test-ulps > @@ -27,19 +27,15 @@ double: 2 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > > Function: "acospi_upward": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps > index b8983447fe..85c58f34e9 100644 > --- a/sysdeps/i386/fpu/libm-test-ulps > +++ b/sysdeps/i386/fpu/libm-test-ulps > @@ -41,25 +41,21 @@ ldouble: 3 > > Function: "acospi": > double: 1 > -float: 1 > float128: 2 > ldouble: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_upward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > index 750d51906b..bc14e7e115 100644 > --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps > @@ -41,25 +41,21 @@ ldouble: 3 > > Function: "acospi": > double: 1 > -float: 1 > float128: 2 > ldouble: 3 > > Function: "acospi_downward": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_upward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > diff --git a/sysdeps/ieee754/flt-32/s_acospif.c b/sysdeps/ieee754/flt-32/s_acospif.c > new file mode 100644 > index 0000000000..03d63a74c8 > --- /dev/null > +++ b/sysdeps/ieee754/flt-32/s_acospif.c > @@ -0,0 +1,137 @@ > +/* Correctly-rounded half-revolution arc-cosine function for binary32 value. > + > +Copyright (c) 2022-2025 Alexei Sibidanov. > + > +The original version of this file was copied from the CORE-MATH > +project (file src/binary32/acospi/acospif.c, revision 1a6a9ab). > + > +Permission is hereby granted, free of charge, to any person obtaining a copy > +of this software and associated documentation files (the "Software"), to deal > +in the Software without restriction, including without limitation the rights > +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > +copies of the Software, and to permit persons to whom the Software is > +furnished to do so, subject to the following conditions: > + > +The above copyright notice and this permission notice shall be included in all > +copies or substantial portions of the Software. > + > +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > +SOFTWARE. > + > +*/ > + > +#include <math.h> > +#include <stdint.h> > +#include <libm-alias-float.h> > +#include "math_config.h" > + > +float > +__acospif (float x) > +{ > + float ax = fabsf (x); > + double az = ax; > + double z = x; > + uint32_t t = asuint (x); > + int e = (t >> 23) & 0xff; > + if (__glibc_unlikely (e >= 127)) > + { > + if (x == 1.0f) > + return 0.0f; > + if (x == -1.0f) > + return 1.0f; > + if (e == 0xff && (t << 9)) > + return x + x; /* nan */ > + return __math_edomf ((x - x) / (x - x)); /* nan */ > + } > + int s = 146 - e; > + int i = 0; > + if (__glibc_likely (s < 32)) > + i = ((t & (~0u >> 9)) | 1 << 23) >> s; > + static const double ch[][8] = { > + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, > + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, > + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, > + { 0x1.ffffffffd3cd9p-2, -0x1.17cc1b3355fd5p-4, 0x1.d067a1e8d5a99p-6, > + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, > + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, > + { 0x1.fffffff7c4622p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, > + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, > + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, > + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, > + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, > + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, > + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, > + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, > + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, > + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, > + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, > + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, > + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, > + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, > + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, > + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, > + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, > + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, > + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, > + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, > + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, > + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, > + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, > + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, > + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, > + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, > + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, > + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, > + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, > + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, > + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, > + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, > + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, > + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, > + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, > + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, > + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, > + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, > + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, > + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, > + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, > + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 }, > + }; > + const double *c = ch[i]; > + double z2 = z * z; > + double z4 = z2 * z2; > + if (__glibc_unlikely (i == 0)) > + { > + double c0 = c[0] + z2 * c[1]; > + double c2 = c[2] + z2 * c[3]; > + double c4 = c[4] + z2 * c[5]; > + double c6 = c[6] + z2 * c[7]; > + c0 += c2 * z4; > + c4 += c6 * z4; > + /* For |x| <= 0x1.0fd288p-127, c0 += c4*(z4*z4) would raise a spurious > + underflow exception, we use an FMA instead, where c4 * z4 does not > + underflow. */ > + c0 = fma (c4 * z4, z4, c0); > + return 0.5 - z * c0; > + } > + else > + { > + double f = sqrt (1 - az); > + double c0 = c[0] + az * c[1]; > + double c2 = c[2] + az * c[3]; > + double c4 = c[4] + az * c[5]; > + double c6 = c[6] + az * c[7]; > + c0 += c2 * z2; > + c4 += c6 * z2; > + c0 += c4 * z4; > + static const double o[] = { 0, 1 }; > + double r = o[t >> 31] + c0 * copysign (f, x); > + return r; > + } > +} > +libm_alias_float (__acospi, acospi) > diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps > index f8bf089773..ce84ddf1e6 100644 > --- a/sysdeps/loongarch/lp64/libm-test-ulps > +++ b/sysdeps/loongarch/lp64/libm-test-ulps > @@ -35,22 +35,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps > index 98079e08e9..67c37dfd5e 100644 > --- a/sysdeps/mips/mips64/libm-test-ulps > +++ b/sysdeps/mips/mips64/libm-test-ulps > @@ -35,22 +35,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps > index b0de024cae..d3b1036d29 100644 > --- a/sysdeps/or1k/fpu/libm-test-ulps > +++ b/sysdeps/or1k/fpu/libm-test-ulps > @@ -27,19 +27,15 @@ double: 2 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > > Function: "acospi_upward": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps > index aa047f3b6f..14b7e0f3f9 100644 > --- a/sysdeps/or1k/nofpu/libm-test-ulps > +++ b/sysdeps/or1k/nofpu/libm-test-ulps > @@ -27,7 +27,6 @@ double: 2 > > Function: "acospi": > double: 2 > -float: 1 > > Function: "asin": > double: 1 > diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps > index cf3dec38a9..c9c86de147 100644 > --- a/sysdeps/powerpc/fpu/libm-test-ulps > +++ b/sysdeps/powerpc/fpu/libm-test-ulps > @@ -43,25 +43,21 @@ ldouble: 4 > > Function: "acospi": > double: 2 > -float: 1 > float128: 1 > ldouble: 1 > > Function: "acospi_downward": > double: 1 > -float: 2 > float128: 1 > ldouble: 4 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 4 > > Function: "acospi_upward": > double: 2 > -float: 1 > float128: 2 > ldouble: 4 > > diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps > index d971ee20b9..6206a9531a 100644 > --- a/sysdeps/riscv/nofpu/libm-test-ulps > +++ b/sysdeps/riscv/nofpu/libm-test-ulps > @@ -35,7 +35,6 @@ ldouble: 2 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps > index 0f849067be..124ca4b719 100644 > --- a/sysdeps/riscv/rvd/libm-test-ulps > +++ b/sysdeps/riscv/rvd/libm-test-ulps > @@ -35,22 +35,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps > index 76a1f3c7e5..364ccf3326 100644 > --- a/sysdeps/s390/fpu/libm-test-ulps > +++ b/sysdeps/s390/fpu/libm-test-ulps > @@ -35,22 +35,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps > index 02a80c499c..1174972002 100644 > --- a/sysdeps/sparc/fpu/libm-test-ulps > +++ b/sysdeps/sparc/fpu/libm-test-ulps > @@ -35,22 +35,18 @@ ldouble: 3 > > Function: "acospi": > double: 2 > -float: 1 > ldouble: 2 > > Function: "acospi_downward": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > ldouble: 1 > > Function: "acospi_upward": > double: 2 > -float: 1 > ldouble: 2 > > Function: "asin": > diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps > index e454a63eea..5ed5112b49 100644 > --- a/sysdeps/x86_64/fpu/libm-test-ulps > +++ b/sysdeps/x86_64/fpu/libm-test-ulps > @@ -83,25 +83,21 @@ float: 2 > > Function: "acospi": > double: 2 > -float: 1 > float128: 2 > ldouble: 3 > > Function: "acospi_downward": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_towardzero": > double: 1 > -float: 2 > float128: 1 > ldouble: 3 > > Function: "acospi_upward": > double: 2 > -float: 1 > float128: 2 > ldouble: 2 > > -- > 2.43.0 > >
diff --git a/SHARED-FILES b/SHARED-FILES index 032c407881..3fde72644a 100644 --- a/SHARED-FILES +++ b/SHARED-FILES @@ -334,3 +334,7 @@ sysdeps/ieee754/flt-32/s_tanhf.c: (src/binary32/tanh/tanhf.c in CORE-MATH) - the code was adapted to use glibc code style and internal functions to handle errno, overflow, and underflow. +sysdeps/ieee754/flt-32/s_acospif.c: + (src/binary32/acospi/acospif.c in CORE-MATH) + - the code was adapted to use glibc code style and internal + functions to handle errno, overflow, and underflow. diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 59ec929176..1a403d95b6 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -51,22 +51,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/arc/fpu/libm-test-ulps b/sysdeps/arc/fpu/libm-test-ulps index 82bc15602c..c0c5daa589 100644 --- a/sysdeps/arc/fpu/libm-test-ulps +++ b/sysdeps/arc/fpu/libm-test-ulps @@ -27,19 +27,15 @@ double: 3 Function: "acospi": double: 2 -float: 1 Function: "acospi_downward": double: 1 -float: 2 Function: "acospi_towardzero": double: 1 -float: 2 Function: "acospi_upward": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/arc/nofpu/libm-test-ulps b/sysdeps/arc/nofpu/libm-test-ulps index aa93d71244..2b34f5a0ab 100644 --- a/sysdeps/arc/nofpu/libm-test-ulps +++ b/sysdeps/arc/nofpu/libm-test-ulps @@ -9,7 +9,6 @@ double: 2 Function: "acospi": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/arm/libm-test-ulps b/sysdeps/arm/libm-test-ulps index 218ffa8b4b..afb0532a66 100644 --- a/sysdeps/arm/libm-test-ulps +++ b/sysdeps/arm/libm-test-ulps @@ -27,19 +27,15 @@ double: 2 Function: "acospi": double: 2 -float: 1 Function: "acospi_downward": double: 1 -float: 2 Function: "acospi_towardzero": double: 1 -float: 2 Function: "acospi_upward": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/hppa/fpu/libm-test-ulps b/sysdeps/hppa/fpu/libm-test-ulps index 2b8fa35078..b9959c8a12 100644 --- a/sysdeps/hppa/fpu/libm-test-ulps +++ b/sysdeps/hppa/fpu/libm-test-ulps @@ -27,19 +27,15 @@ double: 2 Function: "acospi": double: 2 -float: 1 Function: "acospi_downward": double: 1 -float: 2 Function: "acospi_towardzero": double: 1 -float: 2 Function: "acospi_upward": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps index b8983447fe..85c58f34e9 100644 --- a/sysdeps/i386/fpu/libm-test-ulps +++ b/sysdeps/i386/fpu/libm-test-ulps @@ -41,25 +41,21 @@ ldouble: 3 Function: "acospi": double: 1 -float: 1 float128: 2 ldouble: 1 Function: "acospi_downward": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_upward": double: 2 -float: 1 float128: 2 ldouble: 2 diff --git a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps index 750d51906b..bc14e7e115 100644 --- a/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps +++ b/sysdeps/i386/i686/fpu/multiarch/libm-test-ulps @@ -41,25 +41,21 @@ ldouble: 3 Function: "acospi": double: 1 -float: 1 float128: 2 ldouble: 3 Function: "acospi_downward": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_upward": double: 2 -float: 1 float128: 2 ldouble: 2 diff --git a/sysdeps/ieee754/flt-32/s_acospif.c b/sysdeps/ieee754/flt-32/s_acospif.c new file mode 100644 index 0000000000..03d63a74c8 --- /dev/null +++ b/sysdeps/ieee754/flt-32/s_acospif.c @@ -0,0 +1,137 @@ +/* Correctly-rounded half-revolution arc-cosine function for binary32 value. + +Copyright (c) 2022-2025 Alexei Sibidanov. + +The original version of this file was copied from the CORE-MATH +project (file src/binary32/acospi/acospif.c, revision 1a6a9ab). + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +*/ + +#include <math.h> +#include <stdint.h> +#include <libm-alias-float.h> +#include "math_config.h" + +float +__acospif (float x) +{ + float ax = fabsf (x); + double az = ax; + double z = x; + uint32_t t = asuint (x); + int e = (t >> 23) & 0xff; + if (__glibc_unlikely (e >= 127)) + { + if (x == 1.0f) + return 0.0f; + if (x == -1.0f) + return 1.0f; + if (e == 0xff && (t << 9)) + return x + x; /* nan */ + return __math_edomf ((x - x) / (x - x)); /* nan */ + } + int s = 146 - e; + int i = 0; + if (__glibc_likely (s < 32)) + i = ((t & (~0u >> 9)) | 1 << 23) >> s; + static const double ch[][8] = { + { 0x1.45f306dc9c882p-2, 0x1.b2995e7b7dc2fp-5, 0x1.8723a1cf50c7ep-6, + 0x1.d1a4591d16a29p-7, 0x1.3ce3aa68ddaeep-7, 0x1.d3182ab0cc1bfp-8, + 0x1.62b379a8b88e3p-8, 0x1.6811411fcfec2p-8 }, + { 0x1.ffffffffd3cd9p-2, -0x1.17cc1b3355fd5p-4, 0x1.d067a1e8d5a99p-6, + -0x1.08e16fb09314ap-6, 0x1.5eed43d42dcb2p-7, -0x1.f58baca7acc71p-8, + 0x1.5dab64e2dcf15p-8, -0x1.59270e30797acp-9 }, + { 0x1.fffffff7c4622p-2, -0x1.17cc149ded3a2p-4, 0x1.d0654d4cb2c1ap-6, + -0x1.08c3ba713d33ap-6, 0x1.5d2053481079cp-7, -0x1.e485ebc545e7ep-8, + 0x1.303baca167dddp-8, -0x1.dee8d16d06b38p-10 }, + { 0x1.ffffffa749848p-2, -0x1.17cbe7155935p-4, 0x1.d05a312269adfp-6, + -0x1.0862b3ee617d7p-6, 0x1.5920708db2a73p-7, -0x1.cb0463b3862c3p-8, + 0x1.02b82478f95d7p-8, -0x1.52a7b8579e729p-10 }, + { 0x1.fffffe1f92bb5p-2, -0x1.17cb3e74c64e3p-4, 0x1.d03af67311cbfp-6, + -0x1.079441cbfc7ap-6, 0x1.52b4287805a61p-7, -0x1.ac3286d604a98p-8, + 0x1.b2f1210d9701bp-9, -0x1.e740ddc25afd6p-11 }, + { 0x1.fffff92beb6e2p-2, -0x1.17c986fe9518bp-4, 0x1.cff98167c9a5ep-6, + -0x1.0638b591eae52p-6, 0x1.4a0803828959ep-7, -0x1.8adeca229f11dp-8, + 0x1.6b9a7ba05dfcep-9, -0x1.640521a43b2dp-11 }, + { 0x1.ffffeccee5bfcp-2, -0x1.17c5f1753f5eap-4, 0x1.cf874e4fe258fp-6, + -0x1.043e6cf77b256p-6, 0x1.3f7db42227d92p-7, -0x1.691a6fa2a2882p-8, + 0x1.2f6543162bc61p-9, -0x1.07d5da05822b6p-11 }, + { 0x1.ffffd2f64431dp-2, -0x1.17bf8208c10c1p-4, 0x1.ced7487cdb124p-6, + -0x1.01a0d30932905p-6, 0x1.3388f99b254dap-7, -0x1.4844e245c65bdp-8, + 0x1.fa777150197c6p-10, -0x1.8c1ecf16a05c8p-12 }, + { 0x1.ffffa36d1712ep-2, -0x1.17b523971bd4ep-4, 0x1.cddee26de2deep-6, + -0x1.fccb00abaaabcp-7, 0x1.269afc3622342p-7, -0x1.2933152686752p-8, + 0x1.a76d4956cc9a3p-10, -0x1.2ce7d6dc651cep-12 }, + { 0x1.ffff5402ab3a1p-2, -0x1.17a5ba85da77ap-4, 0x1.cc96894e05c02p-6, + -0x1.f532143cb832ep-7, 0x1.19180b660ff09p-7, -0x1.0c57417a78b3cp-8, + 0x1.62e26cbd7bb1ep-10, -0x1.ce28d33fe1df3p-13 }, + { 0x1.fffed8d639751p-2, -0x1.1790349f3ae76p-4, 0x1.caf9a4fd1b398p-6, + -0x1.ec986b111342ep-7, 0x1.0b53c3ad4baa4p-7, -0x1.e3c2282eeace4p-9, + 0x1.2a55369f55bbep-10, -0x1.667fe48c396e8p-13 }, + { 0x1.fffe24b714161p-2, -0x1.177394fbcb719p-4, 0x1.c90652d920ebdp-6, + -0x1.e3239197bddf1p-7, 0x1.fb2188525b025p-8, -0x1.b3aadd451afc7p-9, + 0x1.f74020f31fdabp-11, -0x1.18b0cb246768dp-13 }, + { 0x1.fffd298bec9e2p-2, -0x1.174efbfd34648p-4, 0x1.c6bcfe48ea92bp-6, + -0x1.d8f9f2a16157cp-7, 0x1.e0044f56c8864p-8, -0x1.883e2347fe76cp-9, + 0x1.a9f0e3c1b7af5p-11, -0x1.bb5acc0e60825p-14 }, + { 0x1.fffbd8b784c4dp-2, -0x1.1721abdd3722ep-4, 0x1.c41fee756d4bp-6, + -0x1.ce40bccf8065fp-7, 0x1.c59b684b70ef9p-8, -0x1.6133d027996b3p-9, + 0x1.69cad01106397p-11, -0x1.60f8e45494156p-14 }, + { 0x1.fffa23749cf88p-2, -0x1.16eb0a8285c06p-4, 0x1.c132d762e1b0dp-6, + -0x1.c31a959398f4ep-7, 0x1.ac1c5b46bc8ap-8, -0x1.3e34f1abe51dcp-9, + 0x1.346738737c0b9p-11, -0x1.1b227a3f5c75p-14 }, + { 0x1.fff7fb25bb407p-2, -0x1.16aaa14d7564p-4, 0x1.bdfa75fca5ff2p-6, + -0x1.b7a6e260d079cp-7, 0x1.93ab06911033cp-8, -0x1.1ee5560967fd5p-9, + 0x1.07d31060838bfp-11, -0x1.c96f33a283115p-15 }, + }; + const double *c = ch[i]; + double z2 = z * z; + double z4 = z2 * z2; + if (__glibc_unlikely (i == 0)) + { + double c0 = c[0] + z2 * c[1]; + double c2 = c[2] + z2 * c[3]; + double c4 = c[4] + z2 * c[5]; + double c6 = c[6] + z2 * c[7]; + c0 += c2 * z4; + c4 += c6 * z4; + /* For |x| <= 0x1.0fd288p-127, c0 += c4*(z4*z4) would raise a spurious + underflow exception, we use an FMA instead, where c4 * z4 does not + underflow. */ + c0 = fma (c4 * z4, z4, c0); + return 0.5 - z * c0; + } + else + { + double f = sqrt (1 - az); + double c0 = c[0] + az * c[1]; + double c2 = c[2] + az * c[3]; + double c4 = c[4] + az * c[5]; + double c6 = c[6] + az * c[7]; + c0 += c2 * z2; + c4 += c6 * z2; + c0 += c4 * z4; + static const double o[] = { 0, 1 }; + double r = o[t >> 31] + c0 * copysign (f, x); + return r; + } +} +libm_alias_float (__acospi, acospi) diff --git a/sysdeps/loongarch/lp64/libm-test-ulps b/sysdeps/loongarch/lp64/libm-test-ulps index f8bf089773..ce84ddf1e6 100644 --- a/sysdeps/loongarch/lp64/libm-test-ulps +++ b/sysdeps/loongarch/lp64/libm-test-ulps @@ -35,22 +35,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/mips/mips64/libm-test-ulps b/sysdeps/mips/mips64/libm-test-ulps index 98079e08e9..67c37dfd5e 100644 --- a/sysdeps/mips/mips64/libm-test-ulps +++ b/sysdeps/mips/mips64/libm-test-ulps @@ -35,22 +35,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/or1k/fpu/libm-test-ulps b/sysdeps/or1k/fpu/libm-test-ulps index b0de024cae..d3b1036d29 100644 --- a/sysdeps/or1k/fpu/libm-test-ulps +++ b/sysdeps/or1k/fpu/libm-test-ulps @@ -27,19 +27,15 @@ double: 2 Function: "acospi": double: 2 -float: 1 Function: "acospi_downward": double: 1 -float: 2 Function: "acospi_towardzero": double: 1 -float: 2 Function: "acospi_upward": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/or1k/nofpu/libm-test-ulps b/sysdeps/or1k/nofpu/libm-test-ulps index aa047f3b6f..14b7e0f3f9 100644 --- a/sysdeps/or1k/nofpu/libm-test-ulps +++ b/sysdeps/or1k/nofpu/libm-test-ulps @@ -27,7 +27,6 @@ double: 2 Function: "acospi": double: 2 -float: 1 Function: "asin": double: 1 diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps index cf3dec38a9..c9c86de147 100644 --- a/sysdeps/powerpc/fpu/libm-test-ulps +++ b/sysdeps/powerpc/fpu/libm-test-ulps @@ -43,25 +43,21 @@ ldouble: 4 Function: "acospi": double: 2 -float: 1 float128: 1 ldouble: 1 Function: "acospi_downward": double: 1 -float: 2 float128: 1 ldouble: 4 Function: "acospi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 4 Function: "acospi_upward": double: 2 -float: 1 float128: 2 ldouble: 4 diff --git a/sysdeps/riscv/nofpu/libm-test-ulps b/sysdeps/riscv/nofpu/libm-test-ulps index d971ee20b9..6206a9531a 100644 --- a/sysdeps/riscv/nofpu/libm-test-ulps +++ b/sysdeps/riscv/nofpu/libm-test-ulps @@ -35,7 +35,6 @@ ldouble: 2 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/riscv/rvd/libm-test-ulps b/sysdeps/riscv/rvd/libm-test-ulps index 0f849067be..124ca4b719 100644 --- a/sysdeps/riscv/rvd/libm-test-ulps +++ b/sysdeps/riscv/rvd/libm-test-ulps @@ -35,22 +35,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/s390/fpu/libm-test-ulps b/sysdeps/s390/fpu/libm-test-ulps index 76a1f3c7e5..364ccf3326 100644 --- a/sysdeps/s390/fpu/libm-test-ulps +++ b/sysdeps/s390/fpu/libm-test-ulps @@ -35,22 +35,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps index 02a80c499c..1174972002 100644 --- a/sysdeps/sparc/fpu/libm-test-ulps +++ b/sysdeps/sparc/fpu/libm-test-ulps @@ -35,22 +35,18 @@ ldouble: 3 Function: "acospi": double: 2 -float: 1 ldouble: 2 Function: "acospi_downward": double: 1 -float: 2 ldouble: 1 Function: "acospi_towardzero": double: 1 -float: 2 ldouble: 1 Function: "acospi_upward": double: 2 -float: 1 ldouble: 2 Function: "asin": diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index e454a63eea..5ed5112b49 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -83,25 +83,21 @@ float: 2 Function: "acospi": double: 2 -float: 1 float128: 2 ldouble: 3 Function: "acospi_downward": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_towardzero": double: 1 -float: 2 float128: 1 ldouble: 3 Function: "acospi_upward": double: 2 -float: 1 float128: 2 ldouble: 2