mbox series

[00/11] Add more CORE-math implementations to libm

Message ID 20241111134740.1410635-1-adhemerval.zanella@linaro.org
Headers show
Series Add more CORE-math implementations to libm | expand

Message

Adhemerval Zanella Nov. 11, 2024, 1:45 p.m. UTC
This patchset adds the optimized and correctly rounded cbrtf,
erff, erfcf, lgammaf, and tanf.  Each implementation has a benchmark
to evaluate the performance improvements.

I tested the implementation on recent hardware (Ryzen 9 5900X for
x86_64, Ampere/Neoverse for aarch64, POWER10 for powerpc, and
Loongson-3C5000L-LL for loongarch), and all implementations show good
performance improvements.  Like the implementation from ARM optimized
routines, the CORE-MATH one takes advantage of recent ISA and platform
support (like FMA and rounding instructions, along with FP throughput).

Adhemerval Zanella (11):
  benchtests: Add cbrtf benchmark
  benchtests: Add erff benchmark
  benchtests: Add erfcf benchmark
  benchtests: Add lgammaf benchmark
  benchtests: Add tanf benchmark
  math: Use cbrtf from CORE-MATH
  math: Split s_erfF in erff and erfc
  math: Use erff from CORE-MATH
  math: Use erfcf from CORE-MATH
  math: Use lgammaf from CORE-MATH
  math: Use tanf from CORE-MATH

 SHARED-FILES                                  |   26 +
 benchtests/Makefile                           |    5 +
 benchtests/cbrtf-inputs                       | 1005 ++++++
 benchtests/erfcf-inputs                       |  795 +++++
 benchtests/erff-inputs                        |  795 +++++
 benchtests/lgammaf-inputs                     | 1005 ++++++
 benchtests/tanf-inputs                        | 3005 +++++++++++++++++
 math/Makefile                                 |    1 +
 sysdeps/aarch64/libm-test-ulps                |   13 -
 sysdeps/alpha/fpu/libm-test-ulps              |   20 -
 sysdeps/arc/fpu/libm-test-ulps                |   20 -
 sysdeps/arc/nofpu/libm-test-ulps              |    7 -
 sysdeps/arm/libm-test-ulps                    |   22 -
 sysdeps/csky/fpu/libm-test-ulps               |   22 -
 sysdeps/csky/nofpu/libm-test-ulps             |   22 -
 sysdeps/generic/math_int128.h                 |  144 +
 sysdeps/hppa/fpu/libm-test-ulps               |   20 -
 sysdeps/i386/fpu/libm-test-ulps               |   16 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   13 -
 sysdeps/ieee754/dbl-64/s_erfc.c               |    1 +
 sysdeps/ieee754/float128/s_erfcf128.c         |    1 +
 sysdeps/ieee754/flt-32/e_lgammaf_r.c          |  576 ++--
 sysdeps/ieee754/flt-32/k_tanf.c               |  102 +-
 sysdeps/ieee754/flt-32/lgamma_negf.c          |  283 +-
 sysdeps/ieee754/flt-32/s_cbrtf.c              |  136 +-
 sysdeps/ieee754/flt-32/s_erfcf.c              |  185 +
 sysdeps/ieee754/flt-32/s_erff.c               |  470 +--
 sysdeps/ieee754/flt-32/s_tanf.c               |  220 +-
 sysdeps/ieee754/ldbl-128/s_erfcl.c            |    1 +
 sysdeps/ieee754/ldbl-128ibm/s_erfcl.c         |    1 +
 sysdeps/ieee754/ldbl-96/s_erfcl.c             |    1 +
 sysdeps/loongarch/lp64/libm-test-ulps         |   20 -
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    1 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   16 -
 sysdeps/microblaze/libm-test-ulps             |    7 -
 sysdeps/mips/mips32/libm-test-ulps            |   22 -
 sysdeps/mips/mips64/libm-test-ulps            |   20 -
 sysdeps/nios2/libm-test-ulps                  |    7 -
 sysdeps/or1k/fpu/libm-test-ulps               |   22 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   22 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   20 -
 sysdeps/powerpc/nofpu/libm-test-ulps          |   20 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   20 -
 sysdeps/riscv/rvd/libm-test-ulps              |   20 -
 sysdeps/s390/fpu/libm-test-ulps               |   20 -
 sysdeps/sh/libm-test-ulps                     |   12 -
 sysdeps/sparc/fpu/libm-test-ulps              |   20 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   20 -
 48 files changed, 7815 insertions(+), 1407 deletions(-)
 create mode 100644 benchtests/cbrtf-inputs
 create mode 100644 benchtests/erfcf-inputs
 create mode 100644 benchtests/erff-inputs
 create mode 100644 benchtests/lgammaf-inputs
 create mode 100644 benchtests/tanf-inputs
 create mode 100644 sysdeps/generic/math_int128.h
 create mode 100644 sysdeps/ieee754/dbl-64/s_erfc.c
 create mode 100644 sysdeps/ieee754/float128/s_erfcf128.c
 create mode 100644 sysdeps/ieee754/flt-32/s_erfcf.c
 create mode 100644 sysdeps/ieee754/ldbl-128/s_erfcl.c
 create mode 100644 sysdeps/ieee754/ldbl-128ibm/s_erfcl.c
 create mode 100644 sysdeps/ieee754/ldbl-96/s_erfcl.c