From patchwork Wed Nov 20 13:22:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 844527 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:8b:b0:382:43a8:7b94 with SMTP id m11csp1700547wrx; Wed, 20 Nov 2024 05:23:11 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVbF5vX/YefLxiAY1UeZEa0Ch68lIWANPpLr53TlVgpU/H14hMJIznAPGqWnMNGWKuwaQPPbA==@linaro.org X-Google-Smtp-Source: AGHT+IFj6WLP1nHmRnY7jWioIhnhJK0cEJtTL7XGZDtZD7IACZbJflaoZ4ik09bXtc8mPB+xax09 X-Received: by 2002:a05:6214:21a9:b0:6ce:26d0:c7af with SMTP id 6a1803df08f44-6d437856e0dmr30981736d6.31.1732108991261; Wed, 20 Nov 2024 05:23:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1732108991; cv=pass; d=google.com; s=arc-20240605; b=H+QJk+qg06lYX6d+kBgptNPT02dAJzDhbCFtimWISBdVaDBkZPVK4cnezuPZjgz46M jkP7GrJoq3OCoJt/afEt/GpK0HI8OWefQB17vUQiiC6Hn+xjMjneaMUS75WwSnnvh1qm MQdIiI4iW5PnNiCdYjJH+DMvGApIxpUOhVSk4ws9Ix2YrGvREkfZF7HyOOV3Y136ejBn crzffYDhQgFTJBmqaANJTxhkhQz9osnb1tTse7/GtB4Ro0vURl+53kHZyTEBEEHAEimc dKyIpozMaaK0OQIy6aytUc5zh5ih7Lzen3Qg2u9/MUh7/poswndF2VDjfcA7YStDUGrQ dpjw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dkim-filter:arc-filter:dmarc-filter:delivered-to:dkim-filter; bh=uibRfj1GeT4/3aFzgD9ihJsmAK9Ke+NNggIi5QAFzKk=; fh=sQ6VrpwIgpWTwGFMYMpywmdT6hbxYROjBvrVuzEpKr4=; b=al2lJYxzQLbyjASqobrJDzQLRYCnJCEa4Jv9ZtFqASVOx2LliDAqQBgkYOf8U4dukS q0/pBk90L2C6Dx3AY5OZHIRKXyqcKplKSuBDHa4oWqZi10ZAbVVyVlljPWSXt0BBbNoq 19iX10GBM1QB4xjh46xHF4dYqNTCF4suWix4HiDfl33aJ8xXN+kWFq6NpF0PXnZ4XH21 +o1QpE3+JFBR6RZae3OQp5rwC612xarT7VQQRJZv/egBGKiuFed8JZ8sfdqCst1bati6 9zixUfDEZUtj5SmwzvHnMCjMP0RwzK7IsZuXaMx+SXgUiayQ7KXlws0oL38SjYI+F6NL TlkA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vso1Q6b7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id 6a1803df08f44-6d43812b430si18400116d6.205.2024.11.20.05.23.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Nov 2024 05:23:11 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vso1Q6b7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ADD563858401 for ; Wed, 20 Nov 2024 13:23:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ADD563858401 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=vso1Q6b7 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id 2DC783858D20 for ; Wed, 20 Nov 2024 13:22:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2DC783858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2DC783858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::636 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732108965; cv=none; b=N41t/Z8gLI6NwgI+KmK5yfPjvK5tb5Pk93HsSQMxJGUjmz4ko+Ow8N8bIRtwFgvWImQBoBBQ5A8uE7h+STvNcpGvmvS+/zNEh0JGhrOgrJBxAza39x9H6029IIHV91RfXl4Lc9s142O5To9cwazdS+xRBL8udlmxd8HBkydLW8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732108965; c=relaxed/simple; bh=OrUD70Tf9ACkmL9bVW7UJ3dLOXdu1o6kUqCRtKidm0M=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UtahEnWwwdq9UQt5RkvQZRi9VYKJ7cANE/mNYwEQAG20flQhgFLBzxzPZwW05mkxWby7+SndV0VLU5lC2Zb4ey44geqfQGuIIH0NMcL4CY5i3mzwZ5l9p44SOSSXp38VhKuVH5IgJGuBOnlg8ALTLieapqI0/5OAPFsIVgsXHNc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2DC783858D20 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-21116b187c4so32491645ad.3 for ; Wed, 20 Nov 2024 05:22:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1732108964; x=1732713764; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=uibRfj1GeT4/3aFzgD9ihJsmAK9Ke+NNggIi5QAFzKk=; b=vso1Q6b7cAwXZANGViLTxJVXDircxKHEg9ZYskD4PWn6S7QEDryzSfdwXgvvf0mMcs MoVHFY3jikYo3YsivgFELrWItA3PKQ9JBUrcME6zRQZSJd+RPuh5J62Zc9CFzNbjpGtp lAu7A53ohJAatziIEuTe0uRPfXLdtiIMBWkk6unJaT4sJLm8HKWWz2sKzRyzMv5aCwUK e5tVhzICkx8R/qf2aHOn2Dlg+CnsAUXWd1NDF6x1NZdJOS8MRolElLd8WSevU5W7NsgS dtReFhdJ4J/QW5Xuwy9/Kxm4eggEqKxsxw+4eyduFaGA4gxSvkTa3MLKc1FHRC/1lvcM FLhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732108964; x=1732713764; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uibRfj1GeT4/3aFzgD9ihJsmAK9Ke+NNggIi5QAFzKk=; b=BvEq7X+LwDnfpFcWgqxbQyss3cd+5EBdXkUgHn6SnfxEegjnhJZQD+aOY3AhkcRzQI xZb/GISHjJmc78snZRdpeL0bRGdB1AIRXJ3UGNHJ2AFDKXuqp8lBnoh3brJtuSo5ML/f 3qJg4novu39lygLQY1KMEIp9//RSloin575BDHeeL7zgczTld+B+ETAR8FjArE/RJtHi jVsBWbDPhOToDvzJhkSR4KSJ0AJLOsELajrsMh7ecSEJVxAYPPRvcCGV5JMAWzmIxxjH LMWVU6iPT3vzlZSjzFvHkkPuvwIUeh4CaNANS6fz5W18Xhn5pQdHBJu0NloR/ATnuZ44 fqmg== X-Gm-Message-State: AOJu0YwlZjWWVW3pWRG/lW3PFrEwNl/DCVyOrMiSFPYbz51cjjIklBKK qQ4S6a7EcxUrnCd5BgiK0WArWSOcWhiSg9Z3Os9nubSDHIOm+VzSZ1ZLoq2SXy33SFozZ3Zdu3Q S0wt6RA== X-Received: by 2002:a17:903:230d:b0:211:31ac:89eb with SMTP id d9443c01a7336-2126a380cffmr33506985ad.11.1732108963839; Wed, 20 Nov 2024 05:22:43 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c0:d8e8:5e1b:47e9:33a6:1b47]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-211d0dc322fsm91818355ad.19.2024.11.20.05.22.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Nov 2024 05:22:43 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Paul Zimmermann , Florian Weimer Subject: [PATCH v2] math: Add internal roundeven_finite Date: Wed, 20 Nov 2024 10:22:11 -0300 Message-ID: <20241120132239.1314280-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org Some CORE-MATH routines uses roundeven and most of ISA do not have an specific instruction for the operation. In this case, the call will be routed to generic implementation. However, if the ISA does support round() and ctz() there is a better alternative (as used by CORE-MATH). This patch adds such optimization and also enables it on powerpc. On a power10 it shows the following improvement: expm1f master patched improvement latency 9.8574 7.0139 28.85% reciprocal-throughput 4.3742 2.6592 39.21% Checked on powerpc64le-linux-gnu and aarch64-linux-gnu. --- sysdeps/ieee754/flt-32/e_gammaf_r.c | 2 +- sysdeps/ieee754/flt-32/math_config.h | 27 +++++++++++++++++++++++++++ sysdeps/ieee754/flt-32/s_expm1f.c | 2 +- sysdeps/powerpc/fpu/math_private.h | 5 +++++ 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c index 6b1f95d50f..66e8caee0b 100644 --- a/sysdeps/ieee754/flt-32/e_gammaf_r.c +++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c @@ -140,7 +140,7 @@ __ieee754_gammaf_r (float x, int *signgamp) }; double m = z - 0x1.7p+1; - double i = roundeven (m); + double i = roundeven_finite (m); double step = copysign (1.0, i); double d = m - i, d2 = d * d, d4 = d2 * d2, d8 = d4 * d4; double f = (c[0] + d * c[1]) + d2 * (c[2] + d * c[3]) diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h index dc07ebd459..b30a03eeb4 100644 --- a/sysdeps/ieee754/flt-32/math_config.h +++ b/sysdeps/ieee754/flt-32/math_config.h @@ -57,6 +57,33 @@ static inline int32_t converttoint (double_t x); #endif +#ifndef ROUNDEVEN_INTRINSICS +/* When set, roundeven_finite will route to the internal roundeven function. */ +# define ROUNDEVEN_INTRINSICS 1 +#endif + +/* Round x to nearest integer value in floating-point format, rounding halfway + cases to even. If the input is non finite the result is unspecified. */ +static inline double +roundeven_finite (double x) +{ + if (!isfinite (x)) + __builtin_unreachable (); +#if ROUNDEVEN_INTRINSICS + return roundeven (x); +#else + double y = round (x); + if (fabs (x - y) == 0.5) + { + union { double f; uint64_t i; } u = {y}; + union { double f; uint64_t i; } v = {y - copysign (1.0, x)}; + if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i)) + y = v.f; + } + return y; +#endif +} + static inline uint32_t asuint (float f) { diff --git a/sysdeps/ieee754/flt-32/s_expm1f.c b/sysdeps/ieee754/flt-32/s_expm1f.c index edd7c9acf8..a36e5781f5 100644 --- a/sysdeps/ieee754/flt-32/s_expm1f.c +++ b/sysdeps/ieee754/flt-32/s_expm1f.c @@ -95,7 +95,7 @@ __expm1f (float x) return __math_oflowf (0); } double a = iln2 * z; - double ia = roundeven (a); + double ia = roundeven_finite (a); double h = a - ia; double h2 = h * h; uint64_t u = asuint64 (ia + big); diff --git a/sysdeps/powerpc/fpu/math_private.h b/sysdeps/powerpc/fpu/math_private.h index 9ef35b20cd..b22f53d366 100644 --- a/sysdeps/powerpc/fpu/math_private.h +++ b/sysdeps/powerpc/fpu/math_private.h @@ -59,4 +59,9 @@ __ieee754_sqrtf128 (_Float128 __x) #define _GL_HAS_BUILTIN_ILOGB 0 #endif +#ifdef _ARCH_PWR6 +/* ISA 2.03 provides frin/round() and cntlzw/ctznll(). */ +# define ROUNDEVEN_INTRINSICS 0 +#endif + #endif /* _PPC_MATH_PRIVATE_H_ */