Message ID | 20241125122934.463747-1-adhemerval.zanella@linaro.org |
---|---|
State | Accepted |
Commit | 68d71289425bb133c6cbf0f5065da6b1d99f81fc |
Headers | show |
Series | [COMMITTED] math: Fix non-portability in the computation of signgam in lgammaf | expand |
On 2024-11-25 04:29, Adhemerval Zanella wrote: > - int k; > - if (__builtin_expect (fx >= 0x1p31f, 0)) > - k = INT_MAX; > + if (__glibc_unlikely (fx >= 0)) > + *signgamp = 1; > else > - k = fx; > - *signgamp = 1 - (((k & (k >> 31)) & 1) << 1); > + /* gamma(x) is negative in (-2n-1,-2n), thus when fx is odd. */ > + *signgamp = 1 - ((((int) fx) & 1) << 1); The following avoids the need for the __glibc_unlikely (and shouldn't that be __glibc_likely?) and for the machine-code conditional branch (at least on x86-64) and is easier to understand (at least to me...): *signgamp = ((fx < 0) & (int) fx) != 0 ? -1 : 1;
On 2024-11-25 11:16:25 -0800, Paul Eggert wrote: > On 2024-11-25 04:29, Adhemerval Zanella wrote: > > - int k; > > - if (__builtin_expect (fx >= 0x1p31f, 0)) > > - k = INT_MAX; > > + if (__glibc_unlikely (fx >= 0)) > > + *signgamp = 1; > > else > > - k = fx; > > - *signgamp = 1 - (((k & (k >> 31)) & 1) << 1); > > + /* gamma(x) is negative in (-2n-1,-2n), thus when fx is odd. */ > > + *signgamp = 1 - ((((int) fx) & 1) << 1); > > The following avoids the need for the __glibc_unlikely (and shouldn't that > be __glibc_likely?) and for the machine-code conditional branch (at least on > x86-64) and is easier to understand (at least to me...): > > *signgamp = ((fx < 0) & (int) fx) != 0 ? -1 : 1; But for the most common cases (fx >= 0, which should be likely), I suppose that it would be slower (though I'm wondering whether this is noticeable).
On 2024-11-25 12:22, Vincent Lefevre wrote: > But for the most common cases (fx >= 0, which should be likely), > I suppose that it would be slower (though I'm wondering whether > this is noticeable). Athough it's indeed not a big deal, statically the proposed code is a clear win on x86-64, as it generates only 27 bytes in code space, compared to the current 48 bytes. Dynamically it's closer, but even assuming the argument is always nonnegative the current and proposed code both execute the same number (27) of bytes' worth of instructions; and although the proposed code executes one more instruction, once that gets converted to micro-ops I doubt whether it'd be any slower. Plus, the proposed code has no conditional branches and this avoids polluting the branch predictor.
diff --git a/sysdeps/ieee754/flt-32/e_lgammaf_r.c b/sysdeps/ieee754/flt-32/e_lgammaf_r.c index cb65513056..447376dc55 100644 --- a/sysdeps/ieee754/flt-32/e_lgammaf_r.c +++ b/sysdeps/ieee754/flt-32/e_lgammaf_r.c @@ -181,12 +181,11 @@ __ieee754_lgammaf_r (float x, int *signgamp) Note that for a binary32 |x| >= 2^23, x is necessarily an integer, and we already dealed with negative integers, thus now: -2^23 < x < +Inf and x is not a negative integer nor 0, 1, 2. */ - int k; - if (__builtin_expect (fx >= 0x1p31f, 0)) - k = INT_MAX; + if (__glibc_unlikely (fx >= 0)) + *signgamp = 1; else - k = fx; - *signgamp = 1 - (((k & (k >> 31)) & 1) << 1); + /* gamma(x) is negative in (-2n-1,-2n), thus when fx is odd. */ + *signgamp = 1 - ((((int) fx) & 1) << 1); double z = ax, f; if (__glibc_unlikely (ax < 0x1.52p-1f))
From: Vincent Lefevre <vincent@vinc17.net> The k>>31 in signgam = 1 - (((k&(k>>31))&1)<<1); is not portable: * The ISO C standard says "If E1 has a signed type and a negative value, the resulting value is implementation-defined." (this is still in C23). * If the int type is larger than 32 bits (e.g. a 64-bit type), then k = INT_MAX; line 144 will make k>>31 put 1 in bit 0 (thus signgam will be -1) while 0 is expected. Moreover, instead of the fx >= 0x1p31f condition, testing fx >= 0 is probably better for 2 reasons: The signgam expression has more or less a condition on the sign of fx (the goal of k>>31, which can be dropped with this new condition). Since fx ≥ 0 should be the most common case, one can get signgam directly in this case (value 1). And this simplifies the expression for the other case (fx < 0). This new condition may be easier/faster to test on the processor (e.g. by avoiding a load of a constant from the memory). This is commit d41459c731865516318f813cf4c966dafa0eecbf from CORE-MATH. Checked on x86_64-linux-gnu. --- sysdeps/ieee754/flt-32/e_lgammaf_r.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)