Message ID | 70b62c30-956f-0c3c-92f1-0cd5f1acdb2d@arm.com |
---|---|
State | New |
Headers | show |
On 01/18/2017 11:08 AM, Richard Earnshaw (lists) wrote: > PR 79121 is a silent wrong code regression where, when generating a > shift from an extended value moving from one to two machine registers, > the type of the right shift is for the most significant word should be > determined by the signedness of the inner type, not the signedness of > the result type. > > gcc: > PR rtl-optimization/79121 > * expr.c (expand_expr_real_2, case LSHIFT_EXPR): Look at the signedness > of the inner type when shifting an extended value. > > testsuite: > * gcc.c-torture/execute/pr79121.c: New test. > > Bootstrapped on x86_64 and cross-tested on ARM. I had to refamiliarize myself with this code and nearly got the analysis wrong (again). Due to the copying of the low word into the high word we have to select the type of shift based on the type of the object that was the source of the NOP conversion. The code currently makes that determination based on the signedness of the shift, which is wrong. OK for the trunk. jeff
On 18/01/17 21:07, Jeff Law wrote: > On 01/18/2017 11:08 AM, Richard Earnshaw (lists) wrote: >> PR 79121 is a silent wrong code regression where, when generating a >> shift from an extended value moving from one to two machine registers, >> the type of the right shift is for the most significant word should be >> determined by the signedness of the inner type, not the signedness of >> the result type. >> >> gcc: >> PR rtl-optimization/79121 >> * expr.c (expand_expr_real_2, case LSHIFT_EXPR): Look at the >> signedness >> of the inner type when shifting an extended value. >> >> testsuite: >> * gcc.c-torture/execute/pr79121.c: New test. >> >> Bootstrapped on x86_64 and cross-tested on ARM. > I had to refamiliarize myself with this code and nearly got the analysis > wrong (again). > > Due to the copying of the low word into the high word we have to select > the type of shift based on the type of the object that was the source of > the NOP conversion. The code currently makes that determination based > on the signedness of the shift, which is wrong. > > > OK for the trunk. > > jeff > > Thanks, Jeff. I made some minor tweaks to the comments (adding a bit more about signed vs unsigned) and committed the following. What about gcc-6? R.diff --git a/gcc/expr.c b/gcc/expr.c index 4c54faf..2d8868e 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9056,9 +9056,9 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, /* Left shift optimization when shifting across word_size boundary. - If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't - native instruction to support this wide mode left shift. Given below - scenario: + If mode == GET_MODE_WIDER_MODE (word_mode), then normally + there isn't native instruction to support this wide mode + left shift. Given below scenario: Type A = (Type) B << C @@ -9067,10 +9067,11 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, | word_size | - If the shift amount C caused we shift B to across the word size - boundary, i.e part of B shifted into high half of destination - register, and part of B remains in the low half, then GCC will use - the following left shift expand logic: + If the shift amount C caused we shift B to across the word + size boundary, i.e part of B shifted into high half of + destination register, and part of B remains in the low + half, then GCC will use the following left shift expand + logic: 1. Initialize dest_low to B. 2. Initialize every bit of dest_high to the sign bit of B. @@ -9080,20 +9081,30 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, 5. Logic right shift D by (word_size - C). 6. Or the result of 4 and 5 to finalize dest_high. - While, by checking gimple statements, if operand B is coming from - signed extension, then we can simplify above expand logic into: + While, by checking gimple statements, if operand B is + coming from signed extension, then we can simplify above + expand logic into: 1. dest_high = src_low >> (word_size - C). 2. dest_low = src_low << C. - We can use one arithmetic right shift to finish all the purpose of - steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2. */ + We can use one arithmetic right shift to finish all the + purpose of steps 2, 4, 5, 6, thus we reduce the steps + needed from 6 into 2. + + The case is similar for zero extension, except that we + initialize dest_high to zero rather than copies of the sign + bit from B. Furthermore, we need to use a logical right shift + in this case. + + The choice of sign-extension versus zero-extension is + determined entirely by whether or not B is signed and is + independent of the current setting of unsignedp. */ temp = NULL_RTX; if (code == LSHIFT_EXPR && target && REG_P (target) - && ! unsignedp && mode == GET_MODE_WIDER_MODE (word_mode) && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode) && TREE_CONSTANT (treeop1) @@ -9114,6 +9125,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, rtx_insn *seq, *seq_old; unsigned int high_off = subreg_highpart_offset (word_mode, mode); + bool extend_unsigned + = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def))); rtx low = lowpart_subreg (word_mode, op0, mode); rtx dest_low = lowpart_subreg (word_mode, target, mode); rtx dest_high = simplify_gen_subreg (word_mode, target, @@ -9125,7 +9138,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, start_sequence (); /* dest_high = src_low >> (word_size - C). */ temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low, - rshift, dest_high, unsignedp); + rshift, dest_high, + extend_unsigned); if (temp != dest_high) emit_move_insn (dest_high, temp); diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79121.c b/gcc/testsuite/gcc.c-torture/execute/pr79121.c new file mode 100644 index 0000000..9fca7fb --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr79121.c @@ -0,0 +1,34 @@ +extern void abort (void); + +__attribute__ ((noinline, noclone)) unsigned long long f1 (int x) +{ + return ((unsigned long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) long long f2 (unsigned x) +{ + return ((long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) unsigned long long f3 (unsigned x) +{ + return ((unsigned long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) long long f4 (int x) +{ + return ((long long) x) << 4; +} + +int main () +{ + if (f1 (0xf0000000) != 0xffffffff00000000) + abort (); + if (f2 (0xf0000000) != 0xf00000000) + abort (); + if (f3 (0xf0000000) != 0xf00000000) + abort (); + if (f4 (0xf0000000) != 0xffffffff00000000) + abort (); + return 0; +}
On 01/19/2017 03:37 AM, Richard Earnshaw (lists) wrote: > On 18/01/17 21:07, Jeff Law wrote: >> On 01/18/2017 11:08 AM, Richard Earnshaw (lists) wrote: >>> PR 79121 is a silent wrong code regression where, when generating a >>> shift from an extended value moving from one to two machine registers, >>> the type of the right shift is for the most significant word should be >>> determined by the signedness of the inner type, not the signedness of >>> the result type. >>> >>> gcc: >>> PR rtl-optimization/79121 >>> * expr.c (expand_expr_real_2, case LSHIFT_EXPR): Look at the >>> signedness >>> of the inner type when shifting an extended value. >>> >>> testsuite: >>> * gcc.c-torture/execute/pr79121.c: New test. >>> >>> Bootstrapped on x86_64 and cross-tested on ARM. >> I had to refamiliarize myself with this code and nearly got the analysis >> wrong (again). >> >> Due to the copying of the low word into the high word we have to select >> the type of shift based on the type of the object that was the source of >> the NOP conversion. The code currently makes that determination based >> on the signedness of the shift, which is wrong. >> >> >> OK for the trunk. >> >> jeff >> >> > > Thanks, Jeff. I made some minor tweaks to the comments (adding a bit > more about signed vs unsigned) and committed the following. > > What about gcc-6? I'd think it should be fine for gcc-6 as well. I don't think that code has changed since it first went in. So the analysis and patch should both still apply. jeff
diff --git a/gcc/expr.c b/gcc/expr.c index 4c54faf..cae13a4 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9093,7 +9093,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, if (code == LSHIFT_EXPR && target && REG_P (target) - && ! unsignedp && mode == GET_MODE_WIDER_MODE (word_mode) && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode) && TREE_CONSTANT (treeop1) @@ -9114,6 +9113,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, rtx_insn *seq, *seq_old; unsigned int high_off = subreg_highpart_offset (word_mode, mode); + bool extend_unsigned + = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def))); rtx low = lowpart_subreg (word_mode, op0, mode); rtx dest_low = lowpart_subreg (word_mode, target, mode); rtx dest_high = simplify_gen_subreg (word_mode, target, @@ -9125,7 +9126,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, start_sequence (); /* dest_high = src_low >> (word_size - C). */ temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low, - rshift, dest_high, unsignedp); + rshift, dest_high, + extend_unsigned); if (temp != dest_high) emit_move_insn (dest_high, temp); diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79121.c b/gcc/testsuite/gcc.c-torture/execute/pr79121.c new file mode 100644 index 0000000..9fca7fb --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr79121.c @@ -0,0 +1,34 @@ +extern void abort (void); + +__attribute__ ((noinline, noclone)) unsigned long long f1 (int x) +{ + return ((unsigned long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) long long f2 (unsigned x) +{ + return ((long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) unsigned long long f3 (unsigned x) +{ + return ((unsigned long long) x) << 4; +} + +__attribute__ ((noinline, noclone)) long long f4 (int x) +{ + return ((long long) x) << 4; +} + +int main () +{ + if (f1 (0xf0000000) != 0xffffffff00000000) + abort (); + if (f2 (0xf0000000) != 0xf00000000) + abort (); + if (f3 (0xf0000000) != 0xf00000000) + abort (); + if (f4 (0xf0000000) != 0xffffffff00000000) + abort (); + return 0; +}