Message ID | alpine.DEB.2.02.1611041255001.15240@laptop-mg.saclay.inria.fr |
---|---|
State | Accepted |
Commit | 40fd269ab128d1f5fa7688d7d5e7298744c08cdd |
Headers | show |
On Fri, Nov 4, 2016 at 1:34 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > Hello, > > this kind of simplification is already handled by fold_comparison, but the > code is common with TRUNC_DIV_EXPR, which yields suboptimal code. In > particular, for an unsigned number, X/8==0 means x<=7, while X/[ex]8==0 can > be turned into X==0. > > When we have an explicit division by 0, there is a better optimisation > possible (insert __builtin_unreachable() or __builtin_trap() after that > statement, as in find_explicit_erroneous_behavior), so I don't touch it. > > For the overflow inequality case, it would have been a bit clearer to > generate > (cmp { build_zero_cst (TREE_TYPE (@0)); } @2) > and let that be constant folded instead of having that ugly and error-prone > test in constant_boolean_node, but I saved one tree ;-) > > Bootstrap+regtest on powerpc64le-unknown-linux-gnu, all the regressions are > in the libitm part of the testsuite, they should be fixed by > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02220.html , I'll rerun the > testsuite when that patch is in. Ok. You fail to handle A /[ex] -2 < 2? Is that on purpose? Or just lazy so you dont' have to handle inverting the comparison? Thanks, Richard. > 2016-11-07 Marc Glisse <marc.glisse@inria.fr> > > gcc/ > * fold-const.c (fold_comparison): Ignore EXACT_DIV_EXPR. > * match.pd (A /[ex] B CMP C): New simplifications. > > gcc/testsuite/ > * gcc.dg/tree-ssa/cmpexactdiv.c: New file. > > -- > Marc Glisse
On Fri, Nov 4, 2016 at 2:15 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Nov 4, 2016 at 1:34 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >> Hello, >> >> this kind of simplification is already handled by fold_comparison, but the >> code is common with TRUNC_DIV_EXPR, which yields suboptimal code. In >> particular, for an unsigned number, X/8==0 means x<=7, while X/[ex]8==0 can >> be turned into X==0. >> >> When we have an explicit division by 0, there is a better optimisation >> possible (insert __builtin_unreachable() or __builtin_trap() after that >> statement, as in find_explicit_erroneous_behavior), so I don't touch it. >> >> For the overflow inequality case, it would have been a bit clearer to >> generate >> (cmp { build_zero_cst (TREE_TYPE (@0)); } @2) >> and let that be constant folded instead of having that ugly and error-prone >> test in constant_boolean_node, but I saved one tree ;-) >> >> Bootstrap+regtest on powerpc64le-unknown-linux-gnu, all the regressions are >> in the libitm part of the testsuite, they should be fixed by >> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02220.html , I'll rerun the >> testsuite when that patch is in. > > Ok. > > You fail to handle A /[ex] -2 < 2? Is that on purpose? Or just lazy so you > dont' have to handle inverting the comparison? Oh, I suppose nothing generates exact divisions by negative numbers. Richard. > Thanks, > Richard. > >> 2016-11-07 Marc Glisse <marc.glisse@inria.fr> >> >> gcc/ >> * fold-const.c (fold_comparison): Ignore EXACT_DIV_EXPR. >> * match.pd (A /[ex] B CMP C): New simplifications. >> >> gcc/testsuite/ >> * gcc.dg/tree-ssa/cmpexactdiv.c: New file. >> >> -- >> Marc Glisse
On Fri, 4 Nov 2016, Richard Biener wrote: > On Fri, Nov 4, 2016 at 2:15 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Fri, Nov 4, 2016 at 1:34 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >>> Hello, >>> >>> this kind of simplification is already handled by fold_comparison, but the >>> code is common with TRUNC_DIV_EXPR, which yields suboptimal code. In >>> particular, for an unsigned number, X/8==0 means x<=7, while X/[ex]8==0 can >>> be turned into X==0. >>> >>> When we have an explicit division by 0, there is a better optimisation >>> possible (insert __builtin_unreachable() or __builtin_trap() after that >>> statement, as in find_explicit_erroneous_behavior), so I don't touch it. >>> >>> For the overflow inequality case, it would have been a bit clearer to >>> generate >>> (cmp { build_zero_cst (TREE_TYPE (@0)); } @2) >>> and let that be constant folded instead of having that ugly and error-prone >>> test in constant_boolean_node, but I saved one tree ;-) >>> >>> Bootstrap+regtest on powerpc64le-unknown-linux-gnu, all the regressions are >>> in the libitm part of the testsuite, they should be fixed by >>> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02220.html , I'll rerun the >>> testsuite when that patch is in. >> >> Ok. >> >> You fail to handle A /[ex] -2 < 2? Is that on purpose? Or just lazy so you >> dont' have to handle inverting the comparison? > > Oh, I suppose nothing generates exact divisions by negative numbers. Yes, that :-) I think I'd rather wait until I can test it before simplifying those too much. -- Marc Glisse
Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 241840) +++ gcc/fold-const.c (working copy) @@ -8740,22 +8740,21 @@ fold_comparison (location_t loc, enum tr SET_EXPR_LOCATION (tem, loc); return tem; } return fold_build2_loc (loc, code, type, cval1, cval2); } } } /* We can fold X/C1 op C2 where C1 and C2 are integer constants into a single range test. */ - if ((TREE_CODE (arg0) == TRUNC_DIV_EXPR - || TREE_CODE (arg0) == EXACT_DIV_EXPR) + if (TREE_CODE (arg0) == TRUNC_DIV_EXPR && TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST && !integer_zerop (TREE_OPERAND (arg0, 1)) && !TREE_OVERFLOW (TREE_OPERAND (arg0, 1)) && !TREE_OVERFLOW (arg1)) { tem = fold_div_compare (loc, code, type, arg0, arg1); if (tem != NULL_TREE) return tem; } Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 241840) +++ gcc/match.pd (working copy) @@ -2314,20 +2314,50 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (ne @0 { build_real (TREE_TYPE (@0), c2); })))) /* sqrt(x) < c is the same as x < c*c, if we ignore NaNs. */ (if (! HONOR_NANS (@0)) (cmp @0 { build_real (TREE_TYPE (@0), c2); }) /* sqrt(x) < c is the same as x >= 0 && x < c*c. */ (if (GENERIC) (truth_andif (ge @0 { build_real (TREE_TYPE (@0), dconst0); }) (cmp @0 { build_real (TREE_TYPE (@0), c2); })))))))))))) +/* Fold A /[ex] B CMP C to A CMP B * C. */ +(for cmp (eq ne) + (simplify + (cmp (exact_div @0 @1) INTEGER_CST@2) + (if (!integer_zerop (@1)) + (if (wi::eq_p (@2, 0)) + (cmp @0 @2) + (if (TREE_CODE (@1) == INTEGER_CST) + (with + { + bool ovf; + wide_int prod = wi::mul (@2, @1, TYPE_SIGN (TREE_TYPE (@1)), &ovf); + } + (if (ovf) + { constant_boolean_node (cmp == NE_EXPR, type); } + (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), prod); })))))))) +(for cmp (lt le gt ge) + (simplify + (cmp (exact_div @0 INTEGER_CST@1) INTEGER_CST@2) + (if (wi::gt_p (@1, 0, TYPE_SIGN (TREE_TYPE (@1)))) + (with + { + bool ovf; + wide_int prod = wi::mul (@2, @1, TYPE_SIGN (TREE_TYPE (@1)), &ovf); + } + (if (ovf) + { constant_boolean_node (wi::lt_p (@2, 0, TYPE_SIGN (TREE_TYPE (@2))) + != (cmp == LT_EXPR || cmp == LE_EXPR), type); } + (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), prod); })))))) + /* Unordered tests if either argument is a NaN. */ (simplify (bit_ior (unordered @0 @0) (unordered @1 @1)) (if (types_match (@0, @1)) (unordered @0 @1))) (simplify (bit_and (ordered @0 @0) (ordered @1 @1)) (if (types_match (@0, @1)) (ordered @0 @1))) (simplify Index: gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +int f(int *p, int *q){ + __SIZE_TYPE__ n = q - p; + return n == 0; +} + +int g(int *p, int *q){ + __PTRDIFF_TYPE__ n = q - p; + return n <= 2; +} + +int h(long *p, long *q){ + __SIZE_TYPE__ n = q - p; + return n == (__SIZE_TYPE__)(-1)/2; +} + +/* { dg-final { scan-tree-dump-not "== 0" "optimized" } } */ +/* { dg-final { scan-tree-dump "<= 8" "optimized" { target { int32 } } } } */ +/* { dg-final { scan-tree-dump "return 0" "optimized" } } */