diff mbox

Inline across -ffast-math boundary

Message ID CAKdteOaWtkgVLB0jkTDGd9XYXJcqXyXCGg6WFP4CUoYCG-wQ4Q@mail.gmail.com
State New
Headers show

Commit Message

Christophe Lyon May 3, 2016, 12:04 p.m. UTC
On 3 May 2016 at 10:09, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 3 May 2016, Christophe Lyon wrote:

>

>> On 2 May 2016 at 19:01, Jan Hubicka <hubicka@ucw.cz> wrote:

>> >> On Thu, 21 Apr 2016, Jan Hubicka wrote:

>> >>

>> >> > Hi,

>> >> > this patch implements the long promised logic to inline across -ffast-math

>> >> > boundary when eitehr caller or callee has no fp operations in it.  This is

>> >> > needed to resolve code quality regression on Firefox with LTO where

>> >> > -O3/-O2/-Ofast flags are combined and we fail to inline a lot of comdats

>> >> > otherwise.

>> >> >

>> >> > Bootstrapped/regtested x86_64-linux. Ricahrd, I would like to know your opinion

>> >> > on fp_expression_p predicate - it is bit ugly but I do not know how to implement

>> >> > it better.

>> >> >

>> >> > We still won't inline -O1 code into -O2+ because flag_strict_overflow differs.

>> >> > I will implement similar logic for overflows incrementally. Similarly flag_errno_math

>> >> > can be handled better, but I am not sure it matters - I think wast majority of time

>> >> > users set errno_math in sync with other -ffast-math implied flags.

>> >>

>> >> Note that for reasons PR70586 shows (const functions having possible

>> >> trapping side-effect because of FP math or division) we'd like to

>> >> have sth like "uses FP math" "uses possibly trapping integer math"

>> >> "uses integer math with undefined overflow" on a per function level

>> >> and propagated alongside pure/const/nothrow state.

>> >>

>> >> So maybe you can fit that into a more suitable place than just the

>> >> inliner (which of course is interested in "uses FP math locally",

>> >> not the transitive answer we need for PR70586).

>> >

>> > We don't really have much more suitable place - ipa-inline-analysis is

>> > doing most of the analysis of function body that is usefull for IPA passes,

>> > not only for inliner. It should be renamed perhaps to something like

>> > function_body_summary.  I will do that later this stage1.

>> >

>> > For PR70686 in addition to transitive answer we will need to know that

>> > the transformation is win. Const function may take a lot of time and

>> > introducing new call on code path that did not used it previously is

>> > bad idea unless we know that the function is very cheap (which may

>> > be true only for fast builtins, I don't know)

>> >

>> > This patch impleemnts the suggested check for presence of FP parameters.

>> > We can play with special casing the moves incrementally.

>> >

>> > Bootstrapped/regtested x86_64-linux.

>> >

>> > Index: testsuite/ChangeLog

>> > ===================================================================

>> > --- testsuite/ChangeLog (revision 235765)

>> > +++ testsuite/ChangeLog (working copy)

>> > @@ -1,3 +1,7 @@

>> > +2016-05-02  Jan Hubicka  <hubicka@ucw.cz>

>> > +

>> > +       * gcc.dg/ipa/inline-8.c: New testcase.

>> > +

>> >  2016-05-02  Jakub Jelinek  <jakub@redhat.com>

>> >

>> >         PR rtl-optimization/70467

>> > Index: testsuite/gcc.dg/ipa/inline-8.c

>> > ===================================================================

>> > --- testsuite/gcc.dg/ipa/inline-8.c     (revision 0)

>> > +++ testsuite/gcc.dg/ipa/inline-8.c     (working copy)

>> > @@ -0,0 +1,36 @@

>> > +/* Verify that we do not inline isnanf test info -ffast-math code but that we

>> > +   do inline trivial functions across -Ofast boundary.  */

>> > +/* { dg-do run } */

>> > +/* { dg-options "-O2"  } */

>> > +#include <math.h>

>> > +/* Can't be inlined because isnanf will be optimized out.  */

>> > +int

>> > +cmp (float a)

>> > +{

>> > +  return isnanf (a);

>>

>> Hi,

>>

>> This new testcase does not pass on bare-metal configs (using newlib), because:

>> warning: implicit declaration of function 'isnanf'

>> [-Wimplicit-function-declaration]

>> warning: incompatible implicit declaration of built-in function 'isnanf'

>>

>> I'm not sure what's the appropriate dg-require to avoid this?

>

> c99_runtime I guess.

>

Indeed, that what was used in a previous occurrence of a similar
problem (PR 69227).

I've attached the small (obvious?) patch to make the new inline-8.c
test UNSUPPORTED
without c99_runtime.

OK?

Christophe.

> Richard.

>

>> Christophe

>>

>> > +}

>> > +/* Can be inlined.  */

>> > +int

>> > +move (int a)

>> > +{

>> > +  return a;

>> > +}

>> > +float a;

>> > +void

>> > +set ()

>> > +{

>> > + a=nan("");

>> > +}

>> > +float b;

>> > +__attribute__ ((optimize("Ofast")))

>> > +int

>> > +main()

>> > +{

>> > +  b++;

>> > +  if (cmp(a))

>> > +    __builtin_abort ();

>> > +  float a = move (1);

>> > +  if (!__builtin_constant_p (a))

>> > +    __builtin_abort ();

>> > +  return 0;

>> > +}

>> > Index: ChangeLog

>> > ===================================================================

>> > --- ChangeLog   (revision 235765)

>> > +++ ChangeLog   (working copy)

>> > @@ -1,3 +1,18 @@

>> > +2016-05-02  Jan Hubicka  <hubicka@ucw.cz>

>> > +

>> > +       * ipa-inline-analysis.c (reset_inline_summary): Clear fp_expressions

>> > +       (dump_inline_summary): Dump it.

>> > +       (fp_expression_p): New predicate.

>> > +       (estimate_function_body_sizes): Use it.

>> > +       (inline_merge_summary): Merge fp_expressions.

>> > +       (inline_read_section): Read fp_expressions.

>> > +       (inline_write_summary): Write fp_expressions.

>> > +       * ipa-inline.c (can_inline_edge_p): Permit inlining across fp math

>> > +       codegen boundary if either caller or callee is !fp_expressions.

>> > +       * ipa-inline.h (inline_summary): Add fp_expressions.

>> > +       * ipa-inline-transform.c (inline_call): When inlining !fp_expressions

>> > +       to fp_expressions be sure the fp generation flags are updated.

>> > +

>> >  2016-05-02  Jakub Jelinek  <jakub@redhat.com>

>> >

>> >         PR rtl-optimization/70467

>> > Index: ipa-inline-analysis.c

>> > ===================================================================

>> > --- ipa-inline-analysis.c       (revision 235693)

>> > +++ ipa-inline-analysis.c       (working copy)

>> > @@ -850,7 +850,8 @@ evaluate_conditions_for_known_args (stru

>> >           if (known_aggs.exists ())

>> >             {

>> >               agg = known_aggs[c->operand_num];

>> > -             val = ipa_find_agg_cst_for_param (agg, c->offset, c->by_ref);

>> > +             val = ipa_find_agg_cst_for_param (agg, known_vals[c->operand_num],

>> > +                                               c->offset, c->by_ref);

>> >             }

>> >           else

>> >             val = NULL_TREE;

>> > @@ -1069,6 +1070,7 @@ reset_inline_summary (struct cgraph_node

>> >      reset_inline_edge_summary (e);

>> >    for (e = node->indirect_calls; e; e = e->next_callee)

>> >      reset_inline_edge_summary (e);

>> > +  info->fp_expressions = false;

>> >  }

>> >

>> >  /* Hook that is called by cgraph.c when a node is removed.  */

>> > @@ -1423,6 +1425,8 @@ dump_inline_summary (FILE *f, struct cgr

>> >         fprintf (f, " inlinable");

>> >        if (s->contains_cilk_spawn)

>> >         fprintf (f, " contains_cilk_spawn");

>> > +      if (s->fp_expressions)

>> > +       fprintf (f, " fp_expression");

>> >        fprintf (f, "\n  self time:       %i\n", s->self_time);

>> >        fprintf (f, "  global time:     %i\n", s->time);

>> >        fprintf (f, "  self size:       %i\n", s->self_size);

>> > @@ -2459,6 +2463,21 @@ clobber_only_eh_bb_p (basic_block bb, bo

>> >    return true;

>> >  }

>> >

>> > +/* Return true if STMT compute a floating point expression that may be affected

>> > +   by -ffast-math and similar flags.  */

>> > +

>> > +static bool

>> > +fp_expression_p (gimple *stmt)

>> > +{

>> > +  ssa_op_iter i;

>> > +  tree op;

>> > +

>> > +  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF|SSA_OP_USE)

>> > +    if (FLOAT_TYPE_P (TREE_TYPE (op)))

>> > +      return true;

>> > +  return false;

>> > +}

>> > +

>> >  /* Compute function body size parameters for NODE.

>> >     When EARLY is true, we compute only simple summaries without

>> >     non-trivial predicates to drive the early inliner.  */

>> > @@ -2733,6 +2752,13 @@ estimate_function_body_sizes (struct cgr

>> >                                        this_time * (2 - prob), &p);

>> >                 }

>> >

>> > +             if (!info->fp_expressions && fp_expression_p (stmt))

>> > +               {

>> > +                 info->fp_expressions = true;

>> > +                 if (dump_file)

>> > +                   fprintf (dump_file, "   fp_expression set\n");

>> > +               }

>> > +

>> >               gcc_assert (time >= 0);

>> >               gcc_assert (size >= 0);

>> >             }

>> > @@ -3577,6 +3603,8 @@ inline_merge_summary (struct cgraph_edge

>> >    else

>> >      toplev_predicate = true_predicate ();

>> >

>> > +  info->fp_expressions |= callee_info->fp_expressions;

>> > +

>> >    if (callee_info->conds)

>> >      evaluate_properties_for_edge (edge, true, &clause, NULL, NULL, NULL);

>> >    if (ipa_node_params_sum && callee_info->conds)

>> > @@ -4229,6 +4257,7 @@ inline_read_section (struct lto_file_dec

>> >        bp = streamer_read_bitpack (&ib);

>> >        info->inlinable = bp_unpack_value (&bp, 1);

>> >        info->contains_cilk_spawn = bp_unpack_value (&bp, 1);

>> > +      info->fp_expressions = bp_unpack_value (&bp, 1);

>> >

>> >        count2 = streamer_read_uhwi (&ib);

>> >        gcc_assert (!info->conds);

>> > @@ -4395,6 +4424,7 @@ inline_write_summary (void)

>> >           bp = bitpack_create (ob->main_stream);

>> >           bp_pack_value (&bp, info->inlinable, 1);

>> >           bp_pack_value (&bp, info->contains_cilk_spawn, 1);

>> > +         bp_pack_value (&bp, info->fp_expressions, 1);

>> >           streamer_write_bitpack (&bp);

>> >           streamer_write_uhwi (ob, vec_safe_length (info->conds));

>> >           for (i = 0; vec_safe_iterate (info->conds, i, &c); i++)

>> > Index: ipa-inline.c

>> > ===================================================================

>> > --- ipa-inline.c        (revision 235693)

>> > +++ ipa-inline.c        (working copy)

>> > @@ -338,7 +338,7 @@ can_inline_edge_p (struct cgraph_edge *e

>> >    else if (e->call_stmt_cannot_inline_p)

>> >      {

>> >        if (e->inline_failed != CIF_FUNCTION_NOT_OPTIMIZED)

>> > -        e->inline_failed = CIF_MISMATCHED_ARGUMENTS;

>> > +        e->inline_failed = e->caller->thunk.thunk_p ? CIF_THUNK : CIF_MISMATCHED_ARGUMENTS;

>> >        inlinable = false;

>> >      }

>> >    /* Don't inline if the functions have different EH personalities.  */

>> > @@ -396,6 +396,8 @@ can_inline_edge_p (struct cgraph_edge *e

>> >              (DECL_DISREGARD_INLINE_LIMITS (callee->decl)

>> >               && lookup_attribute ("always_inline",

>> >                                    DECL_ATTRIBUTES (callee->decl)));

>> > +      inline_summary *caller_info = inline_summaries->get (caller);

>> > +      inline_summary *callee_info = inline_summaries->get (callee);

>> >

>> >       /* Until GCC 4.9 we did not check the semantics alterning flags

>> >         bellow and inline across optimization boundry.

>> > @@ -417,16 +419,21 @@ can_inline_edge_p (struct cgraph_edge *e

>> >                    == !opt_for_fn (callee->decl, optimize) || !always_inline))

>> >               || check_match (flag_wrapv)

>> >               || check_match (flag_trapv)

>> > -             /* Strictly speaking only when the callee uses FP math.  */

>> > -             || check_maybe_up (flag_rounding_math)

>> > -             || check_maybe_up (flag_trapping_math)

>> > -             || check_maybe_down (flag_unsafe_math_optimizations)

>> > -             || check_maybe_down (flag_finite_math_only)

>> > -             || check_maybe_up (flag_signaling_nans)

>> > -             || check_maybe_down (flag_cx_limited_range)

>> > -             || check_maybe_up (flag_signed_zeros)

>> > -             || check_maybe_down (flag_associative_math)

>> > -             || check_maybe_down (flag_reciprocal_math)

>> > +             /* When caller or callee does FP math, be sure FP codegen flags

>> > +                compatible.  */

>> > +             || ((caller_info->fp_expressions && callee_info->fp_expressions)

>> > +                 && (check_maybe_up (flag_rounding_math)

>> > +                     || check_maybe_up (flag_trapping_math)

>> > +                     || check_maybe_down (flag_unsafe_math_optimizations)

>> > +                     || check_maybe_down (flag_finite_math_only)

>> > +                     || check_maybe_up (flag_signaling_nans)

>> > +                     || check_maybe_down (flag_cx_limited_range)

>> > +                     || check_maybe_up (flag_signed_zeros)

>> > +                     || check_maybe_down (flag_associative_math)

>> > +                     || check_maybe_down (flag_reciprocal_math)

>> > +                     /* Strictly speaking only when the callee contains function

>> > +                        calls that may end up setting errno.  */

>> > +                     || check_maybe_up (flag_errno_math)))

>> >               /* We do not want to make code compiled with exceptions to be

>> >                  brought into a non-EH function unless we know that the callee

>> >                  does not throw.

>> > @@ -435,9 +442,6 @@ can_inline_edge_p (struct cgraph_edge *e

>> >                   && DECL_FUNCTION_PERSONALITY (callee->decl))

>> >               || (check_maybe_up (flag_exceptions)

>> >                   && DECL_FUNCTION_PERSONALITY (callee->decl))

>> > -             /* Strictly speaking only when the callee contains function

>> > -                calls that may end up setting errno.  */

>> > -             || check_maybe_up (flag_errno_math)

>> >               /* When devirtualization is diabled for callee, it is not safe

>> >                  to inline it as we possibly mangled the type info.

>> >                  Allow early inlining of always inlines.  */

>> > Index: ipa-inline.h

>> > ===================================================================

>> > --- ipa-inline.h        (revision 235693)

>> > +++ ipa-inline.h        (working copy)

>> > @@ -132,6 +132,8 @@ struct GTY(()) inline_summary

>> >    /* True wen there is only one caller of the function before small function

>> >       inlining.  */

>> >    unsigned int single_caller : 1;

>> > +  /* True if function contains any floating point expressions.  */

>> > +  unsigned int fp_expressions : 1;

>> >

>> >    /* Information about function that will result after applying all the

>> >       inline decisions present in the callgraph.  Generally kept up to

>> > Index: ipa-inline-transform.c

>> > ===================================================================

>> > --- ipa-inline-transform.c      (revision 235693)

>> > +++ ipa-inline-transform.c      (working copy)

>> > @@ -338,6 +338,63 @@ inline_call (struct cgraph_edge *e, bool

>> >        DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)

>> >          = build_optimization_node (&opts);

>> >      }

>> > +  inline_summary *caller_info = inline_summaries->get (to);

>> > +  inline_summary *callee_info = inline_summaries->get (callee);

>> > +  if (!caller_info->fp_expressions && callee_info->fp_expressions)

>> > +    {

>> > +      caller_info->fp_expressions = true;

>> > +      if (opt_for_fn (callee->decl, flag_rounding_math)

>> > +         != opt_for_fn (to->decl, flag_rounding_math)

>> > +         || opt_for_fn (callee->decl, flag_trapping_math)

>> > +            != opt_for_fn (to->decl, flag_trapping_math)

>> > +         || opt_for_fn (callee->decl, flag_unsafe_math_optimizations)

>> > +            != opt_for_fn (to->decl, flag_unsafe_math_optimizations)

>> > +         || opt_for_fn (callee->decl, flag_finite_math_only)

>> > +            != opt_for_fn (to->decl, flag_finite_math_only)

>> > +         || opt_for_fn (callee->decl, flag_signaling_nans)

>> > +            != opt_for_fn (to->decl, flag_signaling_nans)

>> > +         || opt_for_fn (callee->decl, flag_cx_limited_range)

>> > +            != opt_for_fn (to->decl, flag_cx_limited_range)

>> > +         || opt_for_fn (callee->decl, flag_signed_zeros)

>> > +            != opt_for_fn (to->decl, flag_signed_zeros)

>> > +         || opt_for_fn (callee->decl, flag_associative_math)

>> > +            != opt_for_fn (to->decl, flag_associative_math)

>> > +         || opt_for_fn (callee->decl, flag_reciprocal_math)

>> > +            != opt_for_fn (to->decl, flag_reciprocal_math)

>> > +         || opt_for_fn (callee->decl, flag_errno_math)

>> > +            != opt_for_fn (to->decl, flag_errno_math))

>> > +       {

>> > +         struct gcc_options opts = global_options;

>> > +

>> > +         cl_optimization_restore (&opts, opts_for_fn (to->decl));

>> > +         opts.x_flag_rounding_math

>> > +           = opt_for_fn (callee->decl, flag_rounding_math);

>> > +         opts.x_flag_trapping_math

>> > +           = opt_for_fn (callee->decl, flag_trapping_math);

>> > +         opts.x_flag_unsafe_math_optimizations

>> > +           = opt_for_fn (callee->decl, flag_unsafe_math_optimizations);

>> > +         opts.x_flag_finite_math_only

>> > +           = opt_for_fn (callee->decl, flag_finite_math_only);

>> > +         opts.x_flag_signaling_nans

>> > +           = opt_for_fn (callee->decl, flag_signaling_nans);

>> > +         opts.x_flag_cx_limited_range

>> > +           = opt_for_fn (callee->decl, flag_cx_limited_range);

>> > +         opts.x_flag_signed_zeros

>> > +           = opt_for_fn (callee->decl, flag_signed_zeros);

>> > +         opts.x_flag_associative_math

>> > +           = opt_for_fn (callee->decl, flag_associative_math);

>> > +         opts.x_flag_reciprocal_math

>> > +           = opt_for_fn (callee->decl, flag_reciprocal_math);

>> > +         opts.x_flag_errno_math

>> > +           = opt_for_fn (callee->decl, flag_errno_math);

>> > +         if (dump_file)

>> > +           fprintf (dump_file, "Copying FP flags from %s:%i to %s:%i\n",

>> > +                    callee->name (), callee->order, to->name (), to->order);

>> > +         build_optimization_node (&opts);

>> > +         DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)

>> > +            = build_optimization_node (&opts);

>> > +       }

>> > +    }

>> >

>> >    /* If aliases are involved, redirect edge to the actual destination and

>> >       possibly remove the aliases.  */

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
2016-05-03  Christophe Lyon  <christophe.lyon@linaro.org>

	* gcc.dg/ipa/inline-8.c: Require c99_runtime.
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/ipa/inline-8.c b/gcc/testsuite/gcc.dg/ipa/inline-8.c
index a4ac5d0..9629064 100644
--- a/gcc/testsuite/gcc.dg/ipa/inline-8.c
+++ b/gcc/testsuite/gcc.dg/ipa/inline-8.c
@@ -1,7 +1,9 @@ 
 /* Verify that we do not inline isnanf test info -ffast-math code but that we
    do inline trivial functions across -Ofast boundary.  */
 /* { dg-do run } */
+/* { dg-require-effective-target c99_runtime } */
 /* { dg-options "-O2"  } */
+/* { dg-add-options c99_runtime } */
 #include <math.h>
 /* Can't be inlined because isnanf will be optimized out.  */
 int