[ARM] Use vector wide add for mixed-mode adds

Kyrill,

I have attached a patch that address your comments. The only change I 
would ask you to re-consider renaming is the function 'bool 
aarch32_simd_check_vect_par_cnst_half'. This function was copied from 
the aarch64 port and I thought it as important to match the naming for 
maintenance purposes. I did rename the function to 'bool 
arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm' and 
added '_p' per you suggestions. Is this okay?

I implemented all your other change suggestions.

2015-12-16  Michael Collison  <michael.collison@linaro.org>

     * config/arm/neon.md (widen_<us>sum<mode>): New patterns where
     mode is VQI to improve mixed mode vectorization.
     * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
     define_insn to match low half of signed vaddw.
     * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
     define_insn to match high half of signed vaddw.
     * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
     define_insn to match low half of unsigned vaddw.
     * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
     define_insn to match high half of unsigned vaddw.
     * config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
     (arm_simd_check_vect_par_cnst_half_p): Likewise.
     * config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
     for new function.
     (arm_simd_check_vect_par_cnst_half_p): Likewise.
     * config/arm/predicates.md (vect_par_constant_high): Support
     big endian and simplify by calling
     arm_simd_check_vect_par_cnst_half
     (vect_par_constant_low): Likewise.
     * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
     * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
     * testsuite/lib/target-supports.exp
     (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
     that arm neon support vector widen sum of HImode TO SImode.

On 12/10/2015 08:09 AM, Kyrill Tkachov wrote:
> Hi Michael,

>

> A few comments while I look deeper into this patch...

>

> On 30/11/15 01:18, Michael Collison wrote:

>>

>> This is a modified version of my previous patch that supports vector 

>> wide add. I added support for vaddw on big endian when generating the 

>> parallel operand for the vector select.

>>

>> There are four failing test cases on arm big endian with similar 

>> code. They are:

>>

>> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test

>> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test

>> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test

>> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test

>>

>>

>> The failures occur without my patch and are related to a bug with 

>> vector loads using VUZP operations.

>>

>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

>>

>> Validated on arm-none-eabi, arm-none-linux-gnueabi, 

>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

>>

>> 2015-11-29  Michael Collison <michael.collison@linaro.org>

>>

>>     * config/arm/neon.md (widen_<us>sum<mode>): New patterns where

>>     mode is VQI to improve mixed mode vectorization.

>>     * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): 

>> New

>>     define_insn to match low half of signed vaddw.

>>     * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): 

>> New

>>     define_insn to match high half of signed vaddw.

>>     * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): 

>> New

>>     define_insn to match low half of unsigned vaddw.

>>     * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): 

>> New

>>     define_insn to match high half of unsigned vaddw.

>>     * config/arm/arm.c (aarch32_simd_vect_par_cnst_half): New function.

>>     (aarch32_simd_check_vect_par_cnst_half): Likewise.

>>     * config/arm/arm-protos.h (aarch32_simd_vect_par_cnst_half): 

>> Prototype

>>     for new function.

>>     (aarch32_simd_check_vect_par_cnst_half): Likewise.

>>     * config/arm/predicates.md (vect_par_constant_high): Support

>>     big endian and simplify by calling

>>     aarch32_simd_check_vect_par_cnst_half

>>     (vect_par_constant_low): Likewise.

>>     * testsuite/gcc.target/arm/neon-vaddws16.c: New test.

>>     * testsuite/gcc.target/arm/neon-vaddws32.c: New test.

>>     * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.

>>     * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.

>>     * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.

>>     * testsuite/lib/target-supports.exp

>>     (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate

>>     that arm neon support vector widen sum of HImode TO SImode.

>>

>> Okay for trunk?

>>

>

> --- a/gcc/config/arm/arm-protos.h

> +++ b/gcc/config/arm/arm-protos.h

> @@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool 

> initialize_p

>                    ATTRIBUTE_UNUSED);

>  extern void arm_init_builtins (void);

>  extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 

> tree *update);

> -

> +extern rtx aarch32_simd_vect_par_cnst_half (machine_mode mode, bool 

> high);

> +extern bool aarch32_simd_check_vect_par_cnst_half (rtx op, 

> machine_mode mode,

> +                           bool high);

>

>

> Please use arm instead of aarch32 in the name to be consistent with 

> the rest of the

> backend. Also, for functions that return a bool without side-effects 

> it's preferable

> to finish their name with '_p'. So for the second one I'd drop the 

> 'check' and call

> it something like "arm_vector_of_lane_nums_p ", is that a more 

> descriptive name?

>

> +/* Check OP for validity as a PARALLEL RTX vector with elements

> +   numbering the lanes of either the high (HIGH == TRUE) or low lanes,

> +   from the perspective of the architecture.  See the diagram above

> +   aarch64_simd_vect_par_cnst_half for more details.  */

> +

>

> aarch64?

>

> --- a/gcc/config/arm/neon.md

> +++ b/gcc/config/arm/neon.md

> @@ -1174,6 +1174,51 @@

>

>  ;; Widening operations

>

> +(define_expand "widen_ssum<mode>3"

> +  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")

> +    (plus:<V_double_width> (sign_extend:<V_double_width> 

> (match_operand:VQI 1 "s_register_operand" ""))

> +                   (match_operand:<V_double_width> 2 

> "s_register_operand" "")))]

> +  "TARGET_NEON"

> +  {

> +    machine_mode mode = GET_MODE (operands[1]);

> +    rtx p1, p2;

> +

> +    p1  = aarch32_simd_vect_par_cnst_half (mode, false);

> +    p2  = aarch32_simd_vect_par_cnst_half (mode, true);

> +

> +    if (operands[0] != operands[2])

> +      emit_move_insn (operands[0], operands[2]);

> +

> +    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], 

> operands[1], p1, operands[0]));

> +    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], 

> operands[1], p2, operands[0]));

> +    DONE;

> +  }

>

> Please format these properly to avoid long lines.

> Thanks,

> Kyrill

>

>

-- 
Michael Collison
Linaro Toolchain Working Group
michael.collison@linaro.org

[ARM] Use vector wide add for mixed-mode adds

Commit Message

Patch