Message ID | 87lgh6bsmr.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Don't use permutes for single-element accesses (PR83753) | expand |
On Tue, Jan 9, 2018 at 10:59 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > After cunrolling the inner loop, the remaining loop in the testcase > has a single 32-bit access and a group of 64-bit accesses. We first > try to vectorise at 128 bits (VF 4), but decide not to for cost reasons. > We then try with 64 bits (VF 2) instead. This means that the group > of 64-bit accesses uses a single-element vector, which is deliberately > supported as of r251538. We then try to create "permutes" for these > single-element vectors and fall foul of: > > for (i = 0; i < 6; i++) > sel[i] += exact_div (nelt, 2); > > in vect_grouped_store_supported, since nelt==1. > > Maybe we shouldn't even be trying to vectorise statements in the > single-element case, and instead just copy the scalar statement > for each member of the group. But until then, this patch treats > non-strided grouped accesses as VMAT_CONTIGUOUS if no permutation > is necessary. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > OK to install? Ok. RIchard. > Richard > > > 2018-01-09 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > PR tree-optimization/83753 > * tree-vect-stmts.c (get_group_load_store_type): Use VMAT_CONTIGUOUS > for non-strided grouped accesses if the number of elements is 1. > > gcc/testsuite/ > PR tree-optimization/83753 > * gcc.dg/torture/pr83753.c: New test. > > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2018-01-09 15:46:34.439449019 +0000 > +++ gcc/tree-vect-stmts.c 2018-01-09 18:15:53.481983778 +0000 > @@ -1849,10 +1849,16 @@ get_group_load_store_type (gimple *stmt, > && (can_overrun_p || !would_overrun_p) > && compare_step_with_zero (stmt) > 0) > { > - /* First try using LOAD/STORE_LANES. */ > - if (vls_type == VLS_LOAD > - ? vect_load_lanes_supported (vectype, group_size) > - : vect_store_lanes_supported (vectype, group_size)) > + /* First cope with the degenerate case of a single-element > + vector. */ > + if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) > + *memory_access_type = VMAT_CONTIGUOUS; > + > + /* Otherwise try using LOAD/STORE_LANES. */ > + if (*memory_access_type == VMAT_ELEMENTWISE > + && (vls_type == VLS_LOAD > + ? vect_load_lanes_supported (vectype, group_size) > + : vect_store_lanes_supported (vectype, group_size))) > { > *memory_access_type = VMAT_LOAD_STORE_LANES; > overrun_p = would_overrun_p; > Index: gcc/testsuite/gcc.dg/torture/pr83753.c > =================================================================== > --- /dev/null 2018-01-08 18:48:58.045015662 +0000 > +++ gcc/testsuite/gcc.dg/torture/pr83753.c 2018-01-09 18:15:53.480983817 +0000 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mcpu=xgene1" { target aarch64*-*-* } } */ > + > +typedef struct { > + int m1[10]; > + double m2[10][8]; > +} blah; > + > +void > +foo (blah *info) { > + int i, d; > + > + for (d=0; d<10; d++) { > + info->m1[d] = 0; > + info->m2[d][0] = 1; > + for (i=1; i<8; i++) > + info->m2[d][i] = 2; > + } > +}
Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2018-01-09 15:46:34.439449019 +0000 +++ gcc/tree-vect-stmts.c 2018-01-09 18:15:53.481983778 +0000 @@ -1849,10 +1849,16 @@ get_group_load_store_type (gimple *stmt, && (can_overrun_p || !would_overrun_p) && compare_step_with_zero (stmt) > 0) { - /* First try using LOAD/STORE_LANES. */ - if (vls_type == VLS_LOAD - ? vect_load_lanes_supported (vectype, group_size) - : vect_store_lanes_supported (vectype, group_size)) + /* First cope with the degenerate case of a single-element + vector. */ + if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) + *memory_access_type = VMAT_CONTIGUOUS; + + /* Otherwise try using LOAD/STORE_LANES. */ + if (*memory_access_type == VMAT_ELEMENTWISE + && (vls_type == VLS_LOAD + ? vect_load_lanes_supported (vectype, group_size) + : vect_store_lanes_supported (vectype, group_size))) { *memory_access_type = VMAT_LOAD_STORE_LANES; overrun_p = would_overrun_p; Index: gcc/testsuite/gcc.dg/torture/pr83753.c =================================================================== --- /dev/null 2018-01-08 18:48:58.045015662 +0000 +++ gcc/testsuite/gcc.dg/torture/pr83753.c 2018-01-09 18:15:53.480983817 +0000 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=xgene1" { target aarch64*-*-* } } */ + +typedef struct { + int m1[10]; + double m2[10][8]; +} blah; + +void +foo (blah *info) { + int i, d; + + for (d=0; d<10; d++) { + info->m1[d] = 0; + info->m2[d][0] = 1; + for (i=1; i<8; i++) + info->m2[d][i] = 2; + } +}