Message ID | CAAgBjMm2O=bSQBen=Mn7QdPKQEcJwRMs3MV8+wMuR7hvBr5n-w@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding | expand |
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > Hi, > The attached patch attempts to fix PR111648. > As mentioned in PR, the issue is when a1 is a multiple of vector > length, we end up creating following encoding in result: { base_elem, > arg[0], arg[1], ... } (assuming S = 1), > where arg is chosen input vector, which is incorrect, since the > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > > For the test-case mentioned in PR, vectorizer pass creates > VEC_PERM_EXPR<arg0, arg, sel> where: > arg0: { -16, -9, -10, -11 } > arg1: { -12, -5, -6, -7 } > sel = { 3, 4, 5, 6 } > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > Since a1 = 4 and arg_len = 4, it ended up creating the result with > following encoding: > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > = { -11, -12, -5 } > > So for res[3], it used S = (-5) - (-12) = 7 > And hence computed it as -5 + 7 = 2. > instead of selecting arg1[2], ie, -6. > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > of vector length, so a1 ... ae select elements only from stepped part > of the pattern > from input vector and return false for this case. > > Since the vectors are VLS, fold_vec_perm_cst then sets: > res_npatterns = res_nelts > res_nelts_per_pattern = 1 > which seems to fix the issue by encoding all the elements. > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > which used a1 = 0, and thus selected arg1[0]. > > I removed Case 4 because it was already covered in test_nunits_min_4, > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > and added a new Case 9 to test for this issue. > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > and on x86_64-linux-gnu. > Does the patch look OK ? > > Thanks, > Prathamesh > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > > gcc/ChangeLog: > PR tree-optimization/111648 > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > is a multiple of vector length. > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > (test_nunits_min_4): ... here and rename case numbers. Also add > Case 9. > > gcc/testsuite/ChangeLog: > PR tree-optimization/111648 > * gcc.dg/vect/pr111648.c: New test. > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 4f8561509ff..c5f421d6b76 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > return false; > } > > - /* Ensure that the stepped sequence always selects from the same > - input pattern. */ > + /* Ensure that the stepped sequence always selects from the stepped > + part of same input pattern. */ > unsigned arg_npatterns > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > : VECTOR_CST_NPATTERNS (arg1); > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > *reason = "step is not multiple of npatterns"; > return false; > } > + > + /* If a1 is a multiple of len, it will select base element of input > + vector resulting in following encoding: > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > + vector. This encoding is not originally present in arg, since it's > + defined as: > + { arg[0], arg[1], arg[2], ... }. */ > + > + if (multiple_p (a1, arg_len)) > + { > + if (reason) > + *reason = "selecting base element of input vector"; > + return false; > + } That wouldn't catch (for example) cases where a1 == arg_len + 1 and the second argument has 2 stepped patterns. The equivalent condition that handles multiple patterns would probably be to reject q1 < arg_npatterns. But that's only necessary if: (1) the argument has three elements per pattern (i.e. has a stepped sequence) and (2) element 2 - element 1 != element 1 - element 0 I think we should check those to avoid pessimising VLA cases. Thanks, Richard > } > > return true; > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > validate_res (2, 2, res, expected_res); > } > - > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > - Test that the stepped sequence of the pattern selects from > - same input pattern. Since input vectors have npatterns = 2, > - and step (a2 - a1) = 1, step is not a multiple of npatterns > - in input vector. So return NULL_TREE. */ > - { > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > - > - vec_perm_builder builder (len, 1, 3); > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > - builder_push_elems (builder, mask_elems); > - > - vec_perm_indices sel (builder, 2, len); > - const char *reason; > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > - &reason); > - ASSERT_TRUE (res == NULL_TREE); > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > - } > - > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > - Test that stepped sequence of the pattern selects from arg0. > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > - { > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > - > - vec_perm_builder builder (len, 1, 3); > - poly_uint64 mask_elems[] = { len, 0, 1 }; > - builder_push_elems (builder, mask_elems); > - > - vec_perm_indices sel (builder, 2, len); > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > - > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > - validate_res (1, 3, res, expected_res); > - } > } > } > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > validate_res (1, 3, res, expected_res); > } > > - /* Case 4: > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > + Test that stepped sequence of the pattern selects from arg0. > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { len, 1, 2 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > + validate_res (1, 3, res, expected_res); > + } > + > + /* Case 5: > sel = { len, 0, 2, ... } // (1, 3) > This should return NULL because we cross the input vectors. > Because, > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > } > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > mask = { 0, len, 1, len + 1, ...} // (2, 2) > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > validate_res (2, 2, res, expected_res); > } > > - /* Case 6: Test combination in sel, where one pattern is dup and other > + /* Case 7: Test combination in sel, where one pattern is dup and other > is stepped sequence. > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > res = { arg0[0], arg0[0], arg0[0], > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > validate_res (2, 3, res, expected_res); > } > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > when arg0, arg1 and sel have different number of patterns. > arg0 is of shape (1, 1) > arg1 is of shape (4, 1) > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > ASSERT_TRUE (res == NULL_TREE); > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > } > + > + /* Case 9: PR111648 - a1 is multiple of vector length, > + which results in incorrect encoding. Verify that we return > + NULL for this case. > + sel = { base_elem, len, len+1, ... } // (1, 3) > + In this case, the single pattern is: { base_elem len, len+1, ...} > + Let's assume that base_elem is used for indexing into arg0, > + and a1 ... ae chooses elements from arg1. > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > + while the original encoding in arg1 is > + arg1: { arg1[0], arg1[1], arg1[2], ... } > + with S = arg1[2] - arg1[1]. > + > + As a concrete example, for above PR: > + arg0: { -16, -9, -10, -11 } > + arg1: { -12, -5, -6, -7 } > + sel = { 3, 4, 5, 6 } > + > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > + following encoding: > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > + = { -11, -12, -5 } > + > + So for res[3], it used S = (-5) - (-12) = 7 > + And hence computed it as -5 + 7 = 2. > + instead of arg1[2], ie, -6, which is the correct value. > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + const char *reason; > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > + ASSERT_TRUE (res == NULL_TREE); > + ASSERT_TRUE (!strcmp (reason, > + "selecting base element of input vector")); > + } > } > } > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > new file mode 100644 > index 00000000000..093e2b02654 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int a; > +int *b = &a; > +static int **c = &b; > +static int d; > +short e; > +short f; > + > +_Bool foo () > +{ > + f = -21; > + for (; f < -5; f++) { > + e = f ^ 3; > + d = *b; > + **c = e; > + } > + > + return d == -6; > +} > + > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
On Mon, 9 Oct 2023 at 17:05, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > Hi, > > The attached patch attempts to fix PR111648. > > As mentioned in PR, the issue is when a1 is a multiple of vector > > length, we end up creating following encoding in result: { base_elem, > > arg[0], arg[1], ... } (assuming S = 1), > > where arg is chosen input vector, which is incorrect, since the > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > > > > For the test-case mentioned in PR, vectorizer pass creates > > VEC_PERM_EXPR<arg0, arg, sel> where: > > arg0: { -16, -9, -10, -11 } > > arg1: { -12, -5, -6, -7 } > > sel = { 3, 4, 5, 6 } > > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > Since a1 = 4 and arg_len = 4, it ended up creating the result with > > following encoding: > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > > = { -11, -12, -5 } > > > > So for res[3], it used S = (-5) - (-12) = 7 > > And hence computed it as -5 + 7 = 2. > > instead of selecting arg1[2], ie, -6. > > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > > of vector length, so a1 ... ae select elements only from stepped part > > of the pattern > > from input vector and return false for this case. > > > > Since the vectors are VLS, fold_vec_perm_cst then sets: > > res_npatterns = res_nelts > > res_nelts_per_pattern = 1 > > which seems to fix the issue by encoding all the elements. > > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > > which used a1 = 0, and thus selected arg1[0]. > > > > I removed Case 4 because it was already covered in test_nunits_min_4, > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > > and added a new Case 9 to test for this issue. > > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > > and on x86_64-linux-gnu. > > Does the patch look OK ? > > > > Thanks, > > Prathamesh > > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > > > > gcc/ChangeLog: > > PR tree-optimization/111648 > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > > is a multiple of vector length. > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > > (test_nunits_min_4): ... here and rename case numbers. Also add > > Case 9. > > > > gcc/testsuite/ChangeLog: > > PR tree-optimization/111648 > > * gcc.dg/vect/pr111648.c: New test. > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 4f8561509ff..c5f421d6b76 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > return false; > > } > > > > - /* Ensure that the stepped sequence always selects from the same > > - input pattern. */ > > + /* Ensure that the stepped sequence always selects from the stepped > > + part of same input pattern. */ > > unsigned arg_npatterns > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > : VECTOR_CST_NPATTERNS (arg1); > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > *reason = "step is not multiple of npatterns"; > > return false; > > } > > + > > + /* If a1 is a multiple of len, it will select base element of input > > + vector resulting in following encoding: > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > > + vector. This encoding is not originally present in arg, since it's > > + defined as: > > + { arg[0], arg[1], arg[2], ... }. */ > > + > > + if (multiple_p (a1, arg_len)) > > + { > > + if (reason) > > + *reason = "selecting base element of input vector"; > > + return false; > > + } > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the > second argument has 2 stepped patterns. Ah right, thanks for pointing out. In the attached patch I extended the check so that r1 < arg_npatterns which should check if we are choosing base elements from any of the patterns in arg (and not just first). Does that look OK ? > > The equivalent condition that handles multiple patterns would > probably be to reject q1 < arg_npatterns. But that's only necessary if: Sorry, I don't understand -- we use q1 only for determining which vector to choose from, and r1 will give the index for first element ? > > (1) the argument has three elements per pattern (i.e. has a stepped > sequence) and > > (2) element 2 - element 1 != element 1 - element 0 > > I think we should check those to avoid pessimising VLA cases. Thanks for the suggestions. In attached POC patch (stage-1 tested), I added the above checks, does it look in the right direction ? Also, should this patch be the right fix for PR111754 ? Thanks, Prathamesh > > Thanks, > Richard > > > } > > > > return true; > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > > validate_res (2, 2, res, expected_res); > > } > > - > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > > - Test that the stepped sequence of the pattern selects from > > - same input pattern. Since input vectors have npatterns = 2, > > - and step (a2 - a1) = 1, step is not a multiple of npatterns > > - in input vector. So return NULL_TREE. */ > > - { > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > - > > - vec_perm_builder builder (len, 1, 3); > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > > - builder_push_elems (builder, mask_elems); > > - > > - vec_perm_indices sel (builder, 2, len); > > - const char *reason; > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > > - &reason); > > - ASSERT_TRUE (res == NULL_TREE); > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > - } > > - > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > > - Test that stepped sequence of the pattern selects from arg0. > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > - { > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > - > > - vec_perm_builder builder (len, 1, 3); > > - poly_uint64 mask_elems[] = { len, 0, 1 }; > > - builder_push_elems (builder, mask_elems); > > - > > - vec_perm_indices sel (builder, 2, len); > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > - > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > - validate_res (1, 3, res, expected_res); > > - } > > } > > } > > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > > validate_res (1, 3, res, expected_res); > > } > > > > - /* Case 4: > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > > + Test that stepped sequence of the pattern selects from arg0. > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { len, 1, 2 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > > + validate_res (1, 3, res, expected_res); > > + } > > + > > + /* Case 5: > > sel = { len, 0, 2, ... } // (1, 3) > > This should return NULL because we cross the input vectors. > > Because, > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > > } > > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > > mask = { 0, len, 1, len + 1, ...} // (2, 2) > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > > validate_res (2, 2, res, expected_res); > > } > > > > - /* Case 6: Test combination in sel, where one pattern is dup and other > > + /* Case 7: Test combination in sel, where one pattern is dup and other > > is stepped sequence. > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > > res = { arg0[0], arg0[0], arg0[0], > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > > validate_res (2, 3, res, expected_res); > > } > > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > > when arg0, arg1 and sel have different number of patterns. > > arg0 is of shape (1, 1) > > arg1 is of shape (4, 1) > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > > ASSERT_TRUE (res == NULL_TREE); > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > } > > + > > + /* Case 9: PR111648 - a1 is multiple of vector length, > > + which results in incorrect encoding. Verify that we return > > + NULL for this case. > > + sel = { base_elem, len, len+1, ... } // (1, 3) > > + In this case, the single pattern is: { base_elem len, len+1, ...} > > + Let's assume that base_elem is used for indexing into arg0, > > + and a1 ... ae chooses elements from arg1. > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > > + while the original encoding in arg1 is > > + arg1: { arg1[0], arg1[1], arg1[2], ... } > > + with S = arg1[2] - arg1[1]. > > + > > + As a concrete example, for above PR: > > + arg0: { -16, -9, -10, -11 } > > + arg1: { -12, -5, -6, -7 } > > + sel = { 3, 4, 5, 6 } > > + > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > > + following encoding: > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > > + = { -11, -12, -5 } > > + > > + So for res[3], it used S = (-5) - (-12) = 7 > > + And hence computed it as -5 + 7 = 2. > > + instead of arg1[2], ie, -6, which is the correct value. > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + const char *reason; > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > + ASSERT_TRUE (res == NULL_TREE); > > + ASSERT_TRUE (!strcmp (reason, > > + "selecting base element of input vector")); > > + } > > } > > } > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > > new file mode 100644 > > index 00000000000..093e2b02654 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > > @@ -0,0 +1,23 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +int a; > > +int *b = &a; > > +static int **c = &b; > > +static int d; > > +short e; > > +short f; > > + > > +_Bool foo () > > +{ > > + f = -21; > > + for (; f < -5; f++) { > > + e = f ^ 3; > > + d = *b; > > + **c = e; > > + } > > + > > + return d == -6; > > +} > > + > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..ae914cbd880 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10607,6 +10607,25 @@ vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts) return true; } +static poly_int64 +vector_cst_elt_poly_index (tree arg, poly_uint64 index) +{ + int q; + poly_uint64 r; + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)); + + if (!can_div_trunc_p (index, len, &q, &r)) + return NULL_TREE; + + unsigned HOST_WIDE_INT i; + if (!r.is_constant (&i)) + return NULL_TREE; + + tree elem = vector_cst_elt (arg, i); + gcc_assert (elem != NULL_TREE); + return tree_to_poly_int64 (elem); +} + /* Helper routine for fold_vec_perm_cst to check if SEL is a suitable mask for VLA vec_perm folding. REASON if specified, will contain the reason why SEL is not suitable. @@ -10684,9 +10703,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, /* Ensure that the stepped sequence always selects from the same input pattern. */ - unsigned arg_npatterns - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) - : VECTOR_CST_NPATTERNS (arg1); + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); if (!multiple_p (step, arg_npatterns)) { @@ -10694,6 +10712,21 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, *reason = "step is not multiple of npatterns"; return false; } + + /* Ensure that a1 only selects from stepped part of the pattern from arg, + if the pattern is not a natural stepped sequence, ie, + ((a2 - a1) != (a1 - a0)). */ + + poly_int64 arg_elem0 = vector_cst_elt_poly_index (arg, sel[pattern]); + poly_int64 arg_elem1 = vector_cst_elt_poly_index (arg, a1); + poly_int64 arg_elem2 = vector_cst_elt_poly_index (arg, a2); + if (!known_eq (arg_elem2 - arg_elem1, arg_elem1 - arg_elem0) + && known_lt (r1, arg_npatterns)) + { + if (reason) + *reason = "choosing base element from input vector"; + return false; + } } return true;
On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford > <richard.sandiford@arm.com> wrote: > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > Hi, > > > The attached patch attempts to fix PR111648. > > > As mentioned in PR, the issue is when a1 is a multiple of vector > > > length, we end up creating following encoding in result: { base_elem, > > > arg[0], arg[1], ... } (assuming S = 1), > > > where arg is chosen input vector, which is incorrect, since the > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > > > > > > For the test-case mentioned in PR, vectorizer pass creates > > > VEC_PERM_EXPR<arg0, arg, sel> where: > > > arg0: { -16, -9, -10, -11 } > > > arg1: { -12, -5, -6, -7 } > > > sel = { 3, 4, 5, 6 } > > > > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > following encoding: > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > > > = { -11, -12, -5 } > > > > > > So for res[3], it used S = (-5) - (-12) = 7 > > > And hence computed it as -5 + 7 = 2. > > > instead of selecting arg1[2], ie, -6. > > > > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > > > of vector length, so a1 ... ae select elements only from stepped part > > > of the pattern > > > from input vector and return false for this case. > > > > > > Since the vectors are VLS, fold_vec_perm_cst then sets: > > > res_npatterns = res_nelts > > > res_nelts_per_pattern = 1 > > > which seems to fix the issue by encoding all the elements. > > > > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > > > which used a1 = 0, and thus selected arg1[0]. > > > > > > I removed Case 4 because it was already covered in test_nunits_min_4, > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > > > and added a new Case 9 to test for this issue. > > > > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > > > and on x86_64-linux-gnu. > > > Does the patch look OK ? > > > > > > Thanks, > > > Prathamesh > > > > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > > > > > > gcc/ChangeLog: > > > PR tree-optimization/111648 > > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > > > is a multiple of vector length. > > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > > > (test_nunits_min_4): ... here and rename case numbers. Also add > > > Case 9. > > > > > > gcc/testsuite/ChangeLog: > > > PR tree-optimization/111648 > > > * gcc.dg/vect/pr111648.c: New test. > > > > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > index 4f8561509ff..c5f421d6b76 100644 > > > --- a/gcc/fold-const.cc > > > +++ b/gcc/fold-const.cc > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > return false; > > > } > > > > > > - /* Ensure that the stepped sequence always selects from the same > > > - input pattern. */ > > > + /* Ensure that the stepped sequence always selects from the stepped > > > + part of same input pattern. */ > > > unsigned arg_npatterns > > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > > : VECTOR_CST_NPATTERNS (arg1); > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > *reason = "step is not multiple of npatterns"; > > > return false; > > > } > > > + > > > + /* If a1 is a multiple of len, it will select base element of input > > > + vector resulting in following encoding: > > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > > > + vector. This encoding is not originally present in arg, since it's > > > + defined as: > > > + { arg[0], arg[1], arg[2], ... }. */ > > > + > > > + if (multiple_p (a1, arg_len)) > > > + { > > > + if (reason) > > > + *reason = "selecting base element of input vector"; > > > + return false; > > > + } > > > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the > > second argument has 2 stepped patterns. > Ah right, thanks for pointing out. In the attached patch I extended the check > so that r1 < arg_npatterns which should check if we are choosing base > elements from any of the patterns in arg (and not just first). > Does that look OK ? > > > > The equivalent condition that handles multiple patterns would > > probably be to reject q1 < arg_npatterns. But that's only necessary if: > Sorry, I don't understand -- we use q1 only for determining which > vector to choose from, > and r1 will give the index for first element ? > > > > (1) the argument has three elements per pattern (i.e. has a stepped > > sequence) and > > > > (2) element 2 - element 1 != element 1 - element 0 > > > > I think we should check those to avoid pessimising VLA cases. > Thanks for the suggestions. In attached POC patch (stage-1 tested), I > added the above checks, > does it look in the right direction ? Also, should this patch be the > right fix for PR111754 ? Oops sorry, this patch causes build errors on aarch64. Please ignore it. Sorry for the noise. Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Richard > > > > > } > > > > > > return true; > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > > > validate_res (2, 2, res, expected_res); > > > } > > > - > > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > > > - Test that the stepped sequence of the pattern selects from > > > - same input pattern. Since input vectors have npatterns = 2, > > > - and step (a2 - a1) = 1, step is not a multiple of npatterns > > > - in input vector. So return NULL_TREE. */ > > > - { > > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > - > > > - vec_perm_builder builder (len, 1, 3); > > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > > > - builder_push_elems (builder, mask_elems); > > > - > > > - vec_perm_indices sel (builder, 2, len); > > > - const char *reason; > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > > > - &reason); > > > - ASSERT_TRUE (res == NULL_TREE); > > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > - } > > > - > > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > > > - Test that stepped sequence of the pattern selects from arg0. > > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > > - { > > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > - > > > - vec_perm_builder builder (len, 1, 3); > > > - poly_uint64 mask_elems[] = { len, 0, 1 }; > > > - builder_push_elems (builder, mask_elems); > > > - > > > - vec_perm_indices sel (builder, 2, len); > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > - > > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > > - validate_res (1, 3, res, expected_res); > > > - } > > > } > > > } > > > > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (1, 3, res, expected_res); > > > } > > > > > > - /* Case 4: > > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > > > + Test that stepped sequence of the pattern selects from arg0. > > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > > > + { > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > + > > > + vec_perm_builder builder (len, 1, 3); > > > + poly_uint64 mask_elems[] = { len, 1, 2 }; > > > + builder_push_elems (builder, mask_elems); > > > + > > > + vec_perm_indices sel (builder, 2, len); > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > + > > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > > > + validate_res (1, 3, res, expected_res); > > > + } > > > + > > > + /* Case 5: > > > sel = { len, 0, 2, ... } // (1, 3) > > > This should return NULL because we cross the input vectors. > > > Because, > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > > > } > > > > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > mask = { 0, len, 1, len + 1, ...} // (2, 2) > > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > > > > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (2, 2, res, expected_res); > > > } > > > > > > - /* Case 6: Test combination in sel, where one pattern is dup and other > > > + /* Case 7: Test combination in sel, where one pattern is dup and other > > > is stepped sequence. > > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > > > res = { arg0[0], arg0[0], arg0[0], > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > > > validate_res (2, 3, res, expected_res); > > > } > > > > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > > > when arg0, arg1 and sel have different number of patterns. > > > arg0 is of shape (1, 1) > > > arg1 is of shape (4, 1) > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > > > ASSERT_TRUE (res == NULL_TREE); > > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > } > > > + > > > + /* Case 9: PR111648 - a1 is multiple of vector length, > > > + which results in incorrect encoding. Verify that we return > > > + NULL for this case. > > > + sel = { base_elem, len, len+1, ... } // (1, 3) > > > + In this case, the single pattern is: { base_elem len, len+1, ...} > > > + Let's assume that base_elem is used for indexing into arg0, > > > + and a1 ... ae chooses elements from arg1. > > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > > > + while the original encoding in arg1 is > > > + arg1: { arg1[0], arg1[1], arg1[2], ... } > > > + with S = arg1[2] - arg1[1]. > > > + > > > + As a concrete example, for above PR: > > > + arg0: { -16, -9, -10, -11 } > > > + arg1: { -12, -5, -6, -7 } > > > + sel = { 3, 4, 5, 6 } > > > + > > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > + following encoding: > > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > > > + = { -11, -12, -5 } > > > + > > > + So for res[3], it used S = (-5) - (-12) = 7 > > > + And hence computed it as -5 + 7 = 2. > > > + instead of arg1[2], ie, -6, which is the correct value. > > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > > > + { > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > + > > > + vec_perm_builder builder (len, 1, 3); > > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > > + builder_push_elems (builder, mask_elems); > > > + > > > + vec_perm_indices sel (builder, 2, len); > > > + const char *reason; > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > > + ASSERT_TRUE (res == NULL_TREE); > > > + ASSERT_TRUE (!strcmp (reason, > > > + "selecting base element of input vector")); > > > + } > > > } > > > } > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > new file mode 100644 > > > index 00000000000..093e2b02654 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > @@ -0,0 +1,23 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > + > > > +int a; > > > +int *b = &a; > > > +static int **c = &b; > > > +static int d; > > > +short e; > > > +short f; > > > + > > > +_Bool foo () > > > +{ > > > + f = -21; > > > + for (; f < -5; f++) { > > > + e = f ^ 3; > > > + d = *b; > > > + **c = e; > > > + } > > > + > > > + return d == -6; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote: > > On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: > > > > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > > > > > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > > > Hi, > > > > The attached patch attempts to fix PR111648. > > > > As mentioned in PR, the issue is when a1 is a multiple of vector > > > > length, we end up creating following encoding in result: { base_elem, > > > > arg[0], arg[1], ... } (assuming S = 1), > > > > where arg is chosen input vector, which is incorrect, since the > > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > > > > > > > > For the test-case mentioned in PR, vectorizer pass creates > > > > VEC_PERM_EXPR<arg0, arg, sel> where: > > > > arg0: { -16, -9, -10, -11 } > > > > arg1: { -12, -5, -6, -7 } > > > > sel = { 3, 4, 5, 6 } > > > > > > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > > following encoding: > > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > > > > = { -11, -12, -5 } > > > > > > > > So for res[3], it used S = (-5) - (-12) = 7 > > > > And hence computed it as -5 + 7 = 2. > > > > instead of selecting arg1[2], ie, -6. > > > > > > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > > > > of vector length, so a1 ... ae select elements only from stepped part > > > > of the pattern > > > > from input vector and return false for this case. > > > > > > > > Since the vectors are VLS, fold_vec_perm_cst then sets: > > > > res_npatterns = res_nelts > > > > res_nelts_per_pattern = 1 > > > > which seems to fix the issue by encoding all the elements. > > > > > > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > > > > which used a1 = 0, and thus selected arg1[0]. > > > > > > > > I removed Case 4 because it was already covered in test_nunits_min_4, > > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > > > > and added a new Case 9 to test for this issue. > > > > > > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > > > > and on x86_64-linux-gnu. > > > > Does the patch look OK ? > > > > > > > > Thanks, > > > > Prathamesh > > > > > > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > > > > > > > > gcc/ChangeLog: > > > > PR tree-optimization/111648 > > > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > > > > is a multiple of vector length. > > > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > > > > (test_nunits_min_4): ... here and rename case numbers. Also add > > > > Case 9. > > > > > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/111648 > > > > * gcc.dg/vect/pr111648.c: New test. > > > > > > > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > > index 4f8561509ff..c5f421d6b76 100644 > > > > --- a/gcc/fold-const.cc > > > > +++ b/gcc/fold-const.cc > > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > > return false; > > > > } > > > > > > > > - /* Ensure that the stepped sequence always selects from the same > > > > - input pattern. */ > > > > + /* Ensure that the stepped sequence always selects from the stepped > > > > + part of same input pattern. */ > > > > unsigned arg_npatterns > > > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > > > : VECTOR_CST_NPATTERNS (arg1); > > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > > *reason = "step is not multiple of npatterns"; > > > > return false; > > > > } > > > > + > > > > + /* If a1 is a multiple of len, it will select base element of input > > > > + vector resulting in following encoding: > > > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > > > > + vector. This encoding is not originally present in arg, since it's > > > > + defined as: > > > > + { arg[0], arg[1], arg[2], ... }. */ > > > > + > > > > + if (multiple_p (a1, arg_len)) > > > > + { > > > > + if (reason) > > > > + *reason = "selecting base element of input vector"; > > > > + return false; > > > > + } > > > > > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the > > > second argument has 2 stepped patterns. > > Ah right, thanks for pointing out. In the attached patch I extended the check > > so that r1 < arg_npatterns which should check if we are choosing base > > elements from any of the patterns in arg (and not just first). > > Does that look OK ? > > > > > > The equivalent condition that handles multiple patterns would > > > probably be to reject q1 < arg_npatterns. But that's only necessary if: > > Sorry, I don't understand -- we use q1 only for determining which > > vector to choose from, > > and r1 will give the index for first element ? > > > > > > (1) the argument has three elements per pattern (i.e. has a stepped > > > sequence) and > > > > > > (2) element 2 - element 1 != element 1 - element 0 > > > > > > I think we should check those to avoid pessimising VLA cases. > > Thanks for the suggestions. In attached POC patch (stage-1 tested), I > > added the above checks, > > does it look in the right direction ? Also, should this patch be the > > right fix for PR111754 ? > Oops sorry, this patch causes build errors on aarch64. Please ignore it. > Sorry for the noise. Hi Richard, The attached patch passes bootstrap+test on aarch64-linux-gnu with and without SVE, and on x86_64-linux-gnu. Does it look OK ? Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > Richard > > > > > > > } > > > > > > > > return true; > > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > > > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > > > > validate_res (2, 2, res, expected_res); > > > > } > > > > - > > > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > > > > - Test that the stepped sequence of the pattern selects from > > > > - same input pattern. Since input vectors have npatterns = 2, > > > > - and step (a2 - a1) = 1, step is not a multiple of npatterns > > > > - in input vector. So return NULL_TREE. */ > > > > - { > > > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > - > > > > - vec_perm_builder builder (len, 1, 3); > > > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > > > > - builder_push_elems (builder, mask_elems); > > > > - > > > > - vec_perm_indices sel (builder, 2, len); > > > > - const char *reason; > > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > > > > - &reason); > > > > - ASSERT_TRUE (res == NULL_TREE); > > > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > > - } > > > > - > > > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > > > > - Test that stepped sequence of the pattern selects from arg0. > > > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > > > - { > > > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > - > > > > - vec_perm_builder builder (len, 1, 3); > > > > - poly_uint64 mask_elems[] = { len, 0, 1 }; > > > > - builder_push_elems (builder, mask_elems); > > > > - > > > > - vec_perm_indices sel (builder, 2, len); > > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > > - > > > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > > > - validate_res (1, 3, res, expected_res); > > > > - } > > > > } > > > > } > > > > > > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > > > > validate_res (1, 3, res, expected_res); > > > > } > > > > > > > > - /* Case 4: > > > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > > > > + Test that stepped sequence of the pattern selects from arg0. > > > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > > > > + { > > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > + > > > > + vec_perm_builder builder (len, 1, 3); > > > > + poly_uint64 mask_elems[] = { len, 1, 2 }; > > > > + builder_push_elems (builder, mask_elems); > > > > + > > > > + vec_perm_indices sel (builder, 2, len); > > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > > > + > > > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > > > > + validate_res (1, 3, res, expected_res); > > > > + } > > > > + > > > > + /* Case 5: > > > > sel = { len, 0, 2, ... } // (1, 3) > > > > This should return NULL because we cross the input vectors. > > > > Because, > > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > > > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > > > > } > > > > > > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > > > > mask = { 0, len, 1, len + 1, ...} // (2, 2) > > > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > > > > > > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > > > > validate_res (2, 2, res, expected_res); > > > > } > > > > > > > > - /* Case 6: Test combination in sel, where one pattern is dup and other > > > > + /* Case 7: Test combination in sel, where one pattern is dup and other > > > > is stepped sequence. > > > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > > > > res = { arg0[0], arg0[0], arg0[0], > > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > > > > validate_res (2, 3, res, expected_res); > > > > } > > > > > > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > > > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > > > > when arg0, arg1 and sel have different number of patterns. > > > > arg0 is of shape (1, 1) > > > > arg1 is of shape (4, 1) > > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > > > > ASSERT_TRUE (res == NULL_TREE); > > > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > > > > } > > > > + > > > > + /* Case 9: PR111648 - a1 is multiple of vector length, > > > > + which results in incorrect encoding. Verify that we return > > > > + NULL for this case. > > > > + sel = { base_elem, len, len+1, ... } // (1, 3) > > > > + In this case, the single pattern is: { base_elem len, len+1, ...} > > > > + Let's assume that base_elem is used for indexing into arg0, > > > > + and a1 ... ae chooses elements from arg1. > > > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > > > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > > > > + while the original encoding in arg1 is > > > > + arg1: { arg1[0], arg1[1], arg1[2], ... } > > > > + with S = arg1[2] - arg1[1]. > > > > + > > > > + As a concrete example, for above PR: > > > > + arg0: { -16, -9, -10, -11 } > > > > + arg1: { -12, -5, -6, -7 } > > > > + sel = { 3, 4, 5, 6 } > > > > + > > > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > > > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > > > > + following encoding: > > > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > > > > + = { -11, -12, -5 } > > > > + > > > > + So for res[3], it used S = (-5) - (-12) = 7 > > > > + And hence computed it as -5 + 7 = 2. > > > > + instead of arg1[2], ie, -6, which is the correct value. > > > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > > > > + { > > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > + > > > > + vec_perm_builder builder (len, 1, 3); > > > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > > > + builder_push_elems (builder, mask_elems); > > > > + > > > > + vec_perm_indices sel (builder, 2, len); > > > > + const char *reason; > > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > > > + ASSERT_TRUE (res == NULL_TREE); > > > > + ASSERT_TRUE (!strcmp (reason, > > > > + "selecting base element of input vector")); > > > > + } > > > > } > > > > } > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > > new file mode 100644 > > > > index 00000000000..093e2b02654 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > > > > @@ -0,0 +1,23 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > + > > > > +int a; > > > > +int *b = &a; > > > > +static int **c = &b; > > > > +static int d; > > > > +short e; > > > > +short f; > > > > + > > > > +_Bool foo () > > > > +{ > > > > + f = -21; > > > > + for (; f < -5; f++) { > > > > + e = f ^ 3; > > > > + d = *b; > > > > + **c = e; > > > > + } > > > > + > > > > + return d == -6; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..55a6a68c16c 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, /* Ensure that the stepped sequence always selects from the same input pattern. */ - unsigned arg_npatterns - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) - : VECTOR_CST_NPATTERNS (arg1); + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); if (!multiple_p (step, arg_npatterns)) { @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, *reason = "step is not multiple of npatterns"; return false; } + + /* If a1 chooses base element from arg, ensure that it's a natural + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) + to preserve arg's encoding. */ + + unsigned HOST_WIDE_INT index; + if (!r1.is_constant (&index)) + return false; + if (index < arg_npatterns) + { + tree arg_elem0 = vector_cst_elt (arg, index); + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); + + if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1), + const_binop (MINUS_EXPR, arg_elem1, arg_elem0), + 0)) + { + if (reason) + *reason = "not a natural stepped sequence"; + return false; + } + } } return true; @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst { static tree build_vec_cst_rand (machine_mode vmode, unsigned npatterns, unsigned nelts_per_pattern, - int step = 0, int threshold = 100) + int step = 0, bool natural_stepped = false, + int threshold = 100) { tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); tree vectype = build_vector_type_for_mode (inner_type, vmode); @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, // Fill a1 for each pattern for (unsigned i = 0; i < npatterns; i++) - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); - + { + tree a1; + if (natural_stepped) + { + tree a0 = builder[i]; + wide_int a0_val = wi::to_wide (a0); + wide_int a1_val = a0_val + step; + a1 = wide_int_to_tree (inner_type, a1_val); + } + else + a1 = build_int_cst (inner_type, rand () % threshold); + builder.quick_push (a1); + } if (nelts_per_pattern == 2) return builder.build (); for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) { tree prev_elem = builder[i - npatterns]; - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); - int val = prev_elem_val + step; - builder.quick_push (build_int_cst (inner_type, val)); + wide_int prev_elem_val = wi::to_wide (prev_elem); + wide_int val = prev_elem_val + step; + builder.quick_push (wide_int_to_tree (inner_type, val)); } return builder.build (); @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode) and step (a2 - a1) = 1, step is not a multiple of npatterns in input vector. So return NULL_TREE. */ { - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode) Test that stepped sequence of the pattern selects from arg0. res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ { - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode) tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; validate_res (1, 3, res, expected_res); } + + /* Case 6: PR111648 - a1 chooses base element from input vector arg. + In this case ensure that arg has a natural stepped sequence + to preserve arg's encoding. + + As a concrete example, consider: + arg0: { -16, -9, -10, ... } // (1, 3) + arg1: { -12, -5, -6, ... } // (1, 3) + sel = { 0, len, len + 1, ... } // (1, 3) + + This will create res with following encoding: + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) + = { -16, -12, -5, ... } + + The step in above encoding would be: (-5) - (-12) = 7 + And hence res[3] would be computed as -5 + 7 = 2. + instead of arg1[2], ie, -6. + Ensure that valid_mask_for_fold_vec_perm_cst returns false + for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + const char *reason; + /* FIXME: It may happen that build_vec_cst_rand may build a natural + stepped pattern, even if we didn't explicitly tell it to. So folding + may not always fail, but if it does, ensure that's because arg1 does + not have a natural stepped sequence (and not due to other reason) */ + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); + if (res == NULL_TREE) + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); + } + + /* Case 7: Same as Case 6, except that arg1 contains natural stepped + sequence and thus folding should be valid for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); + + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; + validate_res (1, 3, res, expected_res); + } } } diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c new file mode 100644 index 00000000000..093e2b02654 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int a; +int *b = &a; +static int **c = &b; +static int d; +short e; +short f; + +_Bool foo () +{ + f = -21; + for (; f < -5; f++) { + e = f ^ 3; + d = *b; + **c = e; + } + + return d == -6; +} + +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni > <prathamesh.kulkarni@linaro.org> wrote: >> >> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni >> <prathamesh.kulkarni@linaro.org> wrote: >> > >> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford >> > <richard.sandiford@arm.com> wrote: >> > > >> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > > > Hi, >> > > > The attached patch attempts to fix PR111648. >> > > > As mentioned in PR, the issue is when a1 is a multiple of vector >> > > > length, we end up creating following encoding in result: { base_elem, >> > > > arg[0], arg[1], ... } (assuming S = 1), >> > > > where arg is chosen input vector, which is incorrect, since the >> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } >> > > > >> > > > For the test-case mentioned in PR, vectorizer pass creates >> > > > VEC_PERM_EXPR<arg0, arg, sel> where: >> > > > arg0: { -16, -9, -10, -11 } >> > > > arg1: { -12, -5, -6, -7 } >> > > > sel = { 3, 4, 5, 6 } >> > > > >> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. >> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with >> > > > following encoding: >> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 >> > > > = { -11, -12, -5 } >> > > > >> > > > So for res[3], it used S = (-5) - (-12) = 7 >> > > > And hence computed it as -5 + 7 = 2. >> > > > instead of selecting arg1[2], ie, -6. >> > > > >> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple >> > > > of vector length, so a1 ... ae select elements only from stepped part >> > > > of the pattern >> > > > from input vector and return false for this case. >> > > > >> > > > Since the vectors are VLS, fold_vec_perm_cst then sets: >> > > > res_npatterns = res_nelts >> > > > res_nelts_per_pattern = 1 >> > > > which seems to fix the issue by encoding all the elements. >> > > > >> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because >> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, >> > > > which used a1 = 0, and thus selected arg1[0]. >> > > > >> > > > I removed Case 4 because it was already covered in test_nunits_min_4, >> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } >> > > > and added a new Case 9 to test for this issue. >> > > > >> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, >> > > > and on x86_64-linux-gnu. >> > > > Does the patch look OK ? >> > > > >> > > > Thanks, >> > > > Prathamesh >> > > > >> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. >> > > > >> > > > gcc/ChangeLog: >> > > > PR tree-optimization/111648 >> > > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 >> > > > is a multiple of vector length. >> > > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... >> > > > (test_nunits_min_4): ... here and rename case numbers. Also add >> > > > Case 9. >> > > > >> > > > gcc/testsuite/ChangeLog: >> > > > PR tree-optimization/111648 >> > > > * gcc.dg/vect/pr111648.c: New test. >> > > > >> > > > >> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc >> > > > index 4f8561509ff..c5f421d6b76 100644 >> > > > --- a/gcc/fold-const.cc >> > > > +++ b/gcc/fold-const.cc >> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, >> > > > return false; >> > > > } >> > > > >> > > > - /* Ensure that the stepped sequence always selects from the same >> > > > - input pattern. */ >> > > > + /* Ensure that the stepped sequence always selects from the stepped >> > > > + part of same input pattern. */ >> > > > unsigned arg_npatterns >> > > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) >> > > > : VECTOR_CST_NPATTERNS (arg1); >> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, >> > > > *reason = "step is not multiple of npatterns"; >> > > > return false; >> > > > } >> > > > + >> > > > + /* If a1 is a multiple of len, it will select base element of input >> > > > + vector resulting in following encoding: >> > > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input >> > > > + vector. This encoding is not originally present in arg, since it's >> > > > + defined as: >> > > > + { arg[0], arg[1], arg[2], ... }. */ >> > > > + >> > > > + if (multiple_p (a1, arg_len)) >> > > > + { >> > > > + if (reason) >> > > > + *reason = "selecting base element of input vector"; >> > > > + return false; >> > > > + } >> > > >> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the >> > > second argument has 2 stepped patterns. >> > Ah right, thanks for pointing out. In the attached patch I extended the check >> > so that r1 < arg_npatterns which should check if we are choosing base >> > elements from any of the patterns in arg (and not just first). >> > Does that look OK ? >> > > >> > > The equivalent condition that handles multiple patterns would >> > > probably be to reject q1 < arg_npatterns. But that's only necessary if: >> > Sorry, I don't understand -- we use q1 only for determining which >> > vector to choose from, >> > and r1 will give the index for first element ? >> > > >> > > (1) the argument has three elements per pattern (i.e. has a stepped >> > > sequence) and >> > > >> > > (2) element 2 - element 1 != element 1 - element 0 >> > > >> > > I think we should check those to avoid pessimising VLA cases. >> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I >> > added the above checks, >> > does it look in the right direction ? Also, should this patch be the >> > right fix for PR111754 ? >> Oops sorry, this patch causes build errors on aarch64. Please ignore it. >> Sorry for the noise. > Hi Richard, > The attached patch passes bootstrap+test on aarch64-linux-gnu with and > without SVE, > and on x86_64-linux-gnu. > Does it look OK ? > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Prathamesh >> > > >> > > Thanks, >> > > Richard >> > > >> > > > } >> > > > >> > > > return true; >> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) >> > > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; >> > > > validate_res (2, 2, res, expected_res); >> > > > } >> > > > - >> > > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) >> > > > - Test that the stepped sequence of the pattern selects from >> > > > - same input pattern. Since input vectors have npatterns = 2, >> > > > - and step (a2 - a1) = 1, step is not a multiple of npatterns >> > > > - in input vector. So return NULL_TREE. */ >> > > > - { >> > > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); >> > > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); >> > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > > > - >> > > > - vec_perm_builder builder (len, 1, 3); >> > > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; >> > > > - builder_push_elems (builder, mask_elems); >> > > > - >> > > > - vec_perm_indices sel (builder, 2, len); >> > > > - const char *reason; >> > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, >> > > > - &reason); >> > > > - ASSERT_TRUE (res == NULL_TREE); >> > > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); >> > > > - } >> > > > - >> > > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) >> > > > - Test that stepped sequence of the pattern selects from arg0. >> > > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ >> > > > - { >> > > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); >> > > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); >> > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > > > - >> > > > - vec_perm_builder builder (len, 1, 3); >> > > > - poly_uint64 mask_elems[] = { len, 0, 1 }; >> > > > - builder_push_elems (builder, mask_elems); >> > > > - >> > > > - vec_perm_indices sel (builder, 2, len); >> > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); >> > > > - >> > > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; >> > > > - validate_res (1, 3, res, expected_res); >> > > > - } >> > > > } >> > > > } >> > > > >> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) >> > > > validate_res (1, 3, res, expected_res); >> > > > } >> > > > >> > > > - /* Case 4: >> > > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) >> > > > + Test that stepped sequence of the pattern selects from arg0. >> > > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ >> > > > + { >> > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); >> > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); >> > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > > > + >> > > > + vec_perm_builder builder (len, 1, 3); >> > > > + poly_uint64 mask_elems[] = { len, 1, 2 }; >> > > > + builder_push_elems (builder, mask_elems); >> > > > + >> > > > + vec_perm_indices sel (builder, 2, len); >> > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); >> > > > + >> > > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; >> > > > + validate_res (1, 3, res, expected_res); >> > > > + } >> > > > + >> > > > + /* Case 5: >> > > > sel = { len, 0, 2, ... } // (1, 3) >> > > > This should return NULL because we cross the input vectors. >> > > > Because, >> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) >> > > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); >> > > > } >> > > > >> > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 >> > > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 >> > > > mask = { 0, len, 1, len + 1, ...} // (2, 2) >> > > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) >> > > > >> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) >> > > > validate_res (2, 2, res, expected_res); >> > > > } >> > > > >> > > > - /* Case 6: Test combination in sel, where one pattern is dup and other >> > > > + /* Case 7: Test combination in sel, where one pattern is dup and other >> > > > is stepped sequence. >> > > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) >> > > > res = { arg0[0], arg0[0], arg0[0], >> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) >> > > > validate_res (2, 3, res, expected_res); >> > > > } >> > > > >> > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, >> > > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, >> > > > when arg0, arg1 and sel have different number of patterns. >> > > > arg0 is of shape (1, 1) >> > > > arg1 is of shape (4, 1) >> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) >> > > > ASSERT_TRUE (res == NULL_TREE); >> > > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); >> > > > } >> > > > + >> > > > + /* Case 9: PR111648 - a1 is multiple of vector length, >> > > > + which results in incorrect encoding. Verify that we return >> > > > + NULL for this case. >> > > > + sel = { base_elem, len, len+1, ... } // (1, 3) >> > > > + In this case, the single pattern is: { base_elem len, len+1, ...} >> > > > + Let's assume that base_elem is used for indexing into arg0, >> > > > + and a1 ... ae chooses elements from arg1. >> > > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) >> > > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] >> > > > + while the original encoding in arg1 is >> > > > + arg1: { arg1[0], arg1[1], arg1[2], ... } >> > > > + with S = arg1[2] - arg1[1]. >> > > > + >> > > > + As a concrete example, for above PR: >> > > > + arg0: { -16, -9, -10, -11 } >> > > > + arg1: { -12, -5, -6, -7 } >> > > > + sel = { 3, 4, 5, 6 } >> > > > + >> > > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. >> > > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with >> > > > + following encoding: >> > > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) >> > > > + = { -11, -12, -5 } >> > > > + >> > > > + So for res[3], it used S = (-5) - (-12) = 7 >> > > > + And hence computed it as -5 + 7 = 2. >> > > > + instead of arg1[2], ie, -6, which is the correct value. >> > > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ >> > > > + { >> > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); >> > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); >> > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > > > + >> > > > + vec_perm_builder builder (len, 1, 3); >> > > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; >> > > > + builder_push_elems (builder, mask_elems); >> > > > + >> > > > + vec_perm_indices sel (builder, 2, len); >> > > > + const char *reason; >> > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); >> > > > + ASSERT_TRUE (res == NULL_TREE); >> > > > + ASSERT_TRUE (!strcmp (reason, >> > > > + "selecting base element of input vector")); >> > > > + } >> > > > } >> > > > } >> > > > >> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c >> > > > new file mode 100644 >> > > > index 00000000000..093e2b02654 >> > > > --- /dev/null >> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c >> > > > @@ -0,0 +1,23 @@ >> > > > +/* { dg-do compile } */ >> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ >> > > > + >> > > > +int a; >> > > > +int *b = &a; >> > > > +static int **c = &b; >> > > > +static int d; >> > > > +short e; >> > > > +short f; >> > > > + >> > > > +_Bool foo () >> > > > +{ >> > > > + f = -21; >> > > > + for (; f < -5; f++) { >> > > > + e = f ^ 3; >> > > > + d = *b; >> > > > + **c = e; >> > > > + } >> > > > + >> > > > + return d == -6; >> > > > +} >> > > > + >> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 4f8561509ff..55a6a68c16c 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > /* Ensure that the stepped sequence always selects from the same > input pattern. */ > - unsigned arg_npatterns > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > - : VECTOR_CST_NPATTERNS (arg1); > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); > > if (!multiple_p (step, arg_npatterns)) > { > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > *reason = "step is not multiple of npatterns"; > return false; > } > + > + /* If a1 chooses base element from arg, ensure that it's a natural > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) > + to preserve arg's encoding. */ > + > + unsigned HOST_WIDE_INT index; > + if (!r1.is_constant (&index)) > + return false; > + if (index < arg_npatterns) > + { I don't know whether it matters in practice, but I think the two conditions above are more natural as: if (maybe_lt (r1, arg_npatterns)) { unsigned HOST_WIDE_INT index; if (!r1.is_constant (&index)) return false; ...[code below]... } > + tree arg_elem0 = vector_cst_elt (arg, index); > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); > + > + if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1), > + const_binop (MINUS_EXPR, arg_elem1, arg_elem0), > + 0)) This needs to check whether const_binop returns null. Maybe: tree step1, step2; if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) || !operand_equal_p (step1, step2, 0)) OK with those changes, thanks. Richard > + { > + if (reason) > + *reason = "not a natural stepped sequence"; > + return false; > + } > + } > } > > return true; > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst { > static tree > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > unsigned nelts_per_pattern, > - int step = 0, int threshold = 100) > + int step = 0, bool natural_stepped = false, > + int threshold = 100) > { > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); > tree vectype = build_vector_type_for_mode (inner_type, vmode); > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > // Fill a1 for each pattern > for (unsigned i = 0; i < npatterns; i++) > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); > - > + { > + tree a1; > + if (natural_stepped) > + { > + tree a0 = builder[i]; > + wide_int a0_val = wi::to_wide (a0); > + wide_int a1_val = a0_val + step; > + a1 = wide_int_to_tree (inner_type, a1_val); > + } > + else > + a1 = build_int_cst (inner_type, rand () % threshold); > + builder.quick_push (a1); > + } > if (nelts_per_pattern == 2) > return builder.build (); > > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > { > tree prev_elem = builder[i - npatterns]; > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > - int val = prev_elem_val + step; > - builder.quick_push (build_int_cst (inner_type, val)); > + wide_int prev_elem_val = wi::to_wide (prev_elem); > + wide_int val = prev_elem_val + step; > + builder.quick_push (wide_int_to_tree (inner_type, val)); > } > > return builder.build (); > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode) > and step (a2 - a1) = 1, step is not a multiple of npatterns > in input vector. So return NULL_TREE. */ > { > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode) > Test that stepped sequence of the pattern selects from arg0. > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > { > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode) > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > validate_res (1, 3, res, expected_res); > } > + > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. > + In this case ensure that arg has a natural stepped sequence > + to preserve arg's encoding. > + > + As a concrete example, consider: > + arg0: { -16, -9, -10, ... } // (1, 3) > + arg1: { -12, -5, -6, ... } // (1, 3) > + sel = { 0, len, len + 1, ... } // (1, 3) > + > + This will create res with following encoding: > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) > + = { -16, -12, -5, ... } > + > + The step in above encoding would be: (-5) - (-12) = 7 > + And hence res[3] would be computed as -5 + 7 = 2. > + instead of arg1[2], ie, -6. > + Ensure that valid_mask_for_fold_vec_perm_cst returns false > + for this case. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + const char *reason; > + /* FIXME: It may happen that build_vec_cst_rand may build a natural > + stepped pattern, even if we didn't explicitly tell it to. So folding > + may not always fail, but if it does, ensure that's because arg1 does > + not have a natural stepped sequence (and not due to other reason) */ > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > + if (res == NULL_TREE) > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); > + } > + > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped > + sequence and thus folding should be valid for this case. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; > + validate_res (1, 3, res, expected_res); > + } > } > } > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > new file mode 100644 > index 00000000000..093e2b02654 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int a; > +int *b = &a; > +static int **c = &b; > +static int d; > +short e; > +short f; > + > +_Bool foo () > +{ > + f = -21; > + for (; f < -5; f++) { > + e = f ^ 3; > + d = *b; > + **c = e; > + } > + > + return d == -6; > +} > + > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
On Tue, 17 Oct 2023 at 02:40, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni > > <prathamesh.kulkarni@linaro.org> wrote: > >> > >> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni > >> <prathamesh.kulkarni@linaro.org> wrote: > >> > > >> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford > >> > <richard.sandiford@arm.com> wrote: > >> > > > >> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > > > Hi, > >> > > > The attached patch attempts to fix PR111648. > >> > > > As mentioned in PR, the issue is when a1 is a multiple of vector > >> > > > length, we end up creating following encoding in result: { base_elem, > >> > > > arg[0], arg[1], ... } (assuming S = 1), > >> > > > where arg is chosen input vector, which is incorrect, since the > >> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } > >> > > > > >> > > > For the test-case mentioned in PR, vectorizer pass creates > >> > > > VEC_PERM_EXPR<arg0, arg, sel> where: > >> > > > arg0: { -16, -9, -10, -11 } > >> > > > arg1: { -12, -5, -6, -7 } > >> > > > sel = { 3, 4, 5, 6 } > >> > > > > >> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > >> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with > >> > > > following encoding: > >> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > >> > > > = { -11, -12, -5 } > >> > > > > >> > > > So for res[3], it used S = (-5) - (-12) = 7 > >> > > > And hence computed it as -5 + 7 = 2. > >> > > > instead of selecting arg1[2], ie, -6. > >> > > > > >> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple > >> > > > of vector length, so a1 ... ae select elements only from stepped part > >> > > > of the pattern > >> > > > from input vector and return false for this case. > >> > > > > >> > > > Since the vectors are VLS, fold_vec_perm_cst then sets: > >> > > > res_npatterns = res_nelts > >> > > > res_nelts_per_pattern = 1 > >> > > > which seems to fix the issue by encoding all the elements. > >> > > > > >> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because > >> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, > >> > > > which used a1 = 0, and thus selected arg1[0]. > >> > > > > >> > > > I removed Case 4 because it was already covered in test_nunits_min_4, > >> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } > >> > > > and added a new Case 9 to test for this issue. > >> > > > > >> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE, > >> > > > and on x86_64-linux-gnu. > >> > > > Does the patch look OK ? > >> > > > > >> > > > Thanks, > >> > > > Prathamesh > >> > > > > >> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. > >> > > > > >> > > > gcc/ChangeLog: > >> > > > PR tree-optimization/111648 > >> > > > * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 > >> > > > is a multiple of vector length. > >> > > > (test_nunits_min_2): Remove Case 4 and move Case 5 to ... > >> > > > (test_nunits_min_4): ... here and rename case numbers. Also add > >> > > > Case 9. > >> > > > > >> > > > gcc/testsuite/ChangeLog: > >> > > > PR tree-optimization/111648 > >> > > > * gcc.dg/vect/pr111648.c: New test. > >> > > > > >> > > > > >> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > >> > > > index 4f8561509ff..c5f421d6b76 100644 > >> > > > --- a/gcc/fold-const.cc > >> > > > +++ b/gcc/fold-const.cc > >> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > >> > > > return false; > >> > > > } > >> > > > > >> > > > - /* Ensure that the stepped sequence always selects from the same > >> > > > - input pattern. */ > >> > > > + /* Ensure that the stepped sequence always selects from the stepped > >> > > > + part of same input pattern. */ > >> > > > unsigned arg_npatterns > >> > > > = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > >> > > > : VECTOR_CST_NPATTERNS (arg1); > >> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > >> > > > *reason = "step is not multiple of npatterns"; > >> > > > return false; > >> > > > } > >> > > > + > >> > > > + /* If a1 is a multiple of len, it will select base element of input > >> > > > + vector resulting in following encoding: > >> > > > + { base_elem, arg[0], arg[1], ... } where arg is the chosen input > >> > > > + vector. This encoding is not originally present in arg, since it's > >> > > > + defined as: > >> > > > + { arg[0], arg[1], arg[2], ... }. */ > >> > > > + > >> > > > + if (multiple_p (a1, arg_len)) > >> > > > + { > >> > > > + if (reason) > >> > > > + *reason = "selecting base element of input vector"; > >> > > > + return false; > >> > > > + } > >> > > > >> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the > >> > > second argument has 2 stepped patterns. > >> > Ah right, thanks for pointing out. In the attached patch I extended the check > >> > so that r1 < arg_npatterns which should check if we are choosing base > >> > elements from any of the patterns in arg (and not just first). > >> > Does that look OK ? > >> > > > >> > > The equivalent condition that handles multiple patterns would > >> > > probably be to reject q1 < arg_npatterns. But that's only necessary if: > >> > Sorry, I don't understand -- we use q1 only for determining which > >> > vector to choose from, > >> > and r1 will give the index for first element ? > >> > > > >> > > (1) the argument has three elements per pattern (i.e. has a stepped > >> > > sequence) and > >> > > > >> > > (2) element 2 - element 1 != element 1 - element 0 > >> > > > >> > > I think we should check those to avoid pessimising VLA cases. > >> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I > >> > added the above checks, > >> > does it look in the right direction ? Also, should this patch be the > >> > right fix for PR111754 ? > >> Oops sorry, this patch causes build errors on aarch64. Please ignore it. > >> Sorry for the noise. > > Hi Richard, > > The attached patch passes bootstrap+test on aarch64-linux-gnu with and > > without SVE, > > and on x86_64-linux-gnu. > > Does it look OK ? > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Prathamesh > >> > > >> > Thanks, > >> > Prathamesh > >> > > > >> > > Thanks, > >> > > Richard > >> > > > >> > > > } > >> > > > > >> > > > return true; > >> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) > >> > > > tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > >> > > > validate_res (2, 2, res, expected_res); > >> > > > } > >> > > > - > >> > > > - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > >> > > > - Test that the stepped sequence of the pattern selects from > >> > > > - same input pattern. Since input vectors have npatterns = 2, > >> > > > - and step (a2 - a1) = 1, step is not a multiple of npatterns > >> > > > - in input vector. So return NULL_TREE. */ > >> > > > - { > >> > > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > > > - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > > - > >> > > > - vec_perm_builder builder (len, 1, 3); > >> > > > - poly_uint64 mask_elems[] = { 0, 0, 1 }; > >> > > > - builder_push_elems (builder, mask_elems); > >> > > > - > >> > > > - vec_perm_indices sel (builder, 2, len); > >> > > > - const char *reason; > >> > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > >> > > > - &reason); > >> > > > - ASSERT_TRUE (res == NULL_TREE); > >> > > > - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > >> > > > - } > >> > > > - > >> > > > - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) > >> > > > - Test that stepped sequence of the pattern selects from arg0. > >> > > > - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > >> > > > - { > >> > > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > > > - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > > > - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > > - > >> > > > - vec_perm_builder builder (len, 1, 3); > >> > > > - poly_uint64 mask_elems[] = { len, 0, 1 }; > >> > > > - builder_push_elems (builder, mask_elems); > >> > > > - > >> > > > - vec_perm_indices sel (builder, 2, len); > >> > > > - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > >> > > > - > >> > > > - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > >> > > > - validate_res (1, 3, res, expected_res); > >> > > > - } > >> > > > } > >> > > > } > >> > > > > >> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) > >> > > > validate_res (1, 3, res, expected_res); > >> > > > } > >> > > > > >> > > > - /* Case 4: > >> > > > + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) > >> > > > + Test that stepped sequence of the pattern selects from arg0. > >> > > > + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ > >> > > > + { > >> > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > > + > >> > > > + vec_perm_builder builder (len, 1, 3); > >> > > > + poly_uint64 mask_elems[] = { len, 1, 2 }; > >> > > > + builder_push_elems (builder, mask_elems); > >> > > > + > >> > > > + vec_perm_indices sel (builder, 2, len); > >> > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > >> > > > + > >> > > > + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; > >> > > > + validate_res (1, 3, res, expected_res); > >> > > > + } > >> > > > + > >> > > > + /* Case 5: > >> > > > sel = { len, 0, 2, ... } // (1, 3) > >> > > > This should return NULL because we cross the input vectors. > >> > > > Because, > >> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) > >> > > > ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > >> > > > } > >> > > > > >> > > > - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > >> > > > + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 > >> > > > mask = { 0, len, 1, len + 1, ...} // (2, 2) > >> > > > res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > >> > > > > >> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) > >> > > > validate_res (2, 2, res, expected_res); > >> > > > } > >> > > > > >> > > > - /* Case 6: Test combination in sel, where one pattern is dup and other > >> > > > + /* Case 7: Test combination in sel, where one pattern is dup and other > >> > > > is stepped sequence. > >> > > > sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > >> > > > res = { arg0[0], arg0[0], arg0[0], > >> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) > >> > > > validate_res (2, 3, res, expected_res); > >> > > > } > >> > > > > >> > > > - /* Case 7: PR111048: Check that we set arg_npatterns correctly, > >> > > > + /* Case 8: PR111048: Check that we set arg_npatterns correctly, > >> > > > when arg0, arg1 and sel have different number of patterns. > >> > > > arg0 is of shape (1, 1) > >> > > > arg1 is of shape (4, 1) > >> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) > >> > > > ASSERT_TRUE (res == NULL_TREE); > >> > > > ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > >> > > > } > >> > > > + > >> > > > + /* Case 9: PR111648 - a1 is multiple of vector length, > >> > > > + which results in incorrect encoding. Verify that we return > >> > > > + NULL for this case. > >> > > > + sel = { base_elem, len, len+1, ... } // (1, 3) > >> > > > + In this case, the single pattern is: { base_elem len, len+1, ...} > >> > > > + Let's assume that base_elem is used for indexing into arg0, > >> > > > + and a1 ... ae chooses elements from arg1. > >> > > > + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) > >> > > > + Which creates an incorrect encoding with S = arg1[1] - arg1[0] > >> > > > + while the original encoding in arg1 is > >> > > > + arg1: { arg1[0], arg1[1], arg1[2], ... } > >> > > > + with S = arg1[2] - arg1[1]. > >> > > > + > >> > > > + As a concrete example, for above PR: > >> > > > + arg0: { -16, -9, -10, -11 } > >> > > > + arg1: { -12, -5, -6, -7 } > >> > > > + sel = { 3, 4, 5, 6 } > >> > > > + > >> > > > + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > >> > > > + Since a1 = 4 and arg_len = 4, it ended up creating the result with > >> > > > + following encoding: > >> > > > + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) > >> > > > + = { -11, -12, -5 } > >> > > > + > >> > > > + So for res[3], it used S = (-5) - (-12) = 7 > >> > > > + And hence computed it as -5 + 7 = 2. > >> > > > + instead of arg1[2], ie, -6, which is the correct value. > >> > > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ > >> > > > + { > >> > > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3); > >> > > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3); > >> > > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > > + > >> > > > + vec_perm_builder builder (len, 1, 3); > >> > > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > >> > > > + builder_push_elems (builder, mask_elems); > >> > > > + > >> > > > + vec_perm_indices sel (builder, 2, len); > >> > > > + const char *reason; > >> > > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > >> > > > + ASSERT_TRUE (res == NULL_TREE); > >> > > > + ASSERT_TRUE (!strcmp (reason, > >> > > > + "selecting base element of input vector")); > >> > > > + } > >> > > > } > >> > > > } > >> > > > > >> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > >> > > > new file mode 100644 > >> > > > index 00000000000..093e2b02654 > >> > > > --- /dev/null > >> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > >> > > > @@ -0,0 +1,23 @@ > >> > > > +/* { dg-do compile } */ > >> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > >> > > > + > >> > > > +int a; > >> > > > +int *b = &a; > >> > > > +static int **c = &b; > >> > > > +static int d; > >> > > > +short e; > >> > > > +short f; > >> > > > + > >> > > > +_Bool foo () > >> > > > +{ > >> > > > + f = -21; > >> > > > + for (; f < -5; f++) { > >> > > > + e = f ^ 3; > >> > > > + d = *b; > >> > > > + **c = e; > >> > > > + } > >> > > > + > >> > > > + return d == -6; > >> > > > +} > >> > > > + > >> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 4f8561509ff..55a6a68c16c 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > > /* Ensure that the stepped sequence always selects from the same > > input pattern. */ > > - unsigned arg_npatterns > > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > - : VECTOR_CST_NPATTERNS (arg1); > > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); > > > > if (!multiple_p (step, arg_npatterns)) > > { > > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > *reason = "step is not multiple of npatterns"; > > return false; > > } > > + > > + /* If a1 chooses base element from arg, ensure that it's a natural > > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) > > + to preserve arg's encoding. */ > > + > > + unsigned HOST_WIDE_INT index; > > + if (!r1.is_constant (&index)) > > + return false; > > + if (index < arg_npatterns) > > + { > > I don't know whether it matters in practice, but I think the two conditions > above are more natural as: > > if (maybe_lt (r1, arg_npatterns)) > { > unsigned HOST_WIDE_INT index; > if (!r1.is_constant (&index)) > return false; > > ...[code below]... > } > > > + tree arg_elem0 = vector_cst_elt (arg, index); > > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); > > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); > > + > > + if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1), > > + const_binop (MINUS_EXPR, arg_elem1, arg_elem0), > > + 0)) > > This needs to check whether const_binop returns null. Maybe: > > tree step1, step2; > if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) > || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) > || !operand_equal_p (step1, step2, 0)) > > OK with those changes, thanks. Hi Richard, Thanks for the suggestions, updated the attached patch accordingly. Bootstrapped+tested with and without SVE on aarch64-linux-gnu and x86_64-linux-gnu. OK to commit ? Thanks, Prathamesh > > Richard > > > + { > > + if (reason) > > + *reason = "not a natural stepped sequence"; > > + return false; > > + } > > + } > > } > > > > return true; > > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst { > > static tree > > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > unsigned nelts_per_pattern, > > - int step = 0, int threshold = 100) > > + int step = 0, bool natural_stepped = false, > > + int threshold = 100) > > { > > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); > > tree vectype = build_vector_type_for_mode (inner_type, vmode); > > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > > > // Fill a1 for each pattern > > for (unsigned i = 0; i < npatterns; i++) > > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); > > - > > + { > > + tree a1; > > + if (natural_stepped) > > + { > > + tree a0 = builder[i]; > > + wide_int a0_val = wi::to_wide (a0); > > + wide_int a1_val = a0_val + step; > > + a1 = wide_int_to_tree (inner_type, a1_val); > > + } > > + else > > + a1 = build_int_cst (inner_type, rand () % threshold); > > + builder.quick_push (a1); > > + } > > if (nelts_per_pattern == 2) > > return builder.build (); > > > > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > > { > > tree prev_elem = builder[i - npatterns]; > > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > > - int val = prev_elem_val + step; > > - builder.quick_push (build_int_cst (inner_type, val)); > > + wide_int prev_elem_val = wi::to_wide (prev_elem); > > + wide_int val = prev_elem_val + step; > > + builder.quick_push (wide_int_to_tree (inner_type, val)); > > } > > > > return builder.build (); > > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode) > > and step (a2 - a1) = 1, step is not a multiple of npatterns > > in input vector. So return NULL_TREE. */ > > { > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); > > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode) > > Test that stepped sequence of the pattern selects from arg0. > > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > { > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); > > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode) > > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > validate_res (1, 3, res, expected_res); > > } > > + > > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. > > + In this case ensure that arg has a natural stepped sequence > > + to preserve arg's encoding. > > + > > + As a concrete example, consider: > > + arg0: { -16, -9, -10, ... } // (1, 3) > > + arg1: { -12, -5, -6, ... } // (1, 3) > > + sel = { 0, len, len + 1, ... } // (1, 3) > > + > > + This will create res with following encoding: > > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) > > + = { -16, -12, -5, ... } > > + > > + The step in above encoding would be: (-5) - (-12) = 7 > > + And hence res[3] would be computed as -5 + 7 = 2. > > + instead of arg1[2], ie, -6. > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false > > + for this case. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + const char *reason; > > + /* FIXME: It may happen that build_vec_cst_rand may build a natural > > + stepped pattern, even if we didn't explicitly tell it to. So folding > > + may not always fail, but if it does, ensure that's because arg1 does > > + not have a natural stepped sequence (and not due to other reason) */ > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > + if (res == NULL_TREE) > > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); > > + } > > + > > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped > > + sequence and thus folding should be valid for this case. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; > > + validate_res (1, 3, res, expected_res); > > + } > > } > > } > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > > new file mode 100644 > > index 00000000000..093e2b02654 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > > @@ -0,0 +1,23 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +int a; > > +int *b = &a; > > +static int **c = &b; > > +static int d; > > +short e; > > +short f; > > + > > +_Bool foo () > > +{ > > + f = -21; > > + for (; f < -5; f++) { > > + e = f ^ 3; > > + d = *b; > > + **c = e; > > + } > > + > > + return d == -6; > > +} > > + > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 44118e799eb..40767736389 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, /* Ensure that the stepped sequence always selects from the same input pattern. */ - unsigned arg_npatterns - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) - : VECTOR_CST_NPATTERNS (arg1); + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); if (!multiple_p (step, arg_npatterns)) { @@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, *reason = "step is not multiple of npatterns"; return false; } + + /* If a1 chooses base element from arg, ensure that it's a natural + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) + to preserve arg's encoding. */ + + if (maybe_lt (r1, arg_npatterns)) + { + unsigned HOST_WIDE_INT index; + if (!r1.is_constant (&index)) + return false; + + tree arg_elem0 = vector_cst_elt (arg, index); + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); + + tree step1, step2; + if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) + || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) + || !operand_equal_p (step1, step2, 0)) + { + if (reason) + *reason = "not a natural stepped sequence"; + return false; + } + } } return true; @@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst { static tree build_vec_cst_rand (machine_mode vmode, unsigned npatterns, unsigned nelts_per_pattern, - int step = 0, int threshold = 100) + int step = 0, bool natural_stepped = false, + int threshold = 100) { tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); tree vectype = build_vector_type_for_mode (inner_type, vmode); @@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, // Fill a1 for each pattern for (unsigned i = 0; i < npatterns; i++) - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); - + { + tree a1; + if (natural_stepped) + { + tree a0 = builder[i]; + wide_int a0_val = wi::to_wide (a0); + wide_int a1_val = a0_val + step; + a1 = wide_int_to_tree (inner_type, a1_val); + } + else + a1 = build_int_cst (inner_type, rand () % threshold); + builder.quick_push (a1); + } if (nelts_per_pattern == 2) return builder.build (); for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) { tree prev_elem = builder[i - npatterns]; - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); - int val = prev_elem_val + step; - builder.quick_push (build_int_cst (inner_type, val)); + wide_int prev_elem_val = wi::to_wide (prev_elem); + wide_int val = prev_elem_val + step; + builder.quick_push (wide_int_to_tree (inner_type, val)); } return builder.build (); @@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode) and step (a2 - a1) = 1, step is not a multiple of npatterns in input vector. So return NULL_TREE. */ { - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); @@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode) Test that stepped sequence of the pattern selects from arg0. res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ { - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); @@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode) tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; validate_res (1, 3, res, expected_res); } + + /* Case 6: PR111648 - a1 chooses base element from input vector arg. + In this case ensure that arg has a natural stepped sequence + to preserve arg's encoding. + + As a concrete example, consider: + arg0: { -16, -9, -10, ... } // (1, 3) + arg1: { -12, -5, -6, ... } // (1, 3) + sel = { 0, len, len + 1, ... } // (1, 3) + + This will create res with following encoding: + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) + = { -16, -12, -5, ... } + + The step in above encoding would be: (-5) - (-12) = 7 + And hence res[3] would be computed as -5 + 7 = 2. + instead of arg1[2], ie, -6. + Ensure that valid_mask_for_fold_vec_perm_cst returns false + for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + const char *reason; + /* FIXME: It may happen that build_vec_cst_rand may build a natural + stepped pattern, even if we didn't explicitly tell it to. So folding + may not always fail, but if it does, ensure that's because arg1 does + not have a natural stepped sequence (and not due to other reason) */ + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); + if (res == NULL_TREE) + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); + } + + /* Case 7: Same as Case 6, except that arg1 contains natural stepped + sequence and thus folding should be valid for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); + + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; + validate_res (1, 3, res, expected_res); + } } }
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford > <richard.sandiford@arm.com> wrote: >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc >> > index 4f8561509ff..55a6a68c16c 100644 >> > --- a/gcc/fold-const.cc >> > +++ b/gcc/fold-const.cc >> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, >> > >> > /* Ensure that the stepped sequence always selects from the same >> > input pattern. */ >> > - unsigned arg_npatterns >> > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) >> > - : VECTOR_CST_NPATTERNS (arg1); >> > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; >> > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); >> > >> > if (!multiple_p (step, arg_npatterns)) >> > { >> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, >> > *reason = "step is not multiple of npatterns"; >> > return false; >> > } >> > + >> > + /* If a1 chooses base element from arg, ensure that it's a natural >> > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) >> > + to preserve arg's encoding. */ >> > + >> > + unsigned HOST_WIDE_INT index; >> > + if (!r1.is_constant (&index)) >> > + return false; >> > + if (index < arg_npatterns) >> > + { >> >> I don't know whether it matters in practice, but I think the two conditions >> above are more natural as: >> >> if (maybe_lt (r1, arg_npatterns)) >> { >> unsigned HOST_WIDE_INT index; >> if (!r1.is_constant (&index)) >> return false; >> >> ...[code below]... >> } >> >> > + tree arg_elem0 = vector_cst_elt (arg, index); >> > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); >> > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); >> > + >> > + if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1), >> > + const_binop (MINUS_EXPR, arg_elem1, arg_elem0), >> > + 0)) >> >> This needs to check whether const_binop returns null. Maybe: >> >> tree step1, step2; >> if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) >> || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) >> || !operand_equal_p (step1, step2, 0)) >> >> OK with those changes, thanks. > Hi Richard, > Thanks for the suggestions, updated the attached patch accordingly. > Bootstrapped+tested with and without SVE on aarch64-linux-gnu and > x86_64-linux-gnu. > OK to commit ? Yes, thanks. Richard > > Thanks, > Prathamesh >> >> Richard >> >> > + { >> > + if (reason) >> > + *reason = "not a natural stepped sequence"; >> > + return false; >> > + } >> > + } >> > } >> > >> > return true; >> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst { >> > static tree >> > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, >> > unsigned nelts_per_pattern, >> > - int step = 0, int threshold = 100) >> > + int step = 0, bool natural_stepped = false, >> > + int threshold = 100) >> > { >> > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); >> > tree vectype = build_vector_type_for_mode (inner_type, vmode); >> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, >> > >> > // Fill a1 for each pattern >> > for (unsigned i = 0; i < npatterns; i++) >> > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); >> > - >> > + { >> > + tree a1; >> > + if (natural_stepped) >> > + { >> > + tree a0 = builder[i]; >> > + wide_int a0_val = wi::to_wide (a0); >> > + wide_int a1_val = a0_val + step; >> > + a1 = wide_int_to_tree (inner_type, a1_val); >> > + } >> > + else >> > + a1 = build_int_cst (inner_type, rand () % threshold); >> > + builder.quick_push (a1); >> > + } >> > if (nelts_per_pattern == 2) >> > return builder.build (); >> > >> > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) >> > { >> > tree prev_elem = builder[i - npatterns]; >> > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); >> > - int val = prev_elem_val + step; >> > - builder.quick_push (build_int_cst (inner_type, val)); >> > + wide_int prev_elem_val = wi::to_wide (prev_elem); >> > + wide_int val = prev_elem_val + step; >> > + builder.quick_push (wide_int_to_tree (inner_type, val)); >> > } >> > >> > return builder.build (); >> > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode) >> > and step (a2 - a1) = 1, step is not a multiple of npatterns >> > in input vector. So return NULL_TREE. */ >> > { >> > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); >> > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); >> > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); >> > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > >> > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode) >> > Test that stepped sequence of the pattern selects from arg0. >> > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ >> > { >> > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); >> > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); >> > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > >> > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode) >> > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; >> > validate_res (1, 3, res, expected_res); >> > } >> > + >> > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. >> > + In this case ensure that arg has a natural stepped sequence >> > + to preserve arg's encoding. >> > + >> > + As a concrete example, consider: >> > + arg0: { -16, -9, -10, ... } // (1, 3) >> > + arg1: { -12, -5, -6, ... } // (1, 3) >> > + sel = { 0, len, len + 1, ... } // (1, 3) >> > + >> > + This will create res with following encoding: >> > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) >> > + = { -16, -12, -5, ... } >> > + >> > + The step in above encoding would be: (-5) - (-12) = 7 >> > + And hence res[3] would be computed as -5 + 7 = 2. >> > + instead of arg1[2], ie, -6. >> > + Ensure that valid_mask_for_fold_vec_perm_cst returns false >> > + for this case. */ >> > + { >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); >> > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > + >> > + vec_perm_builder builder (len, 1, 3); >> > + poly_uint64 mask_elems[] = { 0, len, len+1 }; >> > + builder_push_elems (builder, mask_elems); >> > + >> > + vec_perm_indices sel (builder, 2, len); >> > + const char *reason; >> > + /* FIXME: It may happen that build_vec_cst_rand may build a natural >> > + stepped pattern, even if we didn't explicitly tell it to. So folding >> > + may not always fail, but if it does, ensure that's because arg1 does >> > + not have a natural stepped sequence (and not due to other reason) */ >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); >> > + if (res == NULL_TREE) >> > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); >> > + } >> > + >> > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped >> > + sequence and thus folding should be valid for this case. */ >> > + { >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); >> > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> > + >> > + vec_perm_builder builder (len, 1, 3); >> > + poly_uint64 mask_elems[] = { 0, len, len+1 }; >> > + builder_push_elems (builder, mask_elems); >> > + >> > + vec_perm_indices sel (builder, 2, len); >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); >> > + >> > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; >> > + validate_res (1, 3, res, expected_res); >> > + } >> > } >> > } >> > >> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c >> > new file mode 100644 >> > index 00000000000..093e2b02654 >> > --- /dev/null >> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c >> > @@ -0,0 +1,23 @@ >> > +/* { dg-do compile } */ >> > +/* { dg-options "-O2 -fdump-tree-optimized" } */ >> > + >> > +int a; >> > +int *b = &a; >> > +static int **c = &b; >> > +static int d; >> > +short e; >> > +short f; >> > + >> > +_Bool foo () >> > +{ >> > + f = -21; >> > + for (; f < -5; f++) { >> > + e = f ^ 3; >> > + d = *b; >> > + **c = e; >> > + } >> > + >> > + return d == -6; >> > +} >> > + >> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 44118e799eb..40767736389 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > /* Ensure that the stepped sequence always selects from the same > input pattern. */ > - unsigned arg_npatterns > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > - : VECTOR_CST_NPATTERNS (arg1); > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); > > if (!multiple_p (step, arg_npatterns)) > { > @@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > *reason = "step is not multiple of npatterns"; > return false; > } > + > + /* If a1 chooses base element from arg, ensure that it's a natural > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) > + to preserve arg's encoding. */ > + > + if (maybe_lt (r1, arg_npatterns)) > + { > + unsigned HOST_WIDE_INT index; > + if (!r1.is_constant (&index)) > + return false; > + > + tree arg_elem0 = vector_cst_elt (arg, index); > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); > + > + tree step1, step2; > + if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) > + || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) > + || !operand_equal_p (step1, step2, 0)) > + { > + if (reason) > + *reason = "not a natural stepped sequence"; > + return false; > + } > + } > } > > return true; > @@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst { > static tree > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > unsigned nelts_per_pattern, > - int step = 0, int threshold = 100) > + int step = 0, bool natural_stepped = false, > + int threshold = 100) > { > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); > tree vectype = build_vector_type_for_mode (inner_type, vmode); > @@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > // Fill a1 for each pattern > for (unsigned i = 0; i < npatterns; i++) > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); > - > + { > + tree a1; > + if (natural_stepped) > + { > + tree a0 = builder[i]; > + wide_int a0_val = wi::to_wide (a0); > + wide_int a1_val = a0_val + step; > + a1 = wide_int_to_tree (inner_type, a1_val); > + } > + else > + a1 = build_int_cst (inner_type, rand () % threshold); > + builder.quick_push (a1); > + } > if (nelts_per_pattern == 2) > return builder.build (); > > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > { > tree prev_elem = builder[i - npatterns]; > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > - int val = prev_elem_val + step; > - builder.quick_push (build_int_cst (inner_type, val)); > + wide_int prev_elem_val = wi::to_wide (prev_elem); > + wide_int val = prev_elem_val + step; > + builder.quick_push (wide_int_to_tree (inner_type, val)); > } > > return builder.build (); > @@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode) > and step (a2 - a1) = 1, step is not a multiple of npatterns > in input vector. So return NULL_TREE. */ > { > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > @@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode) > Test that stepped sequence of the pattern selects from arg0. > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > { > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > @@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode) > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > validate_res (1, 3, res, expected_res); > } > + > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. > + In this case ensure that arg has a natural stepped sequence > + to preserve arg's encoding. > + > + As a concrete example, consider: > + arg0: { -16, -9, -10, ... } // (1, 3) > + arg1: { -12, -5, -6, ... } // (1, 3) > + sel = { 0, len, len + 1, ... } // (1, 3) > + > + This will create res with following encoding: > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) > + = { -16, -12, -5, ... } > + > + The step in above encoding would be: (-5) - (-12) = 7 > + And hence res[3] would be computed as -5 + 7 = 2. > + instead of arg1[2], ie, -6. > + Ensure that valid_mask_for_fold_vec_perm_cst returns false > + for this case. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + const char *reason; > + /* FIXME: It may happen that build_vec_cst_rand may build a natural > + stepped pattern, even if we didn't explicitly tell it to. So folding > + may not always fail, but if it does, ensure that's because arg1 does > + not have a natural stepped sequence (and not due to other reason) */ > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > + if (res == NULL_TREE) > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); > + } > + > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped > + sequence and thus folding should be valid for this case. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; > + validate_res (1, 3, res, expected_res); > + } > } > } >
On Wed, 18 Oct 2023 at 23:22, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > >> > index 4f8561509ff..55a6a68c16c 100644 > >> > --- a/gcc/fold-const.cc > >> > +++ b/gcc/fold-const.cc > >> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > >> > > >> > /* Ensure that the stepped sequence always selects from the same > >> > input pattern. */ > >> > - unsigned arg_npatterns > >> > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > >> > - : VECTOR_CST_NPATTERNS (arg1); > >> > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > >> > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); > >> > > >> > if (!multiple_p (step, arg_npatterns)) > >> > { > >> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > >> > *reason = "step is not multiple of npatterns"; > >> > return false; > >> > } > >> > + > >> > + /* If a1 chooses base element from arg, ensure that it's a natural > >> > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) > >> > + to preserve arg's encoding. */ > >> > + > >> > + unsigned HOST_WIDE_INT index; > >> > + if (!r1.is_constant (&index)) > >> > + return false; > >> > + if (index < arg_npatterns) > >> > + { > >> > >> I don't know whether it matters in practice, but I think the two conditions > >> above are more natural as: > >> > >> if (maybe_lt (r1, arg_npatterns)) > >> { > >> unsigned HOST_WIDE_INT index; > >> if (!r1.is_constant (&index)) > >> return false; > >> > >> ...[code below]... > >> } > >> > >> > + tree arg_elem0 = vector_cst_elt (arg, index); > >> > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); > >> > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); > >> > + > >> > + if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1), > >> > + const_binop (MINUS_EXPR, arg_elem1, arg_elem0), > >> > + 0)) > >> > >> This needs to check whether const_binop returns null. Maybe: > >> > >> tree step1, step2; > >> if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) > >> || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) > >> || !operand_equal_p (step1, step2, 0)) > >> > >> OK with those changes, thanks. > > Hi Richard, > > Thanks for the suggestions, updated the attached patch accordingly. > > Bootstrapped+tested with and without SVE on aarch64-linux-gnu and > > x86_64-linux-gnu. > > OK to commit ? > > Yes, thanks. Thanks, committed to trunk in 3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f. Thanks, Prathamesh > > Richard > > > > > Thanks, > > Prathamesh > >> > >> Richard > >> > >> > + { > >> > + if (reason) > >> > + *reason = "not a natural stepped sequence"; > >> > + return false; > >> > + } > >> > + } > >> > } > >> > > >> > return true; > >> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst { > >> > static tree > >> > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > >> > unsigned nelts_per_pattern, > >> > - int step = 0, int threshold = 100) > >> > + int step = 0, bool natural_stepped = false, > >> > + int threshold = 100) > >> > { > >> > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); > >> > tree vectype = build_vector_type_for_mode (inner_type, vmode); > >> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > >> > > >> > // Fill a1 for each pattern > >> > for (unsigned i = 0; i < npatterns; i++) > >> > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); > >> > - > >> > + { > >> > + tree a1; > >> > + if (natural_stepped) > >> > + { > >> > + tree a0 = builder[i]; > >> > + wide_int a0_val = wi::to_wide (a0); > >> > + wide_int a1_val = a0_val + step; > >> > + a1 = wide_int_to_tree (inner_type, a1_val); > >> > + } > >> > + else > >> > + a1 = build_int_cst (inner_type, rand () % threshold); > >> > + builder.quick_push (a1); > >> > + } > >> > if (nelts_per_pattern == 2) > >> > return builder.build (); > >> > > >> > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > >> > { > >> > tree prev_elem = builder[i - npatterns]; > >> > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > >> > - int val = prev_elem_val + step; > >> > - builder.quick_push (build_int_cst (inner_type, val)); > >> > + wide_int prev_elem_val = wi::to_wide (prev_elem); > >> > + wide_int val = prev_elem_val + step; > >> > + builder.quick_push (wide_int_to_tree (inner_type, val)); > >> > } > >> > > >> > return builder.build (); > >> > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode) > >> > and step (a2 - a1) = 1, step is not a multiple of npatterns > >> > in input vector. So return NULL_TREE. */ > >> > { > >> > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); > >> > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > >> > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode) > >> > Test that stepped sequence of the pattern selects from arg0. > >> > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > >> > { > >> > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); > >> > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > > >> > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode) > >> > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > >> > validate_res (1, 3, res, expected_res); > >> > } > >> > + > >> > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. > >> > + In this case ensure that arg has a natural stepped sequence > >> > + to preserve arg's encoding. > >> > + > >> > + As a concrete example, consider: > >> > + arg0: { -16, -9, -10, ... } // (1, 3) > >> > + arg1: { -12, -5, -6, ... } // (1, 3) > >> > + sel = { 0, len, len + 1, ... } // (1, 3) > >> > + > >> > + This will create res with following encoding: > >> > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) > >> > + = { -16, -12, -5, ... } > >> > + > >> > + The step in above encoding would be: (-5) - (-12) = 7 > >> > + And hence res[3] would be computed as -5 + 7 = 2. > >> > + instead of arg1[2], ie, -6. > >> > + Ensure that valid_mask_for_fold_vec_perm_cst returns false > >> > + for this case. */ > >> > + { > >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > + > >> > + vec_perm_builder builder (len, 1, 3); > >> > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > >> > + builder_push_elems (builder, mask_elems); > >> > + > >> > + vec_perm_indices sel (builder, 2, len); > >> > + const char *reason; > >> > + /* FIXME: It may happen that build_vec_cst_rand may build a natural > >> > + stepped pattern, even if we didn't explicitly tell it to. So folding > >> > + may not always fail, but if it does, ensure that's because arg1 does > >> > + not have a natural stepped sequence (and not due to other reason) */ > >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > >> > + if (res == NULL_TREE) > >> > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); > >> > + } > >> > + > >> > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped > >> > + sequence and thus folding should be valid for this case. */ > >> > + { > >> > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > >> > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); > >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > + > >> > + vec_perm_builder builder (len, 1, 3); > >> > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > >> > + builder_push_elems (builder, mask_elems); > >> > + > >> > + vec_perm_indices sel (builder, 2, len); > >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > >> > + > >> > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; > >> > + validate_res (1, 3, res, expected_res); > >> > + } > >> > } > >> > } > >> > > >> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c > >> > new file mode 100644 > >> > index 00000000000..093e2b02654 > >> > --- /dev/null > >> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c > >> > @@ -0,0 +1,23 @@ > >> > +/* { dg-do compile } */ > >> > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > >> > + > >> > +int a; > >> > +int *b = &a; > >> > +static int **c = &b; > >> > +static int d; > >> > +short e; > >> > +short f; > >> > + > >> > +_Bool foo () > >> > +{ > >> > + f = -21; > >> > + for (; f < -5; f++) { > >> > + e = f ^ 3; > >> > + d = *b; > >> > + **c = e; > >> > + } > >> > + > >> > + return d == -6; > >> > +} > >> > + > >> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */ > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 44118e799eb..40767736389 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > > > /* Ensure that the stepped sequence always selects from the same > > input pattern. */ > > - unsigned arg_npatterns > > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) > > - : VECTOR_CST_NPATTERNS (arg1); > > + tree arg = ((q1 & 1) == 0) ? arg0 : arg1; > > + unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg); > > > > if (!multiple_p (step, arg_npatterns)) > > { > > @@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, > > *reason = "step is not multiple of npatterns"; > > return false; > > } > > + > > + /* If a1 chooses base element from arg, ensure that it's a natural > > + stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0]) > > + to preserve arg's encoding. */ > > + > > + if (maybe_lt (r1, arg_npatterns)) > > + { > > + unsigned HOST_WIDE_INT index; > > + if (!r1.is_constant (&index)) > > + return false; > > + > > + tree arg_elem0 = vector_cst_elt (arg, index); > > + tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns); > > + tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2); > > + > > + tree step1, step2; > > + if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0)) > > + || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1)) > > + || !operand_equal_p (step1, step2, 0)) > > + { > > + if (reason) > > + *reason = "not a natural stepped sequence"; > > + return false; > > + } > > + } > > } > > > > return true; > > @@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst { > > static tree > > build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > unsigned nelts_per_pattern, > > - int step = 0, int threshold = 100) > > + int step = 0, bool natural_stepped = false, > > + int threshold = 100) > > { > > tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1); > > tree vectype = build_vector_type_for_mode (inner_type, vmode); > > @@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns, > > > > // Fill a1 for each pattern > > for (unsigned i = 0; i < npatterns; i++) > > - builder.quick_push (build_int_cst (inner_type, rand () % threshold)); > > - > > + { > > + tree a1; > > + if (natural_stepped) > > + { > > + tree a0 = builder[i]; > > + wide_int a0_val = wi::to_wide (a0); > > + wide_int a1_val = a0_val + step; > > + a1 = wide_int_to_tree (inner_type, a1_val); > > + } > > + else > > + a1 = build_int_cst (inner_type, rand () % threshold); > > + builder.quick_push (a1); > > + } > > if (nelts_per_pattern == 2) > > return builder.build (); > > > > for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++) > > { > > tree prev_elem = builder[i - npatterns]; > > - int prev_elem_val = TREE_INT_CST_LOW (prev_elem); > > - int val = prev_elem_val + step; > > - builder.quick_push (build_int_cst (inner_type, val)); > > + wide_int prev_elem_val = wi::to_wide (prev_elem); > > + wide_int val = prev_elem_val + step; > > + builder.quick_push (wide_int_to_tree (inner_type, val)); > > } > > > > return builder.build (); > > @@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode) > > and step (a2 - a1) = 1, step is not a multiple of npatterns > > in input vector. So return NULL_TREE. */ > > { > > - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true); > > tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > @@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode) > > Test that stepped sequence of the pattern selects from arg0. > > res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > > { > > - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true); > > tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > > > @@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode) > > tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > > validate_res (1, 3, res, expected_res); > > } > > + > > + /* Case 6: PR111648 - a1 chooses base element from input vector arg. > > + In this case ensure that arg has a natural stepped sequence > > + to preserve arg's encoding. > > + > > + As a concrete example, consider: > > + arg0: { -16, -9, -10, ... } // (1, 3) > > + arg1: { -12, -5, -6, ... } // (1, 3) > > + sel = { 0, len, len + 1, ... } // (1, 3) > > + > > + This will create res with following encoding: > > + res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3) > > + = { -16, -12, -5, ... } > > + > > + The step in above encoding would be: (-5) - (-12) = 7 > > + And hence res[3] would be computed as -5 + 7 = 2. > > + instead of arg1[2], ie, -6. > > + Ensure that valid_mask_for_fold_vec_perm_cst returns false > > + for this case. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + const char *reason; > > + /* FIXME: It may happen that build_vec_cst_rand may build a natural > > + stepped pattern, even if we didn't explicitly tell it to. So folding > > + may not always fail, but if it does, ensure that's because arg1 does > > + not have a natural stepped sequence (and not due to other reason) */ > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > > + if (res == NULL_TREE) > > + ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence")); > > + } > > + > > + /* Case 7: Same as Case 6, except that arg1 contains natural stepped > > + sequence and thus folding should be valid for this case. */ > > + { > > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true); > > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > > + > > + vec_perm_builder builder (len, 1, 3); > > + poly_uint64 mask_elems[] = { 0, len, len+1 }; > > + builder_push_elems (builder, mask_elems); > > + > > + vec_perm_indices sel (builder, 2, len); > > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > > + > > + tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) }; > > + validate_res (1, 3, res, expected_res); > > + } > > } > > } > >
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..c5f421d6b76 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, return false; } - /* Ensure that the stepped sequence always selects from the same - input pattern. */ + /* Ensure that the stepped sequence always selects from the stepped + part of same input pattern. */ unsigned arg_npatterns = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) : VECTOR_CST_NPATTERNS (arg1); @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, *reason = "step is not multiple of npatterns"; return false; } + + /* If a1 is a multiple of len, it will select base element of input + vector resulting in following encoding: + { base_elem, arg[0], arg[1], ... } where arg is the chosen input + vector. This encoding is not originally present in arg, since it's + defined as: + { arg[0], arg[1], arg[2], ... }. */ + + if (multiple_p (a1, arg_len)) + { + if (reason) + *reason = "selecting base element of input vector"; + return false; + } } return true; @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; validate_res (2, 2, res, expected_res); } - - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) - Test that the stepped sequence of the pattern selects from - same input pattern. Since input vectors have npatterns = 2, - and step (a2 - a1) = 1, step is not a multiple of npatterns - in input vector. So return NULL_TREE. */ - { - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); - - vec_perm_builder builder (len, 1, 3); - poly_uint64 mask_elems[] = { 0, 0, 1 }; - builder_push_elems (builder, mask_elems); - - vec_perm_indices sel (builder, 2, len); - const char *reason; - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, - &reason); - ASSERT_TRUE (res == NULL_TREE); - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); - } - - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) - Test that stepped sequence of the pattern selects from arg0. - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ - { - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); - - vec_perm_builder builder (len, 1, 3); - poly_uint64 mask_elems[] = { len, 0, 1 }; - builder_push_elems (builder, mask_elems); - - vec_perm_indices sel (builder, 2, len); - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); - - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; - validate_res (1, 3, res, expected_res); - } } } @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) validate_res (1, 3, res, expected_res); } - /* Case 4: + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) + Test that stepped sequence of the pattern selects from arg0. + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { len, 1, 2 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); + + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; + validate_res (1, 3, res, expected_res); + } + + /* Case 5: sel = { len, 0, 2, ... } // (1, 3) This should return NULL because we cross the input vectors. Because, @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); } - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 mask = { 0, len, 1, len + 1, ...} // (2, 2) res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) validate_res (2, 2, res, expected_res); } - /* Case 6: Test combination in sel, where one pattern is dup and other + /* Case 7: Test combination in sel, where one pattern is dup and other is stepped sequence. sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) res = { arg0[0], arg0[0], arg0[0], @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) validate_res (2, 3, res, expected_res); } - /* Case 7: PR111048: Check that we set arg_npatterns correctly, + /* Case 8: PR111048: Check that we set arg_npatterns correctly, when arg0, arg1 and sel have different number of patterns. arg0 is of shape (1, 1) arg1 is of shape (4, 1) @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) ASSERT_TRUE (res == NULL_TREE); ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); } + + /* Case 9: PR111648 - a1 is multiple of vector length, + which results in incorrect encoding. Verify that we return + NULL for this case. + sel = { base_elem, len, len+1, ... } // (1, 3) + In this case, the single pattern is: { base_elem len, len+1, ...} + Let's assume that base_elem is used for indexing into arg0, + and a1 ... ae chooses elements from arg1. + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) + Which creates an incorrect encoding with S = arg1[1] - arg1[0] + while the original encoding in arg1 is + arg1: { arg1[0], arg1[1], arg1[2], ... } + with S = arg1[2] - arg1[1]. + + As a concrete example, for above PR: + arg0: { -16, -9, -10, -11 } + arg1: { -12, -5, -6, -7 } + sel = { 3, 4, 5, 6 } + + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. + Since a1 = 4 and arg_len = 4, it ended up creating the result with + following encoding: + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) + = { -11, -12, -5 } + + So for res[3], it used S = (-5) - (-12) = 7 + And hence computed it as -5 + 7 = 2. + instead of arg1[2], ie, -6, which is the correct value. + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3); + tree arg1 = build_vec_cst_rand (vmode, 1, 3); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + const char *reason; + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); + ASSERT_TRUE (res == NULL_TREE); + ASSERT_TRUE (!strcmp (reason, + "selecting base element of input vector")); + } } } diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c new file mode 100644 index 00000000000..093e2b02654 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int a; +int *b = &a; +static int **c = &b; +static int d; +short e; +short f; + +_Bool foo () +{ + f = -21; + for (; f < -5; f++) { + e = f ^ 3; + d = *b; + **c = e; + } + + return d == -6; +} + +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */