Message ID | 5507813E.3060106@linaro.org |
---|---|
State | New |
Headers | show |
ping? Thanks, Kugan On 17/03/15 12:19, Kugan wrote: > > > On 17/03/15 03:48, Kyrill Tkachov wrote: >> >> On 16/03/15 13:15, Kugan wrote: >>> On 16/03/15 23:32, Kugan wrote: >>>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set >>>>>> (reg) >>>>>> (const_int )) move but I think the intention, at least for now, is to >>>>>> return extra_cost->vect.alu for all the vector operations. >>>>> Almost, what we want at the moment is COSTS_N_INSNS (1) + >>>>> extra_cost->vect.alu >>>> Thanks Kyrill for the review. >>>> >>>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this >>>>>> OK for trunk? >>>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I >>>>> thought for moves into vecto registers it would be a (set (reg) >>>>> (const_vector)) which we don't handle in our rtx costs currently. I >>>>> think the correct approach would be to extend the aarch64_rtx_costs >>>>> switch statement to handle the CONST_VECT case. I believe you can use >>>>> aarch64_simd_valid_immediate to check whether x is a valid immediate >>>>> for >>>>> a simd instruction and give it a cost of extra_cost->vect.alu. The >>>>> logic >>>>> should be similar to the CONST_INT case. >>>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that >>>> is being split at 220r.subreg2 is >>>> >>>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0) >>>> (subreg:V4SF (reg/v:OI 73 [ __o ]) 0)) >>>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625 >>>> >>>> 800 {*aarch64_simd_movv4sf} >>>> (nil)) >>>> >>>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be >>>> split and it dosent recover from there. Therefore we need something like >>>> the below to prevent that happening. >>>> >>> Hi Kyrill, >>> >>> How about the attached patch? It is similar to what is currently done >>> for scalar register move. >> >> Hi Kugan, >> yeah, I think this is a better approach, though I can't approve. >> > > Here is the patch with minor comment update. Regression tested on > aarch64-linux-gnu with no new regression. Is this > OK for trunk? > > Thanks, > Kugan > > gcc/ChangeLog: > > 2015-03-17 Kugan Vivekanandarajah <kuganv@linaro.org> > Jim Wilson <jim.wilson@linaro.org> > > PR target/65375 > * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle > vector register copies. > > >
Ping? Now that Stage1 is open, is this OK for trunk. Thanks, Kugan On 26/03/15 18:21, Kugan wrote: > ping? > > Thanks, > Kugan > > On 17/03/15 12:19, Kugan wrote: >> >> >> On 17/03/15 03:48, Kyrill Tkachov wrote: >>> >>> On 16/03/15 13:15, Kugan wrote: >>>> On 16/03/15 23:32, Kugan wrote: >>>>>>> lower-subreg.c:compute_costs() only cares about the cost of a (set >>>>>>> (reg) >>>>>>> (const_int )) move but I think the intention, at least for now, is to >>>>>>> return extra_cost->vect.alu for all the vector operations. >>>>>> Almost, what we want at the moment is COSTS_N_INSNS (1) + >>>>>> extra_cost->vect.alu >>>>> Thanks Kyrill for the review. >>>>> >>>>>>> Regression tested on aarch64-linux-gnu with no new regression. Is this >>>>>>> OK for trunk? >>>>>> Are you sure it's a (set (reg) (const_int)) that's being costed here? I >>>>>> thought for moves into vecto registers it would be a (set (reg) >>>>>> (const_vector)) which we don't handle in our rtx costs currently. I >>>>>> think the correct approach would be to extend the aarch64_rtx_costs >>>>>> switch statement to handle the CONST_VECT case. I believe you can use >>>>>> aarch64_simd_valid_immediate to check whether x is a valid immediate >>>>>> for >>>>>> a simd instruction and give it a cost of extra_cost->vect.alu. The >>>>>> logic >>>>>> should be similar to the CONST_INT case. >>>>> Sorry about the (set (reg) (const_int)) above. But the actual RTL that >>>>> is being split at 220r.subreg2 is >>>>> >>>>> (insn 11 10 12 2 (set (subreg:V4SF (reg/v:OI 77 [ __o ]) 0) >>>>> (subreg:V4SF (reg/v:OI 73 [ __o ]) 0)) >>>>> /home/kugan/work/builds/gcc-fsf-gcc/tools/lib/gcc/aarch64-none-linux-gnu/5.0.0/include/arm_neon.h:22625 >>>>> >>>>> 800 {*aarch64_simd_movv4sf} >>>>> (nil)) >>>>> >>>>> And also, if we return RTX cost above COSTS_N_INSNS (1), it will be >>>>> split and it dosent recover from there. Therefore we need something like >>>>> the below to prevent that happening. >>>>> >>>> Hi Kyrill, >>>> >>>> How about the attached patch? It is similar to what is currently done >>>> for scalar register move. >>> >>> Hi Kugan, >>> yeah, I think this is a better approach, though I can't approve. >>> >> >> Here is the patch with minor comment update. Regression tested on >> aarch64-linux-gnu with no new regression. Is this >> OK for trunk? >> >> Thanks, >> Kugan >> >> gcc/ChangeLog: >> >> 2015-03-17 Kugan Vivekanandarajah <kuganv@linaro.org> >> Jim Wilson <jim.wilson@linaro.org> >> >> PR target/65375 >> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle >> vector register copies. >> >> >>
On 15/04/15 19:25, James Greenhalgh wrote: > On Tue, Apr 14, 2015 at 11:08:55PM +0100, Kugan wrote: >> Now that Stage1 is open, is this OK for trunk. > > Hi Kugan, > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index cba3c1a..d6ad0af 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, >> >> /* Fall through. */ >> case REG: >> + if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1)) >> + { >> + /* The cost is 1 per vector-register copied. */ >> + int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1) >> + / GET_MODE_SIZE (V4SImode); >> + *cost = COSTS_N_INSNS (n_minus_1 + 1); >> + } >> /* const0_rtx is in general free, but we will use an >> instruction to set a register to 0. */ >> - if (REG_P (op1) || op1 == const0_rtx) >> - { >> + else if (REG_P (op1) || op1 == const0_rtx) >> + { >> /* The cost is 1 per register copied. */ >> int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1) >> / UNITS_PER_WORD; > > I would not have expected control flow to reach this point, as we have: It does for mode == VODmode. RTL X is for example: (set (reg:V8DI 67 virtual-incoming-args) (reg:V8DI 68 virtual-stack-vars)) > >> /* TODO: The cost infrastructure currently does not handle >> vector operations. Assume that all vector operations >> are equally expensive. */ >> if (VECTOR_MODE_P (mode)) >> { >> if (speed) >> *cost += extra_cost->vect.alu; >> return true; >> } > > But, I see that this check is broken for a set RTX (which has no mode). > So, your patch works, but only due to a bug in my original implementation. > This leaves the code with quite a messy design. > > There are two ways I see that we could clean things up, both of which > require some reworking of your patch. > > Either we remove my check above and teach the RTX costs how to properly > cost vector operations, or we fix my check to catch all vector RTX > and add the special cases for the small subset of things we understand > up there. > > The correct approach in the long term is to fix the RTX costs to correctly > understand vector operations, so I'd much prefer to see a patch along > these lines, though I appreciate that is a substantially more invasive > piece of work. > I agree that rtx cost for vector is not handled right now. We might not be able to completely separate as Kyrill suggested. We still need the vector SET with VOIDmode to be handled inline. This patch is that part. We can work on the others as a separate function, if you prefer that. I am happy to look this as a separate patch. Thanks, Kugan
On 15/04/15 21:18, James Greenhalgh wrote: > On Wed, Apr 15, 2015 at 11:45:36AM +0100, Kugan wrote: >>> There are two ways I see that we could clean things up, both of which >>> require some reworking of your patch. >>> >>> Either we remove my check above and teach the RTX costs how to properly >>> cost vector operations, or we fix my check to catch all vector RTX >>> and add the special cases for the small subset of things we understand >>> up there. >>> >>> The correct approach in the long term is to fix the RTX costs to correctly >>> understand vector operations, so I'd much prefer to see a patch along >>> these lines, though I appreciate that is a substantially more invasive >>> piece of work. >>> >> >> >> I agree that rtx cost for vector is not handled right now. We might not >> be able to completely separate as Kyrill suggested. We still need the >> vector SET with VOIDmode to be handled inline. This patch is that part. >> We can work on the others as a separate function, if you prefer that. I >> am happy to look this as a separate patch. > > My point is that adding your patch while keeping the logic at the top > which claims to catch ALL vector operations makes for less readable > code. > > At the very least you'll need to update this comment: > > /* TODO: The cost infrastructure currently does not handle > vector operations. Assume that all vector operations > are equally expensive. */ > > to make it clear that this doesn't catch vector set operations. > > But fixing the comment doesn't improve the messy code so I'd certainly > prefer to see one of the other approaches which have been discussed. I see your point. Let me work on this based on your suggestions above. Thanks, Kugan
> On Apr 15, 2015, at 2:18 PM, James Greenhalgh <james.greenhalgh@arm.com> wrote: > > On Wed, Apr 15, 2015 at 11:45:36AM +0100, Kugan wrote: >>> There are two ways I see that we could clean things up, both of which >>> require some reworking of your patch. >>> >>> Either we remove my check above and teach the RTX costs how to properly >>> cost vector operations, or we fix my check to catch all vector RTX >>> and add the special cases for the small subset of things we understand >>> up there. >>> >>> The correct approach in the long term is to fix the RTX costs to correctly >>> understand vector operations, so I'd much prefer to see a patch along >>> these lines, though I appreciate that is a substantially more invasive >>> piece of work. >>> >> >> >> I agree that rtx cost for vector is not handled right now. We might not >> be able to completely separate as Kyrill suggested. We still need the >> vector SET with VOIDmode to be handled inline. This patch is that part. >> We can work on the others as a separate function, if you prefer that. I >> am happy to look this as a separate patch. > > My point is that adding your patch while keeping the logic at the top > which claims to catch ALL vector operations makes for less readable > code. > > At the very least you'll need to update this comment: > > /* TODO: The cost infrastructure currently does not handle > vector operations. Assume that all vector operations > are equally expensive. */ > > to make it clear that this doesn't catch vector set operations. > > But fixing the comment doesn't improve the messy code so I'd certainly > prefer to see one of the other approaches which have been discussed. While I am for cleaning up messy code, I want to avoid Kugan's patch being held hostage until all the proper refactorings and cleanups are done. If we consider the patch on its own merits: Is it a worthwhile improvement? -- [Probably, "yes".] Does it make current spaghetti code significantly more difficult to understand? -- [Probably, "no", if we update the current comments.] Let's discuss the effort of cleaning RTX costs as a separate task. It can be either a joint effort for ARM and Linaro, or one of us can tackle it. Thank you, -- Maxim Kuvyrkov www.linaro.org
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index cba3c1a..d6ad0af 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5544,10 +5544,17 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, /* Fall through. */ case REG: + if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1)) + { + /* The cost is 1 per vector-register copied. */ + int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1) + / GET_MODE_SIZE (V4SImode); + *cost = COSTS_N_INSNS (n_minus_1 + 1); + } /* const0_rtx is in general free, but we will use an instruction to set a register to 0. */ - if (REG_P (op1) || op1 == const0_rtx) - { + else if (REG_P (op1) || op1 == const0_rtx) + { /* The cost is 1 per register copied. */ int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1) / UNITS_PER_WORD;