Message ID | 1386107477-24165-13-git-send-email-peter.maydell@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 12/04/2013 10:51 AM, Peter Maydell wrote: > @@ -184,6 +184,18 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg) > } > } > > +/* read a cpu register in 32bit/64bit mode to dst */ > +static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf) > +{ > + if (reg == 31) { > + tcg_gen_movi_i64(dst, 0); > + } else if (sf) { > + tcg_gen_mov_i64(dst, cpu_X[reg]); > + } else { /* (!sf) */ > + tcg_gen_ext32u_i64(dst, cpu_X[reg]); > + } > +} I think this should be more like cpu_reg and return a TCGv instead of force a copy into a newly generated temporary. You've got a pool of auto-freeing temporaries, after all. > + if (op) { /* CBNZ */ > + tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch); > + } else { /* CBZ */ > + tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch); > + } Similar comments to TBNZ/TBZ apply here. r~
On 4 December 2013 00:10, Richard Henderson <rth@twiddle.net> wrote: > On 12/04/2013 10:51 AM, Peter Maydell wrote: >> @@ -184,6 +184,18 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg) >> } >> } >> >> +/* read a cpu register in 32bit/64bit mode to dst */ >> +static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf) >> +{ >> + if (reg == 31) { >> + tcg_gen_movi_i64(dst, 0); >> + } else if (sf) { >> + tcg_gen_mov_i64(dst, cpu_X[reg]); >> + } else { /* (!sf) */ >> + tcg_gen_ext32u_i64(dst, cpu_X[reg]); >> + } >> +} > > I think this should be more like cpu_reg and return a TCGv instead of force a > copy into a newly generated temporary. You've got a pool of auto-freeing > temporaries, after all. You're right that we can just make this function return the TCGv temp rather than making the caller pass one in. Are you suggesting the 64-bit case should return cpu_X[reg] rather than a copy of it, though? I think it would be pretty hard to reason about if you had to remember that sometimes you got a trashable copy and sometimes the real register. (I'm still a bit on the fence about that pool of auto-freeing temporaries. Manual temp management is certainly a fertile source of decoder bugs, but in the longer term we might want to push the functionality down into the common code rather than having an ad-hoc thing in one decoder.) >> + if (op) { /* CBNZ */ >> + tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch); >> + } else { /* CBZ */ >> + tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch); >> + } > > Similar comments to TBNZ/TBZ apply here. Agreed. thanks -- PMM
On 12/04/2013 01:32 PM, Peter Maydell wrote: > You're right that we can just make this function return the TCGv > temp rather than making the caller pass one in. Are you suggesting > the 64-bit case should return cpu_X[reg] rather than a copy of it, > though? I think it would be pretty hard to reason about if you had to > remember that sometimes you got a trashable copy and sometimes > the real register. I think that the normal case is to write to the output in one step, and thus having an input overlap an output isn't your problem but tcg's. I would think the case of multi-step output to be fairly rare, and easily solvable by always allocating an output temporary. Am I off-base on the multi-output thing? Modulo flags setting, of course, but I think that special case ought to be relatively solvable. > (I'm still a bit on the fence about that pool of auto-freeing temporaries. > Manual temp management is certainly a fertile source of decoder bugs, > but in the longer term we might want to push the functionality down into > the common code rather than having an ad-hoc thing in one decoder.) See also Sparc and S390. ;-) I'm open to suggestions for doing this generically, but I've no great ideas off-hand. r~
On 4 December 2013 00:48, Richard Henderson <rth@twiddle.net> wrote: > On 12/04/2013 01:32 PM, Peter Maydell wrote: >> You're right that we can just make this function return the TCGv >> temp rather than making the caller pass one in. Are you suggesting >> the 64-bit case should return cpu_X[reg] rather than a copy of it, >> though? I think it would be pretty hard to reason about if you had to >> remember that sometimes you got a trashable copy and sometimes >> the real register. > > I think that the normal case is to write to the output in one step, and thus > having an input overlap an output isn't your problem but tcg's. I would think > the case of multi-step output to be fairly rare, and easily solvable by always > allocating an output temporary. Does a copy to temp actually cost us anything, though? I was under the impression the TCG optimizer would basically throw it away again. I suppose you could just treat the semantics of it as "assume you can't trash this TCGv" in all cases and rely on the auto-free. > Am I off-base on the multi-output thing? Modulo flags setting, of course, but > I think that special case ought to be relatively solvable. > >> (I'm still a bit on the fence about that pool of auto-freeing temporaries. >> Manual temp management is certainly a fertile source of decoder bugs, >> but in the longer term we might want to push the functionality down into >> the common code rather than having an ad-hoc thing in one decoder.) > > See also Sparc and S390. ;-) Ah, I hadn't noticed we were already doing this in other frontends. -- PMM
On 4 December 2013 00:48, Richard Henderson <rth@twiddle.net> wrote: > On 12/04/2013 01:32 PM, Peter Maydell wrote: >> You're right that we can just make this function return the TCGv >> temp rather than making the caller pass one in. Are you suggesting >> the 64-bit case should return cpu_X[reg] rather than a copy of it, >> though? I think it would be pretty hard to reason about if you had to >> remember that sometimes you got a trashable copy and sometimes >> the real register. > > I think that the normal case is to write to the output in one step, and thus > having an input overlap an output isn't your problem but tcg's. I would think > the case of multi-step output to be fairly rare, and easily solvable by always > allocating an output temporary. So I had a look at this, and it seems to make code like this (from the handling for logic ops with shifted registers): tcg_rm = read_cpu_reg(s, rm, sf); if (shift_amount) { shift_reg_imm(tcg_rm, tcg_rm, sf, shift_type, shift_amount); } if (invert) { tcg_gen_not_i64(tcg_rm, tcg_rm); /* we zero extend later on (!sf) */ } a bit awkward if the value returned from read_cpu_reg() isn't a trashable copy, because of the sequence of "maybe we need to change the value like this" conditionals. The code basically needs to copy the return value from read_cpu_reg() into its own temporary immediately. There's a similar thing in the conditional-select code where the else-case's value might be inverted and might be incremented and might be neither. I think that kind of thing tips the balance in favour of having read_cpu_reg() always return a trashable temporary. thanks -- PMM
On 12/05/2013 06:02 AM, Peter Maydell wrote: > I think that kind of thing tips the balance in favour of having > read_cpu_reg() always return a trashable temporary. Fair enough. r~
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index bcb59db..37dc462 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -184,6 +184,18 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg) } } +/* read a cpu register in 32bit/64bit mode to dst */ +static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf) +{ + if (reg == 31) { + tcg_gen_movi_i64(dst, 0); + } else if (sf) { + tcg_gen_mov_i64(dst, cpu_X[reg]); + } else { /* (!sf) */ + tcg_gen_ext32u_i64(dst, cpu_X[reg]); + } +} + /* * the instruction disassembly implemented here matches * the instruction encoding classifications in chapter 3 (C3) @@ -209,10 +221,39 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t insn) gen_goto_tb(s, 0, addr); } -/* Compare & branch (immediate) */ +/* C3.2.1 Compare & branch (immediate) + * 31 30 25 24 23 5 4 0 + * +----+-------------+----+---------------------+--------+ + * | sf | 0 1 1 0 1 0 | op | imm19 | Rt | + * +----+-------------+----+---------------------+--------+ + */ static void disas_comp_b_imm(DisasContext *s, uint32_t insn) { - unsupported_encoding(s, insn); + unsigned int sf, op, rt; + uint64_t addr; + int label_nomatch; + TCGv_i64 tcg_cmp; + + sf = extract32(insn, 31, 1); + op = extract32(insn, 24, 1); + rt = extract32(insn, 0, 5); + addr = s->pc + sextract32(insn, 5, 19) * 4 - 4; + + tcg_cmp = tcg_temp_new_i64(); + read_cpu_reg(s, tcg_cmp, rt, sf); + label_nomatch = gen_new_label(); + + if (op) { /* CBNZ */ + tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch); + } else { /* CBZ */ + tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch); + } + + tcg_temp_free_i64(tcg_cmp); + + gen_goto_tb(s, 0, addr); + gen_set_label(label_nomatch); + gen_goto_tb(s, 1, s->pc); } /* C3.2.5 Test & branch (immediate)