Message ID | 20181218063911.2112-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg, target/ppc vector improvements | expand |
On 18/12/2018 06:38, Richard Henderson wrote: > This implements some of the things that I talked about with Mark > this morning / yesterday. In particular: > > (0) Implement expanders for nand, nor, eqv logical operations. > > (1) Implement saturating arithmetic for the tcg backend. > > While I had expanders for these, they always went to helpers. > It's easy enough to expand byte and half-word operations for x86. > Beyond that, 32 and 64-bit operations can be expanded with integers. > > (2) Implement minmax arithmetic for the tcg backend. > > While I had integral minmax operations, I had not yet added > any vector expanders for this. (The integral stuff came in > for atomic minmax.) > > (3) Trivial conversions to minmax for target/arm. > > (4) Patches 11-18 are identical to Mark's. > > (5) Patches 19-25 implement splat and logicals for VMX and VSX. > > VSX is no more difficult than VMX for these. It does seem to be > just about everything that we can do for VSX at the momement. > > (6) Patches 26-33 implement saturating arithmetic for VMX. > > (7) Patch 34 implements minmax arithmetic for VMX. > > I've tested the new operations via aarch64 guest, as that's the set > of risu test cases I've got handy. The rest is untested so far. Thank you for working on this! I've just given this patchset a spin on my test images and here's what I found: - The version of my target/ppc patchset you've used is the one that I posted to the mailing list which doesn't have the GEN_FLOAT macro fixes, removal of the uint64_t * cast that you requested, and additional SoBs I've taken this patchset, replaced my patches with the latest versions, and repushed to github at https://github.com/mcayland/qemu/tree/ppc-altivec-rth. - This patchset introduces visual artefacts on-screen for both OS X and OS 9 A quick bisection suggests that there could be 2 separate issues related to the implementation of splat: Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black border to appear around the OS X splash screen (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an overflow/alignment issue. Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector operations" causes corruption of the OS X splash screen (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be an endian issue. Having said that, the results look really promising, and I don't think it will take too long to resolve any outstanding issues. I will be around on IRC later today if that helps too. ATB, Mark.
On 18/12/2018 09:49, Mark Cave-Ayland wrote: > A quick bisection suggests that there could be 2 separate issues related to the > implementation of splat: > > Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black > border to appear around the OS X splash screen > (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an > overflow/alignment issue. This one appears to be a sign extension issue - if I make use of the same technique used by the previous helper then this problem goes away. Below is my experimental diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations": diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index be638cdb1a..6cd25c8dc6 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \ #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3) \ static void glue(gen_, name)(DisasContext *ctx) \ { \ - int simm; \ + int8_t simm; \ if (unlikely(!ctx->altivec_enabled)) { \ gen_exception(ctx, POWERPC_EXCP_VPU); \ return; \ } \ - simm = SIMM5(ctx->opcode); \ + simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3; \ tcg_op(avr64_offset(rD(ctx->opcode), true), 16, 16, simm); \ } ATB, Mark.
On 18/12/2018 09:49, Mark Cave-Ayland wrote: > Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector > operations" causes corruption of the OS X splash screen > (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be > an endian issue. Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this patch helps a lot, but something still isn't quite right: https://www.ilande.co.uk/tmp/qemu/badapple3.png. Adding some more debugging seems to suggest that boffs is being handled correctly based upon vece/uimm... ins: vsplth bofs before: 304e0 bofs after: 304e0 uimm: 0 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e2 uimm: 1 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e4 uimm: 2 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e6 uimm: 3 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e0 uimm: 0 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e2 uimm: 1 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e4 uimm: 2 vece: 1 ins: vsplth bofs before: 304e0 bofs after: 304e6 uimm: 3 vece: 1 ins: vsplth bofs before: 30560 bofs after: 3056e uimm: 7 vece: 1 ins: vsplth bofs before: 30540 bofs after: 3054e uimm: 7 vece: 1 ins: vsplth bofs before: 30490 bofs after: 30492 uimm: 1 vece: 1 ins: vspltw bofs before: 30580 bofs after: 3058c uimm: 3 vece: 2 ins: vsplth bofs before: 30580 bofs after: 30586 uimm: 3 vece: 1 ins: vspltw bofs before: 30580 bofs after: 30580 uimm: 0 vece: 2 ins: vspltb bofs before: 30560 bofs after: 30560 uimm: 0 vece: 0 ins: vsplth bofs before: 304d0 bofs after: 304d0 uimm: 0 vece: 1 ins: vsplth bofs before: 304d0 bofs after: 304d2 uimm: 1 vece: 1 ins: vsplth bofs before: 304d0 bofs after: 304d4 uimm: 2 vece: 1 ins: vsplth bofs before: 304d0 bofs after: 304d4 uimm: 2 vece: 1 ATB, Mark.
On 12/18/18 6:51 AM, Mark Cave-Ayland wrote: > On 18/12/2018 09:49, Mark Cave-Ayland wrote: > >> A quick bisection suggests that there could be 2 separate issues related to the >> implementation of splat: >> >> Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black >> border to appear around the OS X splash screen >> (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an >> overflow/alignment issue. > > This one appears to be a sign extension issue - if I make use of the same technique > used by the previous helper then this problem goes away. Below is my experimental > diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations": > > diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c > index be638cdb1a..6cd25c8dc6 100644 > --- a/target/ppc/translate/vmx-impl.inc.c > +++ b/target/ppc/translate/vmx-impl.inc.c > @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \ > #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3) \ > static void glue(gen_, name)(DisasContext *ctx) \ > { \ > - int simm; \ > + int8_t simm; This shouldn't matter. \ > if (unlikely(!ctx->altivec_enabled)) { \ > gen_exception(ctx, POWERPC_EXCP_VPU); \ > return; \ > } \ > - simm = SIMM5(ctx->opcode); \ > + simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3; \ This suggests that SIMM5 should be using sextract32. r~
On 12/18/18 7:05 AM, Mark Cave-Ayland wrote: > On 18/12/2018 09:49, Mark Cave-Ayland wrote: > >> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector >> operations" causes corruption of the OS X splash screen >> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be >> an endian issue. > > Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this > patch helps a lot, but something still isn't quite right: > https://www.ilande.co.uk/tmp/qemu/badapple3.png. I can't figure out what the host+guest endian rules for ppc_avr_t are at all. Certainly there appear to be bugs wrt vscr and which end of the register we pull the value. On the tcg side we take host endianness into account, and on the gdb side we always use u32[3]. r~
On 18/12/2018 15:07, Richard Henderson wrote: >> This one appears to be a sign extension issue - if I make use of the same technique >> used by the previous helper then this problem goes away. Below is my experimental >> diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations": >> >> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c >> index be638cdb1a..6cd25c8dc6 100644 >> --- a/target/ppc/translate/vmx-impl.inc.c >> +++ b/target/ppc/translate/vmx-impl.inc.c >> @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \ >> #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3) \ >> static void glue(gen_, name)(DisasContext *ctx) \ >> { \ >> - int simm; \ >> + int8_t simm; > > This shouldn't matter. > \ >> if (unlikely(!ctx->altivec_enabled)) { \ >> gen_exception(ctx, POWERPC_EXCP_VPU); \ >> return; \ >> } \ >> - simm = SIMM5(ctx->opcode); \ >> + simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3; \ > > This suggests that SIMM5 should be using sextract32. There's certainly an obvious typo here, but on its own it doesn't fix the issue: diff --git a/target/ppc/internal.h b/target/ppc/internal.h index b77d564a65..08eee1cd84 100644 --- a/target/ppc/internal.h +++ b/target/ppc/internal.h @@ -124,7 +124,7 @@ EXTRACT_SHELPER(SIMM, 0, 16); /* 16 bits unsigned immediate value */ EXTRACT_HELPER(UIMM, 0, 16); /* 5 bits signed immediate value */ -EXTRACT_HELPER(SIMM5, 16, 5); +EXTRACT_SHELPER(SIMM5, 16, 5); /* 5 bits signed immediate value */ EXTRACT_HELPER(UIMM5, 16, 5); /* 4 bits unsigned immediate value */ ATB, Mark.
On 18/12/2018 15:17, Richard Henderson wrote: > On 12/18/18 7:05 AM, Mark Cave-Ayland wrote: >> On 18/12/2018 09:49, Mark Cave-Ayland wrote: >> >>> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector >>> operations" causes corruption of the OS X splash screen >>> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be >>> an endian issue. >> >> Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this >> patch helps a lot, but something still isn't quite right: >> https://www.ilande.co.uk/tmp/qemu/badapple3.png. > > I can't figure out what the host+guest endian rules for ppc_avr_t are at all. > > Certainly there appear to be bugs wrt vscr and which end of the register we > pull the value. On the tcg side we take host endianness into account, and on > the gdb side we always use u32[3]. That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be in host order? Certainly in the VMX helper macros I've looked at, the members are set directly with no byte swapping. ATB, Mark.
On 12/18/18 7:26 AM, Mark Cave-Ayland wrote: > That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be > in host order? Certainly in the VMX helper macros I've looked at, the members are set > directly with no byte swapping. "Host order"? For both words of the vector? That's certainly going to cause problems wrt VSX and FPU registers. We're hard-coding that as fpu == vsx.u64[0] (both before and after your patch set). For vscr, on master we have void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r) { #if defined(HOST_WORDS_BIGENDIAN) env->vscr = r->u32[3]; #else env->vscr = r->u32[0]; #endif and if (needs_byteswap) { vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]); vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]); } else { vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0]; vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1]; } } vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr); For helper macros that apply the same operation to all lanes, it doesn't matter which order in which the lanes are processed, so of course I would expect them to be processed in host order. It's cases that do not apply the same operation, such as merges, where the problems would arise. There are at least 3 schemes being employed to address this: #if defined(HOST_WORDS_BIGENDIAN) #define HI_IDX 0 #define LO_IDX 1 #define AVRB(i) u8[i] #define AVRW(i) u32[i] #else #define HI_IDX 1 #define LO_IDX 0 #define AVRB(i) u8[15-(i)] #define AVRW(i) u32[3-(i)] #endif ... #if defined(HOST_WORDS_BIGENDIAN) #define EL_IDX(i) (i) #else #define EL_IDX(i) (3 - (i)) #endif ... #define EL_IDX(i) (i) #else #define EL_IDX(i) (1 - (i)) #endif ... #if defined(HOST_WORDS_BIGENDIAN) result.u8[i] = a->u8[indexA] ^ b->u8[indexB]; #else result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB]; #endif r~
On 18/12/2018 09:49, Mark Cave-Ayland wrote: > Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector > operations" causes corruption of the OS X splash screen > (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be > an endian issue. After some more digging I've found out what's going on here by dumping out the AVR registers before and after: Before the patch: BEFORE: uimm: 0 size: 2 sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0 dreg: 99 @ 0x7f54fd715870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad AFTER: dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a BEFORE: uimm: 1 size: 2 sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0 dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a AFTER: dreg: 99 @ 0x7f54fd715870 - 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 After the patch: BEFORE: uimm: 0 size: 2 sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0 dreg: 18 @ 0x7fe5a0c4a870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad AFTER: dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 BEFORE: uimm: 1 size: 2 sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0 dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 AFTER: dreg: 18 @ 0x7fe5a0c4a870 - 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 As you can see vsplth splat is one byte off with this patch applied and the cause is the xor in the #ifndef HOST_WORDS_BIGENDIAN block: before the xor is applied, bofs is aligned to 2 bytes and with bofs ^ 15 the LSB is set to 1 again, introducing the 1 byte error. Applying the following patch to mask bofs based upon the size of vece seems to fix the issue here for me on little-endian Intel: diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index 59d3bc6e02..41ddbd879f 100644 --- a/target/ppc/translate/vmx-impl.inc.c +++ b/target/ppc/translate/vmx-impl.inc.c @@ -815,6 +815,7 @@ static void gen_vsplt(DisasContext *ctx, int vece) bofs += (uimm << vece) & 15; #ifndef HOST_WORDS_BIGENDIAN bofs ^= 15; + bofs &= ~((1 << vece) - 1); #endif tcg_gen_gvec_dup_mem(vece, dofs, bofs, 16, 16); ATB, Mark.
On 18/12/2018 06:38, Richard Henderson wrote: > This implements some of the things that I talked about with Mark > this morning / yesterday. In particular: > > (0) Implement expanders for nand, nor, eqv logical operations. > > (1) Implement saturating arithmetic for the tcg backend. > > While I had expanders for these, they always went to helpers. > It's easy enough to expand byte and half-word operations for x86. > Beyond that, 32 and 64-bit operations can be expanded with integers. > > (2) Implement minmax arithmetic for the tcg backend. > > While I had integral minmax operations, I had not yet added > any vector expanders for this. (The integral stuff came in > for atomic minmax.) > > (3) Trivial conversions to minmax for target/arm. > > (4) Patches 11-18 are identical to Mark's. > > (5) Patches 19-25 implement splat and logicals for VMX and VSX. > > VSX is no more difficult than VMX for these. It does seem to be > just about everything that we can do for VSX at the momement. > > (6) Patches 26-33 implement saturating arithmetic for VMX. > > (7) Patch 34 implements minmax arithmetic for VMX. > > I've tested the new operations via aarch64 guest, as that's the set > of risu test cases I've got handy. The rest is untested so far. I've taken my previous PPC patchsets below: [PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00063.html [PATCH v2 0/8] target/ppc: remove various endian hacks from int_helper.c https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg06149.html and then rebased this patchset on top of them (including a squash of the vsplt fix posted earlier today at https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00287.html) and pushed the result to https://github.com/mcayland/qemu/tree/ppc-altivec-v5.5-rth. Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which I've bisected down to: commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06 Author: Richard Henderson <richard.henderson@linaro.org> Date: Mon Dec 17 22:39:10 2018 -0800 target/ppc: convert vadd*s and vsub*s to vector operations Signed-off-by: Richard Henderson <richard.henderson@linaro.org> So looks like there's something still not quite right with the saturation flag/vector saturation implementation. ATB, Mark.
On 1/4/19 4:31 AM, Mark Cave-Ayland wrote: > Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm > still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which > I've bisected down to: > > > commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06 > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Mon Dec 17 22:39:10 2018 -0800 > > target/ppc: convert vadd*s and vsub*s to vector operations > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > > So looks like there's something still not quite right with the saturation flag/vector > saturation implementation. > Ok, I'll try and set up some RISU tests to track this down next week. r~