Message ID | 20191018194900.34795-1-Richard.Earnshaw@arm.com |
---|---|
Headers | show |
Series | Rewrite DImode arithmetic support | expand |
Hi Richard, On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote: > > This series of patches rewrites all the DImode arithmetic patterns for > the Arm backend when compiling for Arm or Thumb2 to split the > operations during expand (the thumb1 code is unchanged and cannot > benefit from early splitting as we are unable to expose the carry > flag). Very nice :-) I have a bunch of testcases from when I did something similar for PowerPC that I wanted to test... But I cannot get your series to apply. Do you have a git repo I can pull from? Here is one test case (it's a bit geared towards what our ISA can do): === typedef unsigned int u32; typedef unsigned long long u64; u64 add(u64 a, u64 b) { return a + b; } u64 add1(u64 a) { return a + 1; } u64 add42(u64 a) { return a + 42; } u64 addm1(u64 a) { return a - 1; } u64 addff(u64 a) { return a + 0xffffffffULL; } u64 addH(u64 a) { return a + 0x123400005678ULL; } u64 addH0(u64 a) { return a + 0x123400000000ULL; } u64 addS(u64 a, u32 b) { return a + b; } u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); } u64 addB1(u64 a) { return a + 0x100000000ULL; } u64 addB8(u64 a) { return a + 0x800000000ULL; } u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; } u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; } u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xffffffffULL; } === rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and addSHff if I understand well enough. So I'd love to see what it does with your series applied :-) Segher
On 19/10/2019 17:31, Segher Boessenkool wrote: > Hi Richard, > > On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote: >> >> This series of patches rewrites all the DImode arithmetic patterns for >> the Arm backend when compiling for Arm or Thumb2 to split the >> operations during expand (the thumb1 code is unchanged and cannot >> benefit from early splitting as we are unable to expose the carry >> flag). > > Very nice :-) > > I have a bunch of testcases from when I did something similar for PowerPC > that I wanted to test... But I cannot get your series to apply. Do you > have a git repo I can pull from? > Perhaps because it's already committed to trunk? > Here is one test case (it's a bit geared towards what our ISA can do): > > === > typedef unsigned int u32; > typedef unsigned long long u64; > > u64 add(u64 a, u64 b) { return a + b; } > u64 add1(u64 a) { return a + 1; } > u64 add42(u64 a) { return a + 42; } > u64 addm1(u64 a) { return a - 1; } > u64 addff(u64 a) { return a + 0xffffffffULL; } > u64 addH(u64 a) { return a + 0x123400005678ULL; } > u64 addH0(u64 a) { return a + 0x123400000000ULL; } > u64 addS(u64 a, u32 b) { return a + b; } > u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); } > u64 addB1(u64 a) { return a + 0x100000000ULL; } > u64 addB8(u64 a) { return a + 0x800000000ULL; } > > u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; } > u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; } > u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xffffffffULL; } > === > > rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm > has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and > addSHff if I understand well enough. So I'd love to see what it does with > your series applied :-) > > > Segher > We do pretty well on this. Only addSHm1 needs three insns (except where the constant isn't valid for arm), and I think that's the minimum for this case anyway. Several of the tests only need one insn. R. .arch armv8-a .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "lltest.c" .text .align 2 .global add .syntax unified .arm .fpu softvfp .type add, %function add: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r0, r0, r2 adc r1, r1, r3 bx lr .size add, .-add .align 2 .global add1 .syntax unified .arm .fpu softvfp .type add1, %function add1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r0, r0, #1 adc r1, r1, #0 bx lr .size add1, .-add1 .align 2 .global add42 .syntax unified .arm .fpu softvfp .type add42, %function add42: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r0, r0, #42 adc r1, r1, #0 bx lr .size add42, .-add42 .align 2 .global addm1 .syntax unified .arm .fpu softvfp .type addm1, %function addm1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. subs r0, r0, #1 sbc r1, r1, #0 bx lr .size addm1, .-addm1 .align 2 .global addff .syntax unified .arm .fpu softvfp .type addff, %function addff: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. subs r0, r0, #1 adc r1, r1, #0 bx lr .size addff, .-addff .align 2 .global addH .syntax unified .arm .fpu softvfp .type addH, %function addH: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movw r3, #22136 adds r0, r0, r3 movw r3, #4660 adc r1, r3, r1 bx lr .size addH, .-addH .align 2 .global addH0 .syntax unified .arm .fpu softvfp .type addH0, %function addH0: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r1, r1, #4608 add r1, r1, #52 bx lr .size addH0, .-addH0 .align 2 .global addS .syntax unified .arm .fpu softvfp .type addS, %function addS: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r0, r2, r0 adc r1, r1, #0 bx lr .size addS, .-addS .align 2 .global addSH .syntax unified .arm .fpu softvfp .type addSH, %function addSH: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r1, r2, r1 bx lr .size addSH, .-addSH .align 2 .global addB1 .syntax unified .arm .fpu softvfp .type addB1, %function addB1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r1, r1, #1 bx lr .size addB1, .-addB1 .align 2 .global addB8 .syntax unified .arm .fpu softvfp .type addB8, %function addB8: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r1, r1, #8 bx lr .size addB8, .-addB8 .align 2 .global addSH42 .syntax unified .arm .fpu softvfp .type addSH42, %function addSH42: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r0, r0, #42 adc r1, r1, r2 bx lr .size addSH42, .-addSH42 .align 2 .global addSHm1 .syntax unified .arm .fpu softvfp .type addSHm1, %function addSHm1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. subs r0, r0, #1 sbc r1, r1, #0 add r1, r2, r1 bx lr .size addSHm1, .-addSHm1 .align 2 .global addSHff .syntax unified .arm .fpu softvfp .type addSHff, %function addSHff: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. subs r0, r0, #1 adc r1, r1, r2 bx lr .size addSHff, .-addSHff .ident "GCC: (trunk) 10.0.0 20191018 (experimental) [master revision 54f1e150a38:1ddbabe127b:e55e9d95a5ab8397197a5e358ba0185f9471f043]"
On Fri, Oct 18, 2019 at 8:49 PM Richard Earnshaw <Richard.Earnshaw@arm.com> wrote: > > > This series of patches rewrites all the DImode arithmetic patterns for > the Arm backend when compiling for Arm or Thumb2 to split the > operations during expand (the thumb1 code is unchanged and cannot > benefit from early splitting as we are unable to expose the carry > flag). > > This has a number of benefits: > - register allocation has more freedom to use independent > registers for the upper and lower halves of the register > - we can make better use of combine for spotting insn merge > opportunities without needing many additional patterns that are > only used for DImode > - we eliminate a number of bugs in the machine description where > the carry calculations were not correctly propagated by the > split patterns (we mostly got away with this because the > splitting previously happened only after most of the important > optimization passes had been run). > > The patch series starts by paring back all the DImode arithmetic > support to a very simple form without any splitting at all and then > progressively re-implementing the patterns with early split > operations. This proved to be the only sane way of untangling the > existing code due to a number of latent bugs which would have been > exposed if a different approach had been taken. > > Each patch should produce a working compiler (it did when it was > originally written), though since the patch set has been re-ordered > slightly there is a possibility that some of the intermediate steps > may have missing test updates that are only cleaned up later. > However, only the end of the series should be considered complete. > I've kept the patch as a series to permit easier regression hunting > should that prove necessary. Yay ! it's quite nice to see this go in. Ramana > > R. > > Richard Earnshaw (29): > [arm] Rip out DImode addition and subtraction splits. > [arm] Perform early splitting of adddi3. > [arm] Early split zero- and sign-extension > [arm] Rewrite addsi3_carryin_shift_<optab> in canonical form > [arm] fix constraints on addsi3_carryin_alt2 > [arm] Early split subdi3 > [arm] Remove redundant DImode subtract patterns > [arm] Introduce arm_carry_operation > [arm] Correctly cost addition with a carry-in > [arm] Correct cost calculations involving borrow for subtracts. > [arm] Reduce cost of insns that are simple reg-reg moves. > [arm] Implement negscc using SBC when appropriate. > [arm] Add alternative canonicalizations for subtract-with-carry + > shift > [arm] Early split simple DImode equality comparisons > [arm] Improve handling of DImode comparisions against constants. > [arm] early split most DImode comparison operations. > [arm] Handle some constant comparisons using rsbs+rscs > [arm] Cleanup dead code - old support for DImode comparisons > [arm] Handle immediate values in uaddvsi4 > [arm] Early expansion of uaddvdi4. > [arm] Improve code generation for addvsi4. > [arm] Allow the summation result of signed add-with-overflow to be > discarded. > [arm] Early split addvdi4 > [arm] Improve constant handling for usubvsi4. > [arm] Early expansion of usubvdi4. > [arm] Improve constant handling for subvsi4. > [arm] Early expansion of subvdi4 > [arm] Improvements to negvsi4 and negvdi4. > [arm] Fix testsuite nit when compiling for thumb2 > > gcc/config/arm/arm-modes.def | 19 +- > gcc/config/arm/arm-protos.h | 1 + > gcc/config/arm/arm.c | 598 ++++- > gcc/config/arm/arm.md | 2020 ++++++++++------- > gcc/config/arm/iterators.md | 15 +- > gcc/config/arm/predicates.md | 29 +- > gcc/config/arm/thumb2.md | 8 +- > .../gcc.dg/builtin-arith-overflow-3.c | 41 + > gcc/testsuite/gcc.target/arm/negdi-3.c | 4 +- > 9 files changed, 1757 insertions(+), 978 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c >
On Sun, Oct 20, 2019 at 12:21:21PM +0100, Richard Earnshaw (lists) wrote: > On 19/10/2019 17:31, Segher Boessenkool wrote: > > I have a bunch of testcases from when I did something similar for PowerPC > > that I wanted to test... But I cannot get your series to apply. Do you > > have a git repo I can pull from? > > Perhaps because it's already committed to trunk? Oh probably. Duh. Thanks :-) > > u64 addH(u64 a) { return a + 0x123400005678ULL; } > > u64 addH0(u64 a) { return a + 0x123400000000ULL; } (If you change those to 0x3400000078ULL etc. it'll test as meant on arm as well: to see if it uses immediates in the insn where it can. It looks like it'll work fine fwiw). > We do pretty well on this. Only addSHm1 needs three insns (except where > the constant isn't valid for arm), and I think that's the minimum for > this case anyway. Several of the tests only need one insn. Yeah, very nice :-) Segher
On 18/10/2019 21:48, Richard Earnshaw wrote: > Each patch should produce a working compiler (it did when it was > originally written), though since the patch set has been re-ordered > slightly there is a possibility that some of the intermediate steps > may have missing test updates that are only cleaned up later. > However, only the end of the series should be considered complete. > I've kept the patch as a series to permit easier regression hunting > should that prove necessary. Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already). I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful? Christophe
On 21/10/2019 12:51, Christophe Lyon wrote: > On 18/10/2019 21:48, Richard Earnshaw wrote: >> Each patch should produce a working compiler (it did when it was >> originally written), though since the patch set has been re-ordered >> slightly there is a possibility that some of the intermediate steps >> may have missing test updates that are only cleaned up later. >> However, only the end of the series should be considered complete. >> I've kept the patch as a series to permit easier regression hunting >> should that prove necessary. > > Thanks for this information: my validation system was designed in such a > way that it will run the GCC testsuite after each of your patches, so > I'll keep in mind not to report regressions (I've noticed several already). > > > I can perform a manual validation taking your 29 patches as a single one > and compare the results with those of the revision preceding the one > were you committed patch #1. Do you think it would be useful? > > > Christophe > > I think if you can filter out any that are removed by later patches and then report against the patch that caused the regression itself then that would be the best. But I realise that would be more work for you, so a round-up against the combined set would be OK. BTW, I'm aware of an issue with the compiler now generating <alu> reg, reg, shift <reg> in Thumb2; no need to report that again. Thanks, R.
On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: > On 21/10/2019 12:51, Christophe Lyon wrote: >> On 18/10/2019 21:48, Richard Earnshaw wrote: >>> Each patch should produce a working compiler (it did when it was >>> originally written), though since the patch set has been re-ordered >>> slightly there is a possibility that some of the intermediate steps >>> may have missing test updates that are only cleaned up later. >>> However, only the end of the series should be considered complete. >>> I've kept the patch as a series to permit easier regression hunting >>> should that prove necessary. >> >> Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already). >> >> >> I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful? >> >> >> Christophe >> >> > > I think if you can filter out any that are removed by later patches and then report against the patch that caused the regression itself then that would be the best. But I realise that would be more work for you, so a round-up against the combined set would be OK. > > BTW, I'm aware of an issue with the compiler now generating > > <alu> reg, reg, shift <reg> > > in Thumb2; no need to report that again. > > Thanks, > R. > . > Hi Richard, The validation of the whole set shows 1 regression, which was also reported by the validation of r277179 (early split most DImode comparison operations) When GCC is configured as: --target arm-none-eabi --with-mode default --with-cpu default --with-fpu default (that is, no --with-mode, --with-cpu, --with-fpu option) I'm using binutils-2.28 and newlib-3.1.0 I can see: FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test (whatever -std=gnu++XX option) I'm executing the tests using qemu-4.1.0 -cpu arm926 The qemu traces shows that code enters main, then _Znwj (operator new), then _malloc_r The qemu traces end with: IN: _malloc_r^M 0x00019224: e3a00ffe mov r0, #0x3f8^M 0x00019228: e3a0c07f mov ip, #0x7f^M 0x0001922c: e3a0e07e mov lr, #0x7e^M 0x00019230: eafffe41 b #0x18b3c^M ^M R00=00049418 R01=00000000 R02=00000554 R03=00040000^M R04=00000000 R05=08000008 R06=00049418 R07=00000000^M R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M PSR=20000010 --C- A usr32^M ----------------^M IN: _malloc_r^M 0x00018b3c: e59f76f8 ldr r7, [pc, #0x6f8]^M 0x00018b40: e0870000 add r0, r7, r0^M 0x00018b44: e5903004 ldr r3, [r0, #4]^M 0x00018b48: e2400008 sub r0, r0, #8^M 0x00018b4c: e1500003 cmp r0, r3^M 0x00018b50: 1a000005 bne #0x18b6c^M ^M R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M R04=00000000 R05=08000008 R06=00049418 R07=00000000^M R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M PSR=20000010 --C- A usr32^M R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M R04=00000000 R05=08000008 R06=00049418 R07=00049840^M R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M PSR=60000010 -ZC- A usr32^M ----------------^M IN: _malloc_r^M 0x00019120: e1a02a0b lsl r2, fp, #0x14^M 0x00019124: e1a02a22 lsr r2, r2, #0x14^M 0x00019128: e3520000 cmp r2, #0^M 0x0001912c: 1afffee7 bne #0x18cd0^M ^M R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M PSR=60000010 -ZC- A usr32^M ----------------^M IN: _malloc_r^M 0x00019130: e5974008 ldr r4, [r7, #8]^M 0x00019134: e0898008 add r8, sb, r8^M 0x00019138: e3888001 orr r8, r8, #1^M 0x0001913c: e5848004 str r8, [r4, #4]^M 0x00019140: eaffff14 b #0x18d98^M ^M R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M PSR=60000010 -ZC- A usr32^M R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M PSR=60000010 -ZC- A usr32^M Christophe
On 23/10/2019 09:28, Christophe Lyon wrote: > On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: >> On 21/10/2019 12:51, Christophe Lyon wrote: >>> On 18/10/2019 21:48, Richard Earnshaw wrote: >>>> Each patch should produce a working compiler (it did when it was >>>> originally written), though since the patch set has been re-ordered >>>> slightly there is a possibility that some of the intermediate steps >>>> may have missing test updates that are only cleaned up later. >>>> However, only the end of the series should be considered complete. >>>> I've kept the patch as a series to permit easier regression hunting >>>> should that prove necessary. >>> >>> Thanks for this information: my validation system was designed in >>> such a way that it will run the GCC testsuite after each of your >>> patches, so I'll keep in mind not to report regressions (I've noticed >>> several already). >>> >>> >>> I can perform a manual validation taking your 29 patches as a single >>> one and compare the results with those of the revision preceding the >>> one were you committed patch #1. Do you think it would be useful? >>> >>> >>> Christophe >>> >>> >> >> I think if you can filter out any that are removed by later patches >> and then report against the patch that caused the regression itself >> then that would be the best. But I realise that would be more work >> for you, so a round-up against the combined set would be OK. >> >> BTW, I'm aware of an issue with the compiler now generating >> >> <alu> reg, reg, shift <reg> >> >> in Thumb2; no need to report that again. >> >> Thanks, >> R. >> . >> > > > Hi Richard, > > The validation of the whole set shows 1 regression, which was also > reported by the validation of r277179 (early split most DImode > comparison operations) > > When GCC is configured as: > --target arm-none-eabi > --with-mode default > --with-cpu default > --with-fpu default > (that is, no --with-mode, --with-cpu, --with-fpu option) > I'm using binutils-2.28 and newlib-3.1.0 > > I can see: > FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test > (whatever -std=gnu++XX option) That's strange. The assembler code generated for that test is unchanged from before the patch series, so I can't see how it can't be a problem in the test itself. What's more, I can't seem to reproduce this myself. Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r are also unchanged, while the malloc_[un]lock functions are empty stubs (not surprising as we aren't multi-threaded). So the only thing that looks to have really changed are the linker offsets (some of the library code has changed, but I don't think it's really reached in practice, so shouldn't be relevant). > > I'm executing the tests using qemu-4.1.0 -cpu arm926 > The qemu traces shows that code enters main, then _Znwj (operator new), > then _malloc_r > The qemu traces end with: What do you mean by 'end with'? What's the failure mode of the test? A crash, or the test exiting with a failure code? > IN: _malloc_r^M > 0x00019224: e3a00ffe mov r0, #0x3f8^M > 0x00019228: e3a0c07f mov ip, #0x7f^M > 0x0001922c: e3a0e07e mov lr, #0x7e^M > 0x00019230: eafffe41 b #0x18b3c^M > ^M > R00=00049418 R01=00000000 R02=00000554 R03=00040000^M > R04=00000000 R05=08000008 R06=00049418 R07=00000000^M > R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M > R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M > PSR=20000010 --C- A usr32^M > ----------------^M > IN: _malloc_r^M > 0x00018b3c: e59f76f8 ldr r7, [pc, #0x6f8]^M > 0x00018b40: e0870000 add r0, r7, r0^M > 0x00018b44: e5903004 ldr r3, [r0, #4]^M > 0x00018b48: e2400008 sub r0, r0, #8^M > 0x00018b4c: e1500003 cmp r0, r3^M > 0x00018b50: 1a000005 bne #0x18b6c^M But this block neither jumps to, nor falls through to .... > ^M > R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M > R04=00000000 R05=08000008 R06=00049418 R07=00000000^M > R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M > R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M > PSR=20000010 --C- A usr32^M > R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M > R04=00000000 R05=08000008 R06=00049418 R07=00049840^M > R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M > R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M > PSR=60000010 -ZC- A usr32^M > ----------------^M > IN: _malloc_r^M ...here. So there's some trace missing by the looks of it; or some other problem. > 0x00019120: e1a02a0b lsl r2, fp, #0x14^M > 0x00019124: e1a02a22 lsr r2, r2, #0x14^M > 0x00019128: e3520000 cmp r2, #0^M > 0x0001912c: 1afffee7 bne #0x18cd0^M and the same here. > ^M > R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M > R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M > R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M > R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M > PSR=60000010 -ZC- A usr32^M > ----------------^M > IN: _malloc_r^M > 0x00019130: e5974008 ldr r4, [r7, #8]^M > 0x00019134: e0898008 add r8, sb, r8^M > 0x00019138: e3888001 orr r8, r8, #1^M > 0x0001913c: e5848004 str r8, [r4, #4]^M > 0x00019140: eaffff14 b #0x18d98^M > ^M > R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M > R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M > R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M > R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M > PSR=60000010 -ZC- A usr32^M > R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M > R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M > R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M > R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M > PSR=60000010 -ZC- A usr32^M > > Christophe >
On 23/10/2019 15:21, Richard Earnshaw (lists) wrote: > On 23/10/2019 09:28, Christophe Lyon wrote: >> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: >>> On 21/10/2019 12:51, Christophe Lyon wrote: >>>> On 18/10/2019 21:48, Richard Earnshaw wrote: >>>>> Each patch should produce a working compiler (it did when it was >>>>> originally written), though since the patch set has been re-ordered >>>>> slightly there is a possibility that some of the intermediate steps >>>>> may have missing test updates that are only cleaned up later. >>>>> However, only the end of the series should be considered complete. >>>>> I've kept the patch as a series to permit easier regression hunting >>>>> should that prove necessary. >>>> >>>> Thanks for this information: my validation system was designed in such a way that it will run the GCC testsuite after each of your patches, so I'll keep in mind not to report regressions (I've noticed several already). >>>> >>>> >>>> I can perform a manual validation taking your 29 patches as a single one and compare the results with those of the revision preceding the one were you committed patch #1. Do you think it would be useful? >>>> >>>> >>>> Christophe >>>> >>>> >>> >>> I think if you can filter out any that are removed by later patches and then report against the patch that caused the regression itself then that would be the best. But I realise that would be more work for you, so a round-up against the combined set would be OK. >>> >>> BTW, I'm aware of an issue with the compiler now generating >>> >>> <alu> reg, reg, shift <reg> >>> >>> in Thumb2; no need to report that again. >>> >>> Thanks, >>> R. >>> . >>> >> >> >> Hi Richard, >> >> The validation of the whole set shows 1 regression, which was also reported by the validation of r277179 (early split most DImode comparison operations) >> >> When GCC is configured as: >> --target arm-none-eabi >> --with-mode default >> --with-cpu default >> --with-fpu default >> (that is, no --with-mode, --with-cpu, --with-fpu option) >> I'm using binutils-2.28 and newlib-3.1.0 >> >> I can see: >> FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test >> (whatever -std=gnu++XX option) > > That's strange. The assembler code generated for that test is unchanged from before the patch series, so I can't see how it can't be a problem in the test itself. What's more, I can't seem to reproduce this myself. As you have noticed, I have created PR92207 to help understand this. > > Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r are also unchanged, while the malloc_[un]lock functions are empty stubs (not surprising as we aren't multi-threaded). > > So the only thing that looks to have really changed are the linker offsets (some of the library code has changed, but I don't think it's really reached in practice, so shouldn't be relevant). > >> >> I'm executing the tests using qemu-4.1.0 -cpu arm926 >> The qemu traces shows that code enters main, then _Znwj (operator new), then _malloc_r >> The qemu traces end with: > > What do you mean by 'end with'? What's the failure mode of the test? A crash, or the test exiting with a failure code? > qemu complains with: qemu: uncaught target signal 11 (Segmentation fault) - core dumped Segmentation fault (core dumped) 'end with' because my automated validation builds do not keep the full execution traces (that would need too much disk space) >> IN: _malloc_r^M >> 0x00019224: e3a00ffe mov r0, #0x3f8^M >> 0x00019228: e3a0c07f mov ip, #0x7f^M >> 0x0001922c: e3a0e07e mov lr, #0x7e^M >> 0x00019230: eafffe41 b #0x18b3c^M >> ^M >> R00=00049418 R01=00000000 R02=00000554 R03=00040000^M >> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M >> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M >> R12=00000060 R13=fffeb460 R14=00018b14 R15=00019224^M >> PSR=20000010 --C- A usr32^M >> ----------------^M >> IN: _malloc_r^M >> 0x00018b3c: e59f76f8 ldr r7, [pc, #0x6f8]^M >> 0x00018b40: e0870000 add r0, r7, r0^M >> 0x00018b44: e5903004 ldr r3, [r0, #4]^M >> 0x00018b48: e2400008 sub r0, r0, #8^M >> 0x00018b4c: e1500003 cmp r0, r3^M >> 0x00018b50: 1a000005 bne #0x18b6c^M > > But this block neither jumps to, nor falls through to .... >> ^M >> R00=000003f8 R01=00000000 R02=00000554 R03=00040000^M >> R04=00000000 R05=08000008 R06=00049418 R07=00000000^M >> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M >> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b3c^M >> PSR=20000010 --C- A usr32^M >> R00=00049c30 R01=00000000 R02=00000554 R03=00049c30^M >> R04=00000000 R05=08000008 R06=00049418 R07=00049840^M >> R08=00000000 R09=00000000 R10=000492d8 R11=fffeb4b4^M >> R12=0000007f R13=fffeb460 R14=0000007e R15=00018b54^M >> PSR=60000010 -ZC- A usr32^M >> ----------------^M >> IN: _malloc_r^M > > ...here. So there's some trace missing by the looks of it; or some other problem. > >> 0x00019120: e1a02a0b lsl r2, fp, #0x14^M >> 0x00019124: e1a02a22 lsr r2, r2, #0x14^M >> 0x00019128: e3520000 cmp r2, #0^M >> 0x0001912c: 1afffee7 bne #0x18cd0^M > > and the same here. yes, qemu traces are 'incomplete'. > >> ^M >> R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M >> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M >> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M >> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019120^M >> PSR=60000010 -ZC- A usr32^M >> ----------------^M >> IN: _malloc_r^M >> 0x00019130: e5974008 ldr r4, [r7, #8]^M >> 0x00019134: e0898008 add r8, sb, r8^M >> 0x00019138: e3888001 orr r8, r8, #1^M >> 0x0001913c: e5848004 str r8, [r4, #4]^M >> 0x00019140: eaffff14 b #0x18d98^M >> ^M >> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M >> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M >> R08=08001000 R09=00000720 R10=00049e0c R11=0004b000^M >> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00019130^M >> PSR=60000010 -ZC- A usr32^M >> R00=0004b000 R01=08002108 R02=00000000 R03=0004b000^M >> R04=0004a8e0 R05=08000008 R06=00049418 R07=00049840^M >> R08=08001721 R09=00000720 R10=00049e0c R11=0004b000^M >> R12=0000007f R13=fffeb460 R14=00018ca0 R15=00018d98^M >> PSR=60000010 -ZC- A usr32^M >> >> Christophe >> > > . >
On 24/10/2019 11:16, Christophe Lyon wrote: > On 23/10/2019 15:21, Richard Earnshaw (lists) wrote: >> On 23/10/2019 09:28, Christophe Lyon wrote: >>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: >>>> On 21/10/2019 12:51, Christophe Lyon wrote: >>>>> On 18/10/2019 21:48, Richard Earnshaw wrote: >>>>>> Each patch should produce a working compiler (it did when it was >>>>>> originally written), though since the patch set has been re-ordered >>>>>> slightly there is a possibility that some of the intermediate steps >>>>>> may have missing test updates that are only cleaned up later. >>>>>> However, only the end of the series should be considered complete. >>>>>> I've kept the patch as a series to permit easier regression hunting >>>>>> should that prove necessary. >>>>> >>>>> Thanks for this information: my validation system was designed in >>>>> such a way that it will run the GCC testsuite after each of your >>>>> patches, so I'll keep in mind not to report regressions (I've >>>>> noticed several already). >>>>> >>>>> >>>>> I can perform a manual validation taking your 29 patches as a >>>>> single one and compare the results with those of the revision >>>>> preceding the one were you committed patch #1. Do you think it >>>>> would be useful? >>>>> >>>>> >>>>> Christophe >>>>> >>>>> >>>> >>>> I think if you can filter out any that are removed by later patches >>>> and then report against the patch that caused the regression itself >>>> then that would be the best. But I realise that would be more work >>>> for you, so a round-up against the combined set would be OK. >>>> >>>> BTW, I'm aware of an issue with the compiler now generating >>>> >>>> <alu> reg, reg, shift <reg> >>>> >>>> in Thumb2; no need to report that again. >>>> >>>> Thanks, >>>> R. >>>> . >>>> >>> >>> >>> Hi Richard, >>> >>> The validation of the whole set shows 1 regression, which was also >>> reported by the validation of r277179 (early split most DImode >>> comparison operations) >>> >>> When GCC is configured as: >>> --target arm-none-eabi >>> --with-mode default >>> --with-cpu default >>> --with-fpu default >>> (that is, no --with-mode, --with-cpu, --with-fpu option) >>> I'm using binutils-2.28 and newlib-3.1.0 >>> >>> I can see: >>> FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test >>> (whatever -std=gnu++XX option) >> >> That's strange. The assembler code generated for that test is >> unchanged from before the patch series, so I can't see how it can't be >> a problem in the test itself. What's more, I can't seem to reproduce >> this myself. > > As you have noticed, I have created PR92207 to help understand this. > >> >> Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r >> are also unchanged, while the malloc_[un]lock functions are empty >> stubs (not surprising as we aren't multi-threaded). >> >> So the only thing that looks to have really changed are the linker >> offsets (some of the library code has changed, but I don't think it's >> really reached in practice, so shouldn't be relevant). >> >>> >>> I'm executing the tests using qemu-4.1.0 -cpu arm926 >>> The qemu traces shows that code enters main, then _Znwj (operator >>> new), then _malloc_r >>> The qemu traces end with: >> >> What do you mean by 'end with'? What's the failure mode of the test? >> A crash, or the test exiting with a failure code? >> > qemu complains with: > qemu: uncaught target signal 11 (Segmentation fault) - core dumped > Segmentation fault (core dumped) > > 'end with' because my automated validation builds do not keep the full > execution traces (that would need too much disk space) > As I've said in the PR, this looks like a bug in the qemu+newlib code. We call sbrk() which says, OK, but then the page isn't mapped by qemu into the process and it then faults. So I think these changes are off the hook, it's just bad luck that they expose the issue at this point in time. R.
On 24/10/2019 17:10, Richard Earnshaw (lists) wrote: > On 24/10/2019 11:16, Christophe Lyon wrote: >> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote: >>> On 23/10/2019 09:28, Christophe Lyon wrote: >>>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: >>>>> On 21/10/2019 12:51, Christophe Lyon wrote: >>>>>> On 18/10/2019 21:48, Richard Earnshaw wrote: >>>>>>> Each patch should produce a working compiler (it did when it was >>>>>>> originally written), though since the patch set has been re-ordered >>>>>>> slightly there is a possibility that some of the intermediate steps >>>>>>> may have missing test updates that are only cleaned up later. >>>>>>> However, only the end of the series should be considered complete. >>>>>>> I've kept the patch as a series to permit easier regression hunting >>>>>>> should that prove necessary. >>>>>> >>>>>> Thanks for this information: my validation system was designed in >>>>>> such a way that it will run the GCC testsuite after each of your >>>>>> patches, so I'll keep in mind not to report regressions (I've >>>>>> noticed several already). >>>>>> >>>>>> >>>>>> I can perform a manual validation taking your 29 patches as a >>>>>> single one and compare the results with those of the revision >>>>>> preceding the one were you committed patch #1. Do you think it >>>>>> would be useful? >>>>>> >>>>>> >>>>>> Christophe >>>>>> >>>>>> >>>>> >>>>> I think if you can filter out any that are removed by later patches >>>>> and then report against the patch that caused the regression itself >>>>> then that would be the best. But I realise that would be more work >>>>> for you, so a round-up against the combined set would be OK. >>>>> >>>>> BTW, I'm aware of an issue with the compiler now generating >>>>> >>>>> <alu> reg, reg, shift <reg> >>>>> >>>>> in Thumb2; no need to report that again. >>>>> >>>>> Thanks, >>>>> R. >>>>> . >>>>> >>>> >>>> >>>> Hi Richard, >>>> >>>> The validation of the whole set shows 1 regression, which was also >>>> reported by the validation of r277179 (early split most DImode >>>> comparison operations) >>>> >>>> When GCC is configured as: >>>> --target arm-none-eabi >>>> --with-mode default >>>> --with-cpu default >>>> --with-fpu default >>>> (that is, no --with-mode, --with-cpu, --with-fpu option) >>>> I'm using binutils-2.28 and newlib-3.1.0 >>>> >>>> I can see: >>>> FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test >>>> (whatever -std=gnu++XX option) >>> >>> That's strange. The assembler code generated for that test is >>> unchanged from before the patch series, so I can't see how it can't >>> be a problem in the test itself. What's more, I can't seem to >>> reproduce this myself. >> >> As you have noticed, I have created PR92207 to help understand this. >> >>> >>> Similarly, in my build the code for _Znwj, malloc, malloc_r and >>> free_r are also unchanged, while the malloc_[un]lock functions are >>> empty stubs (not surprising as we aren't multi-threaded). >>> >>> So the only thing that looks to have really changed are the linker >>> offsets (some of the library code has changed, but I don't think it's >>> really reached in practice, so shouldn't be relevant). >>> >>>> >>>> I'm executing the tests using qemu-4.1.0 -cpu arm926 >>>> The qemu traces shows that code enters main, then _Znwj (operator >>>> new), then _malloc_r >>>> The qemu traces end with: >>> >>> What do you mean by 'end with'? What's the failure mode of the test? >>> A crash, or the test exiting with a failure code? >>> >> qemu complains with: >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped >> Segmentation fault (core dumped) >> >> 'end with' because my automated validation builds do not keep the full >> execution traces (that would need too much disk space) >> > > As I've said in the PR, this looks like a bug in the qemu+newlib code. > We call sbrk() which says, OK, but then the page isn't mapped by qemu > into the process and it then faults. > > So I think these changes are off the hook, it's just bad luck that they > expose the issue at this point in time. > > R. > I've closed the PR as invalid, because this is a newlib bug that is fixed on trunk. https://sourceware.org/ml/newlib/2019/msg00413.html R.
On Fri, 25 Oct 2019 at 12:08, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote: > > On 24/10/2019 17:10, Richard Earnshaw (lists) wrote: > > On 24/10/2019 11:16, Christophe Lyon wrote: > >> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote: > >>> On 23/10/2019 09:28, Christophe Lyon wrote: > >>>> On 21/10/2019 14:24, Richard Earnshaw (lists) wrote: > >>>>> On 21/10/2019 12:51, Christophe Lyon wrote: > >>>>>> On 18/10/2019 21:48, Richard Earnshaw wrote: > >>>>>>> Each patch should produce a working compiler (it did when it was > >>>>>>> originally written), though since the patch set has been re-ordered > >>>>>>> slightly there is a possibility that some of the intermediate steps > >>>>>>> may have missing test updates that are only cleaned up later. > >>>>>>> However, only the end of the series should be considered complete. > >>>>>>> I've kept the patch as a series to permit easier regression hunting > >>>>>>> should that prove necessary. > >>>>>> > >>>>>> Thanks for this information: my validation system was designed in > >>>>>> such a way that it will run the GCC testsuite after each of your > >>>>>> patches, so I'll keep in mind not to report regressions (I've > >>>>>> noticed several already). > >>>>>> > >>>>>> > >>>>>> I can perform a manual validation taking your 29 patches as a > >>>>>> single one and compare the results with those of the revision > >>>>>> preceding the one were you committed patch #1. Do you think it > >>>>>> would be useful? > >>>>>> > >>>>>> > >>>>>> Christophe > >>>>>> > >>>>>> > >>>>> > >>>>> I think if you can filter out any that are removed by later patches > >>>>> and then report against the patch that caused the regression itself > >>>>> then that would be the best. But I realise that would be more work > >>>>> for you, so a round-up against the combined set would be OK. > >>>>> > >>>>> BTW, I'm aware of an issue with the compiler now generating > >>>>> > >>>>> <alu> reg, reg, shift <reg> > >>>>> > >>>>> in Thumb2; no need to report that again. > >>>>> > >>>>> Thanks, > >>>>> R. > >>>>> . > >>>>> > >>>> > >>>> > >>>> Hi Richard, > >>>> > >>>> The validation of the whole set shows 1 regression, which was also > >>>> reported by the validation of r277179 (early split most DImode > >>>> comparison operations) > >>>> > >>>> When GCC is configured as: > >>>> --target arm-none-eabi > >>>> --with-mode default > >>>> --with-cpu default > >>>> --with-fpu default > >>>> (that is, no --with-mode, --with-cpu, --with-fpu option) > >>>> I'm using binutils-2.28 and newlib-3.1.0 > >>>> > >>>> I can see: > >>>> FAIL: g++.dg/opt/pr36449.C -std=gnu++14 execution test > >>>> (whatever -std=gnu++XX option) > >>> > >>> That's strange. The assembler code generated for that test is > >>> unchanged from before the patch series, so I can't see how it can't > >>> be a problem in the test itself. What's more, I can't seem to > >>> reproduce this myself. > >> > >> As you have noticed, I have created PR92207 to help understand this. > >> > >>> > >>> Similarly, in my build the code for _Znwj, malloc, malloc_r and > >>> free_r are also unchanged, while the malloc_[un]lock functions are > >>> empty stubs (not surprising as we aren't multi-threaded). > >>> > >>> So the only thing that looks to have really changed are the linker > >>> offsets (some of the library code has changed, but I don't think it's > >>> really reached in practice, so shouldn't be relevant). > >>> > >>>> > >>>> I'm executing the tests using qemu-4.1.0 -cpu arm926 > >>>> The qemu traces shows that code enters main, then _Znwj (operator > >>>> new), then _malloc_r > >>>> The qemu traces end with: > >>> > >>> What do you mean by 'end with'? What's the failure mode of the test? > >>> A crash, or the test exiting with a failure code? > >>> > >> qemu complains with: > >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped > >> Segmentation fault (core dumped) > >> > >> 'end with' because my automated validation builds do not keep the full > >> execution traces (that would need too much disk space) > >> > > > > As I've said in the PR, this looks like a bug in the qemu+newlib code. > > We call sbrk() which says, OK, but then the page isn't mapped by qemu > > into the process and it then faults. > > > > So I think these changes are off the hook, it's just bad luck that they > > expose the issue at this point in time. > > > > R. > > > > I've closed the PR as invalid, because this is a newlib bug that is > fixed on trunk. https://sourceware.org/ml/newlib/2019/msg00413.html > Thanks for the analysis. It looks like I have to upgrade the newlib version I'm using for validations. Christophe > R.