Message ID | CAKdteOafjnhgqshyDvk0Q5r_yT4cVgz11PrPi=ye5oOCNhQaHg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 22/09/16 15:11, Nick Clifton wrote: > Hi Christophe, > >> OK? > > Approved - please apply. Just one small nit: > > + PLT stub. If a long branch stub is needed, we'll add a > + Thumb->Arm one and branch directly to the ARM PLT entry. > + Here, we have to check if a pre-PLT Thumb->ARM stub > + is needed and if it will be close enough. > + */ > > Please move the closing */ back to the end of the line above it... > > Cheers > Nick > So this patch got me wondering. If the PLT slot is so far away that we need an indirect jump to get there, why don't we clone the PLT code sequence at the veneer location? As I recall there's nothing architectural about placing all the PLT slots together, it's just more convenient to do that. We could even have the clone in Thumb-2 code if that's appropriate, so that it's compatible with tail calls. R.
Hi, Sorry for the delay, I've just committed the patch as approved by Nick. Tristan, is it OK to backport this patch to 2.27 ? 2.26 ? On 23 September 2016 at 01:25, Alan Modra <amodra@gmail.com> wrote: > On Thu, Sep 22, 2016 at 04:39:30PM +0100, Richard Earnshaw (lists) wrote: >> So this patch got me wondering. If the PLT slot is so far away that we >> need an indirect jump to get there, why don't we clone the PLT code >> sequence at the veneer location? As I recall there's nothing >> architectural about placing all the PLT slots together, it's just more >> convenient to do that. We could even have the clone in Thumb-2 code if >> that's appropriate, so that it's compatible with tail calls. > If there are multiple PLT slots for the same function, doesn't this break the uniqueness of a function pointer? > Yeah, that's what ppc64 does. It does have one downside in that > synthetic plt symbols are not so easy to add. > > /* FIXME: It would be very much nicer to put sym@plt on the > stub rather than on the glink branch table entry. The > objdump disassembler would then use a sensible symbol > name on plt calls. The difficulty in doing so is > a) finding the stubs, and, > b) matching stubs against plt entries, and, > c) there can be multiple stubs for a given plt entry. > > Solving (a) could be done by code scanning, but older > ppc64 binaries used different stubs to current code. > (b) is the tricky one since you need to known the toc > pointer for at least one function that uses a pic stub to > be able to calculate the plt address referenced. > (c) means gdb would need to set multiple breakpoints (or > find the glink branch itself) when setting breakpoints > for pending shared library loads. */ > > We avoid the problem by enabling --emit-stub-syms by default. > > -- > Alan Modra > Australia Development Lab, IBM
On 28/09/16 00:57, Christophe Lyon wrote: > Hi, > > Sorry for the delay, I've just committed the patch as approved by Nick. > > Tristan, is it OK to backport this patch to 2.27 ? 2.26 ? > > On 23 September 2016 at 01:25, Alan Modra <amodra@gmail.com> wrote: >> On Thu, Sep 22, 2016 at 04:39:30PM +0100, Richard Earnshaw (lists) wrote: >>> So this patch got me wondering. If the PLT slot is so far away that we >>> need an indirect jump to get there, why don't we clone the PLT code >>> sequence at the veneer location? As I recall there's nothing >>> architectural about placing all the PLT slots together, it's just more >>> convenient to do that. We could even have the clone in Thumb-2 code if >>> that's appropriate, so that it's compatible with tail calls. >> > > If there are multiple PLT slots for the same function, > doesn't this break the uniqueness of a function pointer? > Yes, you have to ensure that only one plt stub is used for function pointers (this is true globally for the program, so each shared library with a PLT for a function f() has to be able to sort this out), but that's a relatively small issue that's well within the linkers control. Duplicate stubs would be entirely private and can't leak into function pointer addresses. R. >> Yeah, that's what ppc64 does. It does have one downside in that >> synthetic plt symbols are not so easy to add. >> >> /* FIXME: It would be very much nicer to put sym@plt on the >> stub rather than on the glink branch table entry. The >> objdump disassembler would then use a sensible symbol >> name on plt calls. The difficulty in doing so is >> a) finding the stubs, and, >> b) matching stubs against plt entries, and, >> c) there can be multiple stubs for a given plt entry. >> >> Solving (a) could be done by code scanning, but older >> ppc64 binaries used different stubs to current code. >> (b) is the tricky one since you need to known the toc >> pointer for at least one function that uses a pic stub to >> be able to calculate the plt address referenced. >> (c) means gdb would need to set multiple breakpoints (or >> find the glink branch itself) when setting breakpoints >> for pending shared library loads. */ >> >> We avoid the problem by enabling --emit-stub-syms by default. >> >> -- >> Alan Modra >> Australia Development Lab, IBM >
Hi Tristan, Is it OK to backport this patch to 2.27 ? 2.26? Thanks, Christophe On 28 September 2016 at 01:57, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hi, > > Sorry for the delay, I've just committed the patch as approved by Nick. > > Tristan, is it OK to backport this patch to 2.27 ? 2.26 ? > > On 23 September 2016 at 01:25, Alan Modra <amodra@gmail.com> wrote: >> On Thu, Sep 22, 2016 at 04:39:30PM +0100, Richard Earnshaw (lists) wrote: >>> So this patch got me wondering. If the PLT slot is so far away that we >>> need an indirect jump to get there, why don't we clone the PLT code >>> sequence at the veneer location? As I recall there's nothing >>> architectural about placing all the PLT slots together, it's just more >>> convenient to do that. We could even have the clone in Thumb-2 code if >>> that's appropriate, so that it's compatible with tail calls. >> > > If there are multiple PLT slots for the same function, > doesn't this break the uniqueness of a function pointer? > >> Yeah, that's what ppc64 does. It does have one downside in that >> synthetic plt symbols are not so easy to add. >> >> /* FIXME: It would be very much nicer to put sym@plt on the >> stub rather than on the glink branch table entry. The >> objdump disassembler would then use a sensible symbol >> name on plt calls. The difficulty in doing so is >> a) finding the stubs, and, >> b) matching stubs against plt entries, and, >> c) there can be multiple stubs for a given plt entry. >> >> Solving (a) could be done by code scanning, but older >> ppc64 binaries used different stubs to current code. >> (b) is the tricky one since you need to known the toc >> pointer for at least one function that uses a pic stub to >> be able to calculate the plt address referenced. >> (c) means gdb would need to set multiple breakpoints (or >> find the glink branch itself) when setting breakpoints >> for pending shared library loads. */ >> >> We avoid the problem by enabling --emit-stub-syms by default. >> >> -- >> Alan Modra >> Australia Development Lab, IBM
Tristian, Ping for backport to 2.26 / 2.27 ? Thanks, Christophe On 3 October 2016 at 09:06, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hi Tristan, > > Is it OK to backport this patch to 2.27 ? 2.26? > > Thanks, > > Christophe > > > On 28 September 2016 at 01:57, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> Hi, >> >> Sorry for the delay, I've just committed the patch as approved by Nick. >> >> Tristan, is it OK to backport this patch to 2.27 ? 2.26 ? >> >> On 23 September 2016 at 01:25, Alan Modra <amodra@gmail.com> wrote: >>> On Thu, Sep 22, 2016 at 04:39:30PM +0100, Richard Earnshaw (lists) wrote: >>>> So this patch got me wondering. If the PLT slot is so far away that we >>>> need an indirect jump to get there, why don't we clone the PLT code >>>> sequence at the veneer location? As I recall there's nothing >>>> architectural about placing all the PLT slots together, it's just more >>>> convenient to do that. We could even have the clone in Thumb-2 code if >>>> that's appropriate, so that it's compatible with tail calls. >>> >> >> If there are multiple PLT slots for the same function, >> doesn't this break the uniqueness of a function pointer? >> >>> Yeah, that's what ppc64 does. It does have one downside in that >>> synthetic plt symbols are not so easy to add. >>> >>> /* FIXME: It would be very much nicer to put sym@plt on the >>> stub rather than on the glink branch table entry. The >>> objdump disassembler would then use a sensible symbol >>> name on plt calls. The difficulty in doing so is >>> a) finding the stubs, and, >>> b) matching stubs against plt entries, and, >>> c) there can be multiple stubs for a given plt entry. >>> >>> Solving (a) could be done by code scanning, but older >>> ppc64 binaries used different stubs to current code. >>> (b) is the tricky one since you need to known the toc >>> pointer for at least one function that uses a pic stub to >>> be able to calculate the plt address referenced. >>> (c) means gdb would need to set multiple breakpoints (or >>> find the glink branch itself) when setting breakpoints >>> for pending shared library loads. */ >>> >>> We avoid the problem by enabling --emit-stub-syms by default. >>> >>> -- >>> Alan Modra >>> Australia Development Lab, IBM
On 10 October 2016 at 10:32, Tristan Gingold <gingold@adacore.com> wrote: > >> On 10 Oct 2016, at 10:22, Christophe Lyon <christophe.lyon@linaro.org> wrote: >> >> Tristian, >> >> Ping for backport to 2.26 / 2.27 ? > > Yes, that's ok for 2.27 (sorry I missed your mail). > There will be no more 2.26 releases. > OK thanks, done for 2.27. Christophe > Tristan. >
diff --git a/bfd/elf32-arm.c b/bfd/elf32-arm.c index 6e68be1..8c0d716 100644 --- a/bfd/elf32-arm.c +++ b/bfd/elf32-arm.c @@ -3945,17 +3945,44 @@ arm_type_of_stub (struct bfd_link_info *info, /* Note when dealing with PLT entries: the main PLT stub is in ARM mode, so if the branch is in Thumb mode, another Thumb->ARM stub will be inserted later just before the ARM - PLT stub. We don't take this extra distance into account - here, because if a long branch stub is needed, we'll add a - Thumb->Arm one and branch directly to the ARM PLT entry - because it avoids spreading offset corrections in several - places. */ + PLT stub. If a long branch stub is needed, we'll add a + Thumb->Arm one and branch directly to the ARM PLT entry. + Here, we have to check if a pre-PLT Thumb->ARM stub + is needed and if it will be close enough. + */ destination = (splt->output_section->vma + splt->output_offset + root_plt->offset); st_type = STT_FUNC; - branch_type = ST_BRANCH_TO_ARM; + + /* Thumb branch/call to PLT: it can become a branch to ARM + or to Thumb. We must perform the same checks and + corrections as in elf32_arm_final_link_relocate. */ + if ((r_type == R_ARM_THM_CALL) + || (r_type == R_ARM_THM_JUMP24)) + { + if (globals->use_blx + && r_type == R_ARM_THM_CALL + && !thumb_only) + { + /* If the Thumb BLX instruction is available, convert + the BL to a BLX instruction to call the ARM-mode + PLT entry. */ + branch_type = ST_BRANCH_TO_ARM; + } + else + { + if (!thumb_only) + /* Target the Thumb stub before the ARM PLT entry. */ + destination -= PLT_THUMB_STUB_SIZE; + branch_type = ST_BRANCH_TO_THUMB; + } + } + else + { + branch_type = ST_BRANCH_TO_ARM; + } } } /* Calls to STT_GNU_IFUNC symbols should go through a PLT. */ @@ -3991,6 +4018,15 @@ arm_type_of_stub (struct bfd_link_info *info, || (r_type == R_ARM_THM_JUMP19)) && !use_plt)) { + /* If we need to insert a Thumb-Thumb long branch stub to a + PLT, use one that branches directly to the ARM PLT + stub. If we pretended we'd use the pre-PLT Thumb->ARM + stub, undo this now. */ + if ((branch_type == ST_BRANCH_TO_THUMB) && use_plt && !thumb_only) { + branch_type = ST_BRANCH_TO_ARM; + branch_offset += PLT_THUMB_STUB_SIZE; + } + if (branch_type == ST_BRANCH_TO_THUMB) { /* Thumb to thumb. */ diff --git a/ld/testsuite/ld-arm/arm-elf.exp b/ld/testsuite/ld-arm/arm-elf.exp index 24d0b4c..02a35f4 100644 --- a/ld/testsuite/ld-arm/arm-elf.exp +++ b/ld/testsuite/ld-arm/arm-elf.exp @@ -559,6 +559,12 @@ set armeabitests_nonacl { {readelf -Ds farcall-mixed-app.sym}} "farcall-mixed-app-v5"} + {"Mixed ARM/Thumb2 dynamic application with farcalls" "tmpdir/mixed-lib.so -T arm-dyn.ld --section-start .mid_thumb=0x10081c0 --section-start .far_arm=0x2100000 --section-start .far_thumb=0x2200000" "" "" + {farcall-mixed-app2.s} + {{objdump -fdw farcall-mixed-app2.d} {objdump -Rw farcall-mixed-app2.r} + {readelf -Ds farcall-mixed-app2.sym}} + "farcall-mixed-app2"} + {"Mixed ARM/Thumb shared library with long branches (v4t)" "-shared -T arm-lib.ld" "" "-march=armv4t" {farcall-mixed-lib1.s farcall-mixed-lib2.s} {{objdump -fdw farcall-mixed-lib-v4t.d}} diff --git a/ld/testsuite/ld-arm/farcall-mixed-app2.d b/ld/testsuite/ld-arm/farcall-mixed-app2.d new file mode 100644 index 0000000..54338d0 --- /dev/null +++ b/ld/testsuite/ld-arm/farcall-mixed-app2.d @@ -0,0 +1,99 @@ + +tmpdir/farcall-mixed-app2: file format elf32-(little|big)arm +architecture: arm.*, flags 0x00000112: +EXEC_P, HAS_SYMS, D_PAGED +start address 0x.* + +Disassembly of section .plt: + +.* <lib_func2@plt-0x14>: + .*: e52de004 push {lr} ; \(str lr, \[sp, #-4\]!\) + .*: e59fe004 ldr lr, \[pc, #4\] ; .* <lib_func2@plt-0x4> + .*: e08fe00e add lr, pc, lr + .*: e5bef008 ldr pc, \[lr, #8\]! + .*: .* +.* <lib_func2@plt>: + .*: 4778 bx pc + .*: 46c0 nop ; \(mov r8, r8\) + .*: e28fc6.* add ip, pc, #.* + .*: e28cca.* add ip, ip, #.* ; 0x.* + .*: e5bcf.* ldr pc, \[ip, #.*\]!.* +.* <lib_func1@plt>: + .*: e28fc6.* add ip, pc, #.* + .*: e28cca.* add ip, ip, #.* ; 0x.* + .*: e5bcf.* ldr pc, \[ip, #.*\]!.* + +Disassembly of section .text: + +.* <_start>: + .*: e1a0c00d mov ip, sp + .*: e92dd800 push {fp, ip, lr, pc} + .*: eb000008 bl .* <__app_func_veneer> + .*: ebfffff6 bl .* <lib_func1@plt> + .*: ebfffff2 bl .* <lib_func2@plt\+0x4> + .*: e89d6800 ldm sp, {fp, sp, lr} + .*: e12fff1e bx lr + .*: e1a00000 nop ; \(mov r0, r0\) + +.* <app_tfunc_close>: + .*: b500 push {lr} + .*: f7ff efde blx 81e0 <lib_func2@plt\+0x4> + .*: bd00 pop {pc} + .*: 4770 bx lr + .*: 46c0 nop ; \(mov r8, r8\) +#... + +.* <__app_func_veneer>: + .*: e51ff004 ldr pc, \[pc, #-4\] ; 8234 <__app_func_veneer\+0x4> + .*: 02100000 .word 0x02100000 + +Disassembly of section .mid_thumb: + +.* <mid_tfunc>: +#... + .*: f400 9000 b.w .* <lib_func2@plt> + .*: f000 b800 b.w .* <__lib_func2_from_thumb> + +.* <__lib_func2_from_thumb>: + .*: 4778 bx pc + .*: 46c0 nop ; \(mov r8, r8\) + .*: e51ff004 ldr pc, \[pc, #-4\] ; 10081e8 <__lib_func2_from_thumb\+0x8> + .*: 000081e0 .word 0x000081e0 + .*: 00000000 .word 0x00000000 + +Disassembly of section .far_arm: + +.* <app_func>: + .*: e1a0c00d mov ip, sp + .*: e92dd800 push {fp, ip, lr, pc} + .*: eb000006 bl .* <__lib_func1_veneer> + .*: eb000007 bl .* <__lib_func2_veneer> + .*: e89d6800 ldm sp, {fp, sp, lr} + .*: e12fff1e bx lr + .*: e1a00000 nop ; \(mov r0, r0\) + .*: e1a00000 nop ; \(mov r0, r0\) + +.* <app_func2>: + .*: e12fff1e bx lr +#... + +.* <__lib_func1_veneer>: + .*: e51ff004 ldr pc, \[pc, #-4\] ; .* <__lib_func1_veneer\+0x4> + .*: 000081ec .word 0x000081ec +.* <__lib_func2_veneer>: + .*: e51ff004 ldr pc, \[pc, #-4\] ; .* <__lib_func2_veneer\+0x4> + .*: 000081e0 .word 0x000081e0 + +Disassembly of section .far_thumb: + +.* <app_tfunc>: + .*: b500 push {lr} + .*: f000 e806 blx .* <__lib_func2_from_thumb> + .*: bd00 pop {pc} + .*: 4770 bx lr + .*: 46c0 nop ; \(mov r8, r8\) +#... + +.* <__lib_func2_from_thumb>: + .*: e51ff004 ldr pc, \[pc, #-4\] ; 2200014 <__lib_func2_from_thumb\+0x4> + .*: 000081e0 .word 0x000081e0 diff --git a/ld/testsuite/ld-arm/farcall-mixed-app2.r b/ld/testsuite/ld-arm/farcall-mixed-app2.r new file mode 100644 index 0000000..910a361 --- /dev/null +++ b/ld/testsuite/ld-arm/farcall-mixed-app2.r @@ -0,0 +1,10 @@ + +tmpdir/farcall-mixed-app.*: file format elf32-(little|big)arm + +DYNAMIC RELOCATION RECORDS +OFFSET TYPE VALUE +.* R_ARM_COPY data_obj +.* R_ARM_JUMP_SLOT lib_func2 +.* R_ARM_JUMP_SLOT lib_func1 + + diff --git a/ld/testsuite/ld-arm/farcall-mixed-app2.s b/ld/testsuite/ld-arm/farcall-mixed-app2.s new file mode 100644 index 0000000..ee315e9 --- /dev/null +++ b/ld/testsuite/ld-arm/farcall-mixed-app2.s @@ -0,0 +1,76 @@ + .text + .p2align 4 + .globl _start +_start: + mov ip, sp + stmdb sp!, {r11, ip, lr, pc} + bl app_func + bl lib_func1 + bl lib_func2 + ldmia sp, {r11, sp, lr} + bx lr + + .p2align 4 + .globl app_tfunc_close + .type app_tfunc_close,%function + .thumb_func + .code 16 +app_tfunc_close: + push {lr} + bl lib_func2 + pop {pc} + bx lr + +@ We will place the section .mid_thumb at 0xFFFEF8. +@ Just far enough for XXXX + .section .mid_thumb, "xa" + + .p2align 4 + .globl mid_tfunc + .type mid_tfunc,%function + .thumb_func + .code 16 +mid_tfunc: + .syntax unified + .space 24 + b.w lib_func2 + b.w lib_func2 + +@ We will place the section .far_arm at 0x2100000. + .section .far_arm, "xa" + + .arm + .p2align 4 + .globl app_func + .type app_func,%function +app_func: + mov ip, sp + stmdb sp!, {r11, ip, lr, pc} + bl lib_func1 + bl lib_func2 + ldmia sp, {r11, sp, lr} + bx lr + + .arm + .p2align 4 + .globl app_func2 + .type app_func2,%function +app_func2: + bx lr + +@ We will place the section .far_thumb at 0x2200000. + .section .far_thumb, "xa" + + .p2align 4 + .globl app_tfunc + .type app_tfunc,%function + .thumb_func + .code 16 +app_tfunc: + push {lr} + bl lib_func2 + pop {pc} + bx lr + + .data + .long data_obj diff --git a/ld/testsuite/ld-arm/farcall-mixed-app2.sym b/ld/testsuite/ld-arm/farcall-mixed-app2.sym new file mode 100644 index 0000000..1d3bd1d --- /dev/null +++ b/ld/testsuite/ld-arm/farcall-mixed-app2.sym @@ -0,0 +1,15 @@ + +Symbol table for image: + +Num +Buc: +Value +Size +Type +Bind +Vis +Ndx +Name + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +11 _edata + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 __bss_start__ + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 _end + +.. +..: ........ +4 +OBJECT +GLOBAL +DEFAULT +12 data_obj + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 __bss_end__ + +.. +..: 0*[^0]*.* +0 +FUNC +GLOBAL +DEFAULT +UND lib_func1 + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +11 __data_start + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 __end__ + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 __bss_start + +.. +..: .......0 +0 +FUNC +GLOBAL +DEFAULT +15 app_func2 + +.. +..: 0*[^0]*.* +0 +FUNC +GLOBAL +DEFAULT +UND lib_func2 + +.. +..: ........ +0 +NOTYPE +GLOBAL +DEFAULT +12 _bss_end__