Message ID | 20230109135828.879136-1-mark.rutland@arm.com |
---|---|
Headers | show |
Series | arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS | expand |
On Mon, Jan 9, 2023 at 6:06 PM Mark Rutland <mark.rutland@arm.com> wrote: > > Sorry, that is something I had intendeed to do but I hadn't extracted a > reproducer yet. I'll try to come up with something that can be included in the > commit message and reported to GCC folk (and double-check at the same time that > there's not another hidden cause) Yeah, no worries :) I suggested it because from my quick test it didn't appear to be reproducible trivially, so I thought having the reproducer would be nice. > I'm happy to move these, I just wasn't sure what the policy would be w.r.t. the > existing __weak and __cold defitions since those end up depending upon > __function_aligned. > > I assume I should move them all? i.e. move __weak as well? Yeah, with the current policy, all should be moved since their behavior now depends on the config (eventually). Cheers, Miguel
From: Mark Rutland > Sent: 09 January 2023 13:58 > > This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and > enables support for this on arm64. This significantly reduces the > overhead of tracing when a callsite/tracee has a single associated > tracer, avoids a number of issues that make it undesireably and > infeasible to use dynamically-allocated trampolines (e.g. branch range > limitations), and makes it possible to implement support for > DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future. > > The main idea is to give each ftrace callsite an associated pointer to > an ftrace_ops. The architecture's ftrace_caller trampoline can recover > the ops pointer and invoke ops->func from this without needing to use > ftrace_ops_list_func, which has to iterate through all registered ops. > > To do this, we use -fpatchable-function-entry=M,N, there N NOPs are > placed before the function entry point... Doesn't this bump the minimum gcc version up to something like 9.0 ? How does it interact with the 'CFI stuff' that also uses the same area? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Jan 10, 2023 at 08:55:58AM +0000, David Laight wrote: > From: Mark Rutland > > Sent: 09 January 2023 13:58 > > > > This series adds a new DYNAMIC_FTRACE_WITH_CALL_OPS mechanism, and > > enables support for this on arm64. This significantly reduces the > > overhead of tracing when a callsite/tracee has a single associated > > tracer, avoids a number of issues that make it undesireably and > > infeasible to use dynamically-allocated trampolines (e.g. branch range > > limitations), and makes it possible to implement support for > > DYNAMIC_FTRACE_WITH_DIRECT_CALLS in future. > > > > The main idea is to give each ftrace callsite an associated pointer to > > an ftrace_ops. The architecture's ftrace_caller trampoline can recover > > the ops pointer and invoke ops->func from this without needing to use > > ftrace_ops_list_func, which has to iterate through all registered ops. > > > > To do this, we use -fpatchable-function-entry=M,N, there N NOPs are > > placed before the function entry point... > > Doesn't this bump the minimum gcc version up to something like 9.0 ? This doesn't bump the minimum GCC version, but users of older toolchains won't get the speedup. We already support -fpatchable-function-entry based ftrace with GCC 8+ (and this is necessary to play nicely with pointer authentication), for older GCC versions we still support using -pg / mcount. > How does it interact with the 'CFI stuff' that also uses the same area? There's some more detail in patch 8, but the summary is that they're mutually exclusive for now (enforce by Kconfig), and I'm working with others to get improved compiler support necessary for them to play nicely together. Currently LLVM will place the type-hash before the pre-function NOPs, which works if everything has pre-function NOPs, but doesn't work for calls between instrumented and non-instrumented functions, since as the latter don't have pre-function NOPs and the type hash is placed at a different offset. To make them work better together we'll need some improved compiler support, and I'm working with others for that currently. GCC doesn't currently have KCFI support, but the plan is to match whatever LLVM does. Atop that we'll need some trivial changes to the asm function macros, but without the underlying compiler support there's not much point. Thanks, Mark.
On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote: > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h > index 1436fa1cde24d..df18a3446ce82 100644 > --- a/arch/arm64/include/asm/linkage.h > +++ b/arch/arm64/include/asm/linkage.h > @@ -5,8 +5,14 @@ > #include <asm/assembler.h> > #endif > > -#define __ALIGN .align 2 > -#define __ALIGN_STR ".align 2" > +#if CONFIG_FUNCTION_ALIGNMENT > 0 > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT > +#else > +#define ARM64_FUNCTION_ALIGNMENT 4 > +#endif > + > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B and simply removing all these lines and relying on the default behaviour?
On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote: > On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote: > > > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h > > index 1436fa1cde24d..df18a3446ce82 100644 > > --- a/arch/arm64/include/asm/linkage.h > > +++ b/arch/arm64/include/asm/linkage.h > > @@ -5,8 +5,14 @@ > > #include <asm/assembler.h> > > #endif > > > > -#define __ALIGN .align 2 > > -#define __ALIGN_STR ".align 2" > > +#if CONFIG_FUNCTION_ALIGNMENT > 0 > > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT > > +#else > > +#define ARM64_FUNCTION_ALIGNMENT 4 > > +#endif > > + > > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT > > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT > > Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B > and simply removing all these lines and relying on the default > behaviour? There's a proposal (with some rough performance claims) to select FUNCTION_ALIGNMENT_16B over at: https://lore.kernel.org/r/20221208053649.540891-1-almasrymina@google.com so we could just go with that? Will
On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote: > On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote: > > > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h > > index 1436fa1cde24d..df18a3446ce82 100644 > > --- a/arch/arm64/include/asm/linkage.h > > +++ b/arch/arm64/include/asm/linkage.h > > @@ -5,8 +5,14 @@ > > #include <asm/assembler.h> > > #endif > > > > -#define __ALIGN .align 2 > > -#define __ALIGN_STR ".align 2" > > +#if CONFIG_FUNCTION_ALIGNMENT > 0 > > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT > > +#else > > +#define ARM64_FUNCTION_ALIGNMENT 4 > > +#endif > > + > > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT > > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT > > Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B > and simply removing all these lines and relying on the default > behaviour? Yes, it is. I was focussed on obviously retaining the existing semantic by default, and I missed that was possible by selecting FUNCTION_ALIGNMENT_4B. Thanks, Mark.
On Tue, Jan 10, 2023 at 08:43:20PM +0000, Will Deacon wrote: > On Tue, Jan 10, 2023 at 09:35:18PM +0100, Peter Zijlstra wrote: > > On Mon, Jan 09, 2023 at 01:58:23PM +0000, Mark Rutland wrote: > > > > > diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h > > > index 1436fa1cde24d..df18a3446ce82 100644 > > > --- a/arch/arm64/include/asm/linkage.h > > > +++ b/arch/arm64/include/asm/linkage.h > > > @@ -5,8 +5,14 @@ > > > #include <asm/assembler.h> > > > #endif > > > > > > -#define __ALIGN .align 2 > > > -#define __ALIGN_STR ".align 2" > > > +#if CONFIG_FUNCTION_ALIGNMENT > 0 > > > +#define ARM64_FUNCTION_ALIGNMENT CONFIG_FUNCTION_ALIGNMENT > > > +#else > > > +#define ARM64_FUNCTION_ALIGNMENT 4 > > > +#endif > > > + > > > +#define __ALIGN .balign ARM64_FUNCTION_ALIGNMENT > > > +#define __ALIGN_STR ".balign " #ARM64_FUNCTION_ALIGNMENT > > > > Isn't that much the same as having ARM64 select FUNCTION_ALIGNMENT_4B > > and simply removing all these lines and relying on the default > > behaviour? > > There's a proposal (with some rough performance claims) to select > FUNCTION_ALIGNMENT_16B over at: > > https://lore.kernel.org/r/20221208053649.540891-1-almasrymina@google.com > > so we could just go with that? I reckon it'd be worth having that as a separate patch atop, to split the infrastructure from the actual change, but I'm happy to go with 16B immediately if you'd prefer. It'd be nice if we could get some numbers... Thanks, Mark.
On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote: > On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > As far as I can tell, GCC doesn't respect '-falign-functions=N': > > > > * When the __weak__ attribute is used > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > but will respect the '__aligned__(N)' function attribute. Thus, we can > > work around this by explciitly setting the alignment for weak > > functions. > > > > * When the __cold__ attribute is used > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > and also doesn't seem to respect the '__aligned__(N)' function > > attribute. The only way to work around this is to not use the __cold__ > > attibute. > > If you happen to have a reduced case, then it would be nice to link it > in the commit. A bug report to GCC would also be nice. > > I gave it a very quick try in Compiler Explorer, but I couldn't > reproduce it, so I guess it depends on flags, non-trivial functions or > something else. So having spent today coming up with tests, it turns out it's not quite as I described above, but in a sense worse. I'm posting a summary here for posterity; I'll try to get this to compiler folk shortly. GCC appears to not align cold functions to the alignment specified by `-falign-functions=N` when compiling at `-O1` or above. Alignment *can* be restored with explicit attributes on each function, but due to some interprocedural analysis, callees can be implicitly marked as cold (losing their default alignment), which means we don't have a reliable mechanism to ensure functions are always aligned short of annotating *every* function explicitly (and I suspect that's not sufficient due to some interprocedural optimizations). I've tested with the 12.1.0 binary release from the kernel.org cross toolchains page). LLVM always seems to repsect `-falign-functions=N` at both `-O1` and `-O2` (I tested the 10.0.0, 11.0.0, 11.0.1, 15.0.6 binary releases from llvm.org). For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c | #define __cold \ | __attribute__((cold)) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | __cold | void cold_func_a(void) { } | | __cold | void cold_func_b(void) { } | | __cold | void cold_func_c(void) { } | | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-cold.o | | test-cold.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <static_cold_func_a>: | 0: d65f03c0 ret | | 0000000000000004 <static_cold_func_b>: | 4: d65f03c0 ret | | 0000000000000008 <static_cold_func_c>: | 8: d65f03c0 ret | | 000000000000000c <cold_func_a>: | c: d65f03c0 ret | | 0000000000000010 <cold_func_b>: | 10: d65f03c0 ret | | 0000000000000014 <cold_func_c>: | 14: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-cold.o | | test-cold.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 00000018 0000000000000000 0000000000000000 00000040 2**2 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000058 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000070 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000070 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000083 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 00000088 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA In simple cases, alignment *can* be restored if an explicit function attribute is used. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c | #define __aligned(n) \ | __attribute__((aligned(n))) | | #define __cold \ | __attribute__((cold)) __aligned(16) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | __cold | void cold_func_a(void) { } | | __cold | void cold_func_b(void) { } | | __cold | void cold_func_c(void) { } | | static __cold | void static_cold_func_a(void) { } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold.o | | test-aligned-cold.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <static_cold_func_a>: | 0: d65f03c0 ret | 4: d503201f nop | 8: d503201f nop | c: d503201f nop | | 0000000000000010 <static_cold_func_b>: | 10: d65f03c0 ret | 14: d503201f nop | 18: d503201f nop | 1c: d503201f nop | | 0000000000000020 <static_cold_func_c>: | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop | | 0000000000000030 <cold_func_a>: | 30: d65f03c0 ret | 34: d503201f nop | 38: d503201f nop | 3c: d503201f nop | | 0000000000000040 <cold_func_b>: | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop | | 0000000000000050 <cold_func_c>: | 50: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold.o | | test-aligned-cold.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 00000054 0000000000000000 0000000000000000 00000040 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000098 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 000000b0 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 000000b0 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000c3 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000090 0000000000000000 0000000000000000 000000c8 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA Unfortunately it appears that some interprocedural analysis determines that if a callee is only called/referenced from cold callers, the callee is marked as cold, and the alignment it would have got from the command line option is dropped. If it's given an explicit alignment attribute, the alignment is retained. For example: | [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-caller.c | #define noinline \ | __attribute__((noinline)) | | #define __aligned(n) \ | __attribute__((aligned(n))) | | #define __cold \ | __attribute__((cold)) __aligned(16) | | #define EXPORT_FUNC_PTR(func) \ | typeof((func)) *__ptr_##func = (func) | | static noinline void callee_a(void) | { | asm volatile("// callee_a\n" ::: "memory"); | } | | static noinline void callee_b(void) | { | asm volatile("// callee_b\n" ::: "memory"); | } | | static noinline void callee_c(void) | { | asm volatile("// callee_c\n" ::: "memory"); | } | __cold | void cold_func_a(void) { callee_a(); } | | __cold | void cold_func_b(void) { callee_b(); } | | __cold | void cold_func_c(void) { callee_c(); } | | static __cold | void static_cold_func_a(void) { callee_a(); } | EXPORT_FUNC_PTR(static_cold_func_a); | | static __cold | void static_cold_func_b(void) { callee_b(); } | EXPORT_FUNC_PTR(static_cold_func_b); | | static __cold | void static_cold_func_c(void) { callee_c(); } | EXPORT_FUNC_PTR(static_cold_func_c); | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold-caller.c -O1 | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold-caller.o | | test-aligned-cold-caller.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | | 0000000000000000 <callee_a>: | 0: d65f03c0 ret | | 0000000000000004 <callee_b>: | 4: d65f03c0 ret | | 0000000000000008 <callee_c>: | 8: d65f03c0 ret | c: d503201f nop | | 0000000000000010 <static_cold_func_a>: | 10: a9bf7bfd stp x29, x30, [sp, #-16]! | 14: 910003fd mov x29, sp | 18: 97fffffa bl 0 <callee_a> | 1c: a8c17bfd ldp x29, x30, [sp], #16 | 20: d65f03c0 ret | 24: d503201f nop | 28: d503201f nop | 2c: d503201f nop | | 0000000000000030 <static_cold_func_b>: | 30: a9bf7bfd stp x29, x30, [sp, #-16]! | 34: 910003fd mov x29, sp | 38: 97fffff3 bl 4 <callee_b> | 3c: a8c17bfd ldp x29, x30, [sp], #16 | 40: d65f03c0 ret | 44: d503201f nop | 48: d503201f nop | 4c: d503201f nop | | 0000000000000050 <static_cold_func_c>: | 50: a9bf7bfd stp x29, x30, [sp, #-16]! | 54: 910003fd mov x29, sp | 58: 97ffffec bl 8 <callee_c> | 5c: a8c17bfd ldp x29, x30, [sp], #16 | 60: d65f03c0 ret | 64: d503201f nop | 68: d503201f nop | 6c: d503201f nop | | 0000000000000070 <cold_func_a>: | 70: a9bf7bfd stp x29, x30, [sp, #-16]! | 74: 910003fd mov x29, sp | 78: 97ffffe2 bl 0 <callee_a> | 7c: a8c17bfd ldp x29, x30, [sp], #16 | 80: d65f03c0 ret | 84: d503201f nop | 88: d503201f nop | 8c: d503201f nop | | 0000000000000090 <cold_func_b>: | 90: a9bf7bfd stp x29, x30, [sp, #-16]! | 94: 910003fd mov x29, sp | 98: 97ffffdb bl 4 <callee_b> | 9c: a8c17bfd ldp x29, x30, [sp], #16 | a0: d65f03c0 ret | a4: d503201f nop | a8: d503201f nop | ac: d503201f nop | | 00000000000000b0 <cold_func_c>: | b0: a9bf7bfd stp x29, x30, [sp, #-16]! | b4: 910003fd mov x29, sp | b8: 97ffffd4 bl 8 <callee_c> | bc: a8c17bfd ldp x29, x30, [sp], #16 | c0: d65f03c0 ret | [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold-caller.o | | test-aligned-cold-caller.o: file format elf64-littleaarch64 | | Sections: | Idx Name Size VMA LMA File off Algn | 0 .text 000000c4 0000000000000000 0000000000000000 00000040 2**4 | CONTENTS, ALLOC, LOAD, READONLY, CODE | 1 .data 00000018 0000000000000000 0000000000000000 00000108 2**3 | CONTENTS, ALLOC, LOAD, RELOC, DATA | 2 .bss 00000000 0000000000000000 0000000000000000 00000120 2**0 | ALLOC | 3 .comment 00000013 0000000000000000 0000000000000000 00000120 2**0 | CONTENTS, READONLY | 4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000133 2**0 | CONTENTS, READONLY | 5 .eh_frame 00000110 0000000000000000 0000000000000000 00000138 2**3 | CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA Thanks, Mark.
On Wed, Jan 11, 2023 at 06:27:53PM +0000, Mark Rutland wrote: > On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote: > > On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > As far as I can tell, GCC doesn't respect '-falign-functions=N': > > > > > > * When the __weak__ attribute is used > > > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > > but will respect the '__aligned__(N)' function attribute. Thus, we can > > > work around this by explciitly setting the alignment for weak > > > functions. > > > > > > * When the __cold__ attribute is used > > > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > > and also doesn't seem to respect the '__aligned__(N)' function > > > attribute. The only way to work around this is to not use the __cold__ > > > attibute. > > > > If you happen to have a reduced case, then it would be nice to link it > > in the commit. A bug report to GCC would also be nice. > > > > I gave it a very quick try in Compiler Explorer, but I couldn't > > reproduce it, so I guess it depends on flags, non-trivial functions or > > something else. > > So having spent today coming up with tests, it turns out it's not quite as I > described above, but in a sense worse. I'm posting a summary here for > posterity; I'll try to get this to compiler folk shortly. I've added the cold bits to an existing ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 I have not been able to reproduce the issue with __weak__, so I'll go dig into that some more; it's likely I was mistaken there. Thanks, Mark.
On Thu, Jan 12, 2023 at 11:38:17AM +0000, Mark Rutland wrote: > On Wed, Jan 11, 2023 at 06:27:53PM +0000, Mark Rutland wrote: > > On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote: > > > On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > > > > > As far as I can tell, GCC doesn't respect '-falign-functions=N': > > > > > > > > * When the __weak__ attribute is used > > > > > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > > > but will respect the '__aligned__(N)' function attribute. Thus, we can > > > > work around this by explciitly setting the alignment for weak > > > > functions. > > > > > > > > * When the __cold__ attribute is used > > > > > > > > GCC seems to forget the alignment specified by '-falign-functions=N', > > > > and also doesn't seem to respect the '__aligned__(N)' function > > > > attribute. The only way to work around this is to not use the __cold__ > > > > attibute. > > > > > > If you happen to have a reduced case, then it would be nice to link it > > > in the commit. A bug report to GCC would also be nice. > > > > > > I gave it a very quick try in Compiler Explorer, but I couldn't > > > reproduce it, so I guess it depends on flags, non-trivial functions or > > > something else. > > > > So having spent today coming up with tests, it turns out it's not quite as I > > described above, but in a sense worse. I'm posting a summary here for > > posterity; I'll try to get this to compiler folk shortly. > > I've added the cold bits to an existing ticket: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 > > I have not been able to reproduce the issue with __weak__, so I'll go dig into > that some more; it's likely I was mistaken there. It turns out that was a red herring; GCC is actually implicitly marking the abort() function as cold, and as Linux's implementation happened to be marked as weak I assumed that was the culprit. I'll drop the changes to weak and update our abort implementation specifically, with a comment. I'll also go update the ticket above. Thanks, Mark.
On Fri, Jan 13, 2023 at 1:49 PM Mark Rutland <mark.rutland@arm.com> wrote: > > It turns out that was a red herring; GCC is actually implicitly marking the > abort() function as cold, and as Linux's implementation happened to be marked > as weak I assumed that was the culprit. That and your previous message probably explains probably why I couldn't reproduce it. Thanks a lot for all the details -- the `cold` issue is reproducible since gcc 4.6 at least: https://godbolt.org/z/PoxazzT9T The `abort` case appears to happen since gcc 8.1. Cheers, Miguel