Message ID | 20221219153525.632521981@infradead.org |
---|---|
Headers | show |
Series | Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() | expand |
Hi Peter, On Mon, Dec 19, 2022 at 04:35:25PM +0100, Peter Zijlstra wrote: > Hi, > > Since Linus hated on cmpxchg_double(), a few patches to get rid of it, as proposed here: > > https://lkml.kernel.org/r/Y2U3WdU61FvYlpUh@hirez.programming.kicks-ass.net > > based on tip/master because Linus' tree is moving a wee bit fast at the moment. > > 0day robot is all green for building, very limited testing on arm64/s390 > for obvious raisins -- I tried to get the asm right, but please, double > check. > I added some test cases for cmpxcgh128 APIs, and found two issues. I will reply separately in the patches. The test cases themselves are at the end, let me know if you want to me to send a proper patch. Regards, Boqun ------------------------------------------------------------>8 Subject: [PATCH] atomic: Add test cases for cmpxchg128 family Besides for 32bit and 64bit cmpxchg, we only test via atomic_cmpxchg_* APIs, add tests via cmpxchg* APIs while we are at it. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- lib/atomic64_test.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c index d9d170238165..7f79d0704ba8 100644 --- a/lib/atomic64_test.c +++ b/lib/atomic64_test.c @@ -76,12 +76,19 @@ do { \ BUG_ON(atomic##bit##_read(&v) != expect); \ } while (0) +#define TEST_ARGS_PLAIN(_, op, init, ret, expect, args...) \ +do { \ + __WRITE_ONCE(n, init); \ + BUG_ON(op(&n, ##args) != ret); \ + BUG_ON(__READ_ONCE(n) != expect); \ +} while (0) + #define XCHG_FAMILY_TEST(bit, init, new) \ do { \ FAMILY_TEST(TEST_ARGS, bit, xchg, init, init, new, new); \ } while (0) -#define CMPXCHG_FAMILY_TEST(bit, init, new, wrong) \ +#define ATOMIC_CMPXCHG_FAMILY_TEST(bit, init, new, wrong) \ do { \ FAMILY_TEST(TEST_ARGS, bit, cmpxchg, \ init, init, new, init, new); \ @@ -89,6 +96,14 @@ do { \ init, init, init, wrong, new); \ } while (0) +#define CMPXCHG_FAMILY_TEST(bit, init, new, wrong) \ +do { \ + FAMILY_TEST(TEST_ARGS_PLAIN, _, cmpxchg##bit, \ + init, init, new, init, new); \ + FAMILY_TEST(TEST_ARGS_PLAIN, _, cmpxchg##bit, \ + init, init, init, wrong, new); \ +} while (0) + #define INC_RETURN_FAMILY_TEST(bit, i) \ do { \ FAMILY_TEST(TEST_ARGS, bit, inc_return, \ @@ -109,6 +124,7 @@ static __init void test_atomic(void) int one = 1; atomic_t v; + int n; int r; TEST(, add, +=, onestwos); @@ -139,6 +155,7 @@ static __init void test_atomic(void) DEC_RETURN_FAMILY_TEST(, v0); XCHG_FAMILY_TEST(, v0, v1); + ATOMIC_CMPXCHG_FAMILY_TEST(, v0, v1, onestwos); CMPXCHG_FAMILY_TEST(, v0, v1, onestwos); } @@ -155,6 +172,7 @@ static __init void test_atomic64(void) int r_int; atomic64_t v = ATOMIC64_INIT(v0); + long long n = 0; long long r = v0; BUG_ON(v.counter != r); @@ -201,6 +219,7 @@ static __init void test_atomic64(void) DEC_RETURN_FAMILY_TEST(64, v0); XCHG_FAMILY_TEST(64, v0, v1); + ATOMIC_CMPXCHG_FAMILY_TEST(64, v0, v1, v2); CMPXCHG_FAMILY_TEST(64, v0, v1, v2); INIT(v0); @@ -245,10 +264,30 @@ static __init void test_atomic64(void) BUG_ON(!r_int); } +#ifdef system_has_cmpxchg128 +static __init void test_atomic128(void) +{ + long long v0 = 0xaaa31337c001d00dLL; + long long v1 = 0xdeadbeefdeafcafeLL; + long long v2 = 0xfaceabadf00df001LL; + long long v3 = 0x8000000000000000LL; + + s128 init = ((s128)v0 << 64) + v1; + s128 new = ((s128)v1 << 64) + v0; + s128 wrong = ((s128)v2 << 64) + v3; + s128 n = 1; + + CMPXCHG_FAMILY_TEST(128, init, new, wrong); +} +#else +static __init void test_atomic128(void) {} +#endif + static __init int test_atomics_init(void) { test_atomic(); test_atomic64(); + test_atomic128(); #ifdef CONFIG_X86 pr_info("passed for %s platform %s CX8 and %s SSE\n",
Hi! > Introduce [us]128 (when available). Unlike [us]64, ensure they are > always naturally aligned. > > This also enables 128bit wide atomics (which require natural > alignment) such as cmpxchg128(). > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > include/linux/types.h | 5 +++++ > include/uapi/linux/types.h | 4 ++++ > 2 files changed, 9 insertions(+) > > --- a/include/linux/types.h > +++ b/include/linux/types.h > @@ -10,6 +10,11 @@ > #define DECLARE_BITMAP(name,bits) \ > unsigned long name[BITS_TO_LONGS(bits)] > > +#ifdef __SIZEOF_INT128__ > +typedef __s128 s128; > +typedef __u128 u128; > +#endif Should this come as a note here? > Introduce [us]128 (when available). Unlike [us]64, ensure they are > always naturally aligned. BR, Pavel
On Mon, Dec 19, 2022, at 16:35, Peter Zijlstra wrote: > In order to replace cmpxchg_double() with the newly minted > cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg(). > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Does this work on x86 chips without X86_FEATURE_CX16? As far as I can tell, the new percpu_cmpxchg128_op uses the cmpxchg16b instruction unconditionally without checking for the feature bit first, and is now used by this_cpu_cmpxchg() unconditionally as well. This is fine for the moment if the only user is mm/slub.c and that retains the system_has_cmpxchg128() runtime check, but I think a better interface would be to guarantee that this_cpu_cmpxchg() always ends up either in a working inline asm or the generic fallback but not an invalid opcode. For consistency, I would also suggest this_cpu_cmpxchg() to take the same argument types as cmpxchg(): at most 'unsigned long' sized, with additional this_cpu_cmpxchg64() and this_cpu_cmpxchg128() macros that take fixed-size arguments. I have an older patch set that additionally converts all 8-bit and 16-bit cmpxchg()/xchg() calls to cmpxchg_8()/ xchg_8()/cmpxchg_16()/xchg_16() and and leaves only the fixed-32bit and variable typed 'unsigned long' sized callers for the weakly typed variant. Arnd
On Wed, Jan 04, 2023 at 01:09:16PM +0100, Heiko Carstens wrote: > On Mon, Dec 19, 2022 at 04:35:32PM +0100, Peter Zijlstra wrote: > > In order to replace cmpxchg_double() with the newly minted > > cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg(). > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > ... > > --- a/arch/s390/include/asm/percpu.h > > +++ b/arch/s390/include/asm/percpu.h > > +#define this_cpu_cmpxchg_16(pcp, oval, nval) \ > > +({ \ > > + u128 old__ = __pcpu_cast_128((nval), (nval)); \ > > + u128 new__ = __pcpu_cast_128((oval), (oval)); \ > > spot the bug... please merge the below into this patch. D'oh, luckily I got a fresh pile of brown paper bags for xmas.
On Mon, Dec 19, 2022 at 04:35:33PM +0100, Peter Zijlstra wrote: > In order to depricate cmpxchg_double(), replace all its usage with > cmpxchg128(). > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/s390/include/asm/cpu_mf.h | 29 ++++++++++++----- > arch/s390/kernel/perf_cpum_sf.c | 65 +++++++++++++++++++++++++--------------- > 2 files changed, 63 insertions(+), 31 deletions(-) So, Alexander Gordeev reported that this code was already prior to your changes potentially broken with respect to missing READ_ONCE() within the cmpxchg_double() loops. In order to fix that and have a patch that can be backported I would go with something like the patch below, which I would also plan to send for -rc4, unless there are objections. This can then easily be converted to the new cmpxchg128() later.
On Tue, Jan 10, 2023 at 08:23:05AM +0100, Heiko Carstens wrote: > So, Alexander Gordeev reported that this code was already prior to your > changes potentially broken with respect to missing READ_ONCE() within the > cmpxchg_double() loops. Unless there's an early exit, that shouldn't matter. If you managed to read garbage the cmpxchg itself will simply fail and the loop retries. > @@ -1294,12 +1306,16 @@ static void hw_perf_event_update(struct perf_event *event, int flush_all) > num_sdb++; > > /* Reset trailer (using compare-double-and-swap) */ > + /* READ_ONCE() 16 byte header */ > + prev.val = __cdsg(&te->header.val, 0, 0); > do { > + old.val = prev.val; > + new.val = prev.val; > + new.f = 0; > + new.a = 1; > + new.overflow = 0; > + prev.val = __cdsg(&te->header.val, old.val, new.val); > + } while (prev.val != old.val); So this, and > + /* READ_ONCE() 16 byte header */ > + prev.val = __cdsg(&te->header.val, 0, 0); > do { > + old.val = prev.val; > + new.val = prev.val; > + orig_overflow = old.overflow; > + new.f = 0; > + new.overflow = 0; > if (idx == aux->alert_mark) > + new.a = 1; > else > + new.a = 0; > + prev.val = __cdsg(&te->header.val, old.val, new.val); > + } while (prev.val != old.val); this case are just silly and expensive. If that initial read is split and manages to read gibberish the cmpxchg will fail and we retry anyway. > + /* READ_ONCE() 16 byte header */ > + prev.val = __cdsg(&te->header.val, 0, 0); > do { > + old.val = prev.val; > + new.val = prev.val; > + *overflow = old.overflow; > + if (old.f) { > /* > * SDB is already set by hardware. > * Abort and try to set somewhere > @@ -1490,10 +1509,10 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index, > */ > return false; > } > + new.a = 1; > + new.overflow = 0; > + prev.val = __cdsg(&te->header.val, old.val, new.val); > + } while (prev.val != old.val); And while this case has an early exit, it only cares about a single bit (although you made it a full word) and so also shouldn't care. If aux_reset_buffer() returns false, @overflow isn't consumed. So I really don't see the point of this patch.
On Tue, Jan 10, 2023 at 09:32:55AM +0100, Peter Zijlstra wrote: > On Tue, Jan 10, 2023 at 08:23:05AM +0100, Heiko Carstens wrote: > > > So, Alexander Gordeev reported that this code was already prior to your > > changes potentially broken with respect to missing READ_ONCE() within the > > cmpxchg_double() loops. > > Unless there's an early exit, that shouldn't matter. If you managed to > read garbage the cmpxchg itself will simply fail and the loop retries. I don't think that's true; without READ_ONCE() the compiler could (but is very unlikely to) read multiple times, and that could cause problems. For example: | prev = *ptr; | | do { | new = some_function_of(prev); | old = cmpxchg(ptr, prev, new); | } while (old != prev); Could effectively become: | prev1 = *ptr; | prev2 = *ptr; | | do { | new = some_function_of(prev1) | old = cmpxchg(ptr, prev2, new); | } while (old != prev2); ... which would effectively udpate from a stale value, throwing away prev2. That and the two generated reads could be in either order. So I do think it's warranted to use READ_ONCE() for the prev value feeding into a cmpxchg operation, even if that's only for the "once" part rather than lack of tearing. Thanks, Mark.
On Tue, Jan 10, 2023 at 09:32:55AM +0100, Peter Zijlstra wrote: > On Tue, Jan 10, 2023 at 08:23:05AM +0100, Heiko Carstens wrote: > > So, Alexander Gordeev reported that this code was already prior to your > > changes potentially broken with respect to missing READ_ONCE() within the > > cmpxchg_double() loops. > > Unless there's an early exit, that shouldn't matter. If you managed to > read garbage the cmpxchg itself will simply fail and the loop retries. > > > @@ -1294,12 +1306,16 @@ static void hw_perf_event_update(struct perf_event *event, int flush_all) > > num_sdb++; > > > > /* Reset trailer (using compare-double-and-swap) */ > > + /* READ_ONCE() 16 byte header */ > > + prev.val = __cdsg(&te->header.val, 0, 0); > > do { > > + old.val = prev.val; > > + new.val = prev.val; > > + new.f = 0; > > + new.a = 1; > > + new.overflow = 0; > > + prev.val = __cdsg(&te->header.val, old.val, new.val); > > + } while (prev.val != old.val); > > So this, and ... > this case are just silly and expensive. If that initial read is split > and manages to read gibberish the cmpxchg will fail and we retry anyway. While I do agree that there is no need to necessarily read the whole 16 bytes atomically in advance here, there is still the problem about the missing initial READ_ONCE() in the original code. As I tried to outline here: For example: /* Reset trailer (using compare-double-and-swap) */ do { te_flags = te->flags & ~SDB_TE_BUFFER_FULL_MASK; te_flags |= SDB_TE_ALERT_REQ_MASK; } while (!cmpxchg_double(&te->flags, &te->overflow, te->flags, te->overflow, te_flags, 0ULL)); The compiler could generate code where te->flags used within the cmpxchg_double() call may be refetched from memory and which is not necessarily identical to the previous read version which was used to generate te_flags. Which in turn means that an incorrect update could happen. Is there anything that prevents te->flags from being read several times? > > + /* READ_ONCE() 16 byte header */ > > + prev.val = __cdsg(&te->header.val, 0, 0); > > do { > > + old.val = prev.val; > > + new.val = prev.val; > > + *overflow = old.overflow; > > + if (old.f) { > > /* > > * SDB is already set by hardware. > > * Abort and try to set somewhere > > @@ -1490,10 +1509,10 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index, > > */ > > return false; > > } > > + new.a = 1; > > + new.overflow = 0; > > + prev.val = __cdsg(&te->header.val, old.val, new.val); > > + } while (prev.val != old.val); > > And while this case has an early exit, it only cares about a single bit > (although you made it a full word) and so also shouldn't care. If > aux_reset_buffer() returns false, @overflow isn't consumed. Yes, except that it is anything but obvious that @overflow isn't consumed. > So I really don't see the point of this patch. As stated above: READ_ONCE() is missing. And while at it I wanted to have a consistent complete previous value - also considering that cdsg is not very expensive. And while it also reuse the returned values from cdsg, instead of throwing them away and reading from memory again in a splitted and potentially inconsistent way.
On Tue, Jan 10, 2023 at 12:46:44PM +0100, Heiko Carstens wrote: > > > + /* READ_ONCE() 16 byte header */ > > > + prev.val = __cdsg(&te->header.val, 0, 0); > > > do { > > > + old.val = prev.val; > > > + new.val = prev.val; > > > + *overflow = old.overflow; I guess, it would also make sense to place write to overflow after the while loop. So the output variable left intact in case the function bailed out. Not sure if it should be part of this patch though. > > > + if (old.f) { > > > /* > > > * SDB is already set by hardware. > > > * Abort and try to set somewhere > > > @@ -1490,10 +1509,10 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index, > > > */ > > > return false; > > > } > > > + new.a = 1; > > > + new.overflow = 0; > > > + prev.val = __cdsg(&te->header.val, old.val, new.val); > > > + } while (prev.val != old.val); > > > > And while this case has an early exit, it only cares about a single bit > > (although you made it a full word) and so also shouldn't care. If > > aux_reset_buffer() returns false, @overflow isn't consumed. > > Yes, except that it is anything but obvious that @overflow isn't consumed.