From patchwork Fri Jul 25 15:22:51 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Campbell X-Patchwork-Id: 34291 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 19979235AB for ; Fri, 25 Jul 2014 15:25:17 +0000 (UTC) Received: by mail-oi0-f69.google.com with SMTP id h136sf17563061oig.4 for ; Fri, 25 Jul 2014 08:25:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:date:message-id :mime-version:cc:subject:precedence:list-id:list-unsubscribe :list-post:list-help:list-subscribe:sender:errors-to :x-original-sender:x-original-authentication-results:mailing-list :list-archive:content-type:content-transfer-encoding; bh=AEzHiOdz2RipHj8zoY4VJQ1jmO3HPnh27C08mUo7RCE=; b=X1GNEx2ujlAsbMaAQ8vjiUKfykFcy0MTIqA1wIPkA2T6HJRHXcsI2QPNc7/mDW8cVc l/6IF9+J/qELHNSzJXuHx4xZsZFN8tLTH9tr7PGWh2EcrFErT+S1GQemKS7aFTjbG1OC AnC3yZ9wEtgKqJpvC8wP7FocLgSGV8Udpmzrqqu/dGGkKat9xChjOb0PYl5Cv2HlRuK1 w0ZVcGO+qWZyzxs0e87NKMNlYjf9IlOz5TPqSyBTBF1cZwgVB5RooWLTNldK6uzKR4e3 1NmUO2lZ8NseIWMhsfif2MtOetaVZDcpgv/LeEBuhDwxoxDVU0ibyLeveeSOJ6fO1M9+ wIoA== X-Gm-Message-State: ALoCoQl15wh9rVgzN1V8wmNNsv61nvhrD5pZbilhVwCRuaaCzh3kUU7xixDYGJ6u+EwuympBdIkO X-Received: by 10.182.107.135 with SMTP id hc7mr7844358obb.48.1406301916652; Fri, 25 Jul 2014 08:25:16 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.109.36 with SMTP id k33ls1216768qgf.79.gmail; Fri, 25 Jul 2014 08:25:16 -0700 (PDT) X-Received: by 10.220.74.10 with SMTP id s10mr3022402vcj.61.1406301916443; Fri, 25 Jul 2014 08:25:16 -0700 (PDT) Received: from mail-vc0-f179.google.com (mail-vc0-f179.google.com [209.85.220.179]) by mx.google.com with ESMTPS id f7si7692707vcs.34.2014.07.25.08.25.16 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 25 Jul 2014 08:25:16 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.179 as permitted sender) client-ip=209.85.220.179; Received: by mail-vc0-f179.google.com with SMTP id hq11so7741473vcb.10 for ; Fri, 25 Jul 2014 08:25:16 -0700 (PDT) X-Received: by 10.220.118.136 with SMTP id v8mr3237944vcq.50.1406301915933; Fri, 25 Jul 2014 08:25:15 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.37.5 with SMTP id tc5csp46190vcb; Fri, 25 Jul 2014 08:25:15 -0700 (PDT) X-Received: by 10.50.141.199 with SMTP id rq7mr6490641igb.37.1406301915081; Fri, 25 Jul 2014 08:25:15 -0700 (PDT) Received: from lists.xen.org (lists.xen.org. [50.57.142.19]) by mx.google.com with ESMTPS id cm8si22338920icc.80.2014.07.25.08.25.14 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 25 Jul 2014 08:25:15 -0700 (PDT) Received-SPF: none (google.com: xen-devel-bounces@lists.xen.org does not designate permitted sender hosts) client-ip=50.57.142.19; Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XAhKn-0003gD-3L; Fri, 25 Jul 2014 15:23:01 +0000 Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XAhKl-0003fu-Fn for xen-devel@lists.xen.org; Fri, 25 Jul 2014 15:22:59 +0000 Received: from [85.158.143.35:55807] by server-2.bemta-4.messagelabs.com id 2D/53-04525-25672D35; Fri, 25 Jul 2014 15:22:58 +0000 X-Env-Sender: Ian.Campbell@citrix.com X-Msg-Ref: server-13.tower-21.messagelabs.com!1406301775!12908397!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n X-StarScan-Received: X-StarScan-Version: 6.11.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 8948 invoked from network); 25 Jul 2014 15:22:56 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-13.tower-21.messagelabs.com with RC4-SHA encrypted SMTP; 25 Jul 2014 15:22:56 -0000 X-IronPort-AV: E=Sophos;i="5.01,731,1400025600"; d="scan'208";a="156195392" Received: from accessns.citrite.net (HELO FTLPEX01CL01.citrite.net) ([10.9.154.239]) by FTLPIPO02.CITRIX.COM with ESMTP; 25 Jul 2014 15:22:54 +0000 Received: from ukmail1.uk.xensource.com (10.80.16.128) by smtprelay.citrix.com (10.13.107.78) with Microsoft SMTP Server id 14.3.181.6; Fri, 25 Jul 2014 11:22:54 -0400 Received: from drall.uk.xensource.com ([10.80.16.71]) by ukmail1.uk.xensource.com with smtp (Exim 4.69) (envelope-from ) id 1XAhKe-0007aU-Pk; Fri, 25 Jul 2014 16:22:53 +0100 Received: by drall.uk.xensource.com (sSMTP sendmail emulation); Fri, 25 Jul 2014 16:22:52 +0100 From: Ian Campbell To: Date: Fri, 25 Jul 2014 16:22:51 +0100 Message-ID: <80a33cc325055bc9d63e4ef272c5b7f68f8fa812.1406301772.git.ian.campbell@citrix.com> X-Mailer: git-send-email 1.7.10.4 MIME-Version: 1.0 X-DLP: MIA1 Cc: julien.grall@linaro.org, tim@xen.org, Ian Campbell , stefano.stabellini@eu.citrix.com Subject: [Xen-devel] [PATCH 1/2] xen: arm: update arm64 assembly primitives to Linux v3.16-rc6 X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Post: , List-Help: , List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: ian.campbell@citrix.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.179 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Archive: The only really interesting changes here are the updates to mem* which update to actually optimised versions and introduce an optimised memcmp. bitops: No change to the bits we import. Record new baseline. cmpxchg: Import: 60010e5 arm64: cmpxchg: update macros to prevent warnings Author: Mark Hambleton Signed-off-by: Mark Hambleton Signed-off-by: Mark Brown Signed-off-by: Catalin Marinas e1dfda9 arm64: xchg: prevent warning if return value is unused Author: Will Deacon Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas e1dfda9 resolves the warning which previous caused us to skip 60010e508111. Since arm32 and arm64 now differ (as do Linux arm and arm64) here the existing definition in asm/system.h gets moved to asm/arm32/cmpxchg.h. Previously this was shadowing the arm64 one but they happened to be identical. atomics: Import: 8715466 arch,arm64: Convert smp_mb__*() Author: Peter Zijlstra Signed-off-by: Peter Zijlstra This just drops some unused (by us) smp_mb__*_atomic_*. spinlocks: No change. Record new baseline. mem*: Import: 808dbac arm64: lib: Implement optimized memcpy routine Author: zhichang.yuan Signed-off-by: Zhichang Yuan Signed-off-by: Deepak Saxena Signed-off-by: Catalin Marinas 280adc1 arm64: lib: Implement optimized memmove routine Author: zhichang.yuan Signed-off-by: Zhichang Yuan Signed-off-by: Deepak Saxena Signed-off-by: Catalin Marinas b29a51f arm64: lib: Implement optimized memset routine Author: zhichang.yuan Signed-off-by: Zhichang Yuan Signed-off-by: Deepak Saxena Signed-off-by: Catalin Marinas d875c9b arm64: lib: Implement optimized memcmp routine Author: zhichang.yuan Signed-off-by: Zhichang Yuan Signed-off-by: Deepak Saxena Signed-off-by: Catalin Marinas These import various routines from Linaro's Cortex Strings library. Added assembler.h similar to on arm32 to define the various magic symbols which these imported routines depend on (e.g. CPU_LE() and CPU_BE()) str*: No changes. Record new baseline. Correct the paths in the README. *_page: No changes. Record new baseline. README previous said clear_page was unused while clear page was, which was backwards. Signed-off-by: Ian Campbell Acked-by: Julien Grall --- xen/arch/arm/README.LinuxPrimitives | 36 +++-- xen/arch/arm/arm64/lib/Makefile | 2 +- xen/arch/arm/arm64/lib/assembler.h | 13 ++ xen/arch/arm/arm64/lib/memchr.S | 1 + xen/arch/arm/arm64/lib/memcmp.S | 258 +++++++++++++++++++++++++++++++++++ xen/arch/arm/arm64/lib/memcpy.S | 193 +++++++++++++++++++++++--- xen/arch/arm/arm64/lib/memmove.S | 191 ++++++++++++++++++++++---- xen/arch/arm/arm64/lib/memset.S | 208 +++++++++++++++++++++++++--- xen/include/asm-arm/arm32/cmpxchg.h | 3 + xen/include/asm-arm/arm64/atomic.h | 5 - xen/include/asm-arm/arm64/cmpxchg.h | 35 +++-- xen/include/asm-arm/string.h | 5 + xen/include/asm-arm/system.h | 3 - 13 files changed, 844 insertions(+), 109 deletions(-) create mode 100644 xen/arch/arm/arm64/lib/assembler.h create mode 100644 xen/arch/arm/arm64/lib/memcmp.S diff --git a/xen/arch/arm/README.LinuxPrimitives b/xen/arch/arm/README.LinuxPrimitives index 6cd03ca..69eeb70 100644 --- a/xen/arch/arm/README.LinuxPrimitives +++ b/xen/arch/arm/README.LinuxPrimitives @@ -6,29 +6,26 @@ were last updated. arm64: ===================================================================== -bitops: last sync @ v3.14-rc7 (last commit: 8e86f0b) +bitops: last sync @ v3.16-rc6 (last commit: 8715466b6027) linux/arch/arm64/lib/bitops.S xen/arch/arm/arm64/lib/bitops.S linux/arch/arm64/include/asm/bitops.h xen/include/asm-arm/arm64/bitops.h --------------------------------------------------------------------- -cmpxchg: last sync @ v3.14-rc7 (last commit: 95c4189) +cmpxchg: last sync @ v3.16-rc6 (last commit: e1dfda9ced9b) linux/arch/arm64/include/asm/cmpxchg.h xen/include/asm-arm/arm64/cmpxchg.h -Skipped: - 60010e5 arm64: cmpxchg: update macros to prevent warnings - --------------------------------------------------------------------- -atomics: last sync @ v3.14-rc7 (last commit: 95c4189) +atomics: last sync @ v3.16-rc6 (last commit: 8715466b6027) linux/arch/arm64/include/asm/atomic.h xen/include/asm-arm/arm64/atomic.h --------------------------------------------------------------------- -spinlocks: last sync @ v3.14-rc7 (last commit: 95c4189) +spinlocks: last sync @ v3.16-rc6 (last commit: 95c4189689f9) linux/arch/arm64/include/asm/spinlock.h xen/include/asm-arm/arm64/spinlock.h @@ -38,30 +35,31 @@ Skipped: --------------------------------------------------------------------- -mem*: last sync @ v3.14-rc7 (last commit: 4a89922) +mem*: last sync @ v3.16-rc6 (last commit: d875c9b37240) -linux/arch/arm64/lib/memchr.S xen/arch/arm/arm64/lib/memchr.S -linux/arch/arm64/lib/memcpy.S xen/arch/arm/arm64/lib/memcpy.S -linux/arch/arm64/lib/memmove.S xen/arch/arm/arm64/lib/memmove.S -linux/arch/arm64/lib/memset.S xen/arch/arm/arm64/lib/memset.S +linux/arch/arm64/lib/memchr.S xen/arch/arm/arm64/lib/memchr.S +linux/arch/arm64/lib/memcmp.S xen/arch/arm/arm64/lib/memcmp.S +linux/arch/arm64/lib/memcpy.S xen/arch/arm/arm64/lib/memcpy.S +linux/arch/arm64/lib/memmove.S xen/arch/arm/arm64/lib/memmove.S +linux/arch/arm64/lib/memset.S xen/arch/arm/arm64/lib/memset.S -for i in memchr.S memcpy.S memmove.S memset.S ; do +for i in memchr.S memcmp.S memcpy.S memmove.S memset.S ; do diff -u linux/arch/arm64/lib/$i xen/arch/arm/arm64/lib/$i done --------------------------------------------------------------------- -str*: last sync @ v3.14-rc7 (last commit: 2b8cac8) +str*: last sync @ v3.16-rc6 (last commit: 2b8cac814cd5) -linux/arch/arm/lib/strchr.S xen/arch/arm/arm64/lib/strchr.S -linux/arch/arm/lib/strrchr.S xen/arch/arm/arm64/lib/strrchr.S +linux/arch/arm64/lib/strchr.S xen/arch/arm/arm64/lib/strchr.S +linux/arch/arm64/lib/strrchr.S xen/arch/arm/arm64/lib/strrchr.S --------------------------------------------------------------------- -{clear,copy}_page: last sync @ v3.14-rc7 (last commit: f27bb13) +{clear,copy}_page: last sync @ v3.16-rc6 (last commit: f27bb139c387) -linux/arch/arm64/lib/clear_page.S unused in Xen -linux/arch/arm64/lib/copy_page.S xen/arch/arm/arm64/lib/copy_page.S +linux/arch/arm64/lib/clear_page.S xen/arch/arm/arm64/lib/clear_page.S +linux/arch/arm64/lib/copy_page.S unused in Xen ===================================================================== arm32 diff --git a/xen/arch/arm/arm64/lib/Makefile b/xen/arch/arm/arm64/lib/Makefile index b895afa..2e7fb64 100644 --- a/xen/arch/arm/arm64/lib/Makefile +++ b/xen/arch/arm/arm64/lib/Makefile @@ -1,4 +1,4 @@ -obj-y += memcpy.o memmove.o memset.o memchr.o +obj-y += memcpy.o memcmp.o memmove.o memset.o memchr.o obj-y += clear_page.o obj-y += bitops.o find_next_bit.o obj-y += strchr.o strrchr.o diff --git a/xen/arch/arm/arm64/lib/assembler.h b/xen/arch/arm/arm64/lib/assembler.h new file mode 100644 index 0000000..84669d1 --- /dev/null +++ b/xen/arch/arm/arm64/lib/assembler.h @@ -0,0 +1,13 @@ +#ifndef __ASM_ASSEMBLER_H__ +#define __ASM_ASSEMBLER_H__ + +#ifndef __ASSEMBLY__ +#error "Only include this from assembly code" +#endif + +/* Only LE support so far */ +#define CPU_BE(x...) +#define CPU_LE(x...) x + +#endif /* __ASM_ASSEMBLER_H__ */ + diff --git a/xen/arch/arm/arm64/lib/memchr.S b/xen/arch/arm/arm64/lib/memchr.S index 3cc1b01..b04590c 100644 --- a/xen/arch/arm/arm64/lib/memchr.S +++ b/xen/arch/arm/arm64/lib/memchr.S @@ -18,6 +18,7 @@ */ #include +#include "assembler.h" /* * Find a character in an area of memory. diff --git a/xen/arch/arm/arm64/lib/memcmp.S b/xen/arch/arm/arm64/lib/memcmp.S new file mode 100644 index 0000000..9aad925 --- /dev/null +++ b/xen/arch/arm/arm64/lib/memcmp.S @@ -0,0 +1,258 @@ +/* + * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see . + */ + +#include +#include "assembler.h" + +/* +* compare memory areas(when two memory areas' offset are different, +* alignment handled by the hardware) +* +* Parameters: +* x0 - const memory area 1 pointer +* x1 - const memory area 2 pointer +* x2 - the maximal compare byte length +* Returns: +* x0 - a compare result, maybe less than, equal to, or greater than ZERO +*/ + +/* Parameters and result. */ +src1 .req x0 +src2 .req x1 +limit .req x2 +result .req x0 + +/* Internal variables. */ +data1 .req x3 +data1w .req w3 +data2 .req x4 +data2w .req w4 +has_nul .req x5 +diff .req x6 +endloop .req x7 +tmp1 .req x8 +tmp2 .req x9 +tmp3 .req x10 +pos .req x11 +limit_wd .req x12 +mask .req x13 + +ENTRY(memcmp) + cbz limit, .Lret0 + eor tmp1, src1, src2 + tst tmp1, #7 + b.ne .Lmisaligned8 + ands tmp1, src1, #7 + b.ne .Lmutual_align + sub limit_wd, limit, #1 /* limit != 0, so no underflow. */ + lsr limit_wd, limit_wd, #3 /* Convert to Dwords. */ + /* + * The input source addresses are at alignment boundary. + * Directly compare eight bytes each time. + */ +.Lloop_aligned: + ldr data1, [src1], #8 + ldr data2, [src2], #8 +.Lstart_realigned: + subs limit_wd, limit_wd, #1 + eor diff, data1, data2 /* Non-zero if differences found. */ + csinv endloop, diff, xzr, cs /* Last Dword or differences. */ + cbz endloop, .Lloop_aligned + + /* Not reached the limit, must have found a diff. */ + tbz limit_wd, #63, .Lnot_limit + + /* Limit % 8 == 0 => the diff is in the last 8 bytes. */ + ands limit, limit, #7 + b.eq .Lnot_limit + /* + * The remained bytes less than 8. It is needed to extract valid data + * from last eight bytes of the intended memory range. + */ + lsl limit, limit, #3 /* bytes-> bits. */ + mov mask, #~0 +CPU_BE( lsr mask, mask, limit ) +CPU_LE( lsl mask, mask, limit ) + bic data1, data1, mask + bic data2, data2, mask + + orr diff, diff, mask + b .Lnot_limit + +.Lmutual_align: + /* + * Sources are mutually aligned, but are not currently at an + * alignment boundary. Round down the addresses and then mask off + * the bytes that precede the start point. + */ + bic src1, src1, #7 + bic src2, src2, #7 + ldr data1, [src1], #8 + ldr data2, [src2], #8 + /* + * We can not add limit with alignment offset(tmp1) here. Since the + * addition probably make the limit overflown. + */ + sub limit_wd, limit, #1/*limit != 0, so no underflow.*/ + and tmp3, limit_wd, #7 + lsr limit_wd, limit_wd, #3 + add tmp3, tmp3, tmp1 + add limit_wd, limit_wd, tmp3, lsr #3 + add limit, limit, tmp1/* Adjust the limit for the extra. */ + + lsl tmp1, tmp1, #3/* Bytes beyond alignment -> bits.*/ + neg tmp1, tmp1/* Bits to alignment -64. */ + mov tmp2, #~0 + /*mask off the non-intended bytes before the start address.*/ +CPU_BE( lsl tmp2, tmp2, tmp1 )/*Big-endian.Early bytes are at MSB*/ + /* Little-endian. Early bytes are at LSB. */ +CPU_LE( lsr tmp2, tmp2, tmp1 ) + + orr data1, data1, tmp2 + orr data2, data2, tmp2 + b .Lstart_realigned + + /*src1 and src2 have different alignment offset.*/ +.Lmisaligned8: + cmp limit, #8 + b.lo .Ltiny8proc /*limit < 8: compare byte by byte*/ + + and tmp1, src1, #7 + neg tmp1, tmp1 + add tmp1, tmp1, #8/*valid length in the first 8 bytes of src1*/ + and tmp2, src2, #7 + neg tmp2, tmp2 + add tmp2, tmp2, #8/*valid length in the first 8 bytes of src2*/ + subs tmp3, tmp1, tmp2 + csel pos, tmp1, tmp2, hi /*Choose the maximum.*/ + + sub limit, limit, pos + /*compare the proceeding bytes in the first 8 byte segment.*/ +.Ltinycmp: + ldrb data1w, [src1], #1 + ldrb data2w, [src2], #1 + subs pos, pos, #1 + ccmp data1w, data2w, #0, ne /* NZCV = 0b0000. */ + b.eq .Ltinycmp + cbnz pos, 1f /*diff occurred before the last byte.*/ + cmp data1w, data2w + b.eq .Lstart_align +1: + sub result, data1, data2 + ret + +.Lstart_align: + lsr limit_wd, limit, #3 + cbz limit_wd, .Lremain8 + + ands xzr, src1, #7 + b.eq .Lrecal_offset + /*process more leading bytes to make src1 aligned...*/ + add src1, src1, tmp3 /*backwards src1 to alignment boundary*/ + add src2, src2, tmp3 + sub limit, limit, tmp3 + lsr limit_wd, limit, #3 + cbz limit_wd, .Lremain8 + /*load 8 bytes from aligned SRC1..*/ + ldr data1, [src1], #8 + ldr data2, [src2], #8 + + subs limit_wd, limit_wd, #1 + eor diff, data1, data2 /*Non-zero if differences found.*/ + csinv endloop, diff, xzr, ne + cbnz endloop, .Lunequal_proc + /*How far is the current SRC2 from the alignment boundary...*/ + and tmp3, tmp3, #7 + +.Lrecal_offset:/*src1 is aligned now..*/ + neg pos, tmp3 +.Lloopcmp_proc: + /* + * Divide the eight bytes into two parts. First,backwards the src2 + * to an alignment boundary,load eight bytes and compare from + * the SRC2 alignment boundary. If all 8 bytes are equal,then start + * the second part's comparison. Otherwise finish the comparison. + * This special handle can garantee all the accesses are in the + * thread/task space in avoid to overrange access. + */ + ldr data1, [src1,pos] + ldr data2, [src2,pos] + eor diff, data1, data2 /* Non-zero if differences found. */ + cbnz diff, .Lnot_limit + + /*The second part process*/ + ldr data1, [src1], #8 + ldr data2, [src2], #8 + eor diff, data1, data2 /* Non-zero if differences found. */ + subs limit_wd, limit_wd, #1 + csinv endloop, diff, xzr, ne/*if limit_wd is 0,will finish the cmp*/ + cbz endloop, .Lloopcmp_proc +.Lunequal_proc: + cbz diff, .Lremain8 + +/*There is differnence occured in the latest comparison.*/ +.Lnot_limit: +/* +* For little endian,reverse the low significant equal bits into MSB,then +* following CLZ can find how many equal bits exist. +*/ +CPU_LE( rev diff, diff ) +CPU_LE( rev data1, data1 ) +CPU_LE( rev data2, data2 ) + + /* + * The MS-non-zero bit of DIFF marks either the first bit + * that is different, or the end of the significant data. + * Shifting left now will bring the critical information into the + * top bits. + */ + clz pos, diff + lsl data1, data1, pos + lsl data2, data2, pos + /* + * We need to zero-extend (char is unsigned) the value and then + * perform a signed subtraction. + */ + lsr data1, data1, #56 + sub result, data1, data2, lsr #56 + ret + +.Lremain8: + /* Limit % 8 == 0 =>. all data are equal.*/ + ands limit, limit, #7 + b.eq .Lret0 + +.Ltiny8proc: + ldrb data1w, [src1], #1 + ldrb data2w, [src2], #1 + subs limit, limit, #1 + + ccmp data1w, data2w, #0, ne /* NZCV = 0b0000. */ + b.eq .Ltiny8proc + sub result, data1, data2 + ret +.Lret0: + mov result, #0 + ret +ENDPROC(memcmp) diff --git a/xen/arch/arm/arm64/lib/memcpy.S b/xen/arch/arm/arm64/lib/memcpy.S index c8197c6..7cc885d 100644 --- a/xen/arch/arm/arm64/lib/memcpy.S +++ b/xen/arch/arm/arm64/lib/memcpy.S @@ -1,5 +1,13 @@ /* * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -15,6 +23,8 @@ */ #include +#include +#include "assembler.h" /* * Copy a buffer from src to dest (alignment handled by the hardware) @@ -26,27 +36,166 @@ * Returns: * x0 - dest */ +dstin .req x0 +src .req x1 +count .req x2 +tmp1 .req x3 +tmp1w .req w3 +tmp2 .req x4 +tmp2w .req w4 +tmp3 .req x5 +tmp3w .req w5 +dst .req x6 + +A_l .req x7 +A_h .req x8 +B_l .req x9 +B_h .req x10 +C_l .req x11 +C_h .req x12 +D_l .req x13 +D_h .req x14 + ENTRY(memcpy) - mov x4, x0 - subs x2, x2, #8 - b.mi 2f -1: ldr x3, [x1], #8 - subs x2, x2, #8 - str x3, [x4], #8 - b.pl 1b -2: adds x2, x2, #4 - b.mi 3f - ldr w3, [x1], #4 - sub x2, x2, #4 - str w3, [x4], #4 -3: adds x2, x2, #2 - b.mi 4f - ldrh w3, [x1], #2 - sub x2, x2, #2 - strh w3, [x4], #2 -4: adds x2, x2, #1 - b.mi 5f - ldrb w3, [x1] - strb w3, [x4] -5: ret + mov dst, dstin + cmp count, #16 + /*When memory length is less than 16, the accessed are not aligned.*/ + b.lo .Ltiny15 + + neg tmp2, src + ands tmp2, tmp2, #15/* Bytes to reach alignment. */ + b.eq .LSrcAligned + sub count, count, tmp2 + /* + * Copy the leading memory data from src to dst in an increasing + * address order.By this way,the risk of overwritting the source + * memory data is eliminated when the distance between src and + * dst is less than 16. The memory accesses here are alignment. + */ + tbz tmp2, #0, 1f + ldrb tmp1w, [src], #1 + strb tmp1w, [dst], #1 +1: + tbz tmp2, #1, 2f + ldrh tmp1w, [src], #2 + strh tmp1w, [dst], #2 +2: + tbz tmp2, #2, 3f + ldr tmp1w, [src], #4 + str tmp1w, [dst], #4 +3: + tbz tmp2, #3, .LSrcAligned + ldr tmp1, [src],#8 + str tmp1, [dst],#8 + +.LSrcAligned: + cmp count, #64 + b.ge .Lcpy_over64 + /* + * Deal with small copies quickly by dropping straight into the + * exit block. + */ +.Ltail63: + /* + * Copy up to 48 bytes of data. At this point we only need the + * bottom 6 bits of count to be accurate. + */ + ands tmp1, count, #0x30 + b.eq .Ltiny15 + cmp tmp1w, #0x20 + b.eq 1f + b.lt 2f + ldp A_l, A_h, [src], #16 + stp A_l, A_h, [dst], #16 +1: + ldp A_l, A_h, [src], #16 + stp A_l, A_h, [dst], #16 +2: + ldp A_l, A_h, [src], #16 + stp A_l, A_h, [dst], #16 +.Ltiny15: + /* + * Prefer to break one ldp/stp into several load/store to access + * memory in an increasing address order,rather than to load/store 16 + * bytes from (src-16) to (dst-16) and to backward the src to aligned + * address,which way is used in original cortex memcpy. If keeping + * the original memcpy process here, memmove need to satisfy the + * precondition that src address is at least 16 bytes bigger than dst + * address,otherwise some source data will be overwritten when memove + * call memcpy directly. To make memmove simpler and decouple the + * memcpy's dependency on memmove, withdrew the original process. + */ + tbz count, #3, 1f + ldr tmp1, [src], #8 + str tmp1, [dst], #8 +1: + tbz count, #2, 2f + ldr tmp1w, [src], #4 + str tmp1w, [dst], #4 +2: + tbz count, #1, 3f + ldrh tmp1w, [src], #2 + strh tmp1w, [dst], #2 +3: + tbz count, #0, .Lexitfunc + ldrb tmp1w, [src] + strb tmp1w, [dst] + +.Lexitfunc: + ret + +.Lcpy_over64: + subs count, count, #128 + b.ge .Lcpy_body_large + /* + * Less than 128 bytes to copy, so handle 64 here and then jump + * to the tail. + */ + ldp A_l, A_h, [src],#16 + stp A_l, A_h, [dst],#16 + ldp B_l, B_h, [src],#16 + ldp C_l, C_h, [src],#16 + stp B_l, B_h, [dst],#16 + stp C_l, C_h, [dst],#16 + ldp D_l, D_h, [src],#16 + stp D_l, D_h, [dst],#16 + + tst count, #0x3f + b.ne .Ltail63 + ret + + /* + * Critical loop. Start at a new cache line boundary. Assuming + * 64 bytes per line this ensures the entire loop is in one line. + */ + .p2align L1_CACHE_SHIFT +.Lcpy_body_large: + /* pre-get 64 bytes data. */ + ldp A_l, A_h, [src],#16 + ldp B_l, B_h, [src],#16 + ldp C_l, C_h, [src],#16 + ldp D_l, D_h, [src],#16 +1: + /* + * interlace the load of next 64 bytes data block with store of the last + * loaded 64 bytes data. + */ + stp A_l, A_h, [dst],#16 + ldp A_l, A_h, [src],#16 + stp B_l, B_h, [dst],#16 + ldp B_l, B_h, [src],#16 + stp C_l, C_h, [dst],#16 + ldp C_l, C_h, [src],#16 + stp D_l, D_h, [dst],#16 + ldp D_l, D_h, [src],#16 + subs count, count, #64 + b.ge 1b + stp A_l, A_h, [dst],#16 + stp B_l, B_h, [dst],#16 + stp C_l, C_h, [dst],#16 + stp D_l, D_h, [dst],#16 + + tst count, #0x3f + b.ne .Ltail63 + ret ENDPROC(memcpy) diff --git a/xen/arch/arm/arm64/lib/memmove.S b/xen/arch/arm/arm64/lib/memmove.S index 1bf0936..f4065b9 100644 --- a/xen/arch/arm/arm64/lib/memmove.S +++ b/xen/arch/arm/arm64/lib/memmove.S @@ -1,5 +1,13 @@ /* * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -15,6 +23,8 @@ */ #include +#include +#include "assembler.h" /* * Move a buffer from src to test (alignment handled by the hardware). @@ -27,30 +37,161 @@ * Returns: * x0 - dest */ +dstin .req x0 +src .req x1 +count .req x2 +tmp1 .req x3 +tmp1w .req w3 +tmp2 .req x4 +tmp2w .req w4 +tmp3 .req x5 +tmp3w .req w5 +dst .req x6 + +A_l .req x7 +A_h .req x8 +B_l .req x9 +B_h .req x10 +C_l .req x11 +C_h .req x12 +D_l .req x13 +D_h .req x14 + ENTRY(memmove) - cmp x0, x1 - b.ls memcpy - add x4, x0, x2 - add x1, x1, x2 - subs x2, x2, #8 - b.mi 2f -1: ldr x3, [x1, #-8]! - subs x2, x2, #8 - str x3, [x4, #-8]! - b.pl 1b -2: adds x2, x2, #4 - b.mi 3f - ldr w3, [x1, #-4]! - sub x2, x2, #4 - str w3, [x4, #-4]! -3: adds x2, x2, #2 - b.mi 4f - ldrh w3, [x1, #-2]! - sub x2, x2, #2 - strh w3, [x4, #-2]! -4: adds x2, x2, #1 - b.mi 5f - ldrb w3, [x1, #-1] - strb w3, [x4, #-1] -5: ret + cmp dstin, src + b.lo memcpy + add tmp1, src, count + cmp dstin, tmp1 + b.hs memcpy /* No overlap. */ + + add dst, dstin, count + add src, src, count + cmp count, #16 + b.lo .Ltail15 /*probably non-alignment accesses.*/ + + ands tmp2, src, #15 /* Bytes to reach alignment. */ + b.eq .LSrcAligned + sub count, count, tmp2 + /* + * process the aligned offset length to make the src aligned firstly. + * those extra instructions' cost is acceptable. It also make the + * coming accesses are based on aligned address. + */ + tbz tmp2, #0, 1f + ldrb tmp1w, [src, #-1]! + strb tmp1w, [dst, #-1]! +1: + tbz tmp2, #1, 2f + ldrh tmp1w, [src, #-2]! + strh tmp1w, [dst, #-2]! +2: + tbz tmp2, #2, 3f + ldr tmp1w, [src, #-4]! + str tmp1w, [dst, #-4]! +3: + tbz tmp2, #3, .LSrcAligned + ldr tmp1, [src, #-8]! + str tmp1, [dst, #-8]! + +.LSrcAligned: + cmp count, #64 + b.ge .Lcpy_over64 + + /* + * Deal with small copies quickly by dropping straight into the + * exit block. + */ +.Ltail63: + /* + * Copy up to 48 bytes of data. At this point we only need the + * bottom 6 bits of count to be accurate. + */ + ands tmp1, count, #0x30 + b.eq .Ltail15 + cmp tmp1w, #0x20 + b.eq 1f + b.lt 2f + ldp A_l, A_h, [src, #-16]! + stp A_l, A_h, [dst, #-16]! +1: + ldp A_l, A_h, [src, #-16]! + stp A_l, A_h, [dst, #-16]! +2: + ldp A_l, A_h, [src, #-16]! + stp A_l, A_h, [dst, #-16]! + +.Ltail15: + tbz count, #3, 1f + ldr tmp1, [src, #-8]! + str tmp1, [dst, #-8]! +1: + tbz count, #2, 2f + ldr tmp1w, [src, #-4]! + str tmp1w, [dst, #-4]! +2: + tbz count, #1, 3f + ldrh tmp1w, [src, #-2]! + strh tmp1w, [dst, #-2]! +3: + tbz count, #0, .Lexitfunc + ldrb tmp1w, [src, #-1] + strb tmp1w, [dst, #-1] + +.Lexitfunc: + ret + +.Lcpy_over64: + subs count, count, #128 + b.ge .Lcpy_body_large + /* + * Less than 128 bytes to copy, so handle 64 bytes here and then jump + * to the tail. + */ + ldp A_l, A_h, [src, #-16] + stp A_l, A_h, [dst, #-16] + ldp B_l, B_h, [src, #-32] + ldp C_l, C_h, [src, #-48] + stp B_l, B_h, [dst, #-32] + stp C_l, C_h, [dst, #-48] + ldp D_l, D_h, [src, #-64]! + stp D_l, D_h, [dst, #-64]! + + tst count, #0x3f + b.ne .Ltail63 + ret + + /* + * Critical loop. Start at a new cache line boundary. Assuming + * 64 bytes per line this ensures the entire loop is in one line. + */ + .p2align L1_CACHE_SHIFT +.Lcpy_body_large: + /* pre-load 64 bytes data. */ + ldp A_l, A_h, [src, #-16] + ldp B_l, B_h, [src, #-32] + ldp C_l, C_h, [src, #-48] + ldp D_l, D_h, [src, #-64]! +1: + /* + * interlace the load of next 64 bytes data block with store of the last + * loaded 64 bytes data. + */ + stp A_l, A_h, [dst, #-16] + ldp A_l, A_h, [src, #-16] + stp B_l, B_h, [dst, #-32] + ldp B_l, B_h, [src, #-32] + stp C_l, C_h, [dst, #-48] + ldp C_l, C_h, [src, #-48] + stp D_l, D_h, [dst, #-64]! + ldp D_l, D_h, [src, #-64]! + subs count, count, #64 + b.ge 1b + stp A_l, A_h, [dst, #-16] + stp B_l, B_h, [dst, #-32] + stp C_l, C_h, [dst, #-48] + stp D_l, D_h, [dst, #-64]! + + tst count, #0x3f + b.ne .Ltail63 + ret ENDPROC(memmove) diff --git a/xen/arch/arm/arm64/lib/memset.S b/xen/arch/arm/arm64/lib/memset.S index 25a4fb6..4ee714d 100644 --- a/xen/arch/arm/arm64/lib/memset.S +++ b/xen/arch/arm/arm64/lib/memset.S @@ -1,5 +1,13 @@ /* * Copyright (C) 2013 ARM Ltd. + * Copyright (C) 2013 Linaro. + * + * This code is based on glibc cortex strings work originally authored by Linaro + * and re-licensed under GPLv2 for the Linux kernel. The original code can + * be found @ + * + * http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/ + * files/head:/src/aarch64/ * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -15,6 +23,8 @@ */ #include +#include +#include "assembler.h" /* * Fill in the buffer with character c (alignment handled by the hardware) @@ -26,27 +36,181 @@ * Returns: * x0 - buf */ + +dstin .req x0 +val .req w1 +count .req x2 +tmp1 .req x3 +tmp1w .req w3 +tmp2 .req x4 +tmp2w .req w4 +zva_len_x .req x5 +zva_len .req w5 +zva_bits_x .req x6 + +A_l .req x7 +A_lw .req w7 +dst .req x8 +tmp3w .req w9 +tmp3 .req x9 + ENTRY(memset) - mov x4, x0 - and w1, w1, #0xff - orr w1, w1, w1, lsl #8 - orr w1, w1, w1, lsl #16 - orr x1, x1, x1, lsl #32 - subs x2, x2, #8 - b.mi 2f -1: str x1, [x4], #8 - subs x2, x2, #8 - b.pl 1b -2: adds x2, x2, #4 - b.mi 3f - sub x2, x2, #4 - str w1, [x4], #4 -3: adds x2, x2, #2 - b.mi 4f - sub x2, x2, #2 - strh w1, [x4], #2 -4: adds x2, x2, #1 - b.mi 5f - strb w1, [x4] -5: ret + mov dst, dstin /* Preserve return value. */ + and A_lw, val, #255 + orr A_lw, A_lw, A_lw, lsl #8 + orr A_lw, A_lw, A_lw, lsl #16 + orr A_l, A_l, A_l, lsl #32 + + cmp count, #15 + b.hi .Lover16_proc + /*All store maybe are non-aligned..*/ + tbz count, #3, 1f + str A_l, [dst], #8 +1: + tbz count, #2, 2f + str A_lw, [dst], #4 +2: + tbz count, #1, 3f + strh A_lw, [dst], #2 +3: + tbz count, #0, 4f + strb A_lw, [dst] +4: + ret + +.Lover16_proc: + /*Whether the start address is aligned with 16.*/ + neg tmp2, dst + ands tmp2, tmp2, #15 + b.eq .Laligned +/* +* The count is not less than 16, we can use stp to store the start 16 bytes, +* then adjust the dst aligned with 16.This process will make the current +* memory address at alignment boundary. +*/ + stp A_l, A_l, [dst] /*non-aligned store..*/ + /*make the dst aligned..*/ + sub count, count, tmp2 + add dst, dst, tmp2 + +.Laligned: + cbz A_l, .Lzero_mem + +.Ltail_maybe_long: + cmp count, #64 + b.ge .Lnot_short +.Ltail63: + ands tmp1, count, #0x30 + b.eq 3f + cmp tmp1w, #0x20 + b.eq 1f + b.lt 2f + stp A_l, A_l, [dst], #16 +1: + stp A_l, A_l, [dst], #16 +2: + stp A_l, A_l, [dst], #16 +/* +* The last store length is less than 16,use stp to write last 16 bytes. +* It will lead some bytes written twice and the access is non-aligned. +*/ +3: + ands count, count, #15 + cbz count, 4f + add dst, dst, count + stp A_l, A_l, [dst, #-16] /* Repeat some/all of last store. */ +4: + ret + + /* + * Critical loop. Start at a new cache line boundary. Assuming + * 64 bytes per line, this ensures the entire loop is in one line. + */ + .p2align L1_CACHE_SHIFT +.Lnot_short: + sub dst, dst, #16/* Pre-bias. */ + sub count, count, #64 +1: + stp A_l, A_l, [dst, #16] + stp A_l, A_l, [dst, #32] + stp A_l, A_l, [dst, #48] + stp A_l, A_l, [dst, #64]! + subs count, count, #64 + b.ge 1b + tst count, #0x3f + add dst, dst, #16 + b.ne .Ltail63 +.Lexitfunc: + ret + + /* + * For zeroing memory, check to see if we can use the ZVA feature to + * zero entire 'cache' lines. + */ +.Lzero_mem: + cmp count, #63 + b.le .Ltail63 + /* + * For zeroing small amounts of memory, it's not worth setting up + * the line-clear code. + */ + cmp count, #128 + b.lt .Lnot_short /*count is at least 128 bytes*/ + + mrs tmp1, dczid_el0 + tbnz tmp1, #4, .Lnot_short + mov tmp3w, #4 + and zva_len, tmp1w, #15 /* Safety: other bits reserved. */ + lsl zva_len, tmp3w, zva_len + + ands tmp3w, zva_len, #63 + /* + * ensure the zva_len is not less than 64. + * It is not meaningful to use ZVA if the block size is less than 64. + */ + b.ne .Lnot_short +.Lzero_by_line: + /* + * Compute how far we need to go to become suitably aligned. We're + * already at quad-word alignment. + */ + cmp count, zva_len_x + b.lt .Lnot_short /* Not enough to reach alignment. */ + sub zva_bits_x, zva_len_x, #1 + neg tmp2, dst + ands tmp2, tmp2, zva_bits_x + b.eq 2f /* Already aligned. */ + /* Not aligned, check that there's enough to copy after alignment.*/ + sub tmp1, count, tmp2 + /* + * grantee the remain length to be ZVA is bigger than 64, + * avoid to make the 2f's process over mem range.*/ + cmp tmp1, #64 + ccmp tmp1, zva_len_x, #8, ge /* NZCV=0b1000 */ + b.lt .Lnot_short + /* + * We know that there's at least 64 bytes to zero and that it's safe + * to overrun by 64 bytes. + */ + mov count, tmp1 +1: + stp A_l, A_l, [dst] + stp A_l, A_l, [dst, #16] + stp A_l, A_l, [dst, #32] + subs tmp2, tmp2, #64 + stp A_l, A_l, [dst, #48] + add dst, dst, #64 + b.ge 1b + /* We've overrun a bit, so adjust dst downwards.*/ + add dst, dst, tmp2 +2: + sub count, count, zva_len_x +3: + dc zva, dst + add dst, dst, zva_len_x + subs count, count, zva_len_x + b.ge 3b + ands count, count, zva_bits_x + b.ne .Ltail_maybe_long + ret ENDPROC(memset) diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h index 3f4e7a1..9a511f2 100644 --- a/xen/include/asm-arm/arm32/cmpxchg.h +++ b/xen/include/asm-arm/arm32/cmpxchg.h @@ -40,6 +40,9 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size return ret; } +#define xchg(ptr,x) \ + ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) + /* * Atomic compare and exchange. Compare OLD with MEM, if identical, * store NEW in MEM. Return the initial value in MEM. Success is diff --git a/xen/include/asm-arm/arm64/atomic.h b/xen/include/asm-arm/arm64/atomic.h index b5d50f2..b49219e 100644 --- a/xen/include/asm-arm/arm64/atomic.h +++ b/xen/include/asm-arm/arm64/atomic.h @@ -136,11 +136,6 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u) #define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0) -#define smp_mb__before_atomic_dec() smp_mb() -#define smp_mb__after_atomic_dec() smp_mb() -#define smp_mb__before_atomic_inc() smp_mb() -#define smp_mb__after_atomic_inc() smp_mb() - #endif /* * Local variables: diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h index 4e930ce..ae42b2f 100644 --- a/xen/include/asm-arm/arm64/cmpxchg.h +++ b/xen/include/asm-arm/arm64/cmpxchg.h @@ -54,7 +54,12 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size } #define xchg(ptr,x) \ - ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) +({ \ + __typeof__(*(ptr)) __ret; \ + __ret = (__typeof__(*(ptr))) \ + __xchg((unsigned long)(x), (ptr), sizeof(*(ptr))); \ + __ret; \ +}) extern void __bad_cmpxchg(volatile void *ptr, int size); @@ -144,17 +149,23 @@ static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old, return ret; } -#define cmpxchg(ptr,o,n) \ - ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \ - (unsigned long)(o), \ - (unsigned long)(n), \ - sizeof(*(ptr)))) - -#define cmpxchg_local(ptr,o,n) \ - ((__typeof__(*(ptr)))__cmpxchg((ptr), \ - (unsigned long)(o), \ - (unsigned long)(n), \ - sizeof(*(ptr)))) +#define cmpxchg(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) __ret; \ + __ret = (__typeof__(*(ptr))) \ + __cmpxchg_mb((ptr), (unsigned long)(o), (unsigned long)(n), \ + sizeof(*(ptr))); \ + __ret; \ +}) + +#define cmpxchg_local(ptr, o, n) \ +({ \ + __typeof__(*(ptr)) __ret; \ + __ret = (__typeof__(*(ptr))) \ + __cmpxchg((ptr), (unsigned long)(o), \ + (unsigned long)(n), sizeof(*(ptr))); \ + __ret; \ +}) #endif /* diff --git a/xen/include/asm-arm/string.h b/xen/include/asm-arm/string.h index 3242762..dfad1fe 100644 --- a/xen/include/asm-arm/string.h +++ b/xen/include/asm-arm/string.h @@ -17,6 +17,11 @@ extern char * strchr(const char * s, int c); #define __HAVE_ARCH_MEMCPY extern void * memcpy(void *, const void *, __kernel_size_t); +#if defined(CONFIG_ARM_64) +#define __HAVE_ARCH_MEMCMP +extern int memcmp(const void *, const void *, __kernel_size_t); +#endif + /* Some versions of gcc don't have this builtin. It's non-critical anyway. */ #define __HAVE_ARCH_MEMMOVE extern void *memmove(void *dest, const void *src, size_t n); diff --git a/xen/include/asm-arm/system.h b/xen/include/asm-arm/system.h index 7aaaf50..ce3d38a 100644 --- a/xen/include/asm-arm/system.h +++ b/xen/include/asm-arm/system.h @@ -33,9 +33,6 @@ #define smp_wmb() dmb(ishst) -#define xchg(ptr,x) \ - ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) - /* * This is used to ensure the compiler did actually allocate the register we * asked it for some inline assembly sequences. Apparently we can't trust