From patchwork Fri Oct 9 16:16:05 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 54712 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f200.google.com (mail-lb0-f200.google.com [209.85.217.200]) by patches.linaro.org (Postfix) with ESMTPS id 78DDC22DB6 for ; Fri, 9 Oct 2015 16:16:36 +0000 (UTC) Received: by lbbti1 with SMTP id ti1sf42414342lbb.3 for ; Fri, 09 Oct 2015 09:16:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:mime-version:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-original-sender :x-original-authentication-results; bh=xW01Jqegvo4DAq1GK6am2dM6ySde9tzGdHcZ7Pg6ke0=; b=PyPULEKtrWFUxvteLScLC2Rx12xwPAw/PwJcl/x8Jbq4xEavMrTS6T6CPa48vrLaks Wrmg3mGt4bkZjm4OU6Gt/792JAbxhfBEZl6zvDEYN4b8L5k6b5zaTS7XQN/HTcJccKGs cYdxyfbpeiJQQeAOkvjIrk1HULvOwvzweCOHT0oXjOLrDm5DSuhbzyt1GqN5ZoW0j/iU qcRaSAhQzcJb/S5xJ4u3JS6pcLziT6w6MlK+CNrnYI1Waiq4oVHX3SGXFNVCKjlAwX1w nLeZBnDMesAhXatLjgMzIEhTM1IqgOOa17T/LaaOlNc0pXUOEO/1ko4YigovYMhCTsQH FIQw== X-Gm-Message-State: ALoCoQkPo5+Efb+p0dqn5rkIuwaKwnIaUWakXDnGLiFCQRzGJJyQeEbwxrYldaU6yiYecHXozfp1 X-Received: by 10.112.140.202 with SMTP id ri10mr2129123lbb.10.1444407395284; Fri, 09 Oct 2015 09:16:35 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.25.141.74 with SMTP id p71ls308111lfd.77.gmail; Fri, 09 Oct 2015 09:16:35 -0700 (PDT) X-Received: by 10.112.140.197 with SMTP id ri5mr6976629lbb.65.1444407395147; Fri, 09 Oct 2015 09:16:35 -0700 (PDT) Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com. [2a00:1450:4010:c04::22d]) by mx.google.com with ESMTPS id s5si1729015lbw.89.2015.10.09.09.16.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Oct 2015 09:16:34 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) client-ip=2a00:1450:4010:c04::22d; Received: by lbos8 with SMTP id s8so85304962lbo.0 for ; Fri, 09 Oct 2015 09:16:34 -0700 (PDT) X-Received: by 10.112.139.201 with SMTP id ra9mr6962409lbb.29.1444407394742; Fri, 09 Oct 2015 09:16:34 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp83309lbq; Fri, 9 Oct 2015 09:16:33 -0700 (PDT) X-Received: by 10.68.57.137 with SMTP id i9mr15967026pbq.101.1444407393291; Fri, 09 Oct 2015 09:16:33 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id ms6si3560562pbb.247.2015.10.09.09.16.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Oct 2015 09:16:33 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-409779-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 1633 invoked by alias); 9 Oct 2015 16:16:13 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 1582 invoked by uid 89); 9 Oct 2015 16:16:12 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-qg0-f48.google.com Received: from mail-qg0-f48.google.com (HELO mail-qg0-f48.google.com) (209.85.192.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 09 Oct 2015 16:16:08 +0000 Received: by qgez77 with SMTP id z77so72998879qge.1 for ; Fri, 09 Oct 2015 09:16:05 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.140.43.164 with SMTP id e33mr16195407qga.62.1444407365774; Fri, 09 Oct 2015 09:16:05 -0700 (PDT) Received: by 10.140.44.10 with HTTP; Fri, 9 Oct 2015 09:16:05 -0700 (PDT) In-Reply-To: <20151008091230.GA13098@arm.com> References: <20151007150941.GA31205@arm.com> <20151008091230.GA13098@arm.com> Date: Fri, 9 Oct 2015 18:16:05 +0200 Message-ID: Subject: Re: [AArch64_be] Fix vtbl[34] and vtbx4 From: Christophe Lyon To: James Greenhalgh Cc: "gcc-patches@gcc.gnu.org" X-IsSubscribed: yes X-Original-Sender: christophe.lyon@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 On 8 October 2015 at 11:12, James Greenhalgh wrote: > On Wed, Oct 07, 2015 at 09:07:30PM +0100, Christophe Lyon wrote: >> On 7 October 2015 at 17:09, James Greenhalgh wrote: >> > On Tue, Sep 15, 2015 at 05:25:25PM +0100, Christophe Lyon wrote: >> > >> > Why do we want this for vtbx4 rather than putting out a VTBX instruction >> > directly (as in the inline asm versions you replace)? >> > >> I just followed the pattern used for vtbx3. >> >> > This sequence does make sense for vtbx3. >> In fact, I don't see why vtbx3 and vtbx4 should be different? > > The difference between TBL and TBX is in their handling of a request to > select an out-of-range value. For TBL this returns zero, for TBX this > returns the value which was already in the destination register. > > Because the byte-vectors used by the TBX instruction in aarch64 are 128-bit > (so two of them togather allow selecting elements in the range 0-31), and > vtbx3 needs to emulate the AArch32 behaviour of picking elements from 3x64-bit > vectors (allowing elements in the range 0-23), we need to manually check for > values which would have been out-of-range on AArch32, but are not out > of range for AArch64 and handle them appropriately. For vtbx4 on the other > hand, 2x128-bit registers give the range 0..31 and 4x64-bit registers give > the range 0..31, so we don't need the special masked handling. > > You can find the suggested instruction sequences for the Neon intrinsics > in this document: > > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf > Hi James, Please find attached an updated version which hopefully addresses your comments. Tested on aarch64-none-elf and aarch64_be-none-elf using the Foundation Model. OK? Christophe. >> >> /* vtrn */ >> >> >> >> __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) >> >> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md >> >> index b8a45d1..dfbd9cd 100644 >> >> --- a/gcc/config/aarch64/iterators.md >> >> +++ b/gcc/config/aarch64/iterators.md >> >> @@ -100,6 +100,8 @@ >> >> ;; All modes. >> >> (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF]) >> >> >> >> +(define_mode_iterator V8Q [V8QI]) >> >> + >> > >> > This can be dropped if you use VAR1 in aarch64-builtins.c. >> > >> > Thanks for working on this, with your patch applied, the only >> > remaining intrinsics I see failing for aarch64_be are: >> > >> > vqtbl2_*8 >> > vqtbl2q_*8 >> > vqtbl3_*8 >> > vqtbl3q_*8 >> > vqtbl4_*8 >> > vqtbl4q_*8 >> > >> > vqtbx2_*8 >> > vqtbx2q_*8 >> > vqtbx3_*8 >> > vqtbx3q_*8 >> > vqtbx4_*8 >> > vqtbx4q_*8 >> > >> Quite possibly. Which tests are you looking at? Since these are >> aarch64-specific, they are not part of the >> tests I added (advsimd-intrinsics). Do you mean >> gcc.target/aarch64/table-intrinsics.c? > > Sorry, yes I should have given a reference. I'm running with a variant of > a testcase from the LLVM test-suite repository: > > SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c > > This has an execute test for most of the intrinsics specified for AArch64. > It needs some modification to cover the intrinsics we don't implement yet. > > Thanks, > James > 2015-10-09 Christophe Lyon * config/aarch64/aarch64-simd-builtins.def: Update builtins tables: add tbl3 and tbx4. * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): New. (aarch64_tbx4v8qi): New. * config/aarch64/arm_neon.h (vtbl3_s8, vtbl3_u8, vtbl3_p8) (vtbl4_s8, vtbl4_u8, vtbl4_p8, vtbx4_s8, vtbx4_u8, vtbx4_p8): Rewrite using builtin functions. * config/aarch64/iterators.md (UNSPEC_TBX): New. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index d0f298a..c16e82c9 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -405,3 +405,8 @@ VAR1 (BINOPP, crypto_pmull, 0, di) VAR1 (BINOPP, crypto_pmull, 0, v2di) + /* Implemented by aarch64_tbl3v8qi. */ + VAR1 (BINOP, tbl3, 0, v8qi) + + /* Implemented by aarch64_tbx4v8qi. */ + VAR1 (TERNOP, tbx4, 0, v8qi) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9777418..6027582 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4716,6 +4714,27 @@ [(set_attr "type" "neon_tbl2_q")] ) +(define_insn "aarch64_tbl3v8qi" + [(set (match_operand:V8QI 0 "register_operand" "=w") + (unspec:V8QI [(match_operand:OI 1 "register_operand" "w") + (match_operand:V8QI 2 "register_operand" "w")] + UNSPEC_TBL))] + "TARGET_SIMD" + "tbl\\t%S0.8b, {%S1.16b - %T1.16b}, %S2.8b" + [(set_attr "type" "neon_tbl3")] +) + +(define_insn "aarch64_tbx4v8qi" + [(set (match_operand:V8QI 0 "register_operand" "=w") + (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0") + (match_operand:OI 2 "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "w")] + UNSPEC_TBX))] + "TARGET_SIMD" + "tbx\\t%S0.8b, {%S2.16b - %T2.16b}, %S3.8b" + [(set_attr "type" "neon_tbl4")] +) + (define_insn_and_split "aarch64_combinev16qi" [(set (match_operand:OI 0 "register_operand" "=w") (unspec:OI [(match_operand:V16QI 1 "register_operand" "w") diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6dfebe7..e99819e 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10902,13 +10902,14 @@ vtbl3_s8 (int8x8x3_t tab, int8x8_t idx) { int8x8_t result; int8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_s8 (tab.val[2], vcreate_s8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = __builtin_aarch64_tbl3v8qi (__o, idx); return result; } @@ -10917,13 +10918,14 @@ vtbl3_u8 (uint8x8x3_t tab, uint8x8_t idx) { uint8x8_t result; uint8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_u8 (tab.val[2], vcreate_u8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10932,13 +10934,14 @@ vtbl3_p8 (poly8x8x3_t tab, uint8x8_t idx) { poly8x8_t result; poly8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_p8 (tab.val[2], vcreate_p8 (__AARCH64_UINT64_C (0x0))); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10947,13 +10950,14 @@ vtbl4_s8 (int8x8x4_t tab, int8x8_t idx) { int8x8_t result; int8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = __builtin_aarch64_tbl3v8qi (__o, idx); return result; } @@ -10962,13 +10966,14 @@ vtbl4_u8 (uint8x8x4_t tab, uint8x8_t idx) { uint8x8_t result; uint8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -10977,13 +10982,14 @@ vtbl4_p8 (poly8x8x4_t tab, uint8x8_t idx) { poly8x8_t result; poly8x16x2_t temp; + __builtin_aarch64_simd_oi __o; temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbl %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "=w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx); return result; } @@ -11023,51 +11029,6 @@ vtbx2_p8 (poly8x8_t r, poly8x8x2_t tab, uint8x8_t idx) return result; } -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vtbx4_s8 (int8x8_t r, int8x8x4_t tab, int8x8_t idx) -{ - int8x8_t result = r; - int8x16x2_t temp; - temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vtbx4_u8 (uint8x8_t r, uint8x8x4_t tab, uint8x8_t idx) -{ - uint8x8_t result = r; - uint8x16x2_t temp; - temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) -vtbx4_p8 (poly8x8_t r, poly8x8x4_t tab, uint8x8_t idx) -{ - poly8x8_t result = r; - poly8x16x2_t temp; - temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]); - temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]); - __asm__ ("ld1 {v16.16b - v17.16b }, %1\n\t" - "tbx %0.8b, {v16.16b - v17.16b}, %2.8b\n\t" - : "+w"(result) - : "Q"(temp), "w"(idx) - : "v16", "v17", "memory"); - return result; -} - /* End of temporary inline asm. */ /* Start of optimal implementations in approved order. */ @@ -23221,6 +23182,58 @@ vtbx3_p8 (poly8x8_t __r, poly8x8x3_t __tab, uint8x8_t __idx) return vbsl_p8 (__mask, __tbl, __r); } +/* vtbx4 */ + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vtbx4_s8 (int8x8_t __r, int8x8x4_t __tab, int8x8_t __idx) +{ + int8x8_t result; + int8x16x2_t temp; + __builtin_aarch64_simd_oi __o; + temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]); + temp.val[1] = vcombine_s8 (__tab.val[2], __tab.val[3]); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = __builtin_aarch64_tbx4v8qi (__r, __o, __idx); + return result; +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vtbx4_u8 (uint8x8_t __r, uint8x8x4_t __tab, uint8x8_t __idx) +{ + uint8x8_t result; + uint8x16x2_t temp; + __builtin_aarch64_simd_oi __o; + temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]); + temp.val[1] = vcombine_u8 (__tab.val[2], __tab.val[3]); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o, + (int8x8_t)__idx); + return result; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx) +{ + poly8x8_t result; + poly8x16x2_t temp; + __builtin_aarch64_simd_oi __o; + temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]); + temp.val[1] = vcombine_p8 (__tab.val[2], __tab.val[3]); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, + (int8x16_t) temp.val[1], 1); + result = (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o, + (int8x8_t)__idx); + return result; +} + /* vtrn */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index b8a45d1..d856117 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -253,6 +253,7 @@ UNSPEC_USHLL ; Used in aarch64-simd.md. UNSPEC_ADDP ; Used in aarch64-simd.md. UNSPEC_TBL ; Used in vector permute patterns. + UNSPEC_TBX ; Used in vector permute patterns. UNSPEC_CONCAT ; Used in vector permute patterns. UNSPEC_ZIP1 ; Used in vector permute patterns. UNSPEC_ZIP2 ; Used in vector permute patterns.