From patchwork Wed Jan 20 15:22:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 60039 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp3236667lbb; Wed, 20 Jan 2016 07:22:56 -0800 (PST) X-Received: by 10.98.80.12 with SMTP id e12mr52672853pfb.62.1453303376693; Wed, 20 Jan 2016 07:22:56 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id rn15si8931669pab.231.2016.01.20.07.22.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jan 2016 07:22:56 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-419606-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-419606-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-419606-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=yuqmP8D5aNFMfP8UAfOe9L05aUg0Zik0bAD2ZAKcB6OVPbiQYf 3w9NCn9+sBOwATrIfcgv3fkZk+vDHB15W4896cyqfLj3ir+oOdA9dFRP6YHfDfdI PRD7RoM7yL9rNMKaZFurFxnAeW/g5T0ggBtJTenNIbRAZWcMRHrafctaQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=EvmAdnQk73DiYvuz5zYUaplrMNs=; b=WSa5zgmzp+k1HKLqvkPS dC7AfSp/Dhvlien8pgYDC4dhlwH802dRU78KMUd5y0cibXIf9g0q3pxqRpemrhi6 WJxmAeKslcjN6KSE+m2/fynsCwoGlApHmtPQhj8kTNxL1vS3PbKA8JpeTT+VkM7G LXh48YDvVde2G0Hgg3uPTpQ= Received: (qmail 72582 invoked by alias); 20 Jan 2016 15:22:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 72559 invoked by uid 89); 20 Jan 2016 15:22:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Exchange-Antispam-Report-CFA-Test:123027, f11, f12, Half X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 20 Jan 2016 15:22:39 +0000 Received: from emea01-db3-obe.outbound.protection.outlook.com (mail-db3lrp0082.outbound.protection.outlook.com [213.199.154.82]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-13-NaEldkHtQIePI-IyBaGdzg-1; Wed, 20 Jan 2016 15:22:34 +0000 Received: from DB3PR08CA0010.eurprd08.prod.outlook.com (10.161.51.148) by DB5PR08MB0886.eurprd08.prod.outlook.com (10.164.43.28) with Microsoft SMTP Server (TLS) id 15.1.390.13; Wed, 20 Jan 2016 15:22:33 +0000 Received: from AM1FFO11FD036.protection.gbl (2a01:111:f400:7e00::196) by DB3PR08CA0010.outlook.office365.com (2a01:111:e400:503c::20) with Microsoft SMTP Server (TLS) id 15.1.365.19 via Frontend Transport; Wed, 20 Jan 2016 15:22:33 +0000 Received: from nebula.arm.com (217.140.96.140) by AM1FFO11FD036.mail.protection.outlook.com (10.174.64.225) with Microsoft SMTP Server (TLS) id 15.1.355.15 via Frontend Transport; Wed, 20 Jan 2016 15:22:32 +0000 Received: from e107456-lin.cambridge.arm.com (10.1.2.79) by mail.arm.com (10.1.106.66) with Microsoft SMTP Server id 14.3.266.1; Wed, 20 Jan 2016 15:22:13 +0000 From: James Greenhalgh To: CC: , , , Subject: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions. Date: Wed, 20 Jan 2016 15:22:11 +0000 Message-ID: <1453303331-14492-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; AM1FFO11FD036; 1:FcPuZTKr42JW6fr0glJql0MZlUMqMbEK+62+moo/pns1hT1Zc60pqBYB4jTx01yDQhJD5pH8+/e2kgGVNGKd07+LgImtWIhEY+26YU2gKF7YQVLw+P0GkrAAI5yDjWG96SH7Hc3I8ck6r09WTEDAXbLqsHfJgd9iFqru30D31n7oBqpjziHR9+0lufjkBNQ0AgeEL/LnAPLmIBU7TOm7cRnZ6phLSVkm+BdWHZW59wH9rNKSyfF+XmUvu+rkfR4Q93FE+wNY6oESj50vsKXsQYAcyndSUSKHD/mRGUN8ygc7nCejv1Yql/+ehGm8me+bbmyVGEbB7dNNemeKfJTsG8yVJ8cleYrYz3WZM+nN62OT4WAfRO+1ZVuN4fUB9FhfV+4AFlF93gBcs4ehCEUmt2rhv7XJMYLYYNEGX38C/T8= X-Forefront-Antispam-Report: CIP:217.140.96.140; CTRY:GB; IPV:CAL; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10009020)(6009001)(2980300002)(438002)(164054003)(199003)(377424004)(189002)(26826002)(5890100001)(2906002)(4610100001)(110136002)(50986999)(87936001)(19580405001)(11100500001)(92566002)(19580395003)(33646002)(2351001)(512874002)(77096005)(106466001)(86362001)(229853001)(4326007)(36756003)(568964002)(2476003)(1220700001)(1096002)(50226001)(189998001)(450100001)(6806005)(84326002)(5008740100001)(104016004)(5003600100002)(586003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB5PR08MB0886; H:nebula.arm.com; FPR:; SPF:Pass; PTR:fw-tnat.cambridge.arm.com; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0886; 2:Id0Ea7Uy9DGZM4XwqQularXdX+qoROMHT2eRE0joSiSgU4id8rwwXgXoSFw6/xvJHcGo5aQloQek3ypIQcvSPd6sGDD1Wt/Pxn9eVyT+LoPK/ivfCmP4OznslSsDmypN1jGl4IzDLZve3gE9JFARJg==; 3:T3ocYaVl3EQ/DFei3w3CIJHqBeg/QI7xAcgeMxvjKvdYTN09TdQQPvH+S33pwXxmcb3I957SieYqQXvvMtd2yomAdqyMdzY2z1nDG9jxDv/OUi5tQjZDg354mxHzz0sQHw+4PFFMQ9tEFbxhiHi9vbQyO+ATIL/aI9WZSDQT6A8sC/JSbngyEsm7LvWMmeINNl69CN5/gIOKE6ytAKliVQPcton/+KdyZ0n1gt0sgGxNiXmFXSD4g6wPHjrZNN+e58nuayDwP+dyP6OcsIKo6Q==; 25:HH8UrNmjTx3GvA/4ttxwQQiLY3RgcEnWJNi0AEoKJhOifxJ41k1BwCJsglieSOiDZxrolvwf99duqR6KrzlMdo9k1oSc1WpDrnNM9CeZ/KbKpoMbkHo+Afq13I+YNT69ZE/rRPfTDVw87SJrSRcqGEaknz0E9CtRIZhY7SCAF9HDaJMtdzau12JEqkIyVoFi60/aik0aKPKIYqCZbakSnqHOgTYGBAfRRONKp48AufnZwwzHZVNg+FnZT5r8NmR9; 20:BZ0Mk5SwVpVEGfBfF65+cuHo9faAFpP8XX+l+SKhnRRx5PEJT05m/3lCBygKfiZZ0YWrkG+tpfSQRDqjb+ijXYqn2L58u4QhlYONynfA6orwylVXoiPQRmR/gvaFA8FeT6AJ9ZCbIZh+E9TPNJtAlZwuUMmLC6z6ZCyQ3ZMtzgs= X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(8251501001); SRVR:DB5PR08MB0886; X-MS-Office365-Filtering-Correlation-Id: ce22191e-cbd8-48ac-a64f-08d321ad84de NoDisclaimer: True X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(123027)(102615245)(601004)(2401047)(13013025)(8121501046)(13023025)(13016025)(13020025)(13018025)(5005006)(13024025)(520078)(3002001)(10201501046); SRVR:DB5PR08MB0886; BCL:0; PCL:0; RULEID:; SRVR:DB5PR08MB0886; X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0886; 4:FwgtPJ9SbiKlYZKzxbOXh6ZlQbazhwXmG/Odk0A0P5aHLBcCyBasBCPF8GA4epCksgmEFgo0maIqKtnaVFFfs+5gphKlGSSiQ0WagApRifeZTOfCAm9olqglga7wVgSbJAHW8u+5xYC4zsMKq1Abz5OPjuMvaNz/pEcaKxWvmr2NEbc5Y9d8KSywVyAln/1llRdJ/F5NDTcQeymlCgjq3kMhh1v6YoWPBT5azl4IFMmn8RuyypLDPht0RLLYC6VuHVgMaW1j+LpDAGtJvKvZSEEeRidjNIzrVRL+Kf2TR1D2A2/8oey7x/SX7X2p8YOz9LM7/sOtXK+PihOdqQITHgNguXQ3iuKC1BmCRJwSelI9uf5JHrG+qByplP7H7wI6+J8gQu6pNtUupALgAMNCe7VHBe0t0lqHLl6hpO/qkQTnAlCTtSeJDGDKqmivD5JBEudjnFDJOlsYpbc2QaMETubWT4YI27mjmB82dljbizczfkwwFixbPlM0K3gUKnnygjom7JmeMg6rhYCLaFlZYGQYBOTHKxIHVdJFKuRp3bpCHjIyQUPo2QDLpeIRolYo X-Forefront-PRVS: 0827D7ACB9 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB5PR08MB0886; 23:uVFSBsDr7j5UXTrRlhmW29As0DpsvwtVXbjJ8tCHv?= =?us-ascii?Q?OcCjjAPxgnjXf/2feNg9OVY+sf0u/R8hormMOLcvsf8MjdxjPaMDgiQQKgyZ?= =?us-ascii?Q?0nzW0ivgPjSMJkSsEuEH2zGqTKF2HMhSLeWDVwWLptHiReDww64pLYm2BTgC?= =?us-ascii?Q?24eL8UrsrPK3dqQJ3+o2CSVmCtuYTyj/PwUXZWxyG2NraIVJEvAEBvuIMLXk?= =?us-ascii?Q?unpSD3XPfwahA5yRlhkyGOEf9zYBLvKDcbPpi2iDcC30r8sM6zB/U1gxrhgI?= =?us-ascii?Q?f7zUNcg2VYYyiSttLps5m4+WDSnlyo2ZzEX/ymMOzKv6dCbCVmALr1PqjEzy?= =?us-ascii?Q?8dPOOPmsHhMT+Cqw/1EhEVvkDT/Jo662BAblaXARpUhmyK2uG3BTU3FSf/+y?= =?us-ascii?Q?Ate1w8lCh2XYv4ytPGvX+fOJsphIXpamL8RvGOhsa/5u0EiBRCyGr479P5Nc?= =?us-ascii?Q?4cp2pfK/flcTI5yDORW4Gf9LbXLWQuebOt1527R7cVK6yo2RFubdLlD59858?= =?us-ascii?Q?PBpEFkiEQnz/N6DPoGeJIXInd3BhkseRoLY2wd8h2q00PZgBu7btgcovgpXc?= =?us-ascii?Q?NV6ZQ0aGER/qeckFqpgi4xgL4EHgdJwzCsMnhRjqM4s2yFdp8FyWwHG1r2t1?= =?us-ascii?Q?FMS0oUfwew7/vr07kBDg07NARMigIR0UbK5ucgjNCY5hqofXrCr4xWkR5fL+?= =?us-ascii?Q?TyTd6iA7V+bZNE08D5eNrOnRRTVVCddfdgKIZMONcYk8yMUl/gtYZsIWSZb6?= =?us-ascii?Q?5cY4cD3M92qMPRxENPUVtj6eFiNZkkIInL0lj61vPnArH+JtTWJ0HkNRelWS?= =?us-ascii?Q?0voH0Hy6RweS2OLQUpUVZ9+FIZX7Mwo0tyTaIo9k+0LfH75tbc0h0IDdhCtI?= =?us-ascii?Q?KvlNkZXqosgv4rq5rIES2VoyWN9MjWd9V5fgtpV5YiAUoXw7O5vAT9eEpOB+?= =?us-ascii?Q?n3oDzDVKrHaJK0kz1xkBQz3wHBDCa9QfF5fL41LXqAEsv2ihe+hlzwm1u1zA?= =?us-ascii?Q?RXyyZlUt0mexzRh8Di5Mh5pL+h2dxUJ+5x3pWGYKK4nr9zkRiort4HrvY+GT?= =?us-ascii?Q?0ZYJDQ=3D?= X-Microsoft-Exchange-Diagnostics: 1; DB5PR08MB0886; 5:sXCa7xW9ucEfCJM/8TjFN20TFV1rYdcbMPVHLHucBGbo+KJ/pqcTSB0FdP0Vrhzn/0Ph2aeBYVBWeR4pnEAvQBe8u7n96T5cGqAGtRY7vzPjtQ9ceeU98GbJ1LVGx3VMoyyndEKQqNcbZ/IeJdDJvA==; 24:pL59dNLfCM/9a9bbcMeKAVMEBOG9dVahDZfsGJP5Ap9y6/eS03unAgpfnIc/IvqNaEvefnaIrLUnFogNsUYe5nkTPfYpCsX+JEY4ieESrmc=; 20:/dH+smwTIde2f4UY9cWU+JnrIbSgzLqPDnIdUq3gOdPnVbhmeGNy0aOHCV80CdIYLmALzrqHPLRrNav/AJHePgVDWdP9TwK2DHRBFsWC3SNmpSr2IxZYw+IogaiM6pKbpXHII662N+U1zosAlg1MxC8JnHJoDJPn80wUDSrk600= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jan 2016 15:22:32.8182 (UTC) X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[217.140.96.140]; Helo=[nebula.arm.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0886 X-MC-Unique: NaEldkHtQIePI-IyBaGdzg-1 X-IsSubscribed: yes Hi, In a number of cases where we try to create vectors we end up spilling to the stack and then filling. This is one example distilled from a couple of micro-benchmrks where the issue shows up. The reason for the extra cost in this case is the unnecessary use of the stack. The patch attempts to finesse this by using lane loads or vector inserts to produce the right results. This patch is mostly Ramana's work, I've just cleaned it up a little. This has been in a number of our trees lately, and we haven't seen any regressions. I've also bootstrapped and tested it, and run a set of benchmarks to show no regressions on Cortex-A57 or Cortex-A53. The patch fixes some regressions caused by the more agressive vectorization in GCC6, so I'd like to propose it to go in even though we are in Stage 4. OK? Thanks, James --- gcc/ 2016-01-20 James Greenhalgh Ramana Radhakrishnan * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor, always use lane loads to construct non-constant vectors. gcc/testsuite/ 2016-01-20 James Greenhalgh Ramana Radhakrishnan * gcc.target/aarch64/vector_initialization_nostack.c: New. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 03bc1b9..3787b38 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals) return NULL_RTX; } +/* Expand a vector initialisation sequence, such that TARGET is + initialised to contain VALS. */ + void aarch64_expand_vector_init (rtx target, rtx vals) { machine_mode mode = GET_MODE (target); machine_mode inner_mode = GET_MODE_INNER (mode); + /* The number of vector elements. */ int n_elts = GET_MODE_NUNITS (mode); + /* The number of vector elements which are not constant. */ int n_var = 0; rtx any_const = NULL_RTX; + /* The first element of vals. */ + rtx v0 = XVECEXP (vals, 0, 0); bool all_same = true; + /* Count the number of variable elements to initialise. */ for (int i = 0; i < n_elts; ++i) { rtx x = XVECEXP (vals, 0, i); - if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x)) + if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x))) ++n_var; else any_const = x; - if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0))) - all_same = false; + all_same &= rtx_equal_p (x, v0); } + /* No variable elements, hand off to aarch64_simd_make_constant which knows + how best to handle this. */ if (n_var == 0) { rtx constant = aarch64_simd_make_constant (vals); @@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx vals) /* Splat a single non-constant element if we can. */ if (all_same) { - rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0)); + rtx x = copy_to_mode_reg (inner_mode, v0); aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x)); return; } - /* Half the fields (or less) are non-constant. Load constant then overwrite - varying fields. Hope that this is more efficient than using the stack. */ - if (n_var <= n_elts/2) + /* Initialise a vector which is part-variable. We want to first try + to build those lanes which are constant in the most efficient way we + can. */ + if (n_var != n_elts) { rtx copy = copy_rtx (vals); @@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx vals) XVECEXP (copy, 0, i) = subst; } aarch64_expand_vector_init (target, copy); + } - /* Insert variables. */ - enum insn_code icode = optab_handler (vec_set_optab, mode); - gcc_assert (icode != CODE_FOR_nothing); + /* Insert the variable lanes directly. */ - for (int i = 0; i < n_elts; i++) - { - rtx x = XVECEXP (vals, 0, i); - if (CONST_INT_P (x) || CONST_DOUBLE_P (x)) - continue; - x = copy_to_mode_reg (inner_mode, x); - emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i))); - } - return; - } + enum insn_code icode = optab_handler (vec_set_optab, mode); + gcc_assert (icode != CODE_FOR_nothing); - /* Construct the vector in memory one field at a time - and load the whole vector. */ - rtx mem = assign_stack_temp (mode, GET_MODE_SIZE (mode)); for (int i = 0; i < n_elts; i++) - emit_move_insn (adjust_address_nv (mem, inner_mode, - i * GET_MODE_SIZE (inner_mode)), - XVECEXP (vals, 0, i)); - emit_move_insn (target, mem); - + { + rtx x = XVECEXP (vals, 0, i); + if (CONST_INT_P (x) || CONST_DOUBLE_P (x)) + continue; + x = copy_to_mode_reg (inner_mode, x); + emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i))); + } } static unsigned HOST_WIDE_INT diff --git a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c new file mode 100644 index 0000000..bbad04d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */ +float arr_f[100][100]; +float +f9 (void) +{ + + int i; + float sum = 0; + for (i = 0; i < 100; i++) + sum += arr_f[i][0] * arr_f[0][i]; + return sum; + +} + + +int arr[100][100]; +int +f10 (void) +{ + + int i; + int sum = 0; + for (i = 0; i < 100; i++) + sum += arr[i][0] * arr[0][i]; + return sum; + +} + +double arr_d[100][100]; +double +f11 (void) +{ + int i; + double sum = 0; + for (i = 0; i < 100; i++) + sum += arr_d[i][0] * arr_d[0][i]; + return sum; +} + +char arr_c[100][100]; +char +f12 (void) +{ + int i; + char sum = 0; + for (i = 0; i < 100; i++) + sum += arr_c[i][0] * arr_c[0][i]; + return sum; +} + + +/* { dg-final { scan-assembler-not "sp" } } */