From patchwork Fri Jun 10 08:56:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
X-Patchwork-Id: 69748
Delivered-To: patch@linaro.org
Received: by 10.140.106.246 with SMTP id e109csp185078qgf;
 Fri, 10 Jun 2016 01:57:03 -0700 (PDT)
X-Received: by 10.66.216.202 with SMTP id os10mr1116810pac.91.1465549023824; 
 Fri, 10 Jun 2016 01:57:03 -0700 (PDT)
Return-Path: <gcc-patches-return-429505-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 xh3si11580308pab.237.2016.06.10.01.57.03 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 10 Jun 2016 01:57:03 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-429505-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org;
 spf=pass (google.com: domain of
 gcc-patches-return-429505-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-429505-patch=linaro.org@gcc.gnu.org;
 dmarc=fail (p=NONE dis=NONE) header.from=linaro.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type; q=dns; s=default; b=FSvPrO8lQk0XPcE
 W52hVBaiFU0oKINyKEfFsA9qEqrThuQcOKCnpiuyX7RUBooYo/yVXJkPOkNDBUp5
 iaRpHjPVaStj8+j6HbzgaaopOMtIVc+k02z8ESk/YjuiH6evmkM6e6pAiacFWFTh
 SkJfcxdhuzD16dgZTKdcStMyv52Q=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type; s=default; bh=3Ut66roTgEQxjpDfSBYh/
 SAkAdc=; b=rMjNgkgj9O+dbd5BmrnuBlaOTT4qOvUxEY6OOA/WTW2wGPTQOCsN2
 nCHtH5qFM4QDzCaBDnB239Vq3DuP685bBpUI/O/D2nZC5z7P7W6I48KSeGCd9TCi
 Qn8iiODa7qEvDlfyRiDPTIWJHB4iRK3wN3oZEZpe/CJNXiPl5k/TpA=
Received: (qmail 114586 invoked by alias); 10 Jun 2016 08:56:51 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 114561 invoked by uid 89); 10 Jun 2016 08:56:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00,
 RCVD_IN_DNSWL_LOW,
 SPF_PASS autolearn=ham version=3.3.2 spammy=aarch64*-*-*,
 aarch64***, dump_enabled_p, 5786
X-HELO: mail-it0-f42.google.com
Received: from mail-it0-f42.google.com (HELO mail-it0-f42.google.com)
 (209.85.214.42) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Fri, 10 Jun 2016 08:56:40 +0000
Received: by mail-it0-f42.google.com with SMTP id a5so59357980ita.1 for
 <gcc-patches@gcc.gnu.org>; Fri, 10 Jun 2016 01:56:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=90sUPwVCJuXsYZFwZzeRKQjX1ga7mnWoYTgi+tP2uCs=;
 b=LqmIBQ+GkiTEzLCgV/FarxakQNDPu9FKGvBxxLPbVcEM/eHNgxKoWOz0KnBBaUXovD
 a0u+7582q8BQMZrcSbX0CNs6RfegtBTuMZ7SZnwZEL99nKeO5tYl3cy7ElI+o7A34sK8
 /AAQaxznFoI1OVUL+7qspPKedLsdx4p6TjRxY1VZnKFLFMraZ6pPgQD0P7DY3I3yTaut
 FQLoa/n0M0Cyao4HRmpOoGbY5g+rGU1fvV/bitymSEKr5TQpUHOvJl7rga15UjWqtB+n
 YLUBN9zS2GRJa7khxsBpPovkw2jcDZuMiud6iZQKAMFgRp/XFPZPyd8jIOhwGmyBBGrf
 LwuA==
X-Gm-Message-State: ALyK8tLHsKi3MpcJpnJkjT/KdfAgfOrpBY1uirOa3ML0nHqOFjAH3TdW8PDvxhvmjAo2Mw3WQ1RxIiUWYtCBv7Tt
X-Received: by 10.36.123.77 with SMTP id q74mr27800898itc.44.1465548998285;
 Fri, 10 Jun 2016 01:56:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.36.48.133 with HTTP; Fri, 10 Jun 2016 01:56:37 -0700 (PDT)
In-Reply-To: <CAJA7tRZebyUSrU1i9ecH=6O5n9imHiQ4SaTAiur=bPvTgswR2A@mail.gmail.com>
References: <CAAgBjMk0Hdask2JU8xs4fj_Ai1e0ggxB+h3ayb=NOGQBYJ8ccQ@mail.gmail.com>
 <55BB4127.5050202@foss.arm.com>
 <CAAgBjMnteaOO=JFifLD1CG7s-JE5r=65wcQm+HBwSMtpwg3ReA@mail.gmail.com>
 <CAJA7tRaf8sZHmzHCt0n8rGJKt4bQycfSU+ABmb8rbcKb1VFQ=A@mail.gmail.com>
 <CAAgBjM=X8rXnDnNV4-sWC_0fCt9hcvuOx2-hJ=vwsAb1oBq22Q@mail.gmail.com>
 <CAAgBjMkh=0tyzaMjqpSQg_mV8GYH-Tx9WV5xjHs2k0H0=UeqtQ@mail.gmail.com>
 <CAJA7tRZebyUSrU1i9ecH=6O5n9imHiQ4SaTAiur=bPvTgswR2A@mail.gmail.com>
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date: Fri, 10 Jun 2016 14:26:37 +0530
Message-ID: <CAAgBjM=YH2YQMAN+Nk__81x1rsfJSwQCOcgTYotbk8ksHfTV8A@mail.gmail.com>
Subject: Re: [ARM] implement division using vrecpe/vrecps with
 -funsafe-math-optimizations
To: Ramana Radhakrishnan <ramana.gcc@googlemail.com>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com>,
 gcc Patches <gcc-patches@gcc.gnu.org>,
 Charles Baylis <charles.baylis@linaro.org>
X-IsSubscribed: yes

On 7 June 2016 at 14:07, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> Please find the updated patch attached.
>>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
>>> arm-none-eabi.
>>> However the test-case added in the patch (neon-vect-div-1.c) fails to
>>> get vectorized at -O2
>>> for armeb-none-linux-gnueabihf.
>>> Charles suggested me to try with -O3, which worked.
>>> It appears the test-case fails to get vectorized with
>>> -fvect-cost-model=cheap (which is default enabled at -O2)
>>> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic
>>>
>>> I can't figure out why it fails -fvect-cost-model=cheap.
>>> From the vect dump (attached):
>>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
>>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9
>> Hi,
>> I think I have some idea why the test-case fails attached with patch
>> fail to get vectorized on armeb with -O2.
>>
>> Issue with big endian vectorizer:
>> The patch does not cause regressions on big endian vectorizer but
>> fails to vectorize the test-cases attached with the patch, while they
>> get vectorized on
>> litttle-endian.
>> Fails with armeb with the following message in dump:
>> note: not vectorized: unsupported unaligned load.*_9
>>
>> The behavior of big and little endian vectorizer seems to be different
>> in arm_builtin_support_vector_misalignment() which overrides the hook
>> targetm.vectorize.support_vector_misalignment().
>>
>> targetm.vectorize.support_vector_misalignment is called by
>> vect_supportable_dr_alignment () which in turn is called
>> by verify_data_refs_alignment ().
>>
>> Execution upto following condition is common between arm and armeb
>> in vect_supportable_dr_alignment():
>>
>> if ((TYPE_USER_ALIGN (type) && !is_packed)
>>       || targetm.vectorize.support_vector_misalignment (mode, type,
>>                                             DR_MISALIGNMENT (dr), is_packed))
>>         /* Can't software pipeline the loads, but can at least do them.  */
>>         return dr_unaligned_supported;
>>
>> For little endian case:
>> arm_builtin_support_vector_misalignment() is called with
>> V2SF mode and misalignment == -1, and the following condition
>> becomes true:
>> /* If the misalignment is unknown, we should be able to handle the access
>>          so long as it is not to a member of a packed data structure.  */
>>   if (misalignment == -1)
>>     return true;
>>
>> Since the hook returned true we enter the condition above in
>> vect_supportable_dr_alignment() and return dr_unaligned_supported;
>>
>> For big-endian:
>> arm_builtin_support_vector_misalignment() is called with V2SF mode.
>> The following condition that gates the entire function body fails:
>>  if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
>> and the default hook gets called with V2SF mode and the default hook
>> returns false because
>> movmisalign_optab does not exist for V2SF mode.
>>
>> So the condition above in vect_supportable_dr_alignment() fails
>> and we come here:
>>  /* Unsupported.  */
>> return dr_unaligned_unsupported;
>>
>> And hence we get the unaligned load not supported message in the dump
>> for armeb in verify_data_ref_alignment ():
>>
>> static bool
>> verify_data_ref_alignment (data_reference_p dr)
>> {
>>   enum dr_alignment_support supportable_dr_alignment
>>     = vect_supportable_dr_alignment (dr, false);
>>   if (!supportable_dr_alignment)
>>     {
>>       if (dump_enabled_p ())
>>         {
>>           if (DR_IS_READ (dr))
>>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                              "not vectorized: unsupported unaligned load.");
>>
>> With -O3, the test-cases vectorize for armeb, because loop peeling for alignment
>> is turned on.
>> The above behavior is also reproducible with test-case which is
>> irrelevant to the patch.
>> for instance, we get the same unsupported unaligned load for following
>> test-case (replaced / with +)
>>
>> void
>> foo (int len, float * __restrict p, float *__restrict x)
>> {
>>   len = len & ~31;
>>   for (int i = 0; i < len; i++)
>>     p[i] = p[i] + x[i];
>> }
>> Is the patch OK to commit after bootstrap+test ?
>
>
> Thanks for the analysis - all the test needs is an additional marker
> to skip it on armeb (is there a helper for misaligned loads from the
> vectorizer ? ) - Ah probably vect_hw_misalign is sufficient for your
> usecase and you want to appropriately fix it for little endian arm
> with neon support enabled.
Hi,
I added dg-require-effective-target vect_hw_misalign to the tests in the patch,
and modified vect_hw_misalign to return true for little-endian arm configs
with neon support enabled. The patch makes the tests unsupported for
armeb.
Does it look correct ?

Unfortunately the change to vect_hw_misalign breaks gcc.dg/vect/vect-align-1.c,
which were passing before:
XPASS: gcc.dg/vect/vect-align-1.c scan-tree-dump-times vect "Alignment
of access forced using versioning" 1
FAIL: gcc.dg/vect/vect-align-1.c scan-tree-dump-times vect
"Vectorizing an unaligned access" 1

I am not sure how to fix this and would be grateful for suggestions.

Thanks,
Prathamesh
>
> From the patch.
>
>>>+   && flag_unsafe_math_optimizations && flag_reciprocal_math"
>
> Why do we need flag_unsafe_math_optimizations && flag_reciprocal_math
> ? flag_unsafe_math_optimizations should be sufficient since it enables
> flag_reciprocal_math - the reason for flag_unsafe_math_optimizations
> is to prevent loss of precision and the fact that on neon denormalized
> numbers are flushed to zero.
>
> Ok with that change and a quick test with vect_hw_misalign added to
> your testcase.
>
> Sorry about the delay in reviewing.
>
> Ramana
>
>
>>
>> Thanks,
>> Prathamesh
>>>
>>> Thanks,
>>> Prathamesh
>>>>
>>>> Thanks,
>>>> Ramana
>>>>
>>>>>
>>>>> Thanks,
>>>>> Prathamesh
>>>>>>
>>>>>>
>>>>>> moving on to the patches.
>>>>>>
>>>>>>> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
>>>>>>> index 654d9d5..28c2e2a 100644
>>>>>>> --- a/gcc/config/arm/neon.md
>>>>>>> +++ b/gcc/config/arm/neon.md
>>>>>>> @@ -548,6 +548,32 @@
>>>>>>>                      (const_string "neon_mul_<V_elem_ch><q>")))]
>>>>>>>  )
>>>>>>>
>>>>>>
>>>>>> Please add a comment here.
>>>>>>
>>>>>>> +(define_expand "div<mode>3"
>>>>>>> +  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
>>>>>>> +        (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
>>>>>>> +               (match_operand:VCVTF 2 "s_register_operand" "w")))]
>>>>>>
>>>>>> I want to double check that this doesn't collide with Alan's patches for FP16 especially if he reuses the VCVTF iterator for all the vcvt f16 cases.
>>>>>>
>>>>>>> +  "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal_math"
>>>>>>> +  {
>>>>>>> +    rtx rec = gen_reg_rtx (<MODE>mode);
>>>>>>> +    rtx vrecps_temp = gen_reg_rtx (<MODE>mode);
>>>>>>> +
>>>>>>> +    /* Reciprocal estimate */
>>>>>>> +    emit_insn (gen_neon_vrecpe<mode> (rec, operands[2]));
>>>>>>> +
>>>>>>> +    /* Perform 2 iterations of Newton-Raphson method for better accuracy */
>>>>>>> +    for (int i = 0; i < 2; i++)
>>>>>>> +      {
>>>>>>> +     emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2]));
>>>>>>> +     emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +    /* We now have reciprocal in rec, perform operands[0] = operands[1] * rec */
>>>>>>> +    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
>>>>>>> +    DONE;
>>>>>>> +  }
>>>>>>> +)
>>>>>>> +
>>>>>>> +
>>>>>>>  (define_insn "mul<mode>3add<mode>_neon"
>>>>>>>    [(set (match_operand:VDQW 0 "s_register_operand" "=w")
>>>>>>>          (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
>>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c b/gcc/testsuite/gcc.target/arm/vect-div-1.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000..e562ef3
>>>>>>> --- /dev/null
>>>>>>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c
>>>>>>> @@ -0,0 +1,14 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>>>>>> +/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize -fdump-tree-vect-all" } */
>>>>>>> +/* { dg-add-options arm_v8_neon } */
>>>>>>
>>>>>> No this is wrong.
>>>>>>
>>>>>> What is armv8 specific about this test ? This is just like another test that is for Neon. vrecpe / vrecps are not instructions that were introduced in the v8 version of the architecture. They've existed in the base Neon instruction set. The code generation above in the patterns will be enabled when TARGET_NEON is true which can happen when -mfpu=neon -mfloat-abi={softfp/hard} is true.
>>>>>>
>>>>>>> +
>>>>>>> +void
>>>>>>> +foo (int len, float * __restrict p, float *__restrict x)
>>>>>>> +{
>>>>>>> +  len = len & ~31;
>>>>>>> +  for (int i = 0; i < len; i++)
>>>>>>> +    p[i] = p[i] / x[i];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>>>>>>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-2.c b/gcc/testsuite/gcc.target/arm/vect-div-2.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000..8e15d0a
>>>>>>> --- /dev/null
>>>>>>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-2.c
>>>>>>> @@ -0,0 +1,14 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-require-effective-target arm_v8_neon_ok } */
>>>>>>
>>>>>> And likewise.
>>>>>>
>>>>>>> +/* { dg-options "-O2 -funsafe-math-optimizations -fno-reciprocal-math -ftree-vectorize -fdump-tree-vect-all" } */
>>>>>>> +/* { dg-add-options arm_v8_neon } */
>>>>>>> +
>>>>>>> +void
>>>>>>> +foo (int len, float * __restrict p, float *__restrict x)
>>>>>>> +{
>>>>>>> +  len = len & ~31;
>>>>>>> +  for (int i = 0; i < len; i++)
>>>>>>> +    p[i] = p[i] / x[i];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Ramana

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index e2fdfbb..fbd4bb6 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -578,6 +578,38 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
+/* Perform division using multiply-by-reciprocal. 
+   Reciprocal is calculated using Newton-Raphson method.
+   Enabled with -funsafe-math-optimizations -freciprocal-math
+   and disabled for -Os since it increases code size .  */
+
+(define_expand "div<mode>3"
+  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+        (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
+		  (match_operand:VCVTF 2 "s_register_operand" "w")))]
+  "TARGET_NEON && !optimize_size
+   && flag_unsafe_math_optimizations && flag_reciprocal_math"
+  {
+    rtx rec = gen_reg_rtx (<MODE>mode);
+    rtx vrecps_temp = gen_reg_rtx (<MODE>mode);
+
+    /* Reciprocal estimate.  */
+    emit_insn (gen_neon_vrecpe<mode> (rec, operands[2]));
+
+    /* Perform 2 iterations of newton-raphson method.  */
+    for (int i = 0; i < 2; i++)
+      {
+	emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2]));
+	emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp));
+      }
+
+    /* We now have reciprocal in rec, perform operands[0] = operands[1] * rec.  */
+    emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec));
+    DONE;
+  }
+)
+
+
 (define_insn "mul<mode>3add<mode>_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
new file mode 100644
index 0000000..dc507a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
@@ -0,0 +1,16 @@
+/* Test pattern div<mode>3.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target vect_hw_misalign } */
+/* { dg-options "-O2 -ftree-vectorize -funsafe-math-optimizations -fdump-tree-vect-all" } */
+/* { dg-add-options arm_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
new file mode 100644
index 0000000..9654232
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
@@ -0,0 +1,17 @@
+/* Test pattern div<mode>3.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target vect_hw_misalign } */
+/* { dg-options "-O3 -funsafe-math-optimizations -fno-reciprocal-math -fdump-tree-vect-all" } */
+/* { dg-add-options arm_neon } */
+
+void
+foo (int len, float * __restrict p, float *__restrict x)
+{
+  len = len & ~31;
+  for (int i = 0; i < len; i++)
+    p[i] = p[i] / x[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 0b991a5..48feb99 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4812,7 +4812,9 @@ proc check_effective_target_vect_hw_misalign { } {
         set et_vect_hw_misalign_saved 0
        if { [istarget i?86-*-*] || [istarget x86_64-*-*]
             || ([istarget powerpc*-*-*] && [check_p8vector_hw_available])
-	    || [istarget aarch64*-*-*] } {
+	    || [istarget aarch64*-*-*]
+            || ([istarget arm-*-*]
+		&& [is-effective-target arm_neon_ok]) } {
           set et_vect_hw_misalign_saved 1
        }
     }