From patchwork Thu Feb 8 13:08:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770849 Delivered-To: patch@linaro.org Received: by 2002:adf:9dca:0:b0:33b:4db1:f5b3 with SMTP id q10csp310876wre; Thu, 8 Feb 2024 05:09:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUcOiXMpJi5KaIBRjlk4B/lBx0QMbZqCzTSRLIQlXeg0DVuqiCnP+UgxO1RoiUIi5HRiS2QvLPVExuM0FQ/ollw X-Google-Smtp-Source: AGHT+IGi9JVYE53ecO+JAfwtblNOv4QfM8JVdQnzNAt6pa2wg6mM+s6f6I/BM7RysKvnuYMsNmMm X-Received: by 2002:a05:6214:2aa5:b0:68c:c0bd:9bd1 with SMTP id js5-20020a0562142aa500b0068cc0bd9bd1mr4445787qvb.16.1707397745505; Thu, 08 Feb 2024 05:09:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707397745; cv=pass; d=google.com; s=arc-20160816; b=zJNX9/xILDJ0F4Zu4RSKS4p4zFdT9hdopAQnY4FXehjBHGtmNFzJvu8MuH25mjn9GV gqwajr9nDxh3y8jxzlnFK1T34iHn+VJIcIUJC26Qvx4ROr5V7S3ArASB5IEco3m4DXX8 ekCE3h+tMCBIwBv8P8Cdo82oCDpJgAkOJ30BLkcwYkWRuOLRXgpPQjaslpprMcKtYvl8 eabOcxkq6KtrAmV/oA4oyxzjoPoygtDLaLj2mSk66gkKRsUQoJcZXJDYmYNni0HGg66R rDKgu/lzUEMuhUWrmuJWRB2n3aBYYmgYiO0P6CNG+5aOaAEjvPp9nVohjXDNtjcD23LV 9xJw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; fh=1g06m2GWgw1suEHdC/yX5vrQgz+DkQm1bs6AWXrA5q4=; b=OCN/k5w6ebAbjgY7IXBEiqdCxG/VLFqOg46lQrquE9rJngmYZ3MT8bIs9XAw+lfcfQ GKgkMUMeoymSK3BukyFJlrhFBKFuDU+LwI2DxQjTekXUPXDhw9rJ4iJybD77n+qBYi15 XO3LWf9TMauO0n0qu8KB2bDTkcxjcbgj7Ic+CNs6Ndqrfn4CJWdOvx/jHhW9lTYijHxO 7y0HiCrGFqgtFjR0oSqdGAKoYzP2Ek8a9HtWUCqNETk0EwmqEoVIkc15XfCfIc9C4TrJ DIRVoVxLH3kBejCNY9ysoodK2Om6DL3iIC0/sTDoptLKKCOmtUDXhuSv4jF9bhWWjes6 WhOw==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNnLtGJ7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=2; AJvYcCWcr8ttmGxT9jL6VWVNgKiNpzbcgcJ3rAN3VPPQGNpIdgGDP2+fp71WTprgBf5OmG8fJR6gMPH5kM1CkWJaVMI6 Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id iu7-20020ad45cc7000000b0068cb94a6c91si3482507qvb.87.2024.02.08.05.09.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:09:05 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNnLtGJ7; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1F68D38582B9 for ; Thu, 8 Feb 2024 13:09:05 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 10EDD3858C35 for ; Thu, 8 Feb 2024 13:08:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 10EDD3858C35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 10EDD3858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; cv=none; b=T1omClpwBnvM1BdWXh+tWxutC4X+vgOcAaRNJUuy6u4m4V5GtInfB2jTSJ8IkRBCMRRfgHTSn5cAzwlmchbia9t6CRkgKJ+xvD++emjdD06Ut+RQ1bRGH0yJflZrik0h6PldnXxYe/6yCcZiLiYADQSkgia5v1IXMYH5x/3C35U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; c=relaxed/simple; bh=xnJ+tDWAjOBmDgnMtZrrEqUVvVc1F2wm9LR8wmhqXdU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Edztt+KEomWTfaVOwc9hvMgl6COC/cPeX4r4Vn2Ux/MfTmq6rI7uc2SONB2/a2o/thRZ60nXRbcRcWrYc1QIZUuv4Mzj4g620s4Cgz2L9WG/tTpzS2ZNzlrz2dCMJ6ie3bPXpCJ4Z4YcW+OYi4iA/S44GfzkS2soHwtwu5wXSo4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1d70b0e521eso13963365ad.1 for ; Thu, 08 Feb 2024 05:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397728; x=1708002528; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=jNnLtGJ7XnTvbbyLbqL3j3PkUR9jYDohQZ7pMoIzK8J3fhQhR3u4ddO9aOQKJ/BtDM QHyFp39CMiy3aCzCWffVyREuHm1ciop4oD6v9KhrXb4TyVW3HdA3p/lyOxOGt/y/2Ihc U4DKcyRSKm5TmW88BLXwgKIzwJAZhTr8ORbn+fZseHV4ByrCnm4Ww7Peua9WwqmnRpIr jpFIzKnnHcWR86FxJzgR5llY7pCGAJBLl5DPgUgNU+EE6Bu8NbqjTp8AnkmnFjmpGF5n 1miqvOYQK/lqoJNXQis6fVcS7PijaHlfSCzerfzS3arOV1vxFD+6MUcQVKUnJJeSpn6Y GW/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397728; x=1708002528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=rvDZ9lqqIXh4hLysTQF7i2dCCMNi6TzYPUukjtU9uvFaIoXGfbeUoUXEiQkAqw7pIH 048OfhXNt6px22ORjQCqGMx6o8ge6oaGZcqgMcfxcVPX8yqhhO0vDlxezPyYke/ja6QX 3iP3ds4cIlEuQDwOz9aaW8XSt9yrNUN4i/QOuIkFLO91o04zHVP5i+W0QQtA2+5XLQlD W35/n5SUCCPhcj8tCQtKGVt9NzsfTLNxEPq+WDVc4ARpVG8FmDWgYA6dYq46P063DPnE Y+Hebym/9StSwYQyqRwa2PCCD2Vso83N4DucRlWc6GD5XSOR01tTCdNG+Vzan2ulwKaq wgvA== X-Gm-Message-State: AOJu0YzkShnWYF2qqzNvvtUBDejUd4MPTeGkMUaHvHlI4TpE4zGtoszw T106bioma9HvDsUDmfIdHtxpOt6YyawMtdMQnjwANYM4W468sfWIfeZf2TLXvD4c4gHnhR5EtIS O X-Received: by 2002:a17:902:cec1:b0:1d9:90d6:bed3 with SMTP id d1-20020a170902cec100b001d990d6bed3mr9365887plg.43.1707397728419; Thu, 08 Feb 2024 05:08:48 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUjoLx9ztK2D+zu35uCKav2hRKaOW1qlKifOYqF9AwPwPoGhThDzn/K1XfKcczIH84/I8JVzAHBe7xuFl2xbuBu+gagOGoqgKGHzPNuDZmWDyS6FXF9Scv80ipfWIFLbeYvSxCPeFh8XXo1EBfZucUCcdsQd5bX1ImlBw65kseOjbdyyhp7Nj5tOA== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:47 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 1/3] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Date: Thu, 8 Feb 2024 10:08:38 -0300 Message-Id: <20240208130840.533348-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu --- sysdeps/x86/dl-cacheinfo.h | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d5101615e3..f34d12846c 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -791,7 +791,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) long int data = -1; long int shared = -1; long int shared_per_thread = -1; - long int core = -1; unsigned int threads = 0; unsigned long int level1_icache_size = -1; unsigned long int level1_icache_linesize = -1; @@ -809,7 +808,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); shared_per_thread = shared; @@ -822,7 +820,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); level1_dcache_linesize = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); - level2_cache_size = core; + level2_cache_size + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); level2_cache_assoc = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); level2_cache_linesize @@ -835,12 +834,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level4_cache_size = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE); - core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE); shared_per_thread = shared; @@ -849,19 +848,19 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC); level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_amd) { data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); - core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE); @@ -869,7 +868,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_amd (_SC_LEVEL2_CACHE_SIZE);; level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; @@ -880,12 +879,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (shared <= 0) { /* No shared L3 cache. All we have is the L2 cache. */ - shared = core; + shared = level2_cache_size; } else if (cpu_features->basic.family < 0x17) { /* Account for exclusive L2 and L3 caches. */ - shared += core; + shared += level2_cache_size; } shared_per_thread = shared; @@ -987,6 +986,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; + /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of + cases slower than the vectorized path (and for some alignments, + it is really slow, check BZ #30994). */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_movsb_threshold = non_temporal_threshold; + /* The default threshold to use Enhanced REP STOSB. */ unsigned long int rep_stosb_threshold = 2048; @@ -1028,16 +1033,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) SIZE_MAX); unsigned long int rep_movsb_stop_threshold; - /* ERMS feature is implemented from AMD Zen3 architecture and it is - performing poorly for data above L2 cache size. Henceforth, adding - an upper bound threshold parameter to limit the usage of Enhanced - REP MOVSB operations and setting its value to L2 cache size. */ - if (cpu_features->basic.kind == arch_kind_amd) - rep_movsb_stop_threshold = core; /* Setting the upper bound of ERMS to the computed value of - non-temporal threshold for architectures other than AMD. */ - else - rep_movsb_stop_threshold = non_temporal_threshold; + non-temporal threshold for all architectures. */ + rep_movsb_stop_threshold = non_temporal_threshold; cpu_features->data_cache_size = data; cpu_features->shared_cache_size = shared; From patchwork Thu Feb 8 13:08:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770851 Delivered-To: patch@linaro.org Received: by 2002:adf:9dca:0:b0:33b:4db1:f5b3 with SMTP id q10csp311262wre; Thu, 8 Feb 2024 05:09:53 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWvL1fSLcEmez8Im9zHOU7wSi+/xgMqnytZJwSVQX9vptf9h8ZJq2PmfkTqKLdXGqSwwJOm1S4LIO6Yk0YoNe9b X-Google-Smtp-Source: AGHT+IGJnH+ozyqpEr5tBZfCyYp4g0LiahQo/QlILrNS8dfcVuCB6LSXwMHTij171XPrBv/Q/w+n X-Received: by 2002:ac8:4243:0:b0:42c:4771:79c3 with SMTP id r3-20020ac84243000000b0042c477179c3mr2461847qtm.66.1707397793363; Thu, 08 Feb 2024 05:09:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707397793; cv=pass; d=google.com; s=arc-20160816; b=o4zojH5ES3f9fn8CysMPpkjue4KQx4nvQlWx2npeeqrQGgLw6GQpTmybw3nsv9nwzM 5tRVm50reSvbW/I4MU/uGbx57LfVZPEoUa9wyRec5V80RLwxOP19U12D7WhE9PPBqBqa nIEyejsOh1LeAferNqtByuWNJhc2w4Ym2fsPn8z4TSdDvuSKMbdxmbYtqyG/+0XsD0PK UvckHOXRepOHla4sPNfuy2rkD0w4dZKjWTRcEE1/DJx6jEhXfc6sv8yrCeC+HxhhOGPz 26chzsEBwCQqFLfH8p59zs0SuXKu/H6cwh5sjTXvvEjfGhbtrsF3oqM/IeDbqd5DbpQM x1Yg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=b5wsbm9BSPv8Z6b84ObY/vrxF/C+AQXAZoSPnUTkPBs=; fh=DdY3pZvR5hCKI0nr6iNieK/KFShWXEZXyEIMdfhnIlA=; b=ERPCgN9dotxSJ0HfBQchy3GWMBGcZ3DsnG7xcQIcIcV2lcIgxX7FLf+hIXmyqrgq/p D8cS+JBiQ2CnITRfbS28o6e+AN+pfBbaQ4EpkaOKeODbuZ8v+wg/MVHPy0LrUtvh7Ojw sviNYpRYYM929f4ZjU7j50X8cjqO8kww0R8oOIdsfG8qlHfBP8qf0KuTVQXd/xGfoogG BUtRtjAMmywZ4t3emrAzOQPK/H8D9b0k2HH1TFAuvVMVGta1j2HStxBe+Th9f8gOM1r9 ZAQH0gbOykGibXoqLV8ev2+1y+ujYWwHNZ5/p9yGaS94cJ/TDowKWpmtoQncefZNp+CM dPMw==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="DIz/hjll"; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=2; AJvYcCW+un/G58+nkFNU/KtCXghLOmAbElM6iIZmt027g8g3YRxvLeHRQDFuRq78V4Ck15L4xqgrfE4EB3p2iSPtqFyg Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id a5-20020a05622a02c500b0042c450f1144si2495727qtx.357.2024.02.08.05.09.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:09:53 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="DIz/hjll"; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BEDC9385842B for ; Thu, 8 Feb 2024 13:09:52 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 7E8DD385829B for ; Thu, 8 Feb 2024 13:08:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E8DD385829B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7E8DD385829B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::635 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397734; cv=none; b=IJFvEEBnjS30FyE0wasIWpE4C3iRu+NbF5iBYWidBxCB5oOH4qK84jVE8D6U8E8qmlkRxe+dBg9Ki/mNUSqg7Y0+mhGc+dD7LAhQvjPhNC7typ+ZjRtaMXoVqoeS885FgJwMjhZ3URp7VhaHeO6ZoMWXNxECCgj92TOOu37gUgM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397734; c=relaxed/simple; bh=Ch559aowFxilJioZ5WAccV0cVW/v+zYbT7cvanPw3CU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=EiJ7K28dOaupvkpqW5AR3xG44SdDD+HhuYDO50I4ntcOfrPddopikIlr1X0G08uOAQVytWYkI3+466zBXL5J9wZSmO/h2kjAuN9Lkbz5kiNnS6y1GmCdtqfcI14UsKWEJ8NKWfsMq+0tw1DoHFIZcWt2ORtzigZY0p+u12QxTbI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1d7393de183so15389835ad.3 for ; Thu, 08 Feb 2024 05:08:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397731; x=1708002531; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=b5wsbm9BSPv8Z6b84ObY/vrxF/C+AQXAZoSPnUTkPBs=; b=DIz/hjllBAqcpEJFdETiOe0tTEru6ur+CH1ebgxhqIxd8arlojkZczO0aRsRgvVFk4 +osmIQGmUl0V1ghr2bH9Qk/U+ngvxLKiZOI1gyweZLEUoYEjyLe87CaCqAqeW+Jet12s 8hMY7+hGHhVGSYuj8FtgB1fvQ3VrlB/zaX1U8pD6S7c85lwcqs9gSUfiBEZpilSUqM6O jig+qRAR2ab/YmN4FdqumHAiNlc8k+TufhLA1GqpsY7YDELBmpKHIXuIDspzo4zw6YPi Lm5xftEAQ6N3DBTZ+bpYa9nO/K/DK+xRUjmGGaq9qZPlEG1gZdlEnNR2VJeVYo6X7tIH 0b+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397731; x=1708002531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b5wsbm9BSPv8Z6b84ObY/vrxF/C+AQXAZoSPnUTkPBs=; b=tD3wP4dujyOiVYWsz3keZ/phXuAhk7Aon5zvjBFYEnF2SWZdaDKAXjxENBB8RNVt9L Y1uZaYs5VnMV5Z8LUZRfnm+s900XIwwbsoBpHw5da+4CMzTVJG7nmRwDdSqtS9Udi7qQ IV1zcTJLq/C4GJk7+V0wONFkOLYygNRA++SFqoBdL0AvdhNXVH1hj3p6/Jygw2yqHYO0 cy3P82w4UckEwh+Omyqw5Z2IToU7aldXVUSfmUdEKXsNPBm/vymK+qTrKG9xgXsllN0g 1U0sPGv/kD6ir3m/YHkdkCZwNBwXgDIuXyYF1BWANUOmyxYA3MBblCsJL3NmMsrPpeNa qfSw== X-Gm-Message-State: AOJu0YyltBuxDRMbkDIemrOYbmLKbmxkCMXNlWrJRvQcEgyC+VbmXMPs OajdrxxsHbd/9C11gfgbR/kcOh5lJC+FFJQ0WoyngIcz1GG0XBdsAddxqaiBURSodDC19keCrKu R X-Received: by 2002:a17:902:ce86:b0:1d9:65e6:4acc with SMTP id f6-20020a170902ce8600b001d965e64accmr9413666plg.42.1707397730860; Thu, 08 Feb 2024 05:08:50 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUaw6A9zju5zpMcRwlw0Xpd+wVFqP9Qf7cIhmGntAeI4HVKcKSHCpBpNVYtqsShH383xh2PgESu9zJUVZ3/l5O+UEWVVGNNB0/dEjvWjvoTiNxFioskIgiMHfsAcI7AN2Kl3FR3kWwTpztv00eEkRkX4pFijh8J3uDsyitD93/LNpqxS18s9SjsOA== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:50 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 2/3] x86: Do not prefer ERMS for memset on Zen3+ Date: Thu, 8 Feb 2024 10:08:39 -0300 Message-Id: <20240208130840.533348-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org For AMD Zen3+ architecture, the performance of the vectorized loop is slightly better than ERMS. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu --- sysdeps/x86/dl-cacheinfo.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index f34d12846c..5a98f70364 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) minimum value is fixed. */ rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, long int, NULL); + if (cpu_features->basic.kind == arch_kind_amd + && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* For AMD Zen3+ architecture, the performance of the vectorized loop is + slightly better than ERMS. */ + rep_stosb_threshold = SIZE_MAX; TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); From patchwork Thu Feb 8 13:08:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770850 Delivered-To: patch@linaro.org Received: by 2002:adf:9dca:0:b0:33b:4db1:f5b3 with SMTP id q10csp311235wre; Thu, 8 Feb 2024 05:09:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IGMT/4ktrTH1gGwzDiBUbTleHHvYKPs8p4ysjD5c7FtbtS0q0njkjfu/JhlAIwjH/e+iz7R X-Received: by 2002:a05:620a:2907:b0:785:9f2e:d171 with SMTP id m7-20020a05620a290700b007859f2ed171mr4501234qkp.74.1707397789658; Thu, 08 Feb 2024 05:09:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707397789; cv=pass; d=google.com; s=arc-20160816; b=Ql1/m1e1RH/ewFDrjvcBM3JwVP+7I/evzsM5sKo8ZJsUNcPqK3qIrQeuDLJi9veLd4 eo8DCednYOs4FEmPlh1am6EaZPRi3Z2kv0t+0PrFhmLN+CsOuGUJN8Gqumnl239fi8tl Nd65HF47plvH4zQS72xlqvnL9oZ4EuHcq9LNh6zbE8/UsT2pUE26QWE5gOpeFfQtE00F KU0ZFMD4TrVkCwLRvufywU1uYU6PPyLOUsd9o8MkQ9/w0U2caoU3umM7ERKTUZc4YcpM oVXAf7xIJchg2a0DFWc1h0Hu10A2y0duNQDDHU7xs3RDnkES7TLJX1vscJNk9jetKH86 EGDQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; fh=iKc2SJDFfQxu7vkBYL/+KloWDPetv8VqNxHHX7re/eM=; b=0663+47RUEubiTOkAtuHs5TxDtecFxhQAxBSeXZ6ya+WToTDJWUN/jIqhZzr1QmXJC aQqSJaWTeLyELwmYs3ExaQXc3QH5rf/DKlU72oPeKrWiTZpxDgjxIgLzZsdK9u1zYNWw vtSLqFuWvd3a+xtCgaxHZu8TiGG3hb8mhWaFLSCE2TOsKUfZI9nkc05FFSba/RrreNpV ziRbE2LK+1198ZiNvxr2P4/gKsRF9LfaKJmr6njEbaHN7WPKsfFMrz+FTKOrkVmZOZ/B f9gD9P/UZ/8urmIpKuLHjqM4DgNZZGTzPVzudoEVjR7nZ04VKh/fyh+dZgHWZKSA1/eQ sRBw==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VZAU99oS; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=2; AJvYcCX753+6YAZL8iX/sFF9oEVPlrsGb7gIgcLZytCTaz/P8BK5IC0SWiATA5/xWAfff7XjcMJ8OY25FtuwsXpLv1VQ Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id x6-20020a05620a448600b007858f967be7si3533271qkp.139.2024.02.08.05.09.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:09:49 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VZAU99oS; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 46A943858298 for ; Thu, 8 Feb 2024 13:09:49 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id AE67C38582B0 for ; Thu, 8 Feb 2024 13:08:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE67C38582B0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AE67C38582B0 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::629 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397737; cv=none; b=d5O0v/mTbTObnLXNuVtfMQpB2rHdqSps0IPy1c+JufTuB0mo0BIz7jvRzCI/6/xPGUo9eRIhU8JNs1UtqKJ7F0v26XU8G6ZiMHC0gsxaz0mign8hUIGw+qp9JTXvzxG6fM9S1VTMz3+FFSFYiU6Zhkul99406+tAYEDOyTM35i0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397737; c=relaxed/simple; bh=QybbG1EfGsJYvEBRrGGIS59qa0NkKuxbzLTdVztndRo=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=R7OTCid54xuW8evny3yGOOuFJ8wEfY4zVU5HOb2YGuzBnp6lbg/70t/A8qaXbI4r7o7d1fm73LyFlCAnATXDVOfJRIvxBtNcdGIZElhrKyCPAm6mGL9IRvIxOQ5boUUG9NliIpfYAbkJbYqzlA1PGxNTR5jj08vdgcwbXQIx4GE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1d91397bd22so13996935ad.0 for ; Thu, 08 Feb 2024 05:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397733; x=1708002533; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=VZAU99oS8RSTCE7FFfUv/QF+0NToW+s0Gv2pfCTkGixApq+cyKELAhS0CZVJ16BR37 uboissQMk4bgZ0wIWR9pUk7z9/O0ZG/DrMBlcWoImBS3RcAkGg86PNtUsgkd6CYw3bkW EtdNTY4+LrMEvVN+WyLhV88H5suItRThVnGbRyA9RZuwm70m5DxJaW1vqS1TOQ+jAM0h 3Gvwkk9UqcrwCgB0PT+dNJ5VNRJstUE/RleFiUCPRLj9B2WeXJgifSi/EKvCxFWoLeS5 NuIFZEDnt7Yxyf1AErgtFmjiHmFB/UMzKAo1QgBAIWfUggdXjdv9xCKkV91n6rfNj2RA lwmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397733; x=1708002533; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=BC3OOWmOYkid6mlrr8IWzQv2cm3QNXzwCTGMY1KPsgvWe697nU19Fh0s6CZFSu5TjB 6WCeJ2b46xumd0FFJKI5fGRXpG91Owu9Zz4w/uy10c3HiQRwJMAbJGoE9nYGiq13XHk3 qyvi21RbzzNnzgbr0DlPLz4Cnuc4zdvj6FME3l5DKNkvq6HS/cNhwlBKrYylPEhrt3Q6 rlEMyKptNxIYwaa1LfxPq9rm6C3SqkA49by/vbqd0JC9E4ZN6j2/DbcPxTUuRR/0nX92 n/sEAmQWuh/c+2GH/dok79yBBXf777D0+F10DqhwJVfqo9qMRfA44GTilZqlXuJel5hk JR+A== X-Gm-Message-State: AOJu0YzK33Z7s/HnnFrfXoPZL8ORKZw5BAD4ZoNeCEHD7Trtw+OzC6zp aufbLS3QqzIQ+28zTMBuP3BFLOjiVv2vfbiSKrRHxHBvssvV4r6+/vZbuEDFnSE/0ynPdF8x1Li S X-Received: by 2002:a17:902:e9cd:b0:1d8:d478:8b90 with SMTP id 13-20020a170902e9cd00b001d8d4788b90mr7187310plk.16.1707397733243; Thu, 08 Feb 2024 05:08:53 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUnKTUJlOGjvGEMzhqdPo7U8xv22ndra3vAghIA4xh/oIkQyXh+dsNVJEgOSEjRmmcSC1eQptgxq0/RZkzQe/wSr6zgmdhl86ocZY0NseHI51ahGXDHRmZ6z7Rky41/gUllUSmiUIhuT6eWdpqMQrql3O+3vF/CB7zi9TqewxIka/+q36hyTMhhMg== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:52 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 3/3] x86: Expand the comment on when REP STOSB is used on memset Date: Thu, 8 Feb 2024 10:08:40 -0300 Message-Id: <20240208130840.533348-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Reviewed-by: H.J. Lu diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 9984c3ca0f..97839a2248 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -21,7 +21,9 @@ 2. If size is less than VEC, use integer register stores. 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with + 5. On machines ERMS feature, if size is greater or equal than + __x86_rep_stosb_threshold then REP STOSB will be used. + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with 4 VEC stores and store 4 * VEC at a time until done. */ #include