From patchwork Tue Feb 6 17:43:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770400 Delivered-To: patch@linaro.org Received: by 2002:adf:a40c:0:b0:33b:4b49:db74 with SMTP id d12csp79277wra; Tue, 6 Feb 2024 09:43:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IGmUYPzKkyuyTfY2Zw62rSJyo+EqKH+C0MzGKafFHoYl9CbAfZWhHx8hjKR+kwf9RR4QQFr X-Received: by 2002:a05:6808:384c:b0:3bf:db87:186f with SMTP id ej12-20020a056808384c00b003bfdb87186fmr4255707oib.0.1707241427082; Tue, 06 Feb 2024 09:43:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707241427; cv=pass; d=google.com; s=arc-20160816; b=OkKqaSWFw+HwSUVC1utRpydxJdAudiBiSqNBXdu/FqbO0U63URwm38Zidb75K3ToVP GuRiz8nMBRROCE6dJQuR6rpDvxkpNc41DJYEPn3S7Y2qU/zn75hhgCvyVuw7DmvMvOhE rjQmgDYB9Zltj0pfWjay1c/QN9KJCytOOdh4EzH6YrwusulIREoF0rQoeoA8O8lhBE7K WcSxKtXlen4bL+O1kA+5WV+dg0C/p0yOpGC7G/sPvWZak++uQSuOs/cIdFrx8wyFkq3R gEP64zEY8WHQp9R9hHabDUSFclDWZm4QnXlNvW/AOxIFBL2y3LaqRTkjTDlmjhRsZPCn UACA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=CL8RBIO7XYmCI6VfVvkOf2fLr3iWhx6O1jhNkLl4vCc=; fh=R9WhWXeerhqnXOgetoxVRo+3XAmisb/qJfxCwi1r25s=; b=gipcu30ImHVYwedyd5I5n1CsQi7L58BVzCDJzpY5X5ZN8bKJpw87Wsn0jk/PiqTO6n NH+j46oSLDRNkHyidZY8x+31UJWaK9S7Ia7IxyQCH2lxFUfpQzUYPMszP0u5SM3dnGEr L7CsuRKdWgZKztJO4UFvwRLHYk2BY09SeNwjNsX+whrBwQxN+Z/zytjZaLWxoCXH5TzT OvtiGJVbHm1eFR9wZPCNss1yaQdhmf8iWd0lSmOPzpqvbW5i3LGZVhWwNK1hByds4KpM xGij7NBkUt+CHRztOwwtlE+cBKmwAa5vAbOS6GNZZ3mBvsuOgz0/utzSOYYZ7ReDm43K j5/g==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mZIM7hUK; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=1; AJvYcCVwTK9jVxLax2pfezJkJkH6k96mvHDXLUYoXEyHSPx58kWm4cdV2xEjFT/7DpHYLtmNjrGOlZMQ4oJGIPagHXZD Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id c18-20020a05622a059200b0042a83f2cb61si2728819qtb.72.2024.02.06.09.43.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:43:47 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mZIM7hUK; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF5D53858C78 for ; Tue, 6 Feb 2024 17:43:46 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by sourceware.org (Postfix) with ESMTPS id 7E6133858D38 for ; Tue, 6 Feb 2024 17:43:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E6133858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7E6133858D38 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::52b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241415; cv=none; b=SAxESse2A7cKQHeG9h8KKPNvoVsVsKBpW0fVONyhDD7f8sB1j+5h1u8I6KHoCBSnJywWUR+1pg3OIRWPb43WpEKZ/U8kmKtPo6dY0d+Gid7JoEWg/zXq3iKSRkyZcqu4C+ssJS+ar+QaCQUG770Yq4AASoYctx/dxVx+5tRXWHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241415; c=relaxed/simple; bh=mJcAlGvyQw45xAANe9KdykZOEILSKS1fcpJ7Ib7Q3Mc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=NNEf9yDHzL5A0btgHLyLPsLH6GgQw2oZ9Oj8BhDwZ3FlNZsu3jdq6HfXp7iG3/DLFXoC/qhCrcelRRp19Sh/pRKcEw4SxgrPV4oV1pbPZ/lFh2+eyHgNqZmqkDgsQzbRA0qxlSGYlvZpKL2EinkwEyjUjdzUjjbc69+MbOhko4I= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-5d4d15ec7c5so5354344a12.1 for ; Tue, 06 Feb 2024 09:43:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707241411; x=1707846211; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CL8RBIO7XYmCI6VfVvkOf2fLr3iWhx6O1jhNkLl4vCc=; b=mZIM7hUKVXEBrgMbmqIKloDXqjC7zG/S6a0Xy4FHKNxnk6YMXpLvXO02ZhgwJbMKT+ nu+uoXqU3214+LKUxRtYnW8K45HTEOt0ImWu6HF2mU313FMw1v/P+OcnQaonUSaJOW8T 37flXTK+C6gYCdzPG9837T2AQ6LigebkACY9HJDTssWH8hZwOLzWVg2hXhGY7M24t8vN 10ITgU1oSK6KMHIwnuPtp/j/ALIz5Fa4b50BFIl1VAq5YxejZFEtY9RMK8Fws8WsfZSa EQxUkOkEAe6GTjshY1VP0WY0stte+nnFgy/bnUkUeGW/EsAXuj6VBoNK6LAt8xEYtldf NSFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707241411; x=1707846211; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CL8RBIO7XYmCI6VfVvkOf2fLr3iWhx6O1jhNkLl4vCc=; b=ZvyfdXFhwu+eVXZey4gnToxmTl8BqcAg/eLfxuj9pjmyTGEXJwTY9XmWFhjaxvk4wY T2mQQgH4oS7pqPzzNr7qSPzBApINeTOWHrmmXMWQlsjRTVSUjvkLDuYhtVB9LW0FVv1p Jzq8ssQ2RyeOLJIP6rPAehgCacml+zza/qVuejXM3Q1ECQBazuDCY6jcnYXSOe7mukdQ Y2lkKZI/1bMW1KALTupK/0R7w4sYo26U7kz7qo4YyZlUq6SYlWdDgIMY8+j7YHYH2Dex 1k1nO9jjG3Vtz9qVnCpgILq9b1CVQxUzLKC+BOVRU+zRWDg3BpfLJVm2E+QRWsEpRFEx M8OA== X-Gm-Message-State: AOJu0YyX51FY67nS5S/AAtdBxpOH3ZXyG9F2ZkoEHIrc/gGxIi5zwy/w XRWASzc4lAlFXZPIhZrxx3jJ0MSquiCkMI/K3oIW60sBZzZqdZI9taTczE52JvotymaSvKtT0vM S X-Received: by 2002:a05:6a21:150d:b0:19e:9a59:20df with SMTP id nq13-20020a056a21150d00b0019e9a5920dfmr2309100pzb.9.1707241411011; Tue, 06 Feb 2024 09:43:31 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCWCmL/cznSrjf3zN+EpC1r4JFDxr2V+btk6sM4i3l8KNHNEKf8pIzEihavCQHY+0KvMNMRdXGdVkdjIPSnC13gWCJxYsK3YemHz5M2YRqHPY+dV4dJLK7cUylfQAb3sUPyppZLXfPHpBHUdqivVEkCzk8m8YUcgzxW3827dXWHMuO67+S4Mw+l7kw== Received: from mandiga.. ([2804:1b3:a7c0:378:b5ab:9c4b:bdc3:2870]) by smtp.gmail.com with ESMTPSA id d22-20020aa78696000000b006e04a659ed6sm2248598pfo.67.2024.02.06.09.43.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:43:30 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v2 1/3] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Date: Tue, 6 Feb 2024 14:43:20 -0300 Message-Id: <20240206174322.2317679-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> References: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. A new tunable, glibc.cpu.x86_rep_movsb_stop_threshold, allows to set up the higher bound size to use 'rep movsb'. Checked on x86_64-linux-gnu on Zen3. --- manual/tunables.texi | 9 +++++++ sysdeps/x86/dl-cacheinfo.h | 50 +++++++++++++++++++++--------------- sysdeps/x86/dl-tunables.list | 10 ++++++++ 3 files changed, 48 insertions(+), 21 deletions(-) diff --git a/manual/tunables.texi b/manual/tunables.texi index be97190d67..ee5d90b91b 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -569,6 +569,15 @@ greater than zero, and currently defaults to 2048 bytes. This tunable is specific to i386 and x86-64. @end deftp +@deftp Tunable glibc.cpu.x86_rep_movsb_stop_threshold +The @code{glibc.cpu.x86_rep_movsb_threshold} tunable allows the user to +set the threshold in bytes to stop using "rep movsb". The value must be +greater than zero, and currently, the default depends on the CPU and the +cache size. + +This tunable is specific to i386 and x86-64. +@end deftp + @deftp Tunable glibc.cpu.x86_rep_stosb_threshold The @code{glibc.cpu.x86_rep_stosb_threshold} tunable allows the user to set threshold in bytes to start using "rep stosb". The value must be diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d5101615e3..74b804c5e6 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -791,7 +791,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) long int data = -1; long int shared = -1; long int shared_per_thread = -1; - long int core = -1; unsigned int threads = 0; unsigned long int level1_icache_size = -1; unsigned long int level1_icache_linesize = -1; @@ -809,7 +808,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); shared_per_thread = shared; @@ -822,7 +820,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); level1_dcache_linesize = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); - level2_cache_size = core; + level2_cache_size + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); level2_cache_assoc = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); level2_cache_linesize @@ -835,12 +834,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level4_cache_size = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE); - core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE); shared_per_thread = shared; @@ -849,19 +848,19 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC); level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_amd) { data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); - core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE); @@ -869,7 +868,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_amd (_SC_LEVEL2_CACHE_SIZE);; level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; @@ -880,12 +879,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (shared <= 0) { /* No shared L3 cache. All we have is the L2 cache. */ - shared = core; + shared = level2_cache_size; } else if (cpu_features->basic.family < 0x17) { /* Account for exclusive L2 and L3 caches. */ - shared += core; + shared += level2_cache_size; } shared_per_thread = shared; @@ -1028,16 +1027,25 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) SIZE_MAX); unsigned long int rep_movsb_stop_threshold; - /* ERMS feature is implemented from AMD Zen3 architecture and it is - performing poorly for data above L2 cache size. Henceforth, adding - an upper bound threshold parameter to limit the usage of Enhanced - REP MOVSB operations and setting its value to L2 cache size. */ - if (cpu_features->basic.kind == arch_kind_amd) - rep_movsb_stop_threshold = core; - /* Setting the upper bound of ERMS to the computed value of - non-temporal threshold for architectures other than AMD. */ - else - rep_movsb_stop_threshold = non_temporal_threshold; + /* If the tunable is set and with a valid value (larger than the minimal + threshold to use ERMS) use it instead of default values. */ + rep_movsb_stop_threshold = TUNABLE_GET (x86_rep_movsb_stop_threshold, + long int, NULL); + if (!TUNABLE_IS_INITIALIZED (x86_rep_movsb_stop_threshold) + || rep_movsb_stop_threshold <= rep_movsb_threshold) + { + /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of + cases slower than the vectorized path (and for some alignments, + it is really slow, check BZ #30994). */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_movsb_stop_threshold = 0; + else + /* Setting the upper bound of ERMS to the computed value of + non-temporal threshold for architectures other than AMD. */ + rep_movsb_stop_threshold = non_temporal_threshold; + } + TUNABLE_SET_WITH_BOUNDS (x86_rep_stosb_threshold, rep_stosb_threshold, 1, + SIZE_MAX); cpu_features->data_cache_size = data; cpu_features->shared_cache_size = shared; diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list index 7d82da0dec..80cf5563ab 100644 --- a/sysdeps/x86/dl-tunables.list +++ b/sysdeps/x86/dl-tunables.list @@ -49,6 +49,16 @@ glibc { # if the tunable value is set by user or not [BZ #27069]. minval: 1 } + x86_rep_movsb_stop_threshold { + # For AMD CPUs that support ERMS (Zen3+), REP MOVSB is not faster + # than the vectorized path (and for some destination alignment it + # is really slow, check BZ #30994). On Intel CPUs, the size limit + # to use ERMS is [1/8, 1/2] of the size of the chip's cache, check + # the dl-cacheinfo.h). + # This tunable allows the caller to set the limit where to use REP + # MOVB on memcpy/memmove. + type: SIZE_T + } x86_rep_stosb_threshold { type: SIZE_T # Since there is overhead to set up REP STOSB operation, REP STOSB From patchwork Tue Feb 6 17:43:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770401 Delivered-To: patch@linaro.org Received: by 2002:adf:a40c:0:b0:33b:4b49:db74 with SMTP id d12csp79590wra; Tue, 6 Feb 2024 09:44:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IGjxNgxmMJpezo4CBEhajl+o3H7JfQWUPGW0+fWlVoaVW3rNa/LZty6a0k2W8VVzlZWzrD5 X-Received: by 2002:a05:622a:1647:b0:42b:e367:89 with SMTP id y7-20020a05622a164700b0042be3670089mr14638810qtj.18.1707241474319; Tue, 06 Feb 2024 09:44:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707241474; cv=pass; d=google.com; s=arc-20160816; b=gOD/pl78lsf52R8+l+OKpqCrzg9Bfzk3hg1kexw+QQJ1rk2TI570LtMyxdpA1K5OVL 7F6Ro185m5axYytm783ZZZzHnP8lNLQtb+S36g69HTX76WH0gxFnCX9qwe/WD1fya+5T fMNDAzOqwVssgIGxRXi6BsnY07NLW4BmFZda0MElopjfNO1zcjTMKoQjWaV8OdmoKp65 GH7ruSxsqiI5QIx6lcH+LMPy09w1XRRbzv5KZQlrVvxL0lZXzyo9VN447li2++KsXpAF 5K8mtQuCYrqwMVIjQlQE064AxIX1CKaET/3RS+bjBanrON6gShMYYTfDINQUH8eyrcmk 5P/A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=3oOZZr816s5uQyedz/JUAmFcTNBOpuoMQNZAubL+RKU=; fh=Uxfe9xwOOoXnuMehuJVa71E52uGgBkisiqi9I6344eg=; b=yYDCnuD2KgDvcG8JphHawfLZxLV4iBRKwqrCLKQQvMGcWK6G699CA27qH9bSTnZcAT FxGdOdQbOUqRlNuZ8M75PHtgYJOYU4ySYZBXYgIcJByYCMF3TfS2NUd5sHp7iV07phlK tEqIb+p+b/ergztFVaO1oTomPNw/1ULdJ7HHplJECMFGPbIMJhRcSP+mmMZN5TGnf7C9 6bfNgGPzho32d0S7ROKx0f720XkeEUtFfAmYmIv63ov6w7+ZrdPcA/544Wn+mRGkm7Eb XoNfEe1M64/slbJdTrX000trQrHgSmYyln5NzgMKNGe9QA/aavNyWlYnm0bf9O27K8S/ 9J0w==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=UGja9J2k; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=1; AJvYcCVHAoAmNVWkJ6kMdKK8fekVfrCFLJfOIm3Y63lYslEm7djan5yqyCpS9J8FWz9jShuuJaZrZ1V50nCKw8q1LJKg Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s21-20020ac85cd5000000b0042aa62cdbf7si2960245qta.444.2024.02.06.09.44.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:44:34 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=UGja9J2k; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F141D38582A8 for ; Tue, 6 Feb 2024 17:44:33 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 2AF8F3858C53 for ; Tue, 6 Feb 2024 17:43:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2AF8F3858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2AF8F3858C53 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::536 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241417; cv=none; b=A47GTeQXL/ppz6vQ5QCxkTa7ZQW+lhQaStgCTnvXN4V4RjTNjSD/lOm+opuwNpXnUEAAywFeIqp6CJnNbcSJPLyRVlgoeDY3HfeaMu3NL8iUYNBTGssweHHdlecIkYaG9imtzlIkovNnOceeZUSjKjFjpEZo0rH2GlWpVzJ3GdY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241417; c=relaxed/simple; bh=GhNpOfF+xAXLWq4jTg1EvkZwYefR6E416wileLgaZMw=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=boMJdu2E2UHlHKDjmkrsGWdXZ9ZVtD2Q+TVj9cw2uxapJyz8BXVFAhU6vfID6kJIxULurM5Bj3Z16d/IYGO5LOHg9oxaZ4IWTvw/DbV9YB26HG7Y47Yepis8u+qPNX4mr8SBTGL/Y/Ku2ZE8qs8xCQ5t87HUV5Yszl9PzNqh9UA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-5d8df34835aso723894a12.0 for ; Tue, 06 Feb 2024 09:43:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707241413; x=1707846213; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3oOZZr816s5uQyedz/JUAmFcTNBOpuoMQNZAubL+RKU=; b=UGja9J2kTLA2Tdp4aFgoWmrau0YvxJtES+Uo75FO06Ke90OFE2tCrfCL/WvkiL+cLo iQ9sijIHxx/t33C/SKwnirost1+fkDrhucYf8WT6SeNOZk5XQUsK3YSAUG5nwNlwKDQ6 GHFFfW1ZAZaq3ZTOV/tkLRAlcdtjvQCOqN0uawKpveRO+yWlwuB1x0YGftHLeNS/Ik3y hH9eP8cy3TNHkQNUZEljO16iV6EDR0n09fQagdgJ1oj/UcSeAbLvi2PYb19VM51fGKQV m9q8F781/i+QsXA078GMq8Oy1MZT67txcFAv6KMnn7TFRkCII3qqg4CytCPPT4vqRYUW 1N+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707241413; x=1707846213; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3oOZZr816s5uQyedz/JUAmFcTNBOpuoMQNZAubL+RKU=; b=Amh4DSz0OMNzu2c7f8n6zaHvvXtO2ycjKE+IDnMC67jfhhEItL+OXfqGTGhPXk6P3r v0AWpiOCO5payDNwbU7yyvxnXhhmGbYlsezs7Lz9uUJGRmy9G1DAD7FeqJDiI91BQvDF nPATyCk0OPmDeD2drw3+66LRpN+VUAVoFJIb9tQuNZ7FNKn1cWTDkMHVqV+cgf0J8pKQ GLik5J2QIshteXLhf/1Ax9AsUplRWTwMv11TS61xvPWfgLT97GADmnVC724i+VczP5qb UelEB62MmGtAw4KWQwkD4s7EHKWDutt62Cg6WNB24c5CIyuian2fvXLrYeJ3nMd8RTm0 a6wQ== X-Gm-Message-State: AOJu0YwxZjmCVyyyv/1W0ICW7/vwgz76w2mrMa/EyO3EaLHyAKBXHYTV lr8Wu9L3zQz7/rHYI8xNkgrOfi+OpnlWHWWwImPzfsJnwMlqk1Qo0niqWLccLV7i3/mqh3Xnj6J m X-Received: by 2002:a05:6a20:7da4:b0:19e:9d92:b692 with SMTP id v36-20020a056a207da400b0019e9d92b692mr1245621pzj.28.1707241413455; Tue, 06 Feb 2024 09:43:33 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCXfRB6BZq9AiHCqkdGc4gPtR/yeSQLUYdW7ad69MtpQUTnqKn9FkziZmI5FJrMrvipHO4optoRuRWIYfj+XQw2qjV1mWJ8pRMePlw2G2un2JpkPWw2I07pmX8zALP/og2QAD1fTLJYaewUl6b3KBPS6r4j5Ml06q/ni8YKyUztoPzm8T3R3Gy2SAg== Received: from mandiga.. ([2804:1b3:a7c0:378:b5ab:9c4b:bdc3:2870]) by smtp.gmail.com with ESMTPSA id d22-20020aa78696000000b006e04a659ed6sm2248598pfo.67.2024.02.06.09.43.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:43:32 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v2 2/3] x86: Do not prefer ERMS for memset on Zen3+ Date: Tue, 6 Feb 2024 14:43:21 -0300 Message-Id: <20240206174322.2317679-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> References: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org For AMD Zen3+ architecture, the performance of the vectorized loop is slightly better than ERMS. Checked on x86_64-linux-gnu on Zen3. --- sysdeps/x86/dl-cacheinfo.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 74b804c5e6..f2cd6f179d 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1010,11 +1010,17 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (tunable_size > minimum_rep_movsb_threshold) rep_movsb_threshold = tunable_size; - /* NB: The default value of the x86_rep_stosb_threshold tunable is the - same as the default value of __x86_rep_stosb_threshold and the - minimum value is fixed. */ - rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, - long int, NULL); + /* For AMD Zen3+ architecture, the performance of the vectorized loop is + slightly better than ERMS. */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_stosb_threshold = SIZE_MAX; + + if (TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* NB: The default value of the x86_rep_stosb_threshold tunable is the + same as the default value of __x86_rep_stosb_threshold and the + minimum value is fixed. */ + rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, + long int, NULL); TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); From patchwork Tue Feb 6 17:43:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 770402 Delivered-To: patch@linaro.org Received: by 2002:adf:a40c:0:b0:33b:4b49:db74 with SMTP id d12csp79602wra; Tue, 6 Feb 2024 09:44:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IGSDKk5kx8U7CWIb6G4VJtXnWUyAw1EZasjfrnYHWVOafCv9/RYfRRKpruL+fle895dk1+O X-Received: by 2002:a0c:f2c5:0:b0:68c:5e33:dc9 with SMTP id c5-20020a0cf2c5000000b0068c5e330dc9mr3301613qvm.34.1707241476038; Tue, 06 Feb 2024 09:44:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707241476; cv=pass; d=google.com; s=arc-20160816; b=bjFTQY3Lq8kSQa89CXdY5WRXT+jNgjBwWuzWqaEcdid7yIQkeHkS8XY9rZVYN8XnMW lZkx/jbgSm8WZxteKUB6WrDQL9zMld2TejviKfUDYgeGAIkbpF5sG9D8G5rjKy1ww6eO FlIYQxMKkVVYhySvsTblxPm2aZ7om+qgtl7I8WGq0xnVG1orGMfuWgu8+CihNAZFtXpJ tl8O+n0uMJQ0jfgIrmiyj6UUXjD+PB83B56oisRcFfttmVAGNNA3xtjNCXhugYm8wT83 a3L3QGjkejPbfZPDCkMIavCpoddfYQZPGk6V1KpEVjLvAfE7vrDA+8Iqr2u3ixYnS4WW 6AiQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; fh=BpgRoqVkeOPFxGqrkA1sYjkoUXA5H8sRd8TAv3fqrik=; b=PNffVXBRdgc68LLArTGjBWw7PeuLHop3rMGFuyjJwF9vhD6AfNnOMc38q2+JDe+Y3Y 9UyqR2g89mhYMpGcfnsCk2jF58j5mBfp75RLF0ptFyoZDuq7GyhHFy264v5kHuVmBmoy O0c1LEqZKjVkeOhbsXVfPGAgcvB3EdKVG5Heg4qpxH3WNBm525/8T7SXA0GVFNj3K/xW AUdi3FzHRAdwQklrvtNjr9e57g1/E7jSbmABsJgcIaduvEf84cFa/yRmPA5z2JWzORM+ HIav4nW7tj0teoITB4++Mh1idJrRkk0mN/EtFFHPn/ok2I3Z+25vrZCc6vSZl+HIm5MO 7eqg==; darn=linaro.org ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KCb7+ifK; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=1; AJvYcCUeNGeYPI/bzE4VISrg4QetjpPVRnO1uQeRSJyt9IbElkLezP9E9bGb0p+75p36c8gJJXZCRrVtxFy6+NOgc55a Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i3-20020a0cf943000000b0068c8a605eb5si2798518qvo.96.2024.02.06.09.44.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:44:36 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KCb7+ifK; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A5F9A38582B5 for ; Tue, 6 Feb 2024 17:44:35 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by sourceware.org (Postfix) with ESMTPS id 7ECC03858418 for ; Tue, 6 Feb 2024 17:43:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7ECC03858418 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7ECC03858418 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241421; cv=none; b=j3fZVAShQsM1CNNc2G/fU0AbheeTIIJO2ulM9BrfYL6xO5UKF0aCuIC+l9efx5jgGWMFXPgrZFcyli+2P9lQnkpHctrdRAjM6hY5OA1wFKoNVoOT63l9ljIQDPaWPM81U1feJlrEAIRIXHLd5oq6VFgFYgKi8xktXIxYDSJf8yo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707241421; c=relaxed/simple; bh=QybbG1EfGsJYvEBRrGGIS59qa0NkKuxbzLTdVztndRo=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=qM5/SmfwtMW0Lnm5X9pVIUtXsMMv7lucSaLh6CMJ+nGr5LGYOuE0c/8TWENj77SAjM4xJUfuzNHP3TM6zCs/mMNT2DUssxmIQmGRryeNZ58NSIbNqi7RsWucb7u3LztIhkjwlzNTTQCIn7P9yotf9IL7xmC/QvzfjpnXhb3uxt0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-6e053b272b0so1018176b3a.1 for ; Tue, 06 Feb 2024 09:43:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707241416; x=1707846216; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=KCb7+ifKFH1gn6GXqPYUYkVp1tn9qbrIehL+YHmDmYtDt73Ms/eJgelESV/DFrOva+ lKEz1ergyZ4Pl26X//yoDcQC2u/YNvNQrj2yOTWPDqGBJ6NbvT18VN3czoH14hi+KtCH eO3F5lGiqmCCimsv/qrUVyDo4vH7MgcUrbMWvZ+6xwWZW1ZRNRbnksATK90B9q6KkDJT 2665oNF8wsl1rhIlhfXAvjFhn2pRcLWXL0sPXI+M6J4WZlQ5zoLS38nGKDTY3eAOwFQF I8c91J5713KFnpaOCLT8aX4u2VH2KKdekarQKlqAWvjYh8SO67uQLWUscipvkSlLMXIP wzjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707241416; x=1707846216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=f+CcQyHh1mPjhehs1XZBwkIzsgELcaAuQa/waaK3q9Y5CU0eo2500727OGa/VcEJgO XT/LZRFjvhOD36RSo94KwINSeTAK4I0s/LZcTwvO2vXEkZC1LdruA19pz1Y2RAw8lJub vLhXzuBnVLK8VPkiYnn3hfnU/mxs8IGxIlF39GRJqWS3yUJt0KyDONZbrSAigNQKi7dx kuX/yutbo+fxEzccqHxE4KiMuUbBD3Cv39VAK8IPPYZ/4Lnrphw3x1riK3emaleWNfgz FiKKP4qTHPBfXFZ7hRByfY1sKCw1OuK71ti3zL3QmRuHPeCJVyPINo0hm9z5mInz1Uui EJjg== X-Gm-Message-State: AOJu0YxWmH6FOA1c5R+5v4caRlYdUgYM5l3lhOUtFqsHdYlbeoUzv0k+ 4T00zFYqwj2pLgYgNA11QHM0o4rb0rqYtc80NDKJUYedO+mg/zF5e1FtLJjapZOd3l/PBMhasFZ I X-Received: by 2002:a05:6a00:1e16:b0:6d9:bc39:e5ac with SMTP id gx22-20020a056a001e1600b006d9bc39e5acmr240273pfb.6.1707241416092; Tue, 06 Feb 2024 09:43:36 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCVAxAASQ3ZbLLoTC3JtX07y/rw0RU9Odr/DZCoEOjdDdztK9mQxfEUr+/s1w7Hk9HTJZ7MlmtaYrN8XjU4EJq4S+uBqNy2Xi8/1S0uAgpQ13xa4ZUkWdzFt3Hmhu4SBucB7tuDRm9EkMDwLhMJ5txNXEZ8Doxd1vqWkzi4jYXJi53TaaM2wbwPy/Q== Received: from mandiga.. ([2804:1b3:a7c0:378:b5ab:9c4b:bdc3:2870]) by smtp.gmail.com with ESMTPSA id d22-20020aa78696000000b006e04a659ed6sm2248598pfo.67.2024.02.06.09.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 09:43:35 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v2 3/3] x86: Expand the comment on when REP STOSB is used on memset Date: Tue, 6 Feb 2024 14:43:22 -0300 Message-Id: <20240206174322.2317679-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> References: <20240206174322.2317679-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 9984c3ca0f..97839a2248 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -21,7 +21,9 @@ 2. If size is less than VEC, use integer register stores. 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with + 5. On machines ERMS feature, if size is greater or equal than + __x86_rep_stosb_threshold then REP STOSB will be used. + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with 4 VEC stores and store 4 * VEC at a time until done. */ #include