From patchwork Wed Apr 24 22:56:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791543 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093188wrq; Wed, 24 Apr 2024 15:58:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU0pgriy+8IPlyiCGM3wp8wkcD1KH7YjRzd/55gQlLGmYJgTOJ4+mgedLPlKXiR3Ce0k7qe4sCy4ZU7iA5w1KNe X-Google-Smtp-Source: AGHT+IGQD+rvYQx1t5I42IFVhdGrU5VHbkaOH6NGB4Q2MWf5vI1h6FkQIxuQzGw8P5lKANM8yX/c X-Received: by 2002:a05:620a:2482:b0:78e:ef2c:6e with SMTP id i2-20020a05620a248200b0078eef2c006emr4861781qkn.32.1713999495268; Wed, 24 Apr 2024 15:58:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999495; cv=none; d=google.com; s=arc-20160816; b=SPQNT9JnjDtUP/HqZ3Qx7HsiBReew6Q7OoRryfTMnJds3NEiwbUboVrfDzH64VfbpX AQBpmGQ9smB+CRF/d9QSEzWPzzt17OLbLc6g4OZ4NkZh8WEmStD10QG3nvgqJlO7Hxbx VkNZhTIGmOKk+rX6eHAWR332a/0qAnBTZjrR2vBBwJNDZNLccZ9VQ+PEV/XnbnW5m9Lr m1mVMNrykS5N3paS+fe92r2OPgynpBebjmoIh4OGYevTHgseGmTlv9NK0Oo5sJngEb3a LvfFE+gPDFIjTTPLlckfyQXl4b6o8MrH/kpd33xbZO9TL0VQtIw+TZpb0fz7gu3hc8EA RPpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=gVo9BMNCMLKXYCZcgDeaOxZJRVmqE56TtnzR686rkvMZwWIPK6YKD15JW/HLONzyyY QjmmpCEAXsY4tx/Z0Z5vZKntaRgW45OJFR7OKwDjDXn2X8alUZ3NgB2D54UYO5+8PQPv HK7k1DUTWwoBAxzPDV4GShHOGKcSLnCZahHSABieovmPm0UDzRMOeSKZVoMVpqKT3Bij lUvgXvxVlpPxYmXnI1DwVfhimsfGeff7+/HwqnF0kkLycoZbmpgff/lrnzn2cF7FgMZ0 jmXo34lAErVTIwFmaNbrvplp4FxftXst2k3w6Sqe1B4DA1MdVaYawJUXBOo9/DXZDvoc pJEA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=stAeXYE6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id p9-20020ae9f309000000b0078ebaca480fsi15874288qkg.627.2024.04.24.15.58.15 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:15 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=stAeXYE6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlY9-0005NZ-NS; Wed, 24 Apr 2024 18:57:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlY7-0005Ma-Kc for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:11 -0400 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlY5-0001lh-J0 for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:11 -0400 Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-6f28bb6d747so384014b3a.3 for ; Wed, 24 Apr 2024 15:57:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999428; x=1714604228; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=stAeXYE6HUEL8RLD/MH+HDjoeoEVjei5ie+KIYo7hz7GiytyMxfjHBmGsmHR/IHdmR Nx/FJ+u3ZBPPbAgls1UqJ9jjJ48PwLD/Wboseku0a5ZJ36EXT7V0hneUbykeVrvD5RX6 Mqt1ZXDCjOEuXN0aLR8UaFphiNBIynuL8/dsOqka+cjbur7jOBVBgoF4zVe4lFDUpGfq yW4jV22bxOgroblHZyC7YeE53NVkQtq7NLJkXQV3QBegg5Yc6XVFqz9ybvAr2y05AgR2 ohpbo+9j8q7k+ABeQgHeA83dFhn5pjOlMGEYTR9nDgkWzrfF4j825WBE5x5kGTp5QrKp CWew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999428; x=1714604228; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=AdBtv3xgIqdZcSzgeeH/tLCEblIE3ARsPs5rmSZbij9+sFeIoRLfnM7UgGA60A1qCh SZLnb4ly0QAqHo0r1Mv1jhN71Z6C08e3xoAheTI7dUtVMKIWOc4THtDp3fQ8TToEnRpX vBeai8oa1TQJsQEUWMs9bIlNSS35X5Y3w+iyG2MVN7d5lI0wqAadFdkONp8PXxCCPSbx qb8Y7Q3WWI0uoPV1j4RKqiNTEuJYIikdaynkHX4r/M1jN6n43m3IvwEQe8SJBug5i0Ld PW0PrL5fh1Z5yM0xN/37mKSv+tUrAaxLO2ukSA2OYSO2+/rENd0TvDP1Zm1Aj+304BYL ajgQ== X-Gm-Message-State: AOJu0YwGV1lpefwe8zr+beD93rEsK2OljKdMI8DK8QCOUAIdQq3n3+lE h+89OHe28SI2JymkGh/hf5fxDck55DHqKqYyEEQFlHS9/xhflqTgCJbQ7CJ/g6hZ7P8c7N08FLn h X-Received: by 2002:a05:6a00:893:b0:6ed:caf6:6e54 with SMTP id q19-20020a056a00089300b006edcaf66e54mr4665877pfj.28.1713999428081; Wed, 24 Apr 2024 15:57:08 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:07 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PATCH v6 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Wed, 24 Apr 2024 15:56:56 -0700 Message-Id: <20240424225705.929812-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Wed Apr 24 22:56:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791549 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093288wrq; Wed, 24 Apr 2024 15:58:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUCVFhFP1c68cF0VknHEMtCJrQp1VynXsKVHMRfZg7pFj5sM+F/Gcu07ULoQQDS3vrsHKP61+7tKq+PyCrjsf89 X-Google-Smtp-Source: AGHT+IE766OHDuO2mgYED4Wt0uRCeldaWJZYeBVqem2gU2JUAS4cTAC+EZqU4k2b9yKnw/WdcNg+ X-Received: by 2002:ad4:4baf:0:b0:6a0:93c6:140e with SMTP id i15-20020ad44baf000000b006a093c6140emr1938522qvw.17.1713999520194; Wed, 24 Apr 2024 15:58:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999520; cv=none; d=google.com; s=arc-20160816; b=UHss55PijqzVW/8TKV/7XvAsDHKti/ZGRMcrckCz8QGPeW2Q+mcLQqFL7MsVAaOeDM 5n7tRaD1lsA9UhQZAzh+DYltXYxLx9DwaWGG01dbfx26K7YgfCE2MKSl98nuSSWp2nYa Jxc0qqxS/UuWjL6Nm0UCmRPKeFJE+Xc6QyCX4HciUdHT8/8oz8/36KlGUAaUx+Tn9csk QUUf3osAUwToPzoQdEt3GcMtz5grzUXPCohxwW1mEepX2axmhn5ozgV4/IiB7jMwwAN6 DRfqKtVjurZR9kbJ8DaEfAmV3fgfSNk89rENMiOWwfkKmhSRno8TuLwJXYnGXHYkObxa EplA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=lpn3Dm8NN4oYwVgUomo3bT3m4acOsS2FP4pvwCbUdlXFJTAdVvcZFptiovdDZulUuP 7pZvKtAXRFi4v1at0TyBYVANmI6Q7X1ryt5vqmuLP+HdNCH5YjfD/FLx3jorESGzNiHe XYE5clBwNWm+mibS1i9qx/T8jLNqOLkXPJEzfxt8vU+5SQtHv7lHqaeky0px6WrgFaB5 m6vw5JvXfNh6DdW3SKNCcj5tFQ+BJPrW6V3qufOGAeURG4li/ncbYQXhsZ60oOFuIlwL obyhvEB2SiGhPwotydXFAHOtP5UjHABOzYC0ONhzZKSlNkxgMYIkOJPsByQfwyZNe5wl B3PQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EkzqbIs1; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id kd7-20020a056214400700b006a09fce11casi58106qvb.579.2024.04.24.15.58.40 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:40 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EkzqbIs1; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYA-0005Nz-PH; Wed, 24 Apr 2024 18:57:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlY8-0005N7-SG for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:12 -0400 Received: from mail-pf1-x432.google.com ([2607:f8b0:4864:20::432]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlY6-0001lm-OJ for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:12 -0400 Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6ee12766586so385432b3a.0 for ; Wed, 24 Apr 2024 15:57:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999429; x=1714604229; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=EkzqbIs1QpxUNxSBX+DIvfTyj+AW5b4wyB5AC9aCD+AGm5Vq7uFtpaY8ymDmZPISxH n//9vISUaPvSMyOpYXqkbHRDun4+7yi9DYLRGFGVtWABhmxDaJkY1wuwF1LXt5HgJfGu eDRUnLIOR45pohQwwSYrXozAMmuONGbRGllZWpQWYOaoq765AqeXm1S9OzRT4VDXi3GR M6rG/o1xxE6n8gGyLSY40O+MEbTg4LHHfOxQsyda7y0pzPGEU+auVrcB1uw5DzDOqYum nhAFXrqbcYGeslDTwJssCbc23VLel/ekXN8Di440eAhAykxfsdUk7FE3Z1bjA/ptrKkp edHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999429; x=1714604229; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=rcWd4j2FbHdGj7UdoqgxkRwoyFzqk5YeppuhnmFQxC0lo5r8kOtEVDHgjQcYT6QLgz 1RRuD7NK/q8mHZ1u1LwuAWU9wVp2AfdiX2DZ+WRtoiiL+xK+yhjcsAPVrwE2/nXDxjo4 HoOhknOTdfZw8IWmS7e29yimDH8fkr+HjcTF/DtkFjEyi4ke9/IQ+6ngfOpJioRxYEQk sVGgDmLPZSaFPyK4xaYaBidncijrOouSwV9CieVDelBGjYUk20Qc1nvNLG5Kqe0cV4qr HO+rb5Z09PgrZ7AmRTvz9eJQZ9aRhR5g45Zi4CnP4pdK79fuUXwpIeUAmSTh4mND0GCS YHUg== X-Gm-Message-State: AOJu0YzYUji6IUNVfk6Iu71wvdfLmh2jP+SkzqOspMYuCFvf05jGiru5 iseVSKKUpfB0VpwAHFPmueOcKD3ohP7CoefKzHfyfRBkd6LtaXkXDQIhpzoNNC2ZCssxTrZI3lN u X-Received: by 2002:a05:6a00:992:b0:6ea:f3fb:26fe with SMTP id u18-20020a056a00099200b006eaf3fb26femr1698891pfg.12.1713999429116; Wed, 24 Apr 2024 15:57:09 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:08 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant Date: Wed, 24 Apr 2024 15:56:57 -0700 Message-Id: <20240424225705.929812-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::432; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x432.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Wed Apr 24 22:56:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791545 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093191wrq; Wed, 24 Apr 2024 15:58:15 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWK/uknvkQmhhXEq7vPb50AqG7oBv9yWIeP3FhtXscb+zkTd64mwfMRjol5/DYBbdnFCfnfk4QkWBLnoS+VahS8 X-Google-Smtp-Source: AGHT+IFWzsd4Ox586SLgxlQExxsOtDgvh9BwNbAOR777eikg+yOM1JYZvMKe/uefd4vd0cLZ7/CJ X-Received: by 2002:a05:6214:27e5:b0:6a0:9b70:8966 with SMTP id jt5-20020a05621427e500b006a09b708966mr1910150qvb.38.1713999495700; Wed, 24 Apr 2024 15:58:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999495; cv=none; d=google.com; s=arc-20160816; b=PwmLYof4fBrCh3QYpPVIPRO6Ersd2X9uRGcJeLL4hc8196ht4fbLgP7t23av2RWR49 BsywfUHdIx9Ka/gWzcIyWIIhnX/Zabqs+U7vZk5a0w6RgaeJgIw93NcZIN8t97KohUTZ NbqB96Oznl7UAG65Pf3E2qWlpk3gh+oU2T4pIh0l/wL64yPurPzGc4vp3kI4e+IYGo0j popY2Dz7+5IfISyva0syERlYpsItP5cmj6y++qh2/Zff2/ktj4i+82SJDd7LDfVnVoVZ anWOClv67gzWWuxfPMHUKwJYPAohWvziiOTh4++aqWpF8bI5gOHbLzkrywFRInN2tg3j eFeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=GjALNo2Q+V25vvqcVMT4me0iygR668GN7NDsuUhSj2XiINaoDnaozqWU6HnoO9I09U y0NmDJqD+6PZumzfyW54MCLFC+TiHtDNMaSomHN0TJUVAQciJLoJ9upAu7NDdDjHoKLw O9M150EBO/pMKlgI4GlFOSWyOCXfmM1nWIP39Xq7VPIQbSlfS+euAk84thNtX0Gno0kD iEJtqfcuMU/PAtBnpp0fZUa/fS2ummIaKCNVDjAEyk3fdl/UiKIr09XgpstbBMmCc6ho Ewk9113/3TXLYvnyncwEG0sZOOsiloTDeS0mIDzJ/SDjTQcE2CNTLo7RA9vdrf8pF6uz vv8A==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=i7AOEhjG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id o10-20020ad45c8a000000b0069b63f86310si16331380qvh.127.2024.04.24.15.58.15 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:15 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=i7AOEhjG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYJ-0005PF-7v; Wed, 24 Apr 2024 18:57:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYG-0005Ov-Rw for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:20 -0400 Received: from mail-oa1-x33.google.com ([2001:4860:4864:20::33]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYE-0001mp-UK for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:20 -0400 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-2351c03438cso206049fac.2 for ; Wed, 24 Apr 2024 15:57:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999438; x=1714604238; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=i7AOEhjGGugck17CHRV6r+vpb11b/6JZo/IQlW+BDq7tMc1Rbk1ieINquMWbKduXcQ /1eu3isyp3F2nqGv4DNK9Lfu5813cJ9ChPJ/FkpefdMlcGuqHkLi5tXP+jE47PP9bmTN 3pwVy+NoNu5E3j8Pv9iSLpTeiIc5LTZJt4HL7ueeOUUo56KoAzHuaybEef/CKPbrnpfb FoO7YHzuL5uPi1hQcJUDuhbDSVNxtF9KzwqyE7SWl0fnP3XPW7cmukY0YhKOWN+3n0+C 0eA9NNvSX2s+N0RJ31ptjAw9XvZ5pxFgQb/EH5tsq3CllQDGE8KChN4xSUiJLCQQp4jk 53LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999438; x=1714604238; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=vD2GQ7Yb+twbHtkr73J8jWMsqW8RHyvsHbx1gRIVPY7Gy1ZWcxhevyTvzHxJbSMawU 1B0XVPdPL2yiIXnMopqekbeJBcHByE9zaUZ9GT5xooJe4rNgEee+Z7v3GyP7WQXDUxa2 KBkuTjpH0zKt7rwO1L25FKFbSeloOn9W8/bUush2xpuUClvwiNcE4Vk0tPTELmUTRszX 40H1G8nl84FeROfNGAJwCYzNokObXxoLnALXIY3hSVE6FrLPFkP+sXe9AP2fLUHPKXGp ELj1/Vrsx19QtOzFNpCgQeWCP7hFeAcqPYebiCiv0lwduNbCh9oKY/9XIb8UqXESucq1 eXAA== X-Gm-Message-State: AOJu0Yw8hgCjuvDjvhyFjg0OSwa4Dr6maiKaiHMdstzxKLKQC3ykicGp Yf+QNnNXJBMnXUwj4ui4tsXuqVeaATZZTVUmF+lrRFROXGQJOEF8QrjeylCLPcNlGQs8yH4dbke C X-Received: by 2002:a05:6870:d29c:b0:221:793a:3b9a with SMTP id d28-20020a056870d29c00b00221793a3b9amr4650594oae.40.1713999430087; Wed, 24 Apr 2024 15:57:10 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:09 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PATCH v6 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Wed, 24 Apr 2024 15:56:58 -0700 Message-Id: <20240424225705.929812-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::33; envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x33.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p; move the indirect call out of line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 32 ++++++++++++++++- util/bufferiszero.c | 84 +++++++++++++++++-------------------------- 2 files changed, 63 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..741dade7cf 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + /* + * For any reasonably sized buffer, these three samples come from + * three different cachelines. In qemu-img usage, we find that + * each byte eliminates more than half of all buffer testing. + * It is therefore critical to performance that the byte tests + * short-circuit, so that we do not pull in additional cache lines. + * Do not "optimize" this to !(a | b | c). + */ + return !buf[0] && !buf[len - 1] && !buf[len / 2]; +} + +#ifdef __OPTIMIZE__ +static inline bool buffer_is_zero(const void *buf, size_t len) +{ + return (__builtin_constant_p(len) && len >= 256 + ? buffer_is_zero_sample3(buf, len) && + buffer_is_zero_ge256(buf, len) + : buffer_is_zero_ool(buf, len)); +} +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..972f394cbd 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_accel(buf, len); } From patchwork Wed Apr 24 22:56:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791551 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093349wrq; Wed, 24 Apr 2024 15:58:54 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUQ1Oi6ZR0a48eb795EYyUTHYZLeGDRp9ZdgbI+5tjcGaodAaKixBCKEZuAGkMGuKn3gP5nHalHDmlMbnOLpLoS X-Google-Smtp-Source: AGHT+IGneJink9fDmmACGtWvjQsfLbRsl1uUwAO8sXrmso6u1b9FjiNGTKLUT3uUObBYX2rp3uVs X-Received: by 2002:a05:6214:1310:b0:69c:b559:547d with SMTP id pn16-20020a056214131000b0069cb559547dmr2206110qvb.25.1713999533932; Wed, 24 Apr 2024 15:58:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999533; cv=none; d=google.com; s=arc-20160816; b=qM0aWa1mYlImYCYVF2Z0zPFYCQDFG8AKKfwd2yJy0zgZqGshWRGdTVkpL68RidKfs+ EVCYOmWj4qMyEGP7tZtvWV0KM3gLSx/7PhuO8ee3jZOVqHqLR9rS25nMGkLDLLcpXvI2 tgdYhMtoi2gJ2sxJMa98SkoRlQMETDlaJKMfesDC4meQ3i8kO13zTauNjfxRu4avlO80 kkGpOVG1YRrcCWs/53BTKE4LlQaw0fOV8YzZSY7ecj+SVQq4kgGCp590dWAqSk5x0Iq7 K7ufoeqWCjGStcJ1w+b0bt8snmiLyLE176i8rejMu2fGfvCRg5Q3jiR6mloflWBNaFPK 5wig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=HfUFpvZ9eiqqZuS/IN74BqKgLwqumFIXSE8UY5w+Ej+ohMjR8dInM83Z2Fa/6+2oOr +toFEwJYE+U0l/SqQfIiOg2BlR041gVNLQIEAFVepBTOeflf5Lin9KKg4lcNk+svYRLt vAg6dNu5OuQYOi96KSFSTc/dtYsTcX8YlAevLCzPxI9rBNBYPMUfvShkNgXy8T5LsAlt iO1ZVjfdGvMogfhp9nEohLmt2F/gIOGTWNxG9UUyUUK5CzcMXQCillDj4JZ18I5R1lq8 GI/ZgHTbYbe/5sbqcl7FjvnLz8SV7ZRCXPqs4DbxEMDqsmAHSNVh4o2Xs+3r7+DswWc7 jlyA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dIkgolER; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id e8-20020a0562140d8800b0069b4c0a5c9bsi15947284qve.330.2024.04.24.15.58.53 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:53 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dIkgolER; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYP-0005TV-3l; Wed, 24 Apr 2024 18:57:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYI-0005PB-OE for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:23 -0400 Received: from mail-pf1-x42d.google.com ([2607:f8b0:4864:20::42d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYF-0001mu-Uk for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:21 -0400 Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-6f0aeee172dso308397b3a.1 for ; Wed, 24 Apr 2024 15:57:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999438; x=1714604238; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=dIkgolERSWYXhsFzfxo3/NE4n7/13xtTcWddkyqqpEeUF/kIAy1WPT3YZmjDbBi7JW jcuicdnEO55OeRrkYBZ746HmOR05Yx8PVrZ7CPjZarrfIlPn3SZvciBYBRLa8BNslgdL nInt77x7Syx9pD9hq5Ej06yYXvCsgx4A0PkWNiYYgWf5CwGms7cYUkM18xhs70tUPmx3 oHQNKSM577JHubEVHshpn+u3i1vBCAZ4nkPy/VPx97cUbWiSf3RaRMqdcj4QgbhuqeXV tR8ftzQNqh753MroFVwPBxxFjSGFSiJvNJkSF7tg5F/5l3RO3IUh6hG84r8GFqHeve31 c9JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999438; x=1714604238; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=F/f1AzC4ze12+Q4srNmvWFeoslxAsMBcDtzGyfK6TCx65FIIH49uExv3lNCwyhXnwM CrD/0M5AYFPr0qsK5D8fwOrMVKlV87p4rBkE4V3XwMs21E3oBHlY7fqY67m5iZ+flVoK NK7ZTOPxdagB6q+yWEzLgIhupUX11/vlN3joVbfy+myOAf2ES6xvgDA1oSvHtebgjTPA yYwGbfBcRc4P8wLsTuv799S7SNnFp6zE1XEQkCwnA2B0e9jUIB281nCgT1aKjBDVCLdP t3qN087/N6GJoFqBtbVXBFEDH+7PqQbgjWmPO0ONMeKdf6JcDGNQ2d6iBs1oKUSzryht kFVA== X-Gm-Message-State: AOJu0Yy0LPLKox/hfMV7IPSo2kly1XIYZDh8boWhS16eL1seKIvHIVSe nrKOtNfvdUfFYLy0K8aZEybiRAPZGfFgl1F97oHHoGHTIe64PeWqW1l8fV0HeIGFQM0+HNfDNO0 l X-Received: by 2002:a05:6a00:a86:b0:6ed:2f0d:8d73 with SMTP id b6-20020a056a000a8600b006ed2f0d8d73mr1623257pfl.3.1713999438469; Wed, 24 Apr 2024 15:57:18 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:18 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PATCH v6 04/10] util/bufferiszero: Remove useless prefetches Date: Wed, 24 Apr 2024 15:56:59 -0700 Message-Id: <20240424225705.929812-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42d; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 972f394cbd..00118d649e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Wed Apr 24 22:57:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791542 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093182wrq; Wed, 24 Apr 2024 15:58:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVdFmoe0pvonjx2KBsAfWEpe0WggLkKOFDC2daf8jV0LisQc93kYaXj0+MdsOGuJh2TukmuRMouDpqQq5qTzoKJ X-Google-Smtp-Source: AGHT+IFv+QDgwYWEMX4tGMWbeY5KkPIMLZYKiItdaVfNlqQHFKmIavMVN4QDjEut9Hy2UU+zMWcI X-Received: by 2002:a05:6122:4686:b0:4d4:3621:b245 with SMTP id di6-20020a056122468600b004d43621b245mr4692828vkb.16.1713999494694; Wed, 24 Apr 2024 15:58:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999494; cv=none; d=google.com; s=arc-20160816; b=CnAs3W1APM6rPUf3sVrwn+9EL2UlPJrQHOUgiy2ThMZ8XQCgFyhJDhicmOY6maKFt2 wvutqPn7PGCJT3SDHH+Bwb8KT0r4tiE7HtZzd4ydn862QmemTwlyMWKI/2CUmgkNqmiV B4gJopRRu1/8EM1esRpiwmNBzGvXpTFqKoRfVzjfGCOe6DfMjOJ+StPUJWjF1TTvrrgi 2FQeNvwWM/ke59y304F320twIaz1ULicxjxijstVMcuk4Df3yfELJygAeHgxfJybiPH3 PV8ZYp3NPbrqqT8s78AyDQ/WObKWm5tqssIP+lllTaEnseptUl1yPmpB59iz9EKxqkmQ 4lOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=Pl3QDiGZWDvQp5qkpHLLwWNUPMbmu7RuzjP5fAlnF8xXhYiuRmfDel/qv4Hb59uZey w7bNBRdTqDVvL8SgAJKCLVjS+jT/pM2ELCD8unuF84ggyMpesLoCE2rUdNYKvfKop3CU SirU00mqBw6B/B5QHE03VY7jUoPiJrnOf4NsI9IhDoRDsfq9SxlJcJlsqlWBhSjF60F0 ZZzfPRC7LoW4kmlRZuUhd8QmTY2/Gg6SVwm33pk8ZUqYRFBmYe4GSN/6f9pqaD0PklJ9 90uf6iYv1wLjLF/wo7PpHm9swQvWwmR44ByxPicL9jUIZRxXdWtH0NYrsqea2HRL8Vm0 i1Zw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=E1eKpOVR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id pj16-20020a05620a1d9000b00789ed541240si15999498qkn.330.2024.04.24.15.58.14 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:14 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=E1eKpOVR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYL-0005Pu-8n; Wed, 24 Apr 2024 18:57:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYI-0005PA-O9 for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:23 -0400 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYG-0001oR-Ub for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:22 -0400 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-6eddff25e4eso379884b3a.3 for ; Wed, 24 Apr 2024 15:57:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999439; x=1714604239; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=E1eKpOVRBOgDaPJmaOIbooYloL7Yg2+lbDt5J+at04bIdYSRz4MA3dCRnNrW2dPt2W xiVykh+eCL/AKSUQJbDr8kHy+lsOExJEHI8ipTKEJbvr0FSW+RmnAHTcYAc/rUGT85hi 41t2ehynlY+8A9rYqxLJSijdIx7zh0JYE1SEr68Y30kc0RcE/vyBDnK06q5OrRex2g3R LZhlGjBWSyujOUUUgfS9YzF7zjBMW4QYh+Oo18dZHLDcqTMn2x4jTWFaN+1VyMD62Zgm X4e7zmpvfr4ZlAmlD2Kyfix5lSsM8QZUsCRSLfPH/RmY7nvyjQ1tGwPDAgNC3zrfp5Hh Yosg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999439; x=1714604239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=amDSUbjA6Ll37fMApMWTH0ellNVRwk+dZ2NOQYfvOMFYnWZD4c0wdSQ/FG2F8QBXT8 MgFdFvzDzZNC3CzGFofOoX9BXkPH3KiZmxbaKQ92aKs2hacXqSYVrLh7FOCw6M+c5WJG fgAJ9FPPJXuEwdjnwsO4R7zeE8yY8cc/wUv0RLabTk62X3An9uCqSR6+9KJKg8V0L8XH L0r5VHU7iYBxwDc8Gwb4272GDaxmncThkSB2DtqVckT0ZfogJBJsMTgCXoaDE2jKTKvi T9L3YvNAQl+VdLyh0VtufGKg7kV72mRNpsGKbP30FEXdJI6gdUYcmbi8ykgKR4GcsEGO ipjw== X-Gm-Message-State: AOJu0Yz5521L2HPqyZHSM11pJPGklV1s4DO5zD5C78OD5hFbu/U13RJx NTUZquSn9Jg31SrHScg5dTmJvQw1luxHRzEW7Bm7alLyEJYXuRPqU5NFMU7N8R6d4/7CL8Yeez3 8 X-Received: by 2002:a05:6a20:5650:b0:1ac:e0fa:fb24 with SMTP id is16-20020a056a20565000b001ace0fafb24mr3721458pzc.29.1713999439466; Wed, 24 Apr 2024 15:57:19 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:19 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PATCH v6 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Wed, 24 Apr 2024 15:57:00 -0700 Message-Id: <20240424225705.929812-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42e; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 00118d649e..02df82b4ff 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Wed Apr 24 22:57:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791552 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093351wrq; Wed, 24 Apr 2024 15:58:55 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUcMKtGvzmk536P3R3BU5zZ097kJVr+YbqiPlnU0ilYx2JnaqlidJzrullO5fjiTrWUBNWawN7feLbtZavI8+u/ X-Google-Smtp-Source: AGHT+IF3pqvGaZ1jlqXDLjFMv/h0sMqeH7i30rgbnQXTslL/2j2G05nbj3O3g3bsadyun68vdPPW X-Received: by 2002:a05:620a:4803:b0:790:98c8:a765 with SMTP id eb3-20020a05620a480300b0079098c8a765mr1511739qkb.1.1713999534851; Wed, 24 Apr 2024 15:58:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999534; cv=none; d=google.com; s=arc-20160816; b=zKS5UoXU4tKjlAUbk4McqW7zWe7B78fxzW5nC3OLuD6qInP8ipYcaA6DYcMzkF7ayO /pLJ5/57b1gd3kUIvaTrA+1P8ibPkqw+6Qs+q1qUa9Jvi7Hyagp/HUmi1HpwPtYraWZj MCbtmWifXXFyYGnEkfl+sp72tm3ngdQNQX7/BmM7BU4EzUN2iUMb1Wo65705MqtfIfxC KOck06rM5i5Qigz+r7uXQmW7AxU+Kq6BpImcL8AViTGrZSz+zbavGO4dFEbzVJCFu4D8 eidGU04M7VNYVkGkooxRYACDCKvgG/cbKtBIxvO2qjMhBQgKLxG59kWOzd4+a8USyZcg Ja4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=m6Ty8XllMR28noTCDsk0N/pyr7ynqbZ4X/2NsrzAk7g=; fh=PnYt+qEB9tAfMKoqBm2xjKOFpYyFFGPudh5cVIoieJM=; b=EtRJats/jlHNAZqvwq8ByvFebnEJQ/2ioWLKAuQgdINPdVljEySAIBDqRso0K+63eo irEcv91qhG+3ISQIM3oUu6gEvWPozy8HpbzaJNBcuNVnBHPXcY4jzKjrHGhpp/3yy+dJ oxb1DUXReA8KxtX1FSZ9kFCNZLnkqR/w6WOSIJuD9yGNCkH2jOaIcVK4+SJD0Iw4TB7r yK/Wqtz3vchZ+Fn8M8JPEQ5n6T8b4j2CPyca05OJ5uBc1pxZfMsSXwlNpp8LwcYx3mlW 5f26pOcForJxJxejmgsFYA+kpXFBffO/2d8co+ujyYC7SXIhpV2sdbXbun5P2vIlQMgo K3lQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EIJF2JfF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id br18-20020a05620a461200b00790988f1978si759162qkb.528.2024.04.24.15.58.54 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:54 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EIJF2JfF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYN-0005Sb-9D; Wed, 24 Apr 2024 18:57:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYK-0005Pg-2X for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:24 -0400 Received: from mail-pg1-x535.google.com ([2607:f8b0:4864:20::535]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYH-0001oe-QB for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:23 -0400 Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-60585faa69fso307995a12.1 for ; Wed, 24 Apr 2024 15:57:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999440; x=1714604240; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=m6Ty8XllMR28noTCDsk0N/pyr7ynqbZ4X/2NsrzAk7g=; b=EIJF2JfFle4HshL+IB2q0BEng8M3pVv8dlpmDkzESnUYV9Xdw0mHZknY4Wtt28xBgB PMCL1SSlm4zJXxSCvaylY4neRB1pUidZRK0gTdp+O6bd6GGGm3hQlixzdvQMx1VQ27dm 0oTkWnMosRiXacHJVvChmjv4Ydhn44mXgCXDGilCMESKDp8IhNUG5W4UeMuYHz1SY0+I n6ajMcNSlOOfmudFITevwxfh/q2G9Mu4MiTZy6eE9r7KpK+Sfp2kmgt/lj1rz0QnjO/T MSIy6dkvtDfifhfdj13yHeGsraNw6z/hpKuDGryA8NMs+OaBwbKtDH0vvBF0/EPQbJ8q 5i3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999440; x=1714604240; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m6Ty8XllMR28noTCDsk0N/pyr7ynqbZ4X/2NsrzAk7g=; b=jeXPeDvGmytZwrXMjufH3+i5CcQ+b+tin+SUuC4rj9lFkdJiaJQ4WfFhkrP47AH9Lt ELBWQi+FwotJBd66ACHR5nFAzLYPaAj2RkvetcAsi8EZp9QXtIJ9o2tDwF7HLiIIFe6/ 3QwevKKLr3hdAtB/FbVJsRV41ncdSDohLYHUlwbvHOhu5TpMekUJfbcw9j5wrX1LUgwP 3QBrMvfbxVr7BHPYfkk0Pq2jMXySghzIuxcvXPFoiUtN1Hdsmkat6awFWWTPkFtcSgS+ o4cgzxDPT25vPUgjb51rJMKxqWzQqKL1SlIlTZ6U3AjIs56OmUh1+TEzuJ2+tW4jBN9k 7Xgw== X-Gm-Message-State: AOJu0Yz80XXeCasE9nosdeiyeOTaVoEB44PCtXH0tuhiEnQPen8Q607+ JD7CE29AQobkJkrKIbXEdFeLV75IYh99EaJr8rOWBO4bdy0zTnQowAgOfFyei+9brYKsf5TcYHM N X-Received: by 2002:a05:6a21:3989:b0:1a7:9ed1:fc21 with SMTP id ad9-20020a056a21398900b001a79ed1fc21mr1899489pzc.22.1713999440407; Wed, 24 Apr 2024 15:57:20 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:19 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v6 06/10] util/bufferiszero: Improve scalar variant Date: Wed, 24 Apr 2024 15:57:01 -0700 Message-Id: <20240424225705.929812-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::535; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x535.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 85 +++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 02df82b4ff..c9a7ded016 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,57 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Read 0 to 31 aligned words from the middle. */ + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Wed Apr 24 22:57:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791548 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093287wrq; Wed, 24 Apr 2024 15:58:40 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVj+6SVVFyIFEsYIBzju9leImYDfL8YHmlOEDzjKSWdiKKv9UCPfZsejHmI8tXy6T6qP/3qhgFxTePq/0gS0j3c X-Google-Smtp-Source: AGHT+IEWmPr9BgGtVLa1+7meIo0zODLWTeZodJrVECg3XiI1gi53J13CDDUDHYTSLnYO3yyzdLfL X-Received: by 2002:a05:690c:7489:b0:615:4e88:c036 with SMTP id jv9-20020a05690c748900b006154e88c036mr4292918ywb.41.1713999520079; Wed, 24 Apr 2024 15:58:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999520; cv=none; d=google.com; s=arc-20160816; b=iFa8CJGYlwhcLQTrREAGG0z5+bxw5oU2VEV7MSl7kHQJOhJjB9WPWd1OH4Mi2AA/xf KQx0oOk85fpqJKKaJfvul1mcyC8SnWfZe8WwZzyKl+iTonT9oJQpXmV34uA5AaNZglqG pGI1UZn+VVsnNqX/VP1vLcW8xPrDQH2SghlJi6Okrw0qUHp3T8CUyYjZV4TYXoGLLt02 lsc0Y8ZJ+rLXgQ1/9iO7cxQG+03Wowy1agDKvrz83xXM0p6ZrBf2JBmJb2Fme4OH2V2L +D/4KN3q2HViqgQejOxGo1JXSxitnseVljudsXOoVC4PJmdCdpLGuBgwpnzgCdwXfJk4 EqyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=bh4lWnPe+YOgpHGdevl5r/TyXWqM+m5cBerY0p6p+5U=; fh=PnYt+qEB9tAfMKoqBm2xjKOFpYyFFGPudh5cVIoieJM=; b=QlsshB2hmmqqEykMGQpNGinskqJZnL9q2NoZDCm87FAILif5SYO40wpxVEXpMAWTmb Lf7axO3GBSunLVW6RtJI/rLPsh4zFGbV970wStx7ssMfpnOPRfw2bGM6T+KH27RUQaLi nbk93+GqqybJEB+DiqLsEmURs0h0c4xwLjVDspB4gJaSb1NZcLHxU1/eoXQ+6QgQHeb0 QFQPP6SWf4j2LuJxBSIGOaJsqi7GKb6txiAsabavulnN9+QKu9O9Fxh6Gq7gLV/59Xwv 40mcuEj2cFU/B7XpdEmtILrYwLINS6rkqoiuAJBH78i1BYVXbSvfTcOq7k8PHBtEDJ2E FTNg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gHMWaM+q; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id dh5-20020a05622a4e0500b00434ecf9f456si14751063qtb.574.2024.04.24.15.58.39 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:40 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gHMWaM+q; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYP-0005Tf-8q; Wed, 24 Apr 2024 18:57:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYK-0005Ph-5f for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:24 -0400 Received: from mail-oi1-x235.google.com ([2607:f8b0:4864:20::235]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYI-0001oq-Pl for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:23 -0400 Received: by mail-oi1-x235.google.com with SMTP id 5614622812f47-3c730f599abso285706b6e.0 for ; Wed, 24 Apr 2024 15:57:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999441; x=1714604241; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=bh4lWnPe+YOgpHGdevl5r/TyXWqM+m5cBerY0p6p+5U=; b=gHMWaM+qXHvRo+uYZtJMNwdm5fiwh2UowvoULAI+ApwsTRr/J3f0bS1yeFDrzQaEw9 rGQo4Q1BqDXdCrWupVV6s0IeeiHP5lSSlHyP8S0l16I6cP4FbyFaKv6TPJM5kkKLNdBc f6IpUzqXhzBMznDUkU35kqhQuoVrjZpmcUli1cjF+ZSpwnhTGqB/lgpdibFl29/DxVRl KmhGoSMtrgN2+BxzdrBASccE55bgsI8UMb2Alu2FOReGDKsh0WIddO2zlTBcYOOdEbNV oqwyb9ur/yg6Wy6YknuvY8xkiscveJQ0IIXY5xofZGwrnHokvg7EAY/nat1l5D3IHu+P 7dUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999441; x=1714604241; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bh4lWnPe+YOgpHGdevl5r/TyXWqM+m5cBerY0p6p+5U=; b=vD+EEmp+kw/84gKPZuKVo/m/WZIj6E4xyRn3x59oPwC9fVgf9iM8dzAyZwS3VJOibL 91PZ7kCRxMOTFdhd11TVBWdQo3KeSZZh8MDcXq4+pNzlMQgUwZoj/wgQ9lEXtI2iMPG7 SmVYRZDKyIO5MviRk2AFdTEz/R30bSrWb/D+wtRPzVhqEWackuCV7Fbye0AlhT4Q0/eR vQZm6FMWwwGGNQIrKOTYXeQgYWlHerBQUftfOOAVwIs1FlELhXUSQtqVQwKSznHZHyLH 1A+cLWLHCA2cDRxDod1p8wJwtHyiS5xAG5Fa9oTUOlKBn22xsdhBmmGEUjvEfcZi4p1o OEuA== X-Gm-Message-State: AOJu0YxN5sFkMmu9zoRU21DKWtfz4EsE5j4jXKVitj8yR+x7OIAaWrEH yNpQiSjbAk71TunovrgLxZHvD/IUJxBIilkXv4/4vxcgTR4AbDgWiA/D0z6ww5S652ruvzoY/9N D X-Received: by 2002:a05:6808:b2c:b0:3c6:e81:4272 with SMTP id t12-20020a0568080b2c00b003c60e814272mr3851579oij.10.1713999441420; Wed, 24 Apr 2024 15:57:21 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:20 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v6 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Wed, 24 Apr 2024 15:57:02 -0700 Message-Id: <20240424225705.929812-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::235; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x235.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c9a7ded016..eb8030a3f0 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -178,13 +179,15 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ + + static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -231,7 +234,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; bool buffer_is_zero_ool(const void *buf, size_t len) { From patchwork Wed Apr 24 22:57:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791547 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093276wrq; Wed, 24 Apr 2024 15:58:38 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUftQVcL+1KGyyzDw7jQX2lS1c2iDb9HOEZGbYCjBiqYMhWF2qm6iZqd7SnzIxGAriSx6oaMc7+4nMaVc6wsfWE X-Google-Smtp-Source: AGHT+IHz57+wvQEVbXSVzw6nwOPvweHO1q7/p8MuNQb/FFtjjv5W/HSsjm1uEbVd3JlUpXwvepev X-Received: by 2002:a05:622a:5e87:b0:439:f5f1:2d78 with SMTP id er7-20020a05622a5e8700b00439f5f12d78mr4771157qtb.25.1713999517884; Wed, 24 Apr 2024 15:58:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999517; cv=none; d=google.com; s=arc-20160816; b=dyRJPO13jyWPKlboCj4cltQpf6q/VmbUyswkROn+lEw7zmMmhOTSpBin8kNOi56wkA t9M8bl7PsWxN5x/l7P0stT5w1VuygpyQKceud+FUSrAMNaEtJtdzS1quLfz2vy6IbrcL x/MBtUOeXadqfh62nmD8R3qfdRPov09OGaTQnRcGuMAO6yFug/bWDgPqs+HAMKNsbDPe pGC+vQWcuZ6iW8/gMkpqVxGo1b6vw/1BPcKq4AkSIwvRGAfYGNZw6E8yl3v4OCYP5eWM 45AeyHuRCUstsHVAMba2YccKYlDDw0sdZt4fwpeKb098Y7Z0FY7OyNdkyGIJcgEzgHlB rfVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=DA4w6vWBcl/o/fCPqKs/e8JAcC2+x8OTjBrtE2E7+xQ=; fh=PnYt+qEB9tAfMKoqBm2xjKOFpYyFFGPudh5cVIoieJM=; b=tk69kpKyjYUr/UO/U64NCNbuvjm3YONEJG3YVMc5cs+/2fHHqrDo3xv2epPEwsDnF1 9zI1HTHAMBsc8DmhT90H/8TqVRrD/R22oU7aSKzqUTNUZPVoBg+b17wlF4JIiXYx5zfs DpNxEznz4+2UeiuwLcIh1phZaf57FgLzVLdiYH7mU3HVsMkSQgmLcrdBabScxeGGqDnB QhLrjB5pDaBBC7NIf2tPN5OESwV/w5tFtSMY+v58YzaIXZ/nrIbRhBqEm34eRe9FQ4H/ hevSilikHmhM39HQtwAcEZF9APmCMuEVSZcLoUWhPEn6jYz3flOgLuRJcx1JyulfL0FV 9G5Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kwDdxqO3; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id o10-20020a05622a138a00b00439f37bf751si5010720qtk.628.2024.04.24.15.58.37 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:37 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kwDdxqO3; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYW-0005ck-PN; Wed, 24 Apr 2024 18:57:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYL-0005Py-8G for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:25 -0400 Received: from mail-oa1-x30.google.com ([2001:4860:4864:20::30]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYJ-0001p5-KV for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:25 -0400 Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-22ed075a629so211118fac.3 for ; Wed, 24 Apr 2024 15:57:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999442; x=1714604242; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=DA4w6vWBcl/o/fCPqKs/e8JAcC2+x8OTjBrtE2E7+xQ=; b=kwDdxqO3dwbHcyqBo43kRllE23TQcpEJyqFw8JemDE5txYhXaZjr5WgsBgcTiU7Ejm oPepvBkr+bBnN/0hkqyAlcVobmN58khHUdPfJczQI0/7C5o+MNqpRkRyjFIlamDwvpKU tApVYVfuCq9BYWEm5OXybUvs8IEfLWpfcJsMgEKA9geYq6lI+pnsi8B844h2GHR9ZXSw z5h9tUzhYwbDn13d0SS4qoIp3JmMZThXE5CDzf7G/SarOiQ6QvxZTWx6LEE2SFb5Tq2t fZBk2bjZtot+kRPNWg21LZ8auV/QHVhJtc0JIhzNYYnyOoTGpl6DAWKgYk83SS2WRNF6 K8GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999442; x=1714604242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DA4w6vWBcl/o/fCPqKs/e8JAcC2+x8OTjBrtE2E7+xQ=; b=acAzHLS3v1oiBDndWd9ICYdvvpzewzCa6qcHkhRXyqTJlsV8aMaQlZArsgS7mv3Bre Rj6pEsgn3Gj6T3Ol7OwJnjFO1z0u1zRbgwrSmhh6wDCGzrV25L88vcZO9lR2BOMGrYBQ KXA2SITLRu2Uj6qF5kHv8MhSkcFVWL6L4OEAnAPrqUWPKMPNue/MVjuFWhQUvzyeJBtY sejZRpv4TB45NAa/VJ9w0upDeGq4NN8Dw4Q3GENtp1uPhoeBjUwf7P70xAugDbKvVWhP ggwVwnjPYf49YEC0EIO0O4Kv1clita1BGLf1882O55/DFx6IJUIFGWetJ+mKOMwcpQTM ATHA== X-Gm-Message-State: AOJu0Yy1kWlDpHUXWIuTmRDv2tHKFSiJL+/Wik5Bb/bbjs0iXCsXjFPF 5XLhlK+z6o5vNAr5G2y55c8f/cZ0UvS41LwP3TrHwvPUcbhxxFui/kgWXMF7pzNVmKFQ1E5a2vD T X-Received: by 2002:a05:6870:a345:b0:238:b140:1ab with SMTP id y5-20020a056870a34500b00238b14001abmr4443648oak.48.1713999442267; Wed, 24 Apr 2024 15:57:22 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:21 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v6 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Wed, 24 Apr 2024 15:57:03 -0700 Message-Id: <20240424225705.929812-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::30; envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x30.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 56 ++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index eb8030a3f0..ff003dc40e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -179,51 +179,39 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ - - -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; - - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } - } - return 0; -} - -static unsigned used_accel; +}; +static unsigned accel_index; static void __attribute__((constructor)) init_accel(void) { - used_accel = select_accel_cpuinfo(cpuinfo_init()); + unsigned info = cpuinfo_init(); + unsigned index = (info & CPUINFO_SSE2 ? 1 : 0); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + index = 2; + } +#endif + + accel_index = index; + buffer_is_zero_accel = accel_table[index]; } #define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; } #else bool test_buffer_is_zero_next_accel(void) From patchwork Wed Apr 24 22:57:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791550 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093306wrq; Wed, 24 Apr 2024 15:58:43 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVLb4Yn4mv3wsFvALfXMtP0uREolRb4ySkujJKDRW97RBK2feRvoKnSFY8UCYOvpdUGQTRc/ELV/nB4gKk2bCIA X-Google-Smtp-Source: AGHT+IEBzE4t3lw0AxQDXKoH7DP8tBKYZ+4tBPxfb21W90vUK54fcNxhHe0Vj9u8ms/gXsikihM/ X-Received: by 2002:ac8:59cc:0:b0:437:b868:6769 with SMTP id f12-20020ac859cc000000b00437b8686769mr4171384qtf.11.1713999523562; Wed, 24 Apr 2024 15:58:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999523; cv=none; d=google.com; s=arc-20160816; b=XP+iJ/kV99wO6nev9o/foDuPcX2nlwrp8RScw2B1jq8Ep+ljA4kKCFFTnJbonYBkFu pGvdNyKtfZaq9/uofjdkgogCzvfa9OvQBt3dZgrsCI7MD3Gd+hy5fybYydDlOHdP9cK+ aZkcgi/OsUjlXkbLcozG95HOE8RjfocUYgLbmsWeo803EWNofFexsdRppIRpov9TfOWa bQI6e/O+N0FLSqn2W1PeBQDO9ravipTIDeZVZi14fEawyq9lcbsIZ1iHh2YCP0D7fGNm sPbX5OY/UtjPXQuBjZixia8KJ60qu8kD7QYL7ZBRkpaffLNMqfn++ltr0qvBsAbzyWap ikVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=fwoFN6B2qbFkqUCTGf1HoUgZR053rQ1g2covN+0RlE8=; fh=PnYt+qEB9tAfMKoqBm2xjKOFpYyFFGPudh5cVIoieJM=; b=vFXymTNrZKFiUBbRNv+6YwMX6oAR8DhHrXQmM5Qd/THXDtRv+Q5jbpeHLL+ChpCq8L e15ZQYGmu4qkhz1H2GL4facu/BJcFHISl5EYqSZfu2yPw6vfMdrx06grmy2mr0jkzz9u iRFusTQl0GYIp9UNhjXYlWEdUjmp+wUdEwBMzyfibrCV1I0bb6BfPf8z26cg1xkA/t6O sAfIBhcXdpuS9SqTsDTXI3UAEf9cJCt5tqVZi7X6l+tWCCX9cEoGD06v6naYjw0RnQJH YkAvTEbrfWTvkMWvXZhEhBAjeo6kdIl0gV6+0z0w29VPwrblGRZI3kHQwZGDuRG4OEjo wgIg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jaqRTVAy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id v36-20020a05622a18a400b004345fefbd57si16873168qtc.256.2024.04.24.15.58.43 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:43 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jaqRTVAy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYR-0005UW-Ea; Wed, 24 Apr 2024 18:57:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYL-0005Q1-Pc for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:26 -0400 Received: from mail-pf1-x432.google.com ([2607:f8b0:4864:20::432]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYK-0001pD-A3 for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:25 -0400 Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6effe9c852eso372618b3a.3 for ; Wed, 24 Apr 2024 15:57:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999443; x=1714604243; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=fwoFN6B2qbFkqUCTGf1HoUgZR053rQ1g2covN+0RlE8=; b=jaqRTVAyHV/qeNkMG4M2udOLBqLZpP33d1JgmFJVbkS6G9tUIYAV+W7TWA3h16qSXK D5/w6Qn3bdwLFcRg2eCBWHtwqTRY8tD0i6uMpqrdy5mvF/ve++a6VhSnxWvxL4Qqv6NZ EKZBv2qDMededM//hgPOSvQhlO5cOYG4+XqKUeHAYI1wvB6KW8c34h8HhhptWks7ARO5 6htFN6megQYgq2+WBCcj0Ju3OEzKODF1d1CLLXjLMOU0bTybuz5mRZpQcS1p+M5Vb2kk 7DSPUfTIubofzce/CuPR+63NkIeyU7LJKzuESI2wm/+OCTUoLQ3Sm3AOfSiYHZMLV7o6 xbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999443; x=1714604243; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fwoFN6B2qbFkqUCTGf1HoUgZR053rQ1g2covN+0RlE8=; b=Ssa0twqbbzFbRwTPpnHmK8QG5qwB/xKXil6LulE7xGIzZAY5jekrayTuz9jA4Y/XYc JEk4LP56T6GrH8Uhv472qLnLwuMjE3PaUbhzKtTm2XI+95SWBkOSMdeZKmjMt+iOJwHJ Iq8v2sVRXenJT+8YNqxLQEI4OKFzIqaVz8dvKnI3s5vKmxd5RlvYyQmYMSjrzinrhH8J heu7FnMf+GcdVgYTqt+ZXBVlfRuFSQjiuJ2v7P8X/crQfGHneJ5d+knX6iM5vWtXgW8W Lk8SkYLcAN6tGqZziZmHhHMp1JxNeoHuqc+VNgmeALVI19N2bRRJyPm7kTjoDsL9XUwg rMZg== X-Gm-Message-State: AOJu0Yw0hFmrCIege40BzVP2BsVsCm33NdxZM5FFvXTVJNFPNVmz/oO4 HUtHrpmKOxHpQu3S6vCxIn+MNu7khMnPt8lEZyuVapZJYhhyJ4257CemP3zGn8aYHsbHaXwJ4QF i X-Received: by 2002:a05:6a20:c901:b0:1a7:a3cb:7901 with SMTP id gx1-20020a056a20c90100b001a7a3cb7901mr4050098pzb.61.1713999443062; Wed, 24 Apr 2024 15:57:23 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:22 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v6 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Wed, 24 Apr 2024 15:57:04 -0700 Message-Id: <20240424225705.929812-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::432; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x432.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function. Use UMAXV for the vector reduction. This is 3 cycles on cortex-a76 and 2 cycles on neoverse-n1. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 77 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index ff003dc40e..38477a3eac 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -213,7 +213,84 @@ bool test_buffer_is_zero_next_accel(void) } return false; } + +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* + * Reduce via UMAXV. Whatever the actual result, + * it will only be zero if all input bytes are zero. + */ + if (unlikely(vmaxvq_u32(t0) != 0)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vmaxvq_u32(t0) == 0; +} + +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; + +static unsigned accel_index = 1; +#define INIT_ACCEL buffer_is_zero_simd + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + #else + bool test_buffer_is_zero_next_accel(void) { return false; From patchwork Wed Apr 24 22:57:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 791546 Delivered-To: patch@linaro.org Received: by 2002:a5d:4884:0:b0:346:15ad:a2a with SMTP id g4csp1093273wrq; Wed, 24 Apr 2024 15:58:37 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV526dMcEF4A1QqiVP15BaubvxDvl1eirCAjdIGJLIRqzAMS8OpRZGpCij1ILHpsW2SfcFs4Yw0cC9yB7I+iOKZ X-Google-Smtp-Source: AGHT+IE1D5l5rUBMbm2j0B988Z1ua6SNP6ViD/8YCBOvrD9ZZGwEczp4uX7caeVqOhK73Lk2mxDS X-Received: by 2002:a05:6214:11ad:b0:6a0:82ae:f0b with SMTP id u13-20020a05621411ad00b006a082ae0f0bmr4430833qvv.33.1713999517282; Wed, 24 Apr 2024 15:58:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1713999517; cv=none; d=google.com; s=arc-20160816; b=Rv+YpOSQCxgwnz1mm0ouKZKME2DzTSFUejuEqS/ybpXqoytQ/YzOIHbG0k+V/Yrhbv Gm4bMNWkdMR8POfycipCdZJC3Peg4ZJhNn6EJh3Dp+M1M3n+a1FXfH6aYGnH1uWY+0FC s0TA1BUb2b1KHAB1XI4uCfZsyrahSqtLe2n8dEMyvKb+Ys/fX2lFfC/uIK/7EfOEyKQ2 Jv61Aix3ZBOwFgs1v44MT20JuKUH1ktZlQNYLR8CN8MNIVF4kX3E574/g2omMTwLPPpx hG3OGwdVfWOHBPW9WXNzyp5LlHXcEXuhyQi6sbuh9zdRDh8/HhVJJFzlB2jEZsYA0NsD 2GUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=uDnD2JhudGEyHc/H+ZZoiEcnN6tpSM/3wit8Um6ah6g=; fh=PnYt+qEB9tAfMKoqBm2xjKOFpYyFFGPudh5cVIoieJM=; b=UrrEsomLgHo4kBSC9GCiqoyr3u0dfOyIo2HvsqzPe728zP6aYhup4JohfBHjZuCToy Z55xvXfhnWgEMDMOjq7VXvyYK/3ZGPXQK/8tc4VEJoq7hVfg4mWtqCRBtkvUuY02emj2 kOBQHSwhn9jzLVfbqg63W6rxMA71t9d6JlBDVpvWk6hJBacyPGgkTjyGPgOVQPKN3eBH i9pPIygsYIEBcmfR1bgNLtGqKyTzIzwQtKED3Y1aDHBVYKBmO8WKNcr5NPLb+lVOYrsf JERBczHNray4l4MTMhUwd5v6KIK0253wFlJ5M6rHRyUcdYcGa49ar3KC/5ZRrv/kXpFO kAww==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mG+rKjic; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id gw3-20020a0562140f0300b006a03ff213dbsi16680653qvb.68.2024.04.24.15.58.37 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Apr 2024 15:58:37 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mG+rKjic; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rzlYS-0005Y3-6S; Wed, 24 Apr 2024 18:57:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzlYM-0005QJ-LS for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:27 -0400 Received: from mail-oo1-xc2f.google.com ([2607:f8b0:4864:20::c2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rzlYL-0001pT-6O for qemu-devel@nongnu.org; Wed, 24 Apr 2024 18:57:26 -0400 Received: by mail-oo1-xc2f.google.com with SMTP id 006d021491bc7-5aa2a74c238so281508eaf.3 for ; Wed, 24 Apr 2024 15:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713999444; x=1714604244; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=uDnD2JhudGEyHc/H+ZZoiEcnN6tpSM/3wit8Um6ah6g=; b=mG+rKjic6fGDOeNcPoWewMQB9wlHzaoVWPFgKzb4qzDWBFMe7HvJxL/PAYJa2FH1Kw a/lqLwbEVtA7fz5EOZnLbkiud5jRMWv6TrOXEuauR/3aXt9g45/o8x4sQiMziOuNI5BV FRmgDPqDQFw6V5ghxvGIDbMfvlXwVcF4eWiy2DNUyzkIXgpVkFvbYoI4mHGxaxe+XhIc G/cm4gI0lKmy8ObgyH3qDOboFoOBR7PYR/TODRjBUkiQq32HWt/NrAGTcwPHCaqXcRF/ 9gSb65BYIWIXumGRFmKWTFcvQum9Uld8XiEdv963uT23soDgFNEhPEaXnF+cZB1JiJL2 9+UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713999444; x=1714604244; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uDnD2JhudGEyHc/H+ZZoiEcnN6tpSM/3wit8Um6ah6g=; b=r2AHREOJYN0EnauQxG38EsUzsGxfk3ewZ5Z1ojb8ZfuBX4mZ+pIvyFqCfXbdKCJXcw p8f2AP39BdpLCXVlYO5WQg1+WIb+WN6DwrgomJ7JnMIUwnsddhD93RTiCPmHe++xKy5v VDExOEiPIyrYODiVyt9HgJruvYBfOvT2njQgwCihJ49CbCdN7Z52m5VqxSnwvA/zESP0 uPDvx6YWYWdXe3KnNL64BT+NGskhqv+Ghyvwr33gevZmZCCmZx9OqHB/BmVUR8LmDFJZ uMYd3r0ubcauq58YNUC3N2rpQvvoVwV13kv7SAulLtbXXYZENXgRS4U/Qz6MnKl1x5Gy /4vQ== X-Gm-Message-State: AOJu0Yz/WE/VuaTXJWBLiaHILkxDgf2J+E7oLxpa43460isNvGBKP1wN QMqq1RqDNr4z/HZeozJlLD9VblVKuDP4u0qEnvlhPdh8msxUrMl78v7yGe9wBwMvgUJnsyNx2j0 u X-Received: by 2002:a05:6870:3101:b0:22e:8ed:9f7a with SMTP id v1-20020a056870310100b0022e08ed9f7amr4331016oaa.4.1713999443927; Wed, 24 Apr 2024 15:57:23 -0700 (PDT) Received: from stoup.. ([156.19.246.23]) by smtp.gmail.com with ESMTPSA id gu26-20020a056a004e5a00b006ed9760b815sm11947413pfb.211.2024.04.24.15.57.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 15:57:23 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Subject: [PATCH v6 10/10] tests/bench: Add bufferiszero-bench Date: Wed, 24 Apr 2024 15:57:05 -0700 Message-Id: <20240424225705.929812-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240424225705.929812-1-richard.henderson@linaro.org> References: <20240424225705.929812-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::c2f; envelope-from=richard.henderson@linaro.org; helo=mail-oo1-xc2f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Benchmark each acceleration function vs an aligned buffer of zeros. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- tests/bench/bufferiszero-bench.c | 47 ++++++++++++++++++++++++++++++++ tests/bench/meson.build | 1 + 2 files changed, 48 insertions(+) create mode 100644 tests/bench/bufferiszero-bench.c diff --git a/tests/bench/bufferiszero-bench.c b/tests/bench/bufferiszero-bench.c new file mode 100644 index 0000000000..222695c1fa --- /dev/null +++ b/tests/bench/bufferiszero-bench.c @@ -0,0 +1,47 @@ +/* + * QEMU buffer_is_zero speed benchmark + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/units.h" + +static void test(const void *opaque) +{ + size_t max = 64 * KiB; + void *buf = g_malloc0(max); + int accel_index = 0; + + do { + if (accel_index != 0) { + g_test_message("%s", ""); /* gnu_printf Werror for simple "" */ + } + for (size_t len = 1 * KiB; len <= max; len *= 4) { + double total = 0.0; + + g_test_timer_start(); + do { + buffer_is_zero_ge256(buf, len); + total += len; + } while (g_test_timer_elapsed() < 0.5); + + total /= MiB; + g_test_message("buffer_is_zero #%d: %2zuKB %8.0f MB/sec", + accel_index, len / (size_t)KiB, + total / g_test_timer_last()); + } + accel_index++; + } while (test_buffer_is_zero_next_accel()); + + g_free(buf); +} + +int main(int argc, char **argv) +{ + g_test_init(&argc, &argv, NULL); + g_test_add_data_func("/cutils/bufferiszero/speed", NULL, test); + return g_test_run(); +} diff --git a/tests/bench/meson.build b/tests/bench/meson.build index 7e76338a52..4cd7a2f6b5 100644 --- a/tests/bench/meson.build +++ b/tests/bench/meson.build @@ -21,6 +21,7 @@ benchs = {} if have_block benchs += { + 'bufferiszero-bench': [], 'benchmark-crypto-hash': [crypto], 'benchmark-crypto-hmac': [crypto], 'benchmark-crypto-cipher': [crypto],