From patchwork Thu Feb  6 07:20:48 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862662
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87F752144A8;
 Thu,  6 Feb 2025 07:21:07 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826469; cv=none;
 b=H18IGq8tIZnIgwKM/ih513sG/238SNBDg0hVo/+dEnEWvfrexyuivcjEwgFApH4wE+ljuKU97M+6SBK2mNcfWE9way9eoyUgrhEqhH+LzcBqdn0GbdPz2s9kJvTQU773X4nnQ4yDqKMa8ZPGViDWQ8wz/Z2jeakhZLYKiWizCUQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826469; c=relaxed/simple;
 bh=+lL68B5VDme0ssy20KVs9HZUu1uuo/887sCYSoPmBdw=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=B558eIgBoPJLxg+sFNU4WGB+d/cWFufkO+QXWdv/CdvR8J9beFMWQrotWDbLV3qZw0eXZAK0IndfpFljUgNIfmHsjhFVujCCQGkziB+hy/qcHXM6eicgrARXWraxJXKJ67i/Vdzl6Bt+SLAkqRT2pc1AIe03tovy/VsZ4fQUjS8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=E2gAs46d; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="E2gAs46d"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826468; x=1770362468;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=+lL68B5VDme0ssy20KVs9HZUu1uuo/887sCYSoPmBdw=;
 b=E2gAs46d6bgpctahzmsFVdb0sXg++QPbacl2X67VvWyJc8g1UMQkcKIw
 AWU9pRc9uQsGOHGd8MhT5bYzxwJqxpZIFvQqgEGy7mHa7HoIt3qLmO4U6
 vgA+KvPxdTxKY+q1KE9LA4Yy4+bhXvZdYE7QSWr+fN6MIAVe5ge6srmeh
 eKa9ar3qcE0V0zAyjr3l4hsNpyewspamrp0No/v5ErjCOMH6iqtQC7sMl
 RxVZ/rHvOovZ0c+lwvFA7zGOwS8eW7mIMXUXzt1VebJjnAy7EpOQPWyES
 niCN3piE27BuvK276TubkUQ5Tniqlov44b+h2mfUUP5LMzTz7saxCNm/y Q==;
X-CSE-ConnectionGUID: TxHKUeTaRRKh2bs0GrlAvQ==
X-CSE-MsgGUID: ViVvWixJSZO7AiTS9XFybQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962615"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962615"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:04 -0800
X-CSE-ConnectionGUID: MlGTuHLKSGqLU3rmNEC52Q==
X-CSE-MsgGUID: xD8N+IO/Rm2SXpljUYj/EA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022596"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:03 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 02/16] crypto: acomp - Define new interfaces for
 compress/decompress batching.
Date: Wed,  5 Feb 2025 23:20:48 -0800
Message-Id: <20250206072102.29045-3-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This commit adds get_batch_size(), batch_compress() and batch_decompress()
interfaces to:

  struct acomp_alg
  struct crypto_acomp

A crypto_acomp compression algorithm that supports batching of compressions
and decompressions must provide implementations for these API.

A new helper function acomp_has_async_batching() can be invoked to query if
a crypto_acomp has registered these batching interfaces.

A higher level module like zswap can call acomp_has_async_batching() to
detect if the compressor supports batching, and if so, it can call
the new crypto_acomp_batch_size() to detect the maximum batch-size
supported by the compressor. Based on this, zswap can use the minimum of
any zswap-specific upper limits for batch-size and the compressor's max
batch-size, to allocate batching resources.

This allows the iaa_crypto Intel IAA driver to register implementations for
the get_batch_size(), batch_compress() and batch_decompress() acomp_alg
interfaces, that can subsequently be invoked from the kernel zswap/zram
modules to compress/decompress pages in parallel in the IAA hardware
accelerator to improve swapout/swapin performance through these newly added
corresponding crypto_acomp API:

  crypto_acomp_batch_size()
  crypto_acomp_batch_compress()
  crypto_acomp_batch_decompress()

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 crypto/acompress.c                  |   3 +
 include/crypto/acompress.h          | 111 ++++++++++++++++++++++++++++
 include/crypto/internal/acompress.h |  19 +++++
 3 files changed, 133 insertions(+)

diff --git a/crypto/acompress.c b/crypto/acompress.c
index cb6444d09dd7..165559a8b9bd 100644
--- a/crypto/acompress.c
+++ b/crypto/acompress.c
@@ -84,6 +84,9 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
 
 	acomp->compress = alg->compress;
 	acomp->decompress = alg->decompress;
+	acomp->get_batch_size = alg->get_batch_size;
+	acomp->batch_compress = alg->batch_compress;
+	acomp->batch_decompress = alg->batch_decompress;
 	acomp->dst_free = alg->dst_free;
 	acomp->reqsize = alg->reqsize;
 
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index eadc24514056..8451ade70fd8 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -43,6 +43,10 @@ struct acomp_req {
  *
  * @compress:		Function performs a compress operation
  * @decompress:		Function performs a de-compress operation
+ * @get_batch_size:     Maximum batch-size for batching compress/decompress
+ *                      operations.
+ * @batch_compress:	Function performs a batch compress operation
+ * @batch_decompress:	Function performs a batch decompress operation
  * @dst_free:		Frees destination buffer if allocated inside the
  *			algorithm
  * @reqsize:		Context size for (de)compression requests
@@ -51,6 +55,21 @@ struct acomp_req {
 struct crypto_acomp {
 	int (*compress)(struct acomp_req *req);
 	int (*decompress)(struct acomp_req *req);
+	unsigned int (*get_batch_size)(void);
+	bool (*batch_compress)(struct acomp_req *reqs[],
+			       struct crypto_wait *wait,
+			       struct page *pages[],
+			       u8 *dsts[],
+			       unsigned int dlens[],
+			       int errors[],
+			       int nr_pages);
+	bool (*batch_decompress)(struct acomp_req *reqs[],
+				 struct crypto_wait *wait,
+				 u8 *srcs[],
+				 struct page *pages[],
+				 unsigned int slens[],
+				 int errors[],
+				 int nr_pages);
 	void (*dst_free)(struct scatterlist *dst);
 	unsigned int reqsize;
 	struct crypto_tfm base;
@@ -142,6 +161,13 @@ static inline bool acomp_is_async(struct crypto_acomp *tfm)
 	       CRYPTO_ALG_ASYNC;
 }
 
+static inline bool acomp_has_async_batching(struct crypto_acomp *tfm)
+{
+	return (acomp_is_async(tfm) &&
+		(crypto_comp_alg_common(tfm)->base.cra_flags & CRYPTO_ALG_TYPE_ACOMPRESS) &&
+		tfm->get_batch_size && tfm->batch_compress && tfm->batch_decompress);
+}
+
 static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req)
 {
 	return __crypto_acomp_tfm(req->base.tfm);
@@ -306,4 +332,89 @@ static inline int crypto_acomp_decompress(struct acomp_req *req)
 	return crypto_acomp_reqtfm(req)->decompress(req);
 }
 
+/**
+ * crypto_acomp_batch_size() -- Get the algorithm's batch size
+ *
+ * Function returns the algorithm's batch size for batching operations
+ *
+ * @tfm:	ACOMPRESS tfm handle allocated with crypto_alloc_acomp()
+ *
+ * Return:	crypto_acomp's batch size.
+ */
+static inline unsigned int crypto_acomp_batch_size(struct crypto_acomp *tfm)
+{
+	if (acomp_has_async_batching(tfm))
+		return tfm->get_batch_size();
+
+	return 1;
+}
+
+/**
+ * crypto_acomp_batch_compress() -- Invoke asynchronous compress of
+ *                                  a batch of requests
+ *
+ * Function invokes the asynchronous batch compress operation
+ *
+ * @reqs: @nr_pages asynchronous compress requests.
+ * @wait: crypto_wait for acomp batch compress with synchronous/asynchronous
+ *        request chaining. If NULL, the driver must provide a way to process
+ *        request completions asynchronously.
+ * @pages: Pages to be compressed.
+ * @dsts: Pre-allocated destination buffers to store results of compression.
+ * @dlens: Will contain the compressed lengths.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages to be compressed.
+ *
+ * Returns true if all compress requests complete successfully,
+ * false otherwise.
+ */
+static inline bool crypto_acomp_batch_compress(struct acomp_req *reqs[],
+					       struct crypto_wait *wait,
+					       struct page *pages[],
+					       u8 *dsts[],
+					       unsigned int dlens[],
+					       int errors[],
+					       int nr_pages)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]);
+
+	return tfm->batch_compress(reqs, wait, pages, dsts,
+				   dlens, errors, nr_pages);
+}
+
+/**
+ * crypto_acomp_batch_decompress() -- Invoke asynchronous decompress of
+ *                                    a batch of requests
+ *
+ * Function invokes the asynchronous batch decompress operation
+ *
+ * @reqs: @nr_pages asynchronous decompress requests.
+ * @wait: crypto_wait for acomp batch decompress with synchronous/asynchronous
+ *        request chaining. If NULL, the driver must provide a way to process
+ *        request completions asynchronously.
+ * @srcs: The src buffers to be decompressed.
+ * @pages: The pages to store the decompressed buffers.
+ * @slens: Compressed lengths of @srcs.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages to be decompressed.
+ *
+ * Returns true if all decompress requests complete successfully,
+ * false otherwise.
+ */
+static inline bool crypto_acomp_batch_decompress(struct acomp_req *reqs[],
+						 struct crypto_wait *wait,
+						 u8 *srcs[],
+						 struct page *pages[],
+						 unsigned int slens[],
+						 int errors[],
+						 int nr_pages)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]);
+
+	return tfm->batch_decompress(reqs, wait, srcs, pages,
+				     slens, errors, nr_pages);
+}
+
 #endif
diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/acompress.h
index 53b4ef59b48c..df0e192801ff 100644
--- a/include/crypto/internal/acompress.h
+++ b/include/crypto/internal/acompress.h
@@ -17,6 +17,10 @@
  *
  * @compress:	Function performs a compress operation
  * @decompress:	Function performs a de-compress operation
+ * @get_batch_size:     Maximum batch-size for batching compress/decompress
+ *                      operations.
+ * @batch_compress:	Function performs a batch compress operation
+ * @batch_decompress:	Function performs a batch decompress operation
  * @dst_free:	Frees destination buffer if allocated inside the algorithm
  * @init:	Initialize the cryptographic transformation object.
  *		This function is used to initialize the cryptographic
@@ -37,6 +41,21 @@
 struct acomp_alg {
 	int (*compress)(struct acomp_req *req);
 	int (*decompress)(struct acomp_req *req);
+	unsigned int (*get_batch_size)(void);
+	bool (*batch_compress)(struct acomp_req *reqs[],
+			       struct crypto_wait *wait,
+			       struct page *pages[],
+			       u8 *dsts[],
+			       unsigned int dlens[],
+			       int errors[],
+			       int nr_pages);
+	bool (*batch_decompress)(struct acomp_req *reqs[],
+				 struct crypto_wait *wait,
+				 u8 *srcs[],
+				 struct page *pages[],
+				 unsigned int slens[],
+				 int errors[],
+				 int nr_pages);
 	void (*dst_free)(struct scatterlist *dst);
 	int (*init)(struct crypto_acomp *tfm);
 	void (*exit)(struct crypto_acomp *tfm);

From patchwork Thu Feb  6 07:20:50 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862660
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FEA6224B11;
 Thu,  6 Feb 2025 07:21:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826472; cv=none;
 b=eIQ2mU0II7o5CYlLsgbVyvD3e3J1x4JdzHJtBDgKH7R+0/JdFXUxWfznO9Tp4g2cL5jGGXtcWKwMPO6W/hiyYnjf191Er1J3LzkoSJlHT+luRR4hVj5GD1AaYU49WmjXvlNTodcRNHASKj5+GROH0XUP04uQRe5ggzSsEzJFwlg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826472; c=relaxed/simple;
 bh=dcY8wHVUfujZgmvz5KO0cSP8J3+B0GuhP4os2QGOLPY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=izCl13druCTHLdzQfIaAaW+u5zNiy9CyI3XX1oMso8/GTU5vviGHqWCyzaJ5ZyfUuoygUWc5M11/DzFbYcuG+X8LU83ZzfsbcqdfLhjFy8osbR1+cz8UyTus+OmyiCIFuhsTZ1NYJQ6FtHjxgkxi0EMbCbO6jszP5V3wm0V4pJ0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=RA08yror; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="RA08yror"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826470; x=1770362470;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=dcY8wHVUfujZgmvz5KO0cSP8J3+B0GuhP4os2QGOLPY=;
 b=RA08yror/L/wgqqStYnALCwzMYi57GsNPjZyVtBDTipcy00X2dCuOSYE
 uwnztOevhpSyE7HjrHQKNGuEzqkBcoS6wS4i9oABV6OX0RWpepT/ATVRd
 7uOdFolmExhnij9Y/alQvVFwgKauaklU2IchivC/jWlyP22XTmQ9toOD2
 tjy5obwZy7d7FAEMuMyuxi8V2kg0Gt1dCZY1yGLjtK6z5TeVccXmO5Tnz
 ef2PktGnLCfvt++3+TlaUNwBXV1XGKrAeDARAfpP1sssswPJSUqdYSfka
 CFzhHiiJpj3i54we4G5H2M/CsGmccqQnXbAN44BQXnyGt7mpdI2GfUgoo Q==;
X-CSE-ConnectionGUID: vNhWl8c6SWuUCj2WrSjQZA==
X-CSE-MsgGUID: 66Ocgj8nRJ+HdUBUWmGVcQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962649"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962649"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:04 -0800
X-CSE-ConnectionGUID: dQlJdautS06LZgxZHf+H6w==
X-CSE-MsgGUID: VYsq7wa+T4mvVCLRyjun2g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022604"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:04 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 04/16] crypto: iaa - Implement batch_compress(),
 batch_decompress() API in iaa_crypto.
Date: Wed,  5 Feb 2025 23:20:50 -0800
Message-Id: <20250206072102.29045-5-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This patch provides iaa_crypto driver implementations for the newly added
crypto_acomp batch_compress() and batch_decompress() interfaces using
acomp request chaining.

iaa_crypto also implements the new crypto_acomp get_batch_size() interface
that returns an iaa_driver specific constant, IAA_CRYPTO_MAX_BATCH_SIZE
(set to 8U currently).

This allows swap modules such as zswap/zram to allocate required batching
resources and then invoke fully asynchronous batch parallel
compression/decompression of pages on systems with Intel IAA, by invoking
these API, respectively:

 crypto_acomp_batch_size(...);
 crypto_acomp_batch_compress(...);
 crypto_acomp_batch_decompress(...);

This enables zswap compress batching code to be developed in
a manner similar to the current single-page synchronous calls to:

 crypto_acomp_compress(...);
 crypto_acomp_decompress(...);

thereby, facilitating encapsulated and modular hand-off between the kernel
zswap/zram code and the crypto_acomp layer.

Since iaa_crypto supports the use of acomp request chaining, this patch
also adds CRYPTO_ALG_REQ_CHAIN to the iaa_acomp_fixed_deflate algorithm's
cra_flags.

Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 drivers/crypto/intel/iaa/iaa_crypto.h      |   9 +
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 395 ++++++++++++++++++++-
 2 files changed, 403 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h
index 56985e395263..b3b67c44ec8a 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto.h
+++ b/drivers/crypto/intel/iaa/iaa_crypto.h
@@ -39,6 +39,15 @@
 					 IAA_DECOMP_CHECK_FOR_EOB | \
 					 IAA_DECOMP_STOP_ON_EOB)
 
+/*
+ * The maximum compress/decompress batch size for IAA's implementation of
+ * the crypto_acomp batch_compress() and batch_decompress() interfaces.
+ * The IAA compression algorithms should provide the crypto_acomp
+ * get_batch_size() interface through a function that returns this
+ * constant.
+ */
+#define IAA_CRYPTO_MAX_BATCH_SIZE 8U
+
 /* Representation of IAA workqueue */
 struct iaa_wq {
 	struct list_head	list;
diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index d7983ab3c34a..61134a7ad1da 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -1807,6 +1807,396 @@ static void compression_ctx_init(struct iaa_compression_ctx *ctx)
 	ctx->use_irq = use_irq;
 }
 
+static int iaa_comp_poll(struct acomp_req *req)
+{
+	struct idxd_desc *idxd_desc;
+	struct idxd_device *idxd;
+	struct iaa_wq *iaa_wq;
+	struct pci_dev *pdev;
+	struct device *dev;
+	struct idxd_wq *wq;
+	bool compress_op;
+	int ret;
+
+	idxd_desc = req->base.data;
+	if (!idxd_desc)
+		return -EAGAIN;
+
+	compress_op = (idxd_desc->iax_hw->opcode == IAX_OPCODE_COMPRESS);
+	wq = idxd_desc->wq;
+	iaa_wq = idxd_wq_get_private(wq);
+	idxd = iaa_wq->iaa_device->idxd;
+	pdev = idxd->pdev;
+	dev = &pdev->dev;
+
+	ret = check_completion(dev, idxd_desc->iax_completion, true, true);
+	if (ret == -EAGAIN)
+		return ret;
+	if (ret)
+		goto out;
+
+	req->dlen = idxd_desc->iax_completion->output_size;
+
+	/* Update stats */
+	if (compress_op) {
+		update_total_comp_bytes_out(req->dlen);
+		update_wq_comp_bytes(wq, req->dlen);
+	} else {
+		update_total_decomp_bytes_in(req->slen);
+		update_wq_decomp_bytes(wq, req->slen);
+	}
+
+	if (iaa_verify_compress && (idxd_desc->iax_hw->opcode == IAX_OPCODE_COMPRESS)) {
+		struct crypto_tfm *tfm = req->base.tfm;
+		dma_addr_t src_addr, dst_addr;
+		u32 compression_crc;
+
+		compression_crc = idxd_desc->iax_completion->crc;
+
+		dma_sync_sg_for_device(dev, req->dst, 1, DMA_FROM_DEVICE);
+		dma_sync_sg_for_device(dev, req->src, 1, DMA_TO_DEVICE);
+
+		src_addr = sg_dma_address(req->src);
+		dst_addr = sg_dma_address(req->dst);
+
+		ret = iaa_compress_verify(tfm, req, wq, src_addr, req->slen,
+					  dst_addr, &req->dlen, compression_crc);
+	}
+out:
+	/* caller doesn't call crypto_wait_req, so no acomp_request_complete() */
+
+	dma_unmap_sg(dev, req->dst, sg_nents(req->dst), DMA_FROM_DEVICE);
+	dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_TO_DEVICE);
+
+	idxd_free_desc(idxd_desc->wq, idxd_desc);
+
+	dev_dbg(dev, "%s: returning ret=%d\n", __func__, ret);
+
+	return ret;
+}
+
+static unsigned int iaa_comp_get_batch_size(void)
+{
+	return IAA_CRYPTO_MAX_BATCH_SIZE;
+}
+
+static void iaa_set_req_poll(
+	struct acomp_req *reqs[],
+	int nr_reqs,
+	bool set_flag)
+{
+	int i;
+
+	for (i = 0; i < nr_reqs; ++i) {
+		set_flag ? (reqs[i]->flags |= CRYPTO_ACOMP_REQ_POLL) :
+			   (reqs[i]->flags &= ~CRYPTO_ACOMP_REQ_POLL);
+	}
+}
+
+/**
+ * This API provides IAA compress batching functionality for use by swap
+ * modules.
+ *
+ * @reqs: @nr_pages asynchronous compress requests.
+ * @wait: crypto_wait for acomp batch compress implemented using request
+ *        chaining. Required if async_mode is "false". If async_mode is "true",
+ *        and @wait is NULL, the completions will be processed using
+ *        asynchronous polling of the requests' completion statuses.
+ * @pages: Pages to be compressed by IAA.
+ * @dsts: Pre-allocated destination buffers to store results of IAA
+ *        compression. Each element of @dsts must be of size "PAGE_SIZE * 2".
+ * @dlens: Will contain the compressed lengths.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages, up to IAA_CRYPTO_MAX_BATCH_SIZE,
+ *            to be compressed.
+ *
+ * Returns true if all compress requests complete successfully,
+ * false otherwise.
+ */
+static bool iaa_comp_acompress_batch(
+	struct acomp_req *reqs[],
+	struct crypto_wait *wait,
+	struct page *pages[],
+	u8 *dsts[],
+	unsigned int dlens[],
+	int errors[],
+	int nr_pages)
+{
+	struct scatterlist inputs[IAA_CRYPTO_MAX_BATCH_SIZE];
+	struct scatterlist outputs[IAA_CRYPTO_MAX_BATCH_SIZE];
+	bool compressions_done = false;
+	bool async = (async_mode && !use_irq);
+	bool async_poll = (async && !wait);
+	int i, err = 0;
+
+	BUG_ON(nr_pages > IAA_CRYPTO_MAX_BATCH_SIZE);
+	BUG_ON(!async && !wait);
+
+	if (async)
+		iaa_set_req_poll(reqs, nr_pages, true);
+	else
+		iaa_set_req_poll(reqs, nr_pages, false);
+
+	/*
+	 * Prepare and submit acomp_reqs to IAA. IAA will process these
+	 * compress jobs in parallel if async_mode is true.
+	 */
+	for (i = 0; i < nr_pages; ++i) {
+		sg_init_table(&inputs[i], 1);
+		sg_set_page(&inputs[i], pages[i], PAGE_SIZE, 0);
+
+		/*
+		 * Each dst buffer should be of size (PAGE_SIZE * 2).
+		 * Reflect same in sg_list.
+		 */
+		sg_init_one(&outputs[i], dsts[i], PAGE_SIZE * 2);
+		acomp_request_set_params(reqs[i], &inputs[i],
+					 &outputs[i], PAGE_SIZE, dlens[i]);
+
+		/*
+		 * As long as the API is called with a valid "wait", chain the
+		 * requests for synchronous/asynchronous compress ops.
+		 * If async_mode is in effect, but the API is called with a
+		 * NULL "wait", submit the requests first, and poll for
+		 * their completion status later, after all descriptors have
+		 * been submitted.
+		 */
+		if (!async_poll) {
+			/* acomp request chaining. */
+			if (i)
+				acomp_request_chain(reqs[i], reqs[0]);
+			else
+				acomp_reqchain_init(reqs[0], 0, crypto_req_done,
+						    wait);
+		} else {
+			errors[i] = iaa_comp_acompress(reqs[i]);
+
+			if (errors[i] != -EINPROGRESS) {
+				errors[i] = -EINVAL;
+				err = -EINVAL;
+			} else {
+				errors[i] = -EAGAIN;
+			}
+		}
+	}
+
+	if (!async_poll) {
+		if (async)
+			/* Process the request chain in parallel. */
+			err = crypto_wait_req(acomp_do_async_req_chain(reqs[0],
+					      iaa_comp_acompress, iaa_comp_poll),
+					      wait);
+		else
+			/* Process the request chain in series. */
+			err = crypto_wait_req(acomp_do_req_chain(reqs[0],
+					      iaa_comp_acompress), wait);
+
+		for (i = 0; i < nr_pages; ++i) {
+			errors[i] = acomp_request_err(reqs[i]);
+			if (errors[i]) {
+				err = -EINVAL;
+				pr_debug("Request chaining req %d compress error %d\n", i, errors[i]);
+			} else {
+				dlens[i] = reqs[i]->dlen;
+			}
+		}
+
+		goto reset_reqs;
+	}
+
+	/*
+	 * Asynchronously poll for and process IAA compress job completions.
+	 */
+	while (!compressions_done) {
+		compressions_done = true;
+
+		for (i = 0; i < nr_pages; ++i) {
+			/*
+			 * Skip, if the compression has already completed
+			 * successfully or with an error.
+			 */
+			if (errors[i] != -EAGAIN)
+				continue;
+
+			errors[i] = iaa_comp_poll(reqs[i]);
+
+			if (errors[i]) {
+				if (errors[i] == -EAGAIN)
+					compressions_done = false;
+				else
+					err = -EINVAL;
+			} else {
+				dlens[i] = reqs[i]->dlen;
+			}
+		}
+	}
+
+reset_reqs:
+	/*
+	 * For the same 'reqs[]' to be usable by
+	 * iaa_comp_acompress()/iaa_comp_deacompress(),
+	 * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs, and the
+	 * CRYPTO_TFM_REQ_CHAIN bit on the reqs[0].
+	 */
+	iaa_set_req_poll(reqs, nr_pages, false);
+	if (!async_poll)
+		acomp_reqchain_clear(reqs[0], wait);
+
+	return !err;
+}
+
+/**
+ * This API provides IAA decompress batching functionality for use by swap
+ * modules.
+ *
+ * @reqs: @nr_pages asynchronous decompress requests.
+ * @wait: crypto_wait for acomp batch decompress implemented using request
+ *        chaining. Required if async_mode is "false". If async_mode is "true",
+ *        and @wait is NULL, the completions will be processed using
+ *        asynchronous polling of the requests' completion statuses.
+ * @srcs: The src buffers to be decompressed by IAA.
+ * @pages: The pages to store the decompressed buffers.
+ * @slens: Compressed lengths of @srcs.
+ * @errors: zero on successful compression of the corresponding
+ *          req, or error code in case of error.
+ * @nr_pages: The number of pages, up to IAA_CRYPTO_MAX_BATCH_SIZE,
+ *            to be decompressed.
+ *
+ * Returns true if all decompress requests complete successfully,
+ * false otherwise.
+ */
+static bool iaa_comp_adecompress_batch(
+	struct acomp_req *reqs[],
+	struct crypto_wait *wait,
+	u8 *srcs[],
+	struct page *pages[],
+	unsigned int slens[],
+	int errors[],
+	int nr_pages)
+{
+	struct scatterlist inputs[IAA_CRYPTO_MAX_BATCH_SIZE];
+	struct scatterlist outputs[IAA_CRYPTO_MAX_BATCH_SIZE];
+	unsigned int dlens[IAA_CRYPTO_MAX_BATCH_SIZE];
+	bool decompressions_done = false;
+	bool async = (async_mode && !use_irq);
+	bool async_poll = (async && !wait);
+	int i, err = 0;
+
+	BUG_ON(nr_pages > IAA_CRYPTO_MAX_BATCH_SIZE);
+	BUG_ON(!async && !wait);
+
+	if (async)
+		iaa_set_req_poll(reqs, nr_pages, true);
+	else
+		iaa_set_req_poll(reqs, nr_pages, false);
+
+	/*
+	 * Prepare and submit acomp_reqs to IAA. IAA will process these
+	 * decompress jobs in parallel if async_mode is true.
+	 */
+	for (i = 0; i < nr_pages; ++i) {
+		dlens[i] = PAGE_SIZE;
+		sg_init_one(&inputs[i], srcs[i], slens[i]);
+		sg_init_table(&outputs[i], 1);
+		sg_set_page(&outputs[i], pages[i], PAGE_SIZE, 0);
+		acomp_request_set_params(reqs[i], &inputs[i],
+					&outputs[i], slens[i], dlens[i]);
+
+		/*
+		 * As long as the API is called with a valid "wait", chain the
+		 * requests for synchronous/asynchronous decompress ops.
+		 * If async_mode is in effect, but the API is called with a
+		 * NULL "wait", submit the requests first, and poll for
+		 * their completion status later, after all descriptors have
+		 * been submitted.
+		 */
+		if (!async_poll) {
+			/* acomp request chaining. */
+			if (i)
+				acomp_request_chain(reqs[i], reqs[0]);
+			else
+				acomp_reqchain_init(reqs[0], 0, crypto_req_done,
+						    wait);
+		} else {
+			errors[i] = iaa_comp_adecompress(reqs[i]);
+
+			if (errors[i] != -EINPROGRESS) {
+				errors[i] = -EINVAL;
+				err = -EINVAL;
+			} else {
+				errors[i] = -EAGAIN;
+			}
+		}
+	}
+
+	if (!async_poll) {
+		if (async)
+			/* Process the request chain in parallel. */
+			err = crypto_wait_req(acomp_do_async_req_chain(reqs[0],
+					      iaa_comp_adecompress, iaa_comp_poll),
+					      wait);
+		else
+			/* Process the request chain in series. */
+			err = crypto_wait_req(acomp_do_req_chain(reqs[0],
+					      iaa_comp_adecompress), wait);
+
+		for (i = 0; i < nr_pages; ++i) {
+			errors[i] = acomp_request_err(reqs[i]);
+			if (errors[i]) {
+				err = -EINVAL;
+				pr_debug("Request chaining req %d decompress error %d\n", i, errors[i]);
+			} else {
+				dlens[i] = reqs[i]->dlen;
+				BUG_ON(dlens[i] != PAGE_SIZE);
+			}
+		}
+
+		goto reset_reqs;
+	}
+
+	/*
+	 * Asynchronously poll for and process IAA decompress job completions.
+	 */
+	while (!decompressions_done) {
+		decompressions_done = true;
+
+		for (i = 0; i < nr_pages; ++i) {
+			/*
+			 * Skip, if the decompression has already completed
+			 * successfully or with an error.
+			 */
+			if (errors[i] != -EAGAIN)
+				continue;
+
+			errors[i] = iaa_comp_poll(reqs[i]);
+
+			if (errors[i]) {
+				if (errors[i] == -EAGAIN)
+					decompressions_done = false;
+				else
+					err = -EINVAL;
+			} else {
+				dlens[i] = reqs[i]->dlen;
+				BUG_ON(dlens[i] != PAGE_SIZE);
+			}
+		}
+	}
+
+reset_reqs:
+	/*
+	 * For the same 'reqs[]' to be usable by
+	 * iaa_comp_acompress()/iaa_comp_deacompress(),
+	 * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs, and the
+	 * CRYPTO_TFM_REQ_CHAIN bit on the reqs[0].
+	 */
+	iaa_set_req_poll(reqs, nr_pages, false);
+	if (!async_poll)
+		acomp_reqchain_clear(reqs[0], wait);
+
+	return !err;
+}
+
 static int iaa_comp_init_fixed(struct crypto_acomp *acomp_tfm)
 {
 	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp_tfm);
@@ -1832,10 +2222,13 @@ static struct acomp_alg iaa_acomp_fixed_deflate = {
 	.compress		= iaa_comp_acompress,
 	.decompress		= iaa_comp_adecompress,
 	.dst_free               = dst_free,
+	.get_batch_size		= iaa_comp_get_batch_size,
+	.batch_compress		= iaa_comp_acompress_batch,
+	.batch_decompress	= iaa_comp_adecompress_batch,
 	.base			= {
 		.cra_name		= "deflate",
 		.cra_driver_name	= "deflate-iaa",
-		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_flags		= CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_CHAIN,
 		.cra_ctxsize		= sizeof(struct iaa_compression_ctx),
 		.cra_module		= THIS_MODULE,
 		.cra_priority		= IAA_ALG_PRIORITY,

From patchwork Thu Feb  6 07:20:52 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862659
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C466225412;
 Thu,  6 Feb 2025 07:21:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826473; cv=none;
 b=huLqiT79fSz9S2iDri6pO+3Xx9JhSoC+ucxagdM57gN8c4FU1k5Rx+Y536pYygIvcR/tanD8GqpPJUndXvxOUjPvFR89OIotYXFY076fJ5+XHlcdVfPhOc1gnM3oeXy1LRjOu3YZfk2eGQI3M1+d1GZPBqJU29v5yf5qBawveEQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826473; c=relaxed/simple;
 bh=3AC0t67W9YXnQaqtfbv5O8FRwsYURPHnfKQ9TbAZwNM=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=BBuliX7G0IFAQz6onl+ogct+auLC2wE4naAc2PzcIJx1rJ32hial8KKTIsaXnFCXb9wbw3pEBZ4Y/6xaSjMMDwe3QmrjVE1AFeTyffaJRXVyA9zxtf4p69lJj8IIoYkgdRutdTRFg47ry1uFrpJKbYgH+tnXXBm71ullQ4OFT2c=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=L8drw6WV; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="L8drw6WV"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826471; x=1770362471;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=3AC0t67W9YXnQaqtfbv5O8FRwsYURPHnfKQ9TbAZwNM=;
 b=L8drw6WVcz0l5+PL/Tjva+mGCrHs6aVlvGcfuCSBmXWIVUGK32C9hDBA
 SqnmlgZZp7iGtdf4GSRslX0UTFTR7odOIRcObDdRQyLwVbt3Y8cco8zqG
 X/fv273NV0Hd/srLXlSfEqpaHXhEp9DDty82aVOM9/9XexqKY2nksusOd
 7RzJTJGuztJCMJOdZSE2uiHvVNg4Uskvzjekm5vyByt/BNQ+Q9X9A/Vay
 udoyJYB0PJeljUpZgjxERYBgeokjH9CGEQOy+ZFS2aPmkDXF7I+GaR/yP
 6vFibNnfpYJcmYrWXsDaZ55twq6pQSBgevEP8V+XaKpSNyp7RuwFLC2Xb Q==;
X-CSE-ConnectionGUID: 0AzmggkhRdeAe+O2647mnw==
X-CSE-MsgGUID: xk7Wz4PdQeOSTqTWrv7zJg==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962676"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962676"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:05 -0800
X-CSE-ConnectionGUID: BdejDC0cSAOKneED2yE52A==
X-CSE-MsgGUID: pucbyX2lQJO2lO5OQM3S4g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022610"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:04 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 06/16] crypto: iaa - Disable iaa_verify_compress by default.
Date: Wed,  5 Feb 2025 23:20:52 -0800
Message-Id: <20250206072102.29045-7-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This patch makes it easier for IAA hardware acceleration in the iaa_crypto
driver to be loaded by default with "iaa_verify_compress" disabled, to
facilitate performance comparisons with software compressors (which also
do not run compress verification by default). Earlier, iaa_crypto compress
verification used to be enabled by default.

With this patch, if users want to enable compress verification, they can do
so with these steps:

  1) disable all the IAA device/wq bindings that happen at boot time
  2) rmmod iaa_crypto
  3) modprobe iaa_crypto
  4) echo 1 > /sys/bus/dsa/drivers/crypto/verify_compress
  5) re-run initialization of the IAA devices and wqs

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index 6d49f82165fe..f4807a828034 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -94,7 +94,7 @@ static bool iaa_crypto_enabled;
 static bool iaa_crypto_registered;
 
 /* Verify results of IAA compress or not */
-static bool iaa_verify_compress = true;
+static bool iaa_verify_compress = false;
 
 static ssize_t verify_compress_show(struct device_driver *driver, char *buf)
 {

From patchwork Thu Feb  6 07:20:54 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862658
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DAE1225772;
 Thu,  6 Feb 2025 07:21:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826475; cv=none;
 b=mRyUtXVUgOj1gmmzmAd1Dn9bRXmypAIZK55MNXut5jZl6Z7E2yLiyfoqTbFkVa4mIu/1gA9WywnKINaJNVQZ/jZIdWilatZ7R0g2/X7+wOEDlrLOWk0O/UfhGRo3+oEje6Nq3/nGY34wtMj90KUmWrWXoF1oiV7eZYpUsbixHN0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826475; c=relaxed/simple;
 bh=XgmFrlxIUAmI4N9Gxiy77mwDBx+ourmvlIMPZ8/24cI=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=sdtgE8lok2xpVLrhFuP1WeWCzk7368r77h2gUTci71X0Nzi8Hulnoq4GCGPYbrutAwP2UI49lyAe0n2aVFz3U1fHzHtsM+C4V1u6ML5gx4UhQOTCPNhj42snQJkLlEsEN5AuIeFp/rG2WALd7cwGB4+foT7PzVNp0/YTWA3hPWo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Vs6dmTQd; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Vs6dmTQd"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826472; x=1770362472;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=XgmFrlxIUAmI4N9Gxiy77mwDBx+ourmvlIMPZ8/24cI=;
 b=Vs6dmTQddyZd9Jhv1zjqgYkFxPCXzQD4RPacp9ReQ7rLgiBluqUBmDRI
 xXqrAc54AFd97TJLvWRw4FcCTguBpSD05bOFwXCezSHdeEH7ac15OHhN9
 pX2e9TkRWAZgkh63cUpCo+UnwVO6hBbbsI3TM4T4HJ44GW20wMJY5wS7G
 aNzHq+Fd67TgIVCICg5FNAQIP07SLX/wgyoWeVcKBbRHZnhx3UMgazHIq
 7rEl0h+Cz2Jg9bLPhKqrKW6VHrj8iFejy79GLp7RqV+ZibeiiM+80w8xI
 7NSMeA0LbNOYkskiaz7RwfmLx3NxSWNWpnaYG/Sz/CWMZkNKcHY7bcGSU g==;
X-CSE-ConnectionGUID: XlQn9pxnTOCCMiEwaugIuw==
X-CSE-MsgGUID: MRMG+0FKR8Gghn4/t1XacQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962689"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962689"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:05 -0800
X-CSE-ConnectionGUID: E+mypAIGSsyfqbUYbDaiTA==
X-CSE-MsgGUID: xF8J7SyoQEKA/Tfb3Eb2Ng==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022619"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:04 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 08/16] crypto: iaa - Map IAA devices/wqs to cores based on
 packages instead of NUMA.
Date: Wed,  5 Feb 2025 23:20:54 -0800
Message-Id: <20250206072102.29045-9-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This patch modifies the algorithm for mapping available IAA devices and
wqs to cores, as they are being discovered, based on packages instead of
NUMA nodes. This leads to a more realistic mapping of IAA devices as
compression/decompression resources for a package, rather than for a NUMA
node. This also resolves problems that were observed during internal
validation on Intel platforms with many more NUMA nodes than packages: for
such cases, the earlier NUMA based allocation caused some IAAs to be
over-subscribed and some to not be utilized at all.

As a result of this change from NUMA to packages, some of the core
functions used by the iaa_crypto driver's "probe" and "remove" API
have been re-written. The new infrastructure maintains a static/global
mapping of "local wqs" per IAA device, in the "struct iaa_device" itself.
The earlier implementation would allocate memory per-cpu for this data,
which never changes once the IAA devices/wqs have been initialized.

Two main outcomes from this new iaa_crypto driver infrastructure are:

1) Resolves "task blocked for more than x seconds" errors observed during
   internal validation on Intel systems with the earlier NUMA node based
   mappings, which was root-caused to the non-optimal IAA-to-core mappings
   described earlier.

2) Results in a NUM_THREADS factor reduction in memory footprint cost of
   initializing IAA devices/wqs, due to eliminating the per-cpu copies of
   each IAA device's wqs. On a 384 cores Intel Granite Rapids server with
   8 IAA devices, this saves 140MiB.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 drivers/crypto/intel/iaa/iaa_crypto.h      |  17 +-
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 276 ++++++++++++---------
 2 files changed, 171 insertions(+), 122 deletions(-)

diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h
index b3b67c44ec8a..74d25e62df12 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto.h
+++ b/drivers/crypto/intel/iaa/iaa_crypto.h
@@ -55,6 +55,7 @@ struct iaa_wq {
 	struct idxd_wq		*wq;
 	int			ref;
 	bool			remove;
+	bool			mapped;
 
 	struct iaa_device	*iaa_device;
 
@@ -72,6 +73,13 @@ struct iaa_device_compression_mode {
 	dma_addr_t			aecs_comp_table_dma_addr;
 };
 
+struct wq_table_entry {
+	struct idxd_wq **wqs;
+	int	max_wqs;
+	int	n_wqs;
+	int	cur_wq;
+};
+
 /* Representation of IAA device with wqs, populated by probe */
 struct iaa_device {
 	struct list_head		list;
@@ -82,19 +90,14 @@ struct iaa_device {
 	int				n_wq;
 	struct list_head		wqs;
 
+	struct wq_table_entry		*iaa_local_wqs;
+
 	atomic64_t			comp_calls;
 	atomic64_t			comp_bytes;
 	atomic64_t			decomp_calls;
 	atomic64_t			decomp_bytes;
 };
 
-struct wq_table_entry {
-	struct idxd_wq **wqs;
-	int	max_wqs;
-	int	n_wqs;
-	int	cur_wq;
-};
-
 #define IAA_AECS_ALIGN			32
 
 /*
diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index 2c5b7ce041d6..418f78454875 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -30,8 +30,9 @@
 /* number of iaa instances probed */
 static unsigned int nr_iaa;
 static unsigned int nr_cpus;
-static unsigned int nr_nodes;
-static unsigned int nr_cpus_per_node;
+static unsigned int nr_packages;
+static unsigned int nr_cpus_per_package;
+static unsigned int nr_iaa_per_package;
 
 /* Number of physical cpus sharing each iaa instance */
 static unsigned int cpus_per_iaa;
@@ -462,17 +463,46 @@ static void remove_device_compression_modes(struct iaa_device *iaa_device)
  * Functions for use in crypto probe and remove interfaces:
  * allocate/init/query/deallocate devices/wqs.
  ***********************************************************/
-static struct iaa_device *iaa_device_alloc(void)
+static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd)
 {
+	struct wq_table_entry *local;
 	struct iaa_device *iaa_device;
 
 	iaa_device = kzalloc(sizeof(*iaa_device), GFP_KERNEL);
 	if (!iaa_device)
-		return NULL;
+		goto err;
+
+	iaa_device->idxd = idxd;
+
+	/* IAA device's local wqs. */
+	iaa_device->iaa_local_wqs = kzalloc(sizeof(struct wq_table_entry), GFP_KERNEL);
+	if (!iaa_device->iaa_local_wqs)
+		goto err;
+
+	local = iaa_device->iaa_local_wqs;
+
+	local->wqs = kzalloc(iaa_device->idxd->max_wqs * sizeof(struct wq *), GFP_KERNEL);
+	if (!local->wqs)
+		goto err;
+
+	local->max_wqs = iaa_device->idxd->max_wqs;
+	local->n_wqs = 0;
 
 	INIT_LIST_HEAD(&iaa_device->wqs);
 
 	return iaa_device;
+
+err:
+	if (iaa_device) {
+		if (iaa_device->iaa_local_wqs) {
+			if (iaa_device->iaa_local_wqs->wqs)
+				kfree(iaa_device->iaa_local_wqs->wqs);
+			kfree(iaa_device->iaa_local_wqs);
+		}
+		kfree(iaa_device);
+	}
+
+	return NULL;
 }
 
 static bool iaa_has_wq(struct iaa_device *iaa_device, struct idxd_wq *wq)
@@ -491,12 +521,10 @@ static struct iaa_device *add_iaa_device(struct idxd_device *idxd)
 {
 	struct iaa_device *iaa_device;
 
-	iaa_device = iaa_device_alloc();
+	iaa_device = iaa_device_alloc(idxd);
 	if (!iaa_device)
 		return NULL;
 
-	iaa_device->idxd = idxd;
-
 	list_add_tail(&iaa_device->list, &iaa_devices);
 
 	nr_iaa++;
@@ -537,6 +565,7 @@ static int add_iaa_wq(struct iaa_device *iaa_device, struct idxd_wq *wq,
 	iaa_wq->wq = wq;
 	iaa_wq->iaa_device = iaa_device;
 	idxd_wq_set_private(wq, iaa_wq);
+	iaa_wq->mapped = false;
 
 	list_add_tail(&iaa_wq->list, &iaa_device->wqs);
 
@@ -580,6 +609,13 @@ static void free_iaa_device(struct iaa_device *iaa_device)
 		return;
 
 	remove_device_compression_modes(iaa_device);
+
+	if (iaa_device->iaa_local_wqs) {
+		if (iaa_device->iaa_local_wqs->wqs)
+			kfree(iaa_device->iaa_local_wqs->wqs);
+		kfree(iaa_device->iaa_local_wqs);
+	}
+
 	kfree(iaa_device);
 }
 
@@ -716,9 +752,14 @@ static int save_iaa_wq(struct idxd_wq *wq)
 	if (WARN_ON(nr_iaa == 0))
 		return -EINVAL;
 
-	cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa;
+	cpus_per_iaa = (nr_packages * nr_cpus_per_package) / nr_iaa;
 	if (!cpus_per_iaa)
 		cpus_per_iaa = 1;
+
+	nr_iaa_per_package = nr_iaa / nr_packages;
+	if (!nr_iaa_per_package)
+		nr_iaa_per_package = 1;
+
 out:
 	return 0;
 }
@@ -735,53 +776,45 @@ static void remove_iaa_wq(struct idxd_wq *wq)
 	}
 
 	if (nr_iaa) {
-		cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa;
+		cpus_per_iaa = (nr_packages * nr_cpus_per_package) / nr_iaa;
 		if (!cpus_per_iaa)
 			cpus_per_iaa = 1;
-	} else
+
+		nr_iaa_per_package = nr_iaa / nr_packages;
+		if (!nr_iaa_per_package)
+			nr_iaa_per_package = 1;
+	} else {
 		cpus_per_iaa = 1;
+		nr_iaa_per_package = 1;
+	}
 }
 
 /***************************************************************
  * Mapping IAA devices and wqs to cores with per-cpu wq_tables.
  ***************************************************************/
-static void wq_table_free_entry(int cpu)
-{
-	struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu);
-
-	kfree(entry->wqs);
-	memset(entry, 0, sizeof(*entry));
-}
-
-static void wq_table_clear_entry(int cpu)
-{
-	struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu);
-
-	entry->n_wqs = 0;
-	entry->cur_wq = 0;
-	memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *));
-}
-
-static void clear_wq_table(void)
+/*
+ * Given a cpu, find the closest IAA instance.  The idea is to try to
+ * choose the most appropriate IAA instance for a caller and spread
+ * available workqueues around to clients.
+ */
+static inline int cpu_to_iaa(int cpu)
 {
-	int cpu;
-
-	for (cpu = 0; cpu < nr_cpus; cpu++)
-		wq_table_clear_entry(cpu);
+	int package_id, base_iaa, iaa = 0;
 
-	pr_debug("cleared wq table\n");
-}
+	if (!nr_packages || !nr_iaa_per_package)
+		return 0;
 
-static void free_wq_table(void)
-{
-	int cpu;
+	package_id = topology_logical_package_id(cpu);
+	base_iaa = package_id * nr_iaa_per_package;
+	iaa = base_iaa + ((cpu % nr_cpus_per_package) / cpus_per_iaa);
 
-	for (cpu = 0; cpu < nr_cpus; cpu++)
-		wq_table_free_entry(cpu);
+	pr_debug("cpu = %d, package_id = %d, base_iaa = %d, iaa = %d",
+		 cpu, package_id, base_iaa, iaa);
 
-	free_percpu(wq_table);
+	if (iaa >= 0 && iaa < nr_iaa)
+		return iaa;
 
-	pr_debug("freed wq table\n");
+	return (nr_iaa - 1);
 }
 
 static int alloc_wq_table(int max_wqs)
@@ -795,13 +828,11 @@ static int alloc_wq_table(int max_wqs)
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
 		entry = per_cpu_ptr(wq_table, cpu);
-		entry->wqs = kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL);
-		if (!entry->wqs) {
-			free_wq_table();
-			return -ENOMEM;
-		}
 
+		entry->wqs = NULL;
 		entry->max_wqs = max_wqs;
+		entry->n_wqs = 0;
+		entry->cur_wq = 0;
 	}
 
 	pr_debug("initialized wq table\n");
@@ -809,33 +840,27 @@ static int alloc_wq_table(int max_wqs)
 	return 0;
 }
 
-static void wq_table_add(int cpu, struct idxd_wq *wq)
+static void wq_table_add(int cpu, struct wq_table_entry *iaa_local_wqs)
 {
 	struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu);
 
-	if (WARN_ON(entry->n_wqs == entry->max_wqs))
-		return;
-
-	entry->wqs[entry->n_wqs++] = wq;
+	entry->wqs = iaa_local_wqs->wqs;
+	entry->max_wqs = iaa_local_wqs->max_wqs;
+	entry->n_wqs = iaa_local_wqs->n_wqs;
+	entry->cur_wq = 0;
 
-	pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__,
+	pr_debug("%s: cpu %d: added %d iaa local wqs up to wq %d.%d\n", __func__,
+		 cpu, entry->n_wqs,
 		 entry->wqs[entry->n_wqs - 1]->idxd->id,
-		 entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu);
+		 entry->wqs[entry->n_wqs - 1]->id);
 }
 
 static int wq_table_add_wqs(int iaa, int cpu)
 {
 	struct iaa_device *iaa_device, *found_device = NULL;
-	int ret = 0, cur_iaa = 0, n_wqs_added = 0;
-	struct idxd_device *idxd;
-	struct iaa_wq *iaa_wq;
-	struct pci_dev *pdev;
-	struct device *dev;
+	int ret = 0, cur_iaa = 0;
 
 	list_for_each_entry(iaa_device, &iaa_devices, list) {
-		idxd = iaa_device->idxd;
-		pdev = idxd->pdev;
-		dev = &pdev->dev;
 
 		if (cur_iaa != iaa) {
 			cur_iaa++;
@@ -843,7 +868,8 @@ static int wq_table_add_wqs(int iaa, int cpu)
 		}
 
 		found_device = iaa_device;
-		dev_dbg(dev, "getting wq from iaa_device %d, cur_iaa %d\n",
+		dev_dbg(&found_device->idxd->pdev->dev,
+			"getting wq from iaa_device %d, cur_iaa %d\n",
 			found_device->idxd->id, cur_iaa);
 		break;
 	}
@@ -858,29 +884,58 @@ static int wq_table_add_wqs(int iaa, int cpu)
 		}
 		cur_iaa = 0;
 
-		idxd = found_device->idxd;
-		pdev = idxd->pdev;
-		dev = &pdev->dev;
-		dev_dbg(dev, "getting wq from only iaa_device %d, cur_iaa %d\n",
+		dev_dbg(&found_device->idxd->pdev->dev,
+			"getting wq from only iaa_device %d, cur_iaa %d\n",
 			found_device->idxd->id, cur_iaa);
 	}
 
-	list_for_each_entry(iaa_wq, &found_device->wqs, list) {
-		wq_table_add(cpu, iaa_wq->wq);
-		pr_debug("rebalance: added wq for cpu=%d: iaa wq %d.%d\n",
-			 cpu, iaa_wq->wq->idxd->id, iaa_wq->wq->id);
-		n_wqs_added++;
+	wq_table_add(cpu, found_device->iaa_local_wqs);
+
+out:
+	return ret;
+}
+
+static int map_iaa_device_wqs(struct iaa_device *iaa_device)
+{
+	struct wq_table_entry *local;
+	int ret = 0, n_wqs_added = 0;
+	struct iaa_wq *iaa_wq;
+
+	local = iaa_device->iaa_local_wqs;
+
+	list_for_each_entry(iaa_wq, &iaa_device->wqs, list) {
+		if (iaa_wq->mapped && ++n_wqs_added)
+			continue;
+
+		pr_debug("iaa_device %px: processing wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id);
+
+		if (WARN_ON(local->n_wqs == local->max_wqs))
+			break;
+
+		local->wqs[local->n_wqs++] = iaa_wq->wq;
+		pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id);
+
+		iaa_wq->mapped = true;
+		++n_wqs_added;
 	}
 
-	if (!n_wqs_added) {
-		pr_debug("couldn't find any iaa wqs!\n");
+	if (!n_wqs_added && !iaa_device->n_wq) {
+		pr_debug("iaa_device %d: couldn't find any iaa wqs!\n", iaa_device->idxd->id);
 		ret = -EINVAL;
-		goto out;
 	}
-out:
+
 	return ret;
 }
 
+static void map_iaa_devices(void)
+{
+	struct iaa_device *iaa_device;
+
+	list_for_each_entry(iaa_device, &iaa_devices, list) {
+		BUG_ON(map_iaa_device_wqs(iaa_device));
+	}
+}
+
 /*
  * Rebalance the wq table so that given a cpu, it's easy to find the
  * closest IAA instance.  The idea is to try to choose the most
@@ -889,48 +944,42 @@ static int wq_table_add_wqs(int iaa, int cpu)
  */
 static void rebalance_wq_table(void)
 {
-	const struct cpumask *node_cpus;
-	int node, cpu, iaa = -1;
+	int cpu, iaa;
 
 	if (nr_iaa == 0)
 		return;
 
-	pr_debug("rebalance: nr_nodes=%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %d\n",
-		 nr_nodes, nr_cpus, nr_iaa, cpus_per_iaa);
+	map_iaa_devices();
 
-	clear_wq_table();
+	pr_debug("rebalance: nr_packages=%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %d\n",
+		 nr_packages, nr_cpus, nr_iaa, cpus_per_iaa);
 
-	if (nr_iaa == 1) {
-		for (cpu = 0; cpu < nr_cpus; cpu++) {
-			if (WARN_ON(wq_table_add_wqs(0, cpu))) {
-				pr_debug("could not add any wqs for iaa 0 to cpu %d!\n", cpu);
-				return;
-			}
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		iaa = cpu_to_iaa(cpu);
+		pr_debug("rebalance: cpu=%d iaa=%d\n", cpu, iaa);
+
+		if (WARN_ON(iaa == -1)) {
+			pr_debug("rebalance (cpu_to_iaa(%d)) failed!\n", cpu);
+			return;
 		}
 
-		return;
+		if (WARN_ON(wq_table_add_wqs(iaa, cpu))) {
+			pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu);
+			return;
+		}
 	}
 
-	for_each_node_with_cpus(node) {
-		node_cpus = cpumask_of_node(node);
-
-		for (cpu = 0; cpu <  cpumask_weight(node_cpus); cpu++) {
-			int node_cpu = cpumask_nth(cpu, node_cpus);
-
-			if (WARN_ON(node_cpu >= nr_cpu_ids)) {
-				pr_debug("node_cpu %d doesn't exist!\n", node_cpu);
-				return;
-			}
-
-			if ((cpu % cpus_per_iaa) == 0)
-				iaa++;
+	pr_debug("Finished rebalance local wqs.");
+}
 
-			if (WARN_ON(wq_table_add_wqs(iaa, node_cpu))) {
-				pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu);
-				return;
-			}
-		}
+static void free_wq_tables(void)
+{
+	if (wq_table) {
+		free_percpu(wq_table);
+		wq_table = NULL;
 	}
+
+	pr_debug("freed local wq table\n");
 }
 
 /***************************************************************
@@ -2347,7 +2396,7 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev)
 	free_iaa_wq(idxd_wq_get_private(wq));
 err_save:
 	if (first_wq)
-		free_wq_table();
+		free_wq_tables();
 err_alloc:
 	mutex_unlock(&iaa_devices_lock);
 	idxd_drv_disable_wq(wq);
@@ -2397,7 +2446,9 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_dev)
 
 	if (nr_iaa == 0) {
 		iaa_crypto_enabled = false;
-		free_wq_table();
+		free_wq_tables();
+		BUG_ON(!list_empty(&iaa_devices));
+		INIT_LIST_HEAD(&iaa_devices);
 		module_put(THIS_MODULE);
 
 		pr_info("iaa_crypto now DISABLED\n");
@@ -2423,16 +2474,11 @@ static struct idxd_device_driver iaa_crypto_driver = {
 static int __init iaa_crypto_init_module(void)
 {
 	int ret = 0;
-	int node;
+	INIT_LIST_HEAD(&iaa_devices);
 
 	nr_cpus = num_possible_cpus();
-	for_each_node_with_cpus(node)
-		nr_nodes++;
-	if (!nr_nodes) {
-		pr_err("IAA couldn't find any nodes with cpus\n");
-		return -ENODEV;
-	}
-	nr_cpus_per_node = nr_cpus / nr_nodes;
+	nr_cpus_per_package = topology_num_cores_per_package();
+	nr_packages = topology_max_packages();
 
 	if (crypto_has_comp("deflate-generic", 0, 0))
 		deflate_generic_tfm = crypto_alloc_comp("deflate-generic", 0, 0);

From patchwork Thu Feb  6 07:20:57 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862657
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13E36225A44;
 Thu,  6 Feb 2025 07:21:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826476; cv=none;
 b=J+qCRCOCj/zQjFVAs1JeMDcM0PhmCdCZRafndP4cPiNhQzLOAha8NWjjLRjOg1ooqFo+jqNpP9/nLhqsvMPcNvjFYJfYHGhO1cFlVEmNsgxAX2ykz6ghHtCrP3Th5GT0rRfH9DT4tGLfuFFeC6mHPKFwbTbgs66ny171UJLiYrE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826476; c=relaxed/simple;
 bh=6c1d9813y0GRE+6bR9SnHZofzGJx7eVHq/LRDvJCxss=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=lSfmMsJN+VlHZ1vLSpeRHLrt4azaZakuaduKtbE9D9+0aRCwa9Qdc03JvlomK4pO+3DUw4shQjLfdqReE6DSlNrlNlk2szX7BN6U+r1lVIBArH2MsmYr/0EiH5OdUxaOdnF/ofQeb6mqs5C2/urqDrwNkF9sgwok2xpAowgsVoA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=L16JS7qo; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="L16JS7qo"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826475; x=1770362475;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=6c1d9813y0GRE+6bR9SnHZofzGJx7eVHq/LRDvJCxss=;
 b=L16JS7qoA3j7Eq5a9W6ga3YrIGbWQlDmoInH/j/Y5DVU0Cb+/UU9KV+X
 3Tsf80psphGvaU0GKOgg+ITnLdwUcwAkOVfLmm+ac0DnMR8Cld+O3eGOW
 hXzG4/CjLKAxH/7Q/sazCpPlmwtoFqBe0JDNZpwlIHaBVyi+l6c9JR0FC
 vFN3pD4uzEX3uw3a26frKX6ab89G58BqeEsjfXeUjDuxH0n9FETCu5P74
 FTFBHyHownIMbpONs3d8z5IebD1ioOnbC/Iy5blhwaOavJVmQewvfO4Jg
 AFkj8KXcvWw9SfHt+3kcfMpO4SpMzTBhZnAKrXznjUmbO9sG/VQFYIygJ w==;
X-CSE-ConnectionGUID: H0uw+SczS/eQzO5lOGlwUg==
X-CSE-MsgGUID: Y2PU0RCgRumY4HaJW0ovTw==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962712"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962712"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:05 -0800
X-CSE-ConnectionGUID: SKYLn851TW6vkdfKb0Rdkw==
X-CSE-MsgGUID: 4cehqHtlRLeQcDOBefapEQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022632"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:05 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 11/16] crypto: iaa - Fix for "deflate_generic_tfm" global
 being accessed without locks.
Date: Wed,  5 Feb 2025 23:20:57 -0800
Message-Id: <20250206072102.29045-12-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

The mainline implementation of "deflate_generic_decompress" has a bug in
the usage of this global variable:

static struct crypto_comp *deflate_generic_tfm;

The "deflate_generic_tfm" is allocated at module init time, and freed
during module cleanup. Any calls to software decompress, for instance, if
descriptor allocation fails or job submission fails, will trigger this bug
in the deflate_generic_decompress() procedure. The problem is the
unprotected access of "deflate_generic_tfm" in this procedure. While
stress testing workloads under high memory pressure, with 1 IAA device
and "deflate-iaa" as the compressor, the descriptor allocation times out
and the software fallback route is taken. With multiple processes calling:

        ret = crypto_comp_decompress(deflate_generic_tfm,
                                     src, req->slen, dst, &req->dlen);

we end up with data corruption, that results in req->dlen being larger
than PAGE_SIZE. zswap_decompress() subsequently raises a kernel bug.

This bug can manifest under high contention and memory pressure situations
with high likelihood. This has been resolved by adding a mutex, which is
locked before accessing "deflate_generic_tfm" and unlocked after the
crypto_comp call is done.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c
index 5292d8f7ebd6..6796c783dd16 100644
--- a/drivers/crypto/intel/iaa/iaa_crypto_main.c
+++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c
@@ -105,6 +105,7 @@ static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_MAX];
 
 LIST_HEAD(iaa_devices);
 DEFINE_MUTEX(iaa_devices_lock);
+DEFINE_MUTEX(deflate_generic_tfm_lock);
 
 /* If enabled, IAA hw crypto algos are registered, unavailable otherwise */
 static bool iaa_crypto_enabled;
@@ -1407,6 +1408,9 @@ static int deflate_generic_decompress(struct acomp_req *req)
 	int ret;
 
 	req->dlen = PAGE_SIZE;
+
+	mutex_lock(&deflate_generic_tfm_lock);
+
 	src = kmap_local_page(sg_page(req->src)) + req->src->offset;
 	dst = kmap_local_page(sg_page(req->dst)) + req->dst->offset;
 
@@ -1416,6 +1420,8 @@ static int deflate_generic_decompress(struct acomp_req *req)
 	kunmap_local(src);
 	kunmap_local(dst);
 
+	mutex_unlock(&deflate_generic_tfm_lock);
+
 	update_total_sw_decomp_calls();
 
 	return ret;

From patchwork Thu Feb  6 07:21:00 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862655
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id E57CA22653E;
 Thu,  6 Feb 2025 07:21:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826480; cv=none;
 b=ZNxQsK0hJg/oebUEPklWv01Ss/S+lCLrfLwIhF3WjcPlxV75dEKfUFBjUy0Teyie6hGV7Z/oPp3P5A6bBBFZNcUvpRa7j5d6MCnhIL95PbzYYMB+AnnKvyxDsye0yTG39/nVIoYGnIC2tqQSXwZ5hoQ4HdAloolLeVeqrNpov44=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826480; c=relaxed/simple;
 bh=iqZv8rJXdWI52EmscIRZweGSd8xTWCD0Rr/EovMHLuY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=XMXOs87u/SjyOMJRfNTq3GEBV6TgJrPhPq/kX0zYFN6mXjh/J1CeyEExT5ZDrQ2aTt1nZrgEES70Uf15irT9FYSSAc3F7UGhGOgCRKG4IyIopv7DbAgNkPk6n/GtNsPbK7fBHiHpILA0FwpE8fkWIm1zonAd5M4kNMgU5l/3r1Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=eess9Iex; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="eess9Iex"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826479; x=1770362479;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=iqZv8rJXdWI52EmscIRZweGSd8xTWCD0Rr/EovMHLuY=;
 b=eess9IexY9pUhwNnzWs41PRhnBZIipSIrnNB9kG6280M/tvM9YwCGI08
 HrfDJgliSanxtGy6Mr8OnCQm34GFa5Yz4qpfzxqULOlpkMGGxUK7HDD1J
 4tdlAa2ADarkexk/agX4Su9kTtgGP+fROPxScvf4QUKP1Y55PWihMUKyT
 ihyU88YiiPFTwGO+oyKr0FlkCzskD0Z1q9IaBJSwLwZPzZLznehIbAvly
 yw3pVDLti5mA5sS6EzZ5XiYT3LUJ2fpUPv+N9qV2NcT0iA1CnWofgn+Oz
 QzSOvseNzNrJT/LYprTVM9aSIILZHukkW4XXA9mK8E4Va6X105tMilI1a w==;
X-CSE-ConnectionGUID: C+mUYK7ZTQyQYKKjjd77dQ==
X-CSE-MsgGUID: swCcEsNBQ96kk12qjLy0pA==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962734"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962734"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:06 -0800
X-CSE-ConnectionGUID: ILym3snHR9yvBXAi4N76GQ==
X-CSE-MsgGUID: LTcU3nkoQeadkuMSgJpNTQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022646"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:06 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 14/16] mm: zswap: Introduce zswap_compress_folio() to
 compress all pages in a folio.
Date: Wed,  5 Feb 2025 23:21:00 -0800
Message-Id: <20250206072102.29045-15-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

This patch encapsulates away the computes for compressing the pages in a
folio and storing the compressed memory in zpool, into a distinct
zswap_compress_folio() procedure.

For now, zswap_compress_folio() simply calls zswap_compress() for each page
in the folio it is called with.

This facilitates adding compress batching in subsequent patches.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index af682bf0f690..6563d12e907b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1509,6 +1509,22 @@ static void shrink_worker(struct work_struct *w)
 * main API
 **********************************/
 
+static bool zswap_compress_folio(struct folio *folio,
+				 struct zswap_entry *entries[],
+				 struct zswap_pool *pool)
+{
+	long index, nr_pages = folio_nr_pages(folio);
+
+	for (index = 0; index < nr_pages; ++index) {
+		struct page *page = folio_page(folio, index);
+
+		if (!zswap_compress(page, entries[index], pool))
+			return false;
+	}
+
+	return true;
+}
+
 /*
  * Store all pages in a folio.
  *
@@ -1542,12 +1558,8 @@ static bool zswap_store_folio(struct folio *folio,
 		entries[index]->handle = (unsigned long)ERR_PTR(-EINVAL);
 	}
 
-	for (index = 0; index < nr_pages; ++index) {
-		struct page *page = folio_page(folio, index);
-
-		if (!zswap_compress(page, entries[index], pool))
-			goto store_folio_failed;
-	}
+	if (!zswap_compress_folio(folio, entries, pool))
+		goto store_folio_failed;
 
 	for (index = 0; index < nr_pages; ++index) {
 		swp_entry_t page_swpentry = page_swap_entry(folio_page(folio, index));

From patchwork Thu Feb  6 07:21:02 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Sridhar, Kanchana P" <kanchana.p.sridhar@intel.com>
X-Patchwork-Id: 862656
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D21E225412;
 Thu,  6 Feb 2025 07:21:16 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.8
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
 t=1738826479; cv=none;
 b=Tat4mm/Pkvce9Bt+dPs77IotimZxM/Z3imExnM4tAyyu59I9ZY5kQs5y5DXem0EgEVbw4bcq3Ktxew0Pet5O+AFgpVnamYZS710rs0mbuS4uP4guGHKA1bwGWD0+gsjgLgvdqqPe6haPY5cUYux8cG7CxYy09X41WoBA0I7b6Hk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
 s=arc-20240116; t=1738826479; c=relaxed/simple;
 bh=2J9gtcIwHxH1MpbEJssRtTtbQQgA3xu9INBQ54csBJY=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=HZ15E8cVcUTuj+m2lbhxp572d/X+NzV4sjarFlAM1uniyzCjX3k6e7sSL0Jbl/th9tnHKJViaiJNcAc7WK/Izq3mctso2+cGxGkY0yeJcU011rsUsvpiS8Q1pzQBHcTHryYrN4B7iqnyYnRpfHp/cw+RQ0ddceTNMzwXkUe/Fgg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=O5eYAZik; arc=none smtp.client-ip=192.198.163.8
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="O5eYAZik"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1738826477; x=1770362477;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=2J9gtcIwHxH1MpbEJssRtTtbQQgA3xu9INBQ54csBJY=;
 b=O5eYAZikCExUuQIZ7OeVyni78s6TnfqRkbYcrSgtddlNE87g6pfskQUa
 vLLfxgj/CZPBmnwhKP96S160d/DJ+3KHlWBBjHn8MFhJ/D3AvV+4okfHs
 96aFEAtvHBBlT6/uDeBDIRc1Ww6m+2nyz3L06uJ36FUBYvB+HiQH+yoX5
 WamXgtHTDHpQj2Tcoq5XdA1n1/HEeuaphoT3+pyUMcGXduPVdCYO98VA1
 4Oc6wcsTbo4RHB7xmmLDOP1fkvi0+CHonjtlB5zUynIK3R2HNbee7O4yS
 5ixWbIpyZuMoFwM5vAo9ko7SBt9cw4GcmE7zjsyH76iybLXuVEitMGGEu g==;
X-CSE-ConnectionGUID: wiJ9mxGaRdWv1G2C18wmVQ==
X-CSE-MsgGUID: bdJ0sKziRMi7XKXLQL1xaQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11336"; a="56962756"
X-IronPort-AV: E=Sophos;i="6.13,263,1732608000";
   d="scan'208";a="56962756"
Received: from orviesa008.jf.intel.com ([10.64.159.148])
 by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 05 Feb 2025 23:21:06 -0800
X-CSE-ConnectionGUID: IlfEp/8QS12Kl4i9twxvoA==
X-CSE-MsgGUID: bYZGwRz4Rc256Qiw4cxcsg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112022652"
Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115])
 by orviesa008.jf.intel.com with ESMTP; 05 Feb 2025 23:21:06 -0800
From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org,
 yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev,
 usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
 akpm@linux-foundation.org, linux-crypto@vger.kernel.org,
 herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com,
 ardb@kernel.org, ebiggers@google.com, surenb@google.com,
 kristen.c.accardi@intel.com
Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com,
 kanchana.p.sridhar@intel.com
Subject: [PATCH v6 16/16] mm: zswap: Fix for zstd performance regression with
 2M folios.
Date: Wed,  5 Feb 2025 23:21:02 -0800
Message-Id: <20250206072102.29045-17-kanchana.p.sridhar@intel.com>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
References: <20250206072102.29045-1-kanchana.p.sridhar@intel.com>
Precedence: bulk
X-Mailing-List: linux-crypto@vger.kernel.org
List-Id: <linux-crypto.vger.kernel.org>
List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org>
MIME-Version: 1.0

With the previous patch that enables support for batch compressions in
zswap_compress_folio(), a 6.2% throughput regression was seen with zstd and
2M folios, using vm-scalability/usemem.

For compressors that don't support batching, this was root-caused to the
following zswap_store_folio() structure:

 Batched stores:
 ---------------
 - Allocate all entries,
 - Compress all entries,
 - Store all entries in xarray/LRU.

Hence, the above structure is maintained only for batched stores, and the
following structure is implemented for sequential stores of large folio pages,
that fixes the zstd regression, while preserving common code paths for batched
and sequential stores of a folio:

 Sequential stores:
 ------------------
 For each page in folio:
  - allocate an entry,
  - compress the page,
  - store the entry in xarray/LRU.

This is submitted as a separate patch only for code review purposes. I will
squash this with the previous commit in subsequent versions of this
patch-series.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 193 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 111 insertions(+), 82 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f1cba77eda62..7bfc720a6201 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1592,40 +1592,32 @@ static bool zswap_batch_compress(struct folio *folio,
 	return ret;
 }
 
-static bool zswap_compress_folio(struct folio *folio,
-				 struct zswap_entry *entries[],
-				 struct zswap_pool *pool)
+static __always_inline bool zswap_compress_folio(struct folio *folio,
+						 struct zswap_entry *entries[],
+						 long from_index,
+						 struct zswap_pool *pool,
+						 unsigned int batch_size,
+						 struct crypto_acomp_ctx *acomp_ctx)
 {
-	long index, nr_pages = folio_nr_pages(folio);
-	struct crypto_acomp_ctx *acomp_ctx;
-	unsigned int batch_size;
+	long index = from_index, nr_pages = folio_nr_pages(folio);
 	bool ret = true;
 
-	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
-	batch_size = acomp_ctx->nr_reqs;
-
 	if ((batch_size > 1) && (nr_pages > 1)) {
-		for (index = 0; index < nr_pages; index += batch_size) {
+		for (; index < nr_pages; index += batch_size) {
 
 			if (!zswap_batch_compress(folio, index, batch_size,
 						  &entries[index], pool, acomp_ctx)) {
 				ret = false;
-				goto unlock_acomp_ctx;
+				break;
 			}
 		}
 	} else {
-		for (index = 0; index < nr_pages; ++index) {
-			struct page *page = folio_page(folio, index);
+		struct page *page = folio_page(folio, index);
 
-			if (!zswap_compress(page, entries[index], pool, acomp_ctx)) {
-				ret = false;
-				goto unlock_acomp_ctx;
-			}
-		}
+		if (!zswap_compress(page, entries[index], pool, acomp_ctx))
+			ret = false;
 	}
 
-unlock_acomp_ctx:
-	acomp_ctx_put_unlock(acomp_ctx);
 	return ret;
 }
 
@@ -1637,92 +1629,128 @@ static bool zswap_compress_folio(struct folio *folio,
  * handles to ERR_PTR(-EINVAL) at allocation time, and the fact that the
  * entry's handle is subsequently modified only upon a successful zpool_malloc()
  * after the page is compressed.
+ *
+ * For compressors that don't support batching, the following structure
+ * showed a performance regression with zstd/2M folios:
+ *
+ * Batched stores:
+ * ---------------
+ *  - Allocate all entries,
+ *  - Compress all entries,
+ *  - Store all entries in xarray/LRU.
+ *
+ * Hence, the above structure is maintained only for batched stores, and the
+ * following structure is implemented for sequential stores of large folio pages,
+ * that fixes the regression, while preserving common code paths for batched
+ * and sequential stores of a folio:
+ *
+ * Sequential stores:
+ * ------------------
+ * For each page in folio:
+ *  - allocate an entry,
+ *  - compress the page,
+ *  - store the entry in xarray/LRU.
  */
 static bool zswap_store_folio(struct folio *folio,
 			       struct obj_cgroup *objcg,
 			       struct zswap_pool *pool)
 {
-	long index, from_index = 0, nr_pages = folio_nr_pages(folio);
+	long index = 0, from_index = 0, nr_pages, nr_folio_pages = folio_nr_pages(folio);
 	struct zswap_entry **entries = NULL;
+	struct crypto_acomp_ctx *acomp_ctx;
 	int node_id = folio_nid(folio);
+	unsigned int batch_size;
 
-	entries = kmalloc(nr_pages * sizeof(*entries), GFP_KERNEL);
+	entries = kmalloc(nr_folio_pages * sizeof(*entries), GFP_KERNEL);
 	if (!entries)
 		return false;
 
-	for (index = 0; index < nr_pages; ++index) {
-		entries[index] = zswap_entry_cache_alloc(GFP_KERNEL, node_id);
+	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
+	batch_size = acomp_ctx->nr_reqs;
 
-		if (!entries[index]) {
-			zswap_reject_kmemcache_fail++;
-			nr_pages = index;
-			goto store_folio_failed;
+	nr_pages = (batch_size > 1) ? nr_folio_pages : 1;
+
+	while (1) {
+		for (index = from_index; index < nr_pages; ++index) {
+			entries[index] = zswap_entry_cache_alloc(GFP_KERNEL, node_id);
+
+			if (!entries[index]) {
+				zswap_reject_kmemcache_fail++;
+				nr_pages = index;
+				goto store_folio_failed;
+			}
+
+			entries[index]->handle = (unsigned long)ERR_PTR(-EINVAL);
 		}
 
-		entries[index]->handle = (unsigned long)ERR_PTR(-EINVAL);
-	}
+		if (!zswap_compress_folio(folio, entries, from_index, pool, batch_size, acomp_ctx))
+			goto store_folio_failed;
 
-	if (!zswap_compress_folio(folio, entries, pool))
-		goto store_folio_failed;
+		for (index = from_index; index < nr_pages; ++index) {
+			swp_entry_t page_swpentry = page_swap_entry(folio_page(folio, index));
+			struct zswap_entry *old, *entry = entries[index];
 
-	for (index = 0; index < nr_pages; ++index) {
-		swp_entry_t page_swpentry = page_swap_entry(folio_page(folio, index));
-		struct zswap_entry *old, *entry = entries[index];
+			old = xa_store(swap_zswap_tree(page_swpentry),
+				swp_offset(page_swpentry),
+				entry, GFP_KERNEL);
+			if (xa_is_err(old)) {
+				int err = xa_err(old);
 
-		old = xa_store(swap_zswap_tree(page_swpentry),
-			       swp_offset(page_swpentry),
-			       entry, GFP_KERNEL);
-		if (xa_is_err(old)) {
-			int err = xa_err(old);
+				WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
+				zswap_reject_alloc_fail++;
+				from_index = index;
+				goto store_folio_failed;
+			}
 
-			WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
-			zswap_reject_alloc_fail++;
-			from_index = index;
-			goto store_folio_failed;
-		}
+			/*
+			 * We may have had an existing entry that became stale when
+			 * the folio was redirtied and now the new version is being
+			 * swapped out. Get rid of the old.
+			 */
+			if (old)
+				zswap_entry_free(old);
 
-		/*
-		 * We may have had an existing entry that became stale when
-		 * the folio was redirtied and now the new version is being
-		 * swapped out. Get rid of the old.
-		 */
-		if (old)
-			zswap_entry_free(old);
+			/*
+			 * The entry is successfully compressed and stored in the tree, there is
+			 * no further possibility of failure. Grab refs to the pool and objcg,
+			 * charge zswap memory, and increment zswap_stored_pages.
+			 * The opposite actions will be performed by zswap_entry_free()
+			 * when the entry is removed from the tree.
+			 */
+			zswap_pool_get(pool);
+			if (objcg) {
+				obj_cgroup_get(objcg);
+				obj_cgroup_charge_zswap(objcg, entry->length);
+			}
+			atomic_long_inc(&zswap_stored_pages);
 
-		/*
-		 * The entry is successfully compressed and stored in the tree, there is
-		 * no further possibility of failure. Grab refs to the pool and objcg,
-		 * charge zswap memory, and increment zswap_stored_pages.
-		 * The opposite actions will be performed by zswap_entry_free()
-		 * when the entry is removed from the tree.
-		 */
-		zswap_pool_get(pool);
-		if (objcg) {
-			obj_cgroup_get(objcg);
-			obj_cgroup_charge_zswap(objcg, entry->length);
+			/*
+			 * We finish initializing the entry while it's already in xarray.
+			 * This is safe because:
+			 *
+			 * 1. Concurrent stores and invalidations are excluded by folio lock.
+			 *
+			 * 2. Writeback is excluded by the entry not being on the LRU yet.
+			 *    The publishing order matters to prevent writeback from seeing
+			 *    an incoherent entry.
+			 */
+			entry->pool = pool;
+			entry->swpentry = page_swpentry;
+			entry->objcg = objcg;
+			entry->referenced = true;
+			if (entry->length) {
+				INIT_LIST_HEAD(&entry->lru);
+				zswap_lru_add(&zswap_list_lru, entry);
+			}
 		}
-		atomic_long_inc(&zswap_stored_pages);
 
-		/*
-		 * We finish initializing the entry while it's already in xarray.
-		 * This is safe because:
-		 *
-		 * 1. Concurrent stores and invalidations are excluded by folio lock.
-		 *
-		 * 2. Writeback is excluded by the entry not being on the LRU yet.
-		 *    The publishing order matters to prevent writeback from seeing
-		 *    an incoherent entry.
-		 */
-		entry->pool = pool;
-		entry->swpentry = page_swpentry;
-		entry->objcg = objcg;
-		entry->referenced = true;
-		if (entry->length) {
-			INIT_LIST_HEAD(&entry->lru);
-			zswap_lru_add(&zswap_list_lru, entry);
-		}
+		from_index = nr_pages++;
+
+		if (nr_pages > nr_folio_pages)
+			break;
 	}
 
+	acomp_ctx_put_unlock(acomp_ctx);
 	kfree(entries);
 	return true;
 
@@ -1734,6 +1762,7 @@ static bool zswap_store_folio(struct folio *folio,
 		zswap_entry_cache_free(entries[index]);
 	}
 
+	acomp_ctx_put_unlock(acomp_ctx);
 	kfree(entries);
 	return false;
 }