[v5,09/12] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package.

This change enables processes running on any logical core on a package to
use all the IAA devices enabled on that package for compress jobs. In
other words, compressions originating from any process in a package will be
distributed in round-robin manner to the available IAA devices on the same
package.

The main premise behind this change is to make sure that no compress
engines on any IAA device are un-utilized/under-utilized/over-utilized.
In other words, the compress engines on all IAA devices are considered a
global resource for that package, thus maximizing compression throughput.

This allows the use of all IAA devices present in a given package for
(batched) compressions originating from zswap/zram, from all cores
on this package.

A new per-cpu "global_wq_table" implements this in the iaa_crypto driver.
We can think of the global WQ per IAA as a WQ to which all cores on
that package can submit compress jobs.

To avail of this feature, the user must configure 2 WQs per IAA in order to
enable distribution of compress jobs to multiple IAA devices.

Each IAA will have 2 WQs:
 wq.0 (local WQ):
   Used for decompress jobs from cores mapped by the cpu_to_iaa() "even
   balancing of logical cores to IAA devices" algorithm.

 wq.1 (global WQ):
   Used for compress jobs from *all* logical cores on that package.

The iaa_crypto driver will place all global WQs from all same-package IAA
devices in the global_wq_table per cpu on that package. When the driver
receives a compress job, it will lookup the "next" global WQ in the cpu's
global_wq_table to submit the descriptor.

The starting wq in the global_wq_table for each cpu is the global wq
associated with the IAA nearest to it, so that we stagger the starting
global wq for each process. This results in very uniform usage of all IAAs
for compress jobs.

Two new driver module parameters are added for this feature:

g_wqs_per_iaa (default 0):

 /sys/bus/dsa/drivers/crypto/g_wqs_per_iaa

 This represents the number of global WQs that can be configured per IAA
 device. The recommended setting is 1 to enable the use of this feature
 once the user configures 2 WQs per IAA using higher level scripts as
 described in Documentation/driver-api/crypto/iaa/iaa-crypto.rst.

g_consec_descs_per_gwq (default 1):

 /sys/bus/dsa/drivers/crypto/g_consec_descs_per_gwq

 This represents the number of consecutive compress jobs that will be
 submitted to the same global WQ (i.e. to the same IAA device) from a given
 core, before moving to the next global WQ. The default is 1, which is also
 the recommended setting to avail of this feature.

The decompress jobs from any core will be sent to the "local" IAA, namely
the one that the driver assigns with the cpu_to_iaa() mapping algorithm
that evenly balances the assignment of logical cores to IAA devices on a
package.

On a 2-package Sapphire Rapids server where each package has 56 cores and
4 IAA devices, this is how the compress/decompress jobs will be mapped
when the user configures 2 WQs per IAA device (which implies wq.1 will
be added to the global WQ table for each logical core on that package):

 package(s):        2
 package0 CPU(s):   0-55,112-167
 package1 CPU(s):   56-111,168-223

 Compress jobs:
 --------------
 package 0:
 iaa_crypto will send compress jobs from all cpus (0-55,112-167) to all IAA
 devices on the package (iax1/iax3/iax5/iax7) in round-robin manner:
 iaa:   iax1           iax3           iax5           iax7

 package 1:
 iaa_crypto will send compress jobs from all cpus (56-111,168-223) to all
 IAA devices on the package (iax9/iax11/iax13/iax15) in round-robin manner:
 iaa:   iax9           iax11          iax13           iax15

 Decompress jobs:
 ----------------
 package 0:
 cpu   0-13,112-125   14-27,126-139  28-41,140-153  42-55,154-167
 iaa:  iax1           iax3           iax5           iax7

 package 1:
 cpu   56-69,168-181  70-83,182-195  84-97,196-209   98-111,210-223
 iaa:  iax9           iax11          iax13           iax15

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 drivers/crypto/intel/iaa/iaa_crypto.h      |   1 +
 drivers/crypto/intel/iaa/iaa_crypto_main.c | 385 ++++++++++++++++++++-
 2 files changed, 378 insertions(+), 8 deletions(-)

Message ID	20241221063119.29140-10-kanchana.p.sridhar@intel.com
State	Superseded
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 962B81F0E55; Sat, 21 Dec 2024 06:31:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762690; cv=none; b=CYymxcXo6EV7tM4E0AOj845Gfe7e4F8glEZcjvt1uueIDrM9zbD81cC4sj96KnkyPpbsLnlTT1jmxYDfoJ429g88kE7EgP5/y8mwJjt7+Eslak2SgbJ5OnpUhRgz3gR5iMdzxkoSPhmWsaO5OxqLcoPO6AXEytt9CRNAGJJbXNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734762690; c=relaxed/simple; bh=YlzhhM4A69VbmucsjnzFvkXVMn/NOLnMKxMNKmr2SEI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=S/moouMU8u1xjhFhSNHLhzFayoHjXR+ba5KVoT+VqKAJ/bU8Nu6Hd8FVkrjYJ1aI6cUhM946Gi0wTsedfPJN0lM4bkyRuChJ8H7aXetQJ9ZdK1CpvKVlZPQ6FqfsIutolJoihLMCE+48pONLCil+Xz1lwk+ai27+T8afjlW81ug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HRY2fDmi; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HRY2fDmi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1734762687; x=1766298687; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YlzhhM4A69VbmucsjnzFvkXVMn/NOLnMKxMNKmr2SEI=; b=HRY2fDmig4DOX5sviv2zcyftFyPreFvvw1jly2A0VBM6qLw9GUeai7IY Z5AYMt+m1nLFlS7f1zFvZZKLPehFTHEJr+3gHedkVfYQ0NdW5Wn2S97Zr V6sx982uByqOufiT1+1FJi67rhKnSuT7DWZXHolQEcf2HvB7Wu5YfOaXO pf8tFlFk1CU8mMSGMrn92Pd10QLL40jtxdHAbQzq+ZJVmtO3h/fi7sntY hjKg+vexW2qeYKx5qhAdYhC3SkLVggJpUYhy4Ipf7fDS72+NU43RTTFgC MI5sR2wR5b+RUbZRuuSc648waSQFatcrLLDUjd92VlTrW8fSEkUoaU+V1 Q==; X-CSE-ConnectionGUID: sYeIfZ9QR2q5X7gENLjj3A== X-CSE-MsgGUID: oaJ9gvOARTy5DI8iStQYRQ== X-IronPort-AV: E=McAfee;i="6700,10204,11292"; a="35021707" X-IronPort-AV: E=Sophos;i="6.12,253,1728975600"; d="scan'208";a="35021707" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Dec 2024 22:31:20 -0800 X-CSE-ConnectionGUID: OhA+j9FoSUiywuaJWmt60g== X-CSE-MsgGUID: U0JhRZ1FRCSHZwusjIn04g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="99184603" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa007.jf.intel.com with ESMTP; 20 Dec 2024 22:31:21 -0800 From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v5 09/12] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package. Date: Fri, 20 Dec 2024 22:31:16 -0800 Message-Id: <20241221063119.29140-10-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> References: <20241221063119.29140-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: <linux-crypto.vger.kernel.org> List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	zswap IAA compress batching \| expand [v5,00/12] zswap IAA compress batching [v5,01/12] crypto: acomp - Add synchronous/asynchronous acomp request chaining. [v5,02/12] crypto: acomp - Define new interfaces for compress/decompress batching. [v5,03/12] crypto: iaa - Add an acomp_req flag CRYPTO_ACOMP_REQ_POLL to enable async mode. [v5,04/12] crypto: iaa - Implement batch_compress(), batch_decompress() API in iaa_crypto. [v5,05/12] crypto: iaa - Make async mode the default. [v5,06/12] crypto: iaa - Disable iaa_verify_compress by default. [v5,07/12] crypto: iaa - Re-organize the iaa_crypto driver code. [v5,08/12] crypto: iaa - Map IAA devices/wqs to cores based on packages instead of NUMA. [v5,09/12] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package. [v5,10/12] mm: zswap: Allocate pool batching resources if the crypto_alg supports batching. [v5,11/12] mm: zswap: Restructure & simplify zswap_store() to make it amenable for batching. [v5,12/12] mm: zswap: Compress batching with Intel IAA in zswap_store() of large folios.

[v5,09/12] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package.

Commit Message

Patch