mbox series

[v7,00/19] Add iommufd physical device operations for replace and alloc hwpt

Message ID 0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com
Headers show
Series Add iommufd physical device operations for replace and alloc hwpt | expand

Message

Jason Gunthorpe May 15, 2023, 2 p.m. UTC
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.

iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.

IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.

The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.

This series is infrastructure work for the following series which:
 - Add replace for attach
 - Expose replace through VFIO APIs
 - Implement driver parameters for HWPT creation (nesting)

Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.

This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt

v7:
 - Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
 - Go back to the v4 locking arragnment with now both the attach/detach
   igroup->locks inside the functions, Kevin says he needs this for a
   followup series. This still fixes the syzkaller bug
 - Fix two more error unwind locking bugs where
   iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
   Make sure fail_nth will catch these mistakes
 - Add a patch allowing objects to have different abort than destroy
   function, it allows hwpt abort to require the caller to continue
   to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
 - Go back to the v3 version of the code, keep the comment changes from
   v4. Syzkaller says the group lock change in v4 didn't work.
 - Adjust the fail_nth test to cover the path syzkaller found. We need to
   have an ioas with a mapped page installed to inject a failure during
   domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.com
 - Refine comments and commit messages
 - Move the group lock into iommufd_hw_pagetable_attach()
 - Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
 - Refine comments and commit messages
 - Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
   set on success
 - Reject replace on non-attached devices
 - Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.com
 - Use WARN_ON for the igroup->group test and move that logic to a
   function iommufd_group_try_get()
 - Change igroup->devices to igroup->device list
   Replace will need to iterate over all attached idevs
 - Rename to iommufd_group_setup_msi()
 - New patch to export iommu_get_resv_regions()
 - New patch to use per-device reserved regions instead of per-group
   regions
 - Split out the reorganizing of iommufd_device_change_pt() from the
   replace patch
 - Replace uses the per-dev reserved regions
 - Use stdev_id in a few more places in the selftest
 - Fix error handling in IOMMU_HWPT_ALLOC
 - Clarify comments
 - Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia.com/

Jason Gunthorpe (17):
  iommufd: Move isolated msi enforcement to iommufd_device_bind()
  iommufd: Add iommufd_group
  iommufd: Replace the hwpt->devices list with iommufd_group
  iommu: Export iommu_get_resv_regions()
  iommufd: Keep track of each device's reserved regions instead of
    groups
  iommufd: Use the iommufd_group to avoid duplicate MSI setup
  iommufd: Make sw_msi_start a group global
  iommufd: Move putting a hwpt to a helper function
  iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
  iommufd: Allow a hwpt to be aborted after allocation
  iommufd: Fix locking around hwpt allocation
  iommufd: Reorganize iommufd_device_attach into
    iommufd_device_change_pt
  iommufd: Add iommufd_device_replace()
  iommufd: Make destroy_rwsem use a lock class per object type
  iommufd: Add IOMMU_HWPT_ALLOC
  iommufd/selftest: Return the real idev id from selftest mock_domain
  iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC

Nicolin Chen (2):
  iommu: Introduce a new iommu_group_replace_domain() API
  iommufd/selftest: Test iommufd_device_replace()

 drivers/iommu/iommu-priv.h                    |  10 +
 drivers/iommu/iommu.c                         |  41 +-
 drivers/iommu/iommufd/device.c                | 553 +++++++++++++-----
 drivers/iommu/iommufd/hw_pagetable.c          | 112 +++-
 drivers/iommu/iommufd/io_pagetable.c          |  32 +-
 drivers/iommu/iommufd/iommufd_private.h       |  52 +-
 drivers/iommu/iommufd/iommufd_test.h          |   6 +
 drivers/iommu/iommufd/main.c                  |  24 +-
 drivers/iommu/iommufd/selftest.c              |  40 ++
 include/linux/iommufd.h                       |   1 +
 include/uapi/linux/iommufd.h                  |  26 +
 tools/testing/selftests/iommu/iommufd.c       |  67 ++-
 .../selftests/iommu/iommufd_fail_nth.c        |  67 ++-
 tools/testing/selftests/iommu/iommufd_utils.h |  63 +-
 14 files changed, 868 insertions(+), 226 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h


base-commit: f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6

Comments

Yi Liu July 7, 2023, 8 a.m. UTC | #1
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, May 15, 2023 10:00 PM
> 
> Replace allows all the devices in a group to move in one step to a new
> HWPT. Further, the HWPT move is done without going through a blocking
> domain so that the IOMMU driver can implement some level of
> non-distruption to ongoing DMA if that has meaning for it (eg for future
> special driver domains)
> 
> Replace uses a lot of the same logic as normal attach, except the actual
> domain change over has different restrictions, and we are careful to
> sequence things so that failure is going to leave everything the way it
> was, and not get trapped in a blocking domain or something if there is
> ENOMEM.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/iommufd/device.c | 99 ++++++++++++++++++++++++++++++++++
>  drivers/iommu/iommufd/main.c   |  1 +
>  2 files changed, 100 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index b7868c877d1c1c..ce758fbe3c525d 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -4,6 +4,7 @@
>  #include <linux/iommufd.h>
>  #include <linux/slab.h>
>  #include <linux/iommu.h>
> +#include "../iommu-priv.h"
> 
>  #include "io_pagetable.h"
>  #include "iommufd_private.h"
> @@ -365,6 +366,84 @@ iommufd_device_do_attach(struct iommufd_device *idev,
>  	return NULL;
>  }
> 
> +static struct iommufd_hw_pagetable *
> +iommufd_device_do_replace(struct iommufd_device *idev,
> +			  struct iommufd_hw_pagetable *hwpt)
> +{
> +	struct iommufd_group *igroup = idev->igroup;
> +	struct iommufd_hw_pagetable *old_hwpt;
> +	unsigned int num_devices = 0;
> +	struct iommufd_device *cur;
> +	int rc;
> +
> +	mutex_lock(&idev->igroup->lock);
> +
> +	if (igroup->hwpt == NULL) {
> +		rc = -EINVAL;
> +		goto err_unlock;
> +	}
> +
> +	if (hwpt == igroup->hwpt) {
> +		mutex_unlock(&idev->igroup->lock);
> +		return NULL;
> +	}
> +
> +	/* Try to upgrade the domain we have */
> +	list_for_each_entry(cur, &igroup->device_list, group_item) {
> +		num_devices++;
> +		if (cur->enforce_cache_coherency) {
> +			rc = iommufd_hw_pagetable_enforce_cc(hwpt);
> +			if (rc)
> +				goto err_unlock;
> +		}
> +	}
> +
> +	old_hwpt = igroup->hwpt;
> +	if (hwpt->ioas != old_hwpt->ioas) {
> +		list_for_each_entry(cur, &igroup->device_list, group_item) {
> +			rc = iopt_table_enforce_dev_resv_regions(
> +				&hwpt->ioas->iopt, cur->dev, NULL);
> +			if (rc)
> +				goto err_unresv;
> +		}
> +	}
> +
> +	rc = iommufd_group_setup_msi(idev->igroup, hwpt);
> +	if (rc)
> +		goto err_unresv;
> +
> +	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
> +	if (rc)
> +		goto err_unresv;
> +
> +	if (hwpt->ioas != old_hwpt->ioas) {
> +		list_for_each_entry(cur, &igroup->device_list, group_item)
> +			iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
> +						  cur->dev);
> +	}
> +
> +	igroup->hwpt = hwpt;
> +
> +	/*
> +	 * Move the refcounts held by the device_list to the new hwpt. Retain a
> +	 * refcount for this thread as the caller will free it.
> +	 */
> +	refcount_add(num_devices, &hwpt->obj.users);
> +	if (num_devices > 1)
> +		WARN_ON(refcount_sub_and_test(num_devices - 1,
> +					      &old_hwpt->obj.users));
> +	mutex_unlock(&idev->igroup->lock);
> +
> +	/* Caller must destroy old_hwpt */
> +	return old_hwpt;
> +err_unresv:
> +	list_for_each_entry(cur, &igroup->device_list, group_item)
> +		iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev);
> +err_unlock:
> +	mutex_unlock(&idev->igroup->lock);
> +	return ERR_PTR(rc);
> +}
> +
>  typedef struct iommufd_hw_pagetable *(*attach_fn)(
>  	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
> 
> @@ -523,6 +602,26 @@ int iommufd_device_attach(struct iommufd_device *idev, u32
> *pt_id)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
> 
> +/**
> + * iommufd_device_replace - Change the device's iommu_domain
> + * @idev: device to change
> + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> + *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
> + *
> + * This is the same as
> + *   iommufd_device_detach();
> + *   iommufd_device_attach();

One blank line here would fix a warning as below in "make htmldocs".

Documentation/userspace-api/iommufd:184: ./drivers/iommu/iommufd/device.c:665: WARNING: Definition list ends without a blank line; unexpected unindent.

Regards,
Yi Liu

> + * If it fails then no change is made to the attachment. The iommu driver may
> + * implement this so there is no disruption in translation. This can only be
> + * called if iommufd_device_attach() has already succeeded.
> + */
> +int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
> +{
> +	return iommufd_device_change_pt(idev, pt_id,
> +					&iommufd_device_do_replace);
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
> +
>  /**
>   * iommufd_device_detach - Disconnect a device to an iommu_domain
>   * @idev: device to detach
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 24f30f384df6f9..5b7f70543adb24 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -466,5 +466,6 @@ module_exit(iommufd_exit);
>  MODULE_ALIAS_MISCDEV(VFIO_MINOR);
>  MODULE_ALIAS("devname:vfio/vfio");
>  #endif
> +MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
>  MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
>  MODULE_LICENSE("GPL");
> --
> 2.40.1