mbox series

[v6,00/10] iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

Message ID cover.1730313494.git.nicolinc@nvidia.com
Headers show
Series iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE) | expand

Message

Nicolin Chen Oct. 30, 2024, 9:35 p.m. UTC
Following the previous vIOMMU series, this adds another vDEVICE structure,
representing the association from an iommufd_device to an iommufd_viommu.
This gives the whole architecture a new "v" layer:
  _______________________________________________________________________
 |                      iommufd (with vIOMMU/vDEVICE)                    |
 |                        _____________      _____________               |
 |                       |             |    |             |              |
 |      |----------------|    vIOMMU   |<---|   vDEVICE   |<------|      |
 |      |                |             |    |_____________|       |      |
 |      |     ______     |             |     _____________     ___|____  |
 |      |    |      |    |             |    |             |   |        | |
 |      |    | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
 |      |    |______|    |_____________|    |_____________|   |________| |
 |______|________|______________|__________________|_______________|_____|
        |        |              |                  |               |
  ______v_____   |        ______v_____       ______v_____       ___v__
 |   struct   |  |  PFN  |  (paging)  |     |  (nested)  |     |struct|
 |iommu_device|  |------>|iommu_domain|<----|iommu_domain|<----|device|
 |____________|   storage|____________|     |____________|     |______|

This vDEVICE object is used to collect and store all vIOMMU-related device
information/attributes in a VM. As an initial series for vDEVICE, add only
the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to
a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
of the device against the physical IOMMU instance. This is essential for a
vIOMMU-based invalidation, where the request contains a device's vID for a
device cache flush, e.g. ATC invalidation.

Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
with a given driver data.

As for the implementation of the series, add driver support in ARM SMMUv3
for a real world use case.

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v6
(QEMU branch for testing will be provided in Jason's nesting series)

Changelog
v6
 * Fixed kdoc in the uAPI header
 * Fixed indentations in iommufd.rst
 * Replaced vdev->idev with vdev->dev
 * Added "Reviewed-by" from Kevin and Jason
 * Updated kdoc of struct iommu_vdevice_alloc
 * Fixed lockdep function call in iommufd_viommu_find_dev
 * Added missing iommu_dev validation between viommu and idev
 * Skipped SMMUv3 driver changes (to post in a separate series)
 * Replaced !cache_invalidate_user in WARN_ON of the allocation path
   with cache_invalidate_user validation in iommufd_hwpt_invalidate
v5
 https://lore.kernel.org/all/cover.1729897278.git.nicolinc@nvidia.com/
 * Dropped driver-allocated vDEVICE support
 * Changed vdev_to_dev helper to iommufd_viommu_find_dev
v4
 https://lore.kernel.org/all/cover.1729555967.git.nicolinc@nvidia.com/
 * Added missing brackets in switch-case
 * Fixed the unreleased idev refcount issue
 * Reworked the iommufd_vdevice_alloc allocator
 * Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT
 * Added missing TEST_LENGTH and fail_nth coverages
 * Added a verification to the driver-allocated vDEVICE object
 * Added an iommufd_vdevice_abort for a missing mutex protection
 * Added a u64 structure arm_vsmmu_invalidation_cmd for user command
   conversion
v3
 https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/
 * Added Jason's Reviewed-by
 * Split this invalidation part out of the part-1 series
 * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl
 * Reduced viommu_api functions by allowing drivers to access viommu
   and vdevice structure directly
 * Dropped vdevs_rwsem by using xa_lock instead
 * Dropped arm_smmu_cache_invalidate_user
v2
 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
 * Limited vdev_id to one per idev
 * Added a rw_sem to protect the vdev_id list
 * Reworked driver-level APIs with proper lockings
 * Added a new viommu_api file for IOMMUFD_DRIVER config
 * Dropped useless iommu_dev point from the viommu structure
 * Added missing index numnbers to new types in the uAPI header
 * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
 * Reworked mock_viommu_cache_invalidate() using the new iommu helper
 * Reordered details of set/unset_vdev_id handlers for proper lockings
v1
 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/

Thanks!
Nicolin

Jason Gunthorpe (1):
  iommu: Add iommu_copy_struct_from_full_user_array helper

Nicolin Chen (9):
  iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl
  iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
  iommu/viommu: Add cache_invalidate to iommufd_viommu_ops
  iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
  iommufd/viommu: Add iommufd_viommu_find_dev helper
  iommufd/selftest: Add mock_viommu_cache_invalidate
  iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
  iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
  Documentation: userspace-api: iommufd: Update vDEVICE

 drivers/iommu/iommufd/iommufd_private.h       |  18 ++
 drivers/iommu/iommufd/iommufd_test.h          |  30 +++
 include/linux/iommu.h                         |  48 ++++-
 include/linux/iommufd.h                       |  22 ++
 include/uapi/linux/iommufd.h                  |  31 ++-
 tools/testing/selftests/iommu/iommufd_utils.h |  83 +++++++
 drivers/iommu/iommufd/driver.c                |  13 ++
 drivers/iommu/iommufd/hw_pagetable.c          |  40 +++-
 drivers/iommu/iommufd/main.c                  |   6 +
 drivers/iommu/iommufd/selftest.c              |  98 ++++++++-
 drivers/iommu/iommufd/viommu.c                |  76 +++++++
 tools/testing/selftests/iommu/iommufd.c       | 204 +++++++++++++++++-
 .../selftests/iommu/iommufd_fail_nth.c        |   4 +
 Documentation/userspace-api/iommufd.rst       |  41 +++-
 14 files changed, 688 insertions(+), 26 deletions(-)

Comments

Zhangfei Gao Oct. 31, 2024, 6:28 a.m. UTC | #1
On Thu, 31 Oct 2024 at 05:36, Nicolin Chen <nicolinc@nvidia.com> wrote:
>
> Following the previous vIOMMU series, this adds another vDEVICE structure,
> representing the association from an iommufd_device to an iommufd_viommu.
> This gives the whole architecture a new "v" layer:
>   _______________________________________________________________________
>  |                      iommufd (with vIOMMU/vDEVICE)                    |
>  |                        _____________      _____________               |
>  |                       |             |    |             |              |
>  |      |----------------|    vIOMMU   |<---|   vDEVICE   |<------|      |
>  |      |                |             |    |_____________|       |      |
>  |      |     ______     |             |     _____________     ___|____  |
>  |      |    |      |    |             |    |             |   |        | |
>  |      |    | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
>  |      |    |______|    |_____________|    |_____________|   |________| |
>  |______|________|______________|__________________|_______________|_____|
>         |        |              |                  |               |
>   ______v_____   |        ______v_____       ______v_____       ___v__
>  |   struct   |  |  PFN  |  (paging)  |     |  (nested)  |     |struct|
>  |iommu_device|  |------>|iommu_domain|<----|iommu_domain|<----|device|
>  |____________|   storage|____________|     |____________|     |______|
>
> This vDEVICE object is used to collect and store all vIOMMU-related device
> information/attributes in a VM. As an initial series for vDEVICE, add only
> the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
> e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to
> a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
> of the device against the physical IOMMU instance. This is essential for a
> vIOMMU-based invalidation, where the request contains a device's vID for a
> device cache flush, e.g. ATC invalidation.
>
> Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
> by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
> with a given driver data.
>
> As for the implementation of the series, add driver support in ARM SMMUv3
> for a real world use case.
>
> This series is on Github:
> https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v6
> (QEMU branch for testing will be provided in Jason's nesting series)

Thanks Nico

I tested on aarch64, functions are OK.

But with some hacks
https://github.com/Linaro/linux-kernel-uadk/commit/22f47d6f3a34a0867a187473bd5ba0e0ee3395d4

Thanks
Jason Gunthorpe Oct. 31, 2024, 11:59 a.m. UTC | #2
On Thu, Oct 31, 2024 at 02:28:12PM +0800, Zhangfei Gao wrote:

> > As for the implementation of the series, add driver support in ARM SMMUv3
> > for a real world use case.
> >
> > This series is on Github:
> > https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v6
> > (QEMU branch for testing will be provided in Jason's nesting series)
> 
> Thanks Nico
> 
> I tested on aarch64, functions are OK.
> 
> But with some hacks
> https://github.com/Linaro/linux-kernel-uadk/commit/22f47d6f3a34a0867a187473bd5ba0e0ee3395d4

Hmm, it seems like we should permit IOMMU_HWPT_FAULT_ID_VALID domain
creation on ARM?

Jason
Jason Gunthorpe Oct. 31, 2024, 1:30 p.m. UTC | #3
On Wed, Oct 30, 2024 at 02:35:32PM -0700, Nicolin Chen wrote:
> This avoids a bigger trouble of exposing struct iommufd_device and struct
> iommufd_vdevice in the public header.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  include/linux/iommufd.h        |  8 ++++++++
>  drivers/iommu/iommufd/driver.c | 13 +++++++++++++
>  2 files changed, 21 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Jason Gunthorpe Oct. 31, 2024, 3:33 p.m. UTC | #4
On Wed, Oct 30, 2024 at 02:35:31PM -0700, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> The iommu_copy_struct_from_user_array helper can be used to copy a single
> entry from a user array which might not be efficient if the array is big.
> 
> Add a new iommu_copy_struct_from_full_user_array to copy the entire user
> array at once. Update the existing iommu_copy_struct_from_user_array kdoc
> accordingly.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  include/linux/iommu.h | 48 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 47 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Nicolin Chen Oct. 31, 2024, 4:41 p.m. UTC | #5
On Thu, Oct 31, 2024 at 02:28:12PM +0800, Zhangfei Gao wrote:
> On Thu, 31 Oct 2024 at 05:36, Nicolin Chen <nicolinc@nvidia.com> wrote:
> >
> > Following the previous vIOMMU series, this adds another vDEVICE structure,
> > representing the association from an iommufd_device to an iommufd_viommu.
> > This gives the whole architecture a new "v" layer:
> >   _______________________________________________________________________
> >  |                      iommufd (with vIOMMU/vDEVICE)                    |
> >  |                        _____________      _____________               |
> >  |                       |             |    |             |              |
> >  |      |----------------|    vIOMMU   |<---|   vDEVICE   |<------|      |
> >  |      |                |             |    |_____________|       |      |
> >  |      |     ______     |             |     _____________     ___|____  |
> >  |      |    |      |    |             |    |             |   |        | |
> >  |      |    | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
> >  |      |    |______|    |_____________|    |_____________|   |________| |
> >  |______|________|______________|__________________|_______________|_____|
> >         |        |              |                  |               |
> >   ______v_____   |        ______v_____       ______v_____       ___v__
> >  |   struct   |  |  PFN  |  (paging)  |     |  (nested)  |     |struct|
> >  |iommu_device|  |------>|iommu_domain|<----|iommu_domain|<----|device|
> >  |____________|   storage|____________|     |____________|     |______|
> >
> > This vDEVICE object is used to collect and store all vIOMMU-related device
> > information/attributes in a VM. As an initial series for vDEVICE, add only
> > the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
> > e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to
> > a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
> > of the device against the physical IOMMU instance. This is essential for a
> > vIOMMU-based invalidation, where the request contains a device's vID for a
> > device cache flush, e.g. ATC invalidation.
> >
> > Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
> > by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
> > with a given driver data.
> >
> > As for the implementation of the series, add driver support in ARM SMMUv3
> > for a real world use case.
> >
> > This series is on Github:
> > https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v6
> > (QEMU branch for testing will be provided in Jason's nesting series)
> 
> Thanks Nico
> 
> I tested on aarch64, functions are OK.
> 
> But with some hacks
> https://github.com/Linaro/linux-kernel-uadk/commit/22f47d6f3a34a0867a187473bd5ba0e0ee3395d4

Though we will have some small changes to this two series (and
possibly a small change to SMMUv3 driver too for that flags in
the hacks), would you mind giving this two series a Tested-by?

Thanks!
Nicolin
Nicolin Chen Oct. 31, 2024, 4:43 p.m. UTC | #6
On Thu, Oct 31, 2024 at 08:59:37AM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 31, 2024 at 02:28:12PM +0800, Zhangfei Gao wrote:
> 
> > > As for the implementation of the series, add driver support in ARM SMMUv3
> > > for a real world use case.
> > >
> > > This series is on Github:
> > > https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v6
> > > (QEMU branch for testing will be provided in Jason's nesting series)
> > 
> > Thanks Nico
> > 
> > I tested on aarch64, functions are OK.
> > 
> > But with some hacks
> > https://github.com/Linaro/linux-kernel-uadk/commit/22f47d6f3a34a0867a187473bd5ba0e0ee3395d4
> 
> Hmm, it seems like we should permit IOMMU_HWPT_FAULT_ID_VALID domain
> creation on ARM?

+1, since we proved PRI could work :)

Thanks
Nicolin
Jason Gunthorpe Oct. 31, 2024, 5:04 p.m. UTC | #7
On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote:
> On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote:
> > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote:
> > > +void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > > +{
> > > +	struct iommufd_vdevice *vdev =
> > > +		container_of(obj, struct iommufd_vdevice, obj);
> > > +	struct iommufd_viommu *viommu = vdev->viommu;
> > > +
> > > +	/* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */
> > > +	xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> > 
> > There are crazy races that would cause this not to work. Another
> > thread could have successfully destroyed whatever caused EEXIST and
> > the successfully registered this same vdev to the same id. Then this
> > will wrongly erase the other threads entry.
> >
> > It would be better to skip the erase directly if the EEXIST unwind is
> > being taken.
>
> Hmm, is the "another thread" an alloc() or a destroy()? 

I was thinking both

> It doesn't seem to me that there could be another destroy() on the
> same object since this current destroy() is the abort to an
> unfinalized object.  And it doesn't seem that another alloc() will
> get the same vdev ptr since every vdev allocation in the alloc()
> will be different?

Ah so you are saying that since the vdev 'old' is local to this thread
it can't possibly by aliased by another?

I was worried the id could be aliased, but yes, that seems right that
the vdev cmpxchg would reject that.

So lets leave it

Jason
Nicolin Chen Oct. 31, 2024, 6:46 p.m. UTC | #8
On Thu, Oct 31, 2024 at 02:04:46PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote:
> > On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote:
> > > > +void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > > > +{
> > > > +	struct iommufd_vdevice *vdev =
> > > > +		container_of(obj, struct iommufd_vdevice, obj);
> > > > +	struct iommufd_viommu *viommu = vdev->viommu;
> > > > +
> > > > +	/* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */
> > > > +	xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> > > 
> > > There are crazy races that would cause this not to work. Another
> > > thread could have successfully destroyed whatever caused EEXIST and
> > > the successfully registered this same vdev to the same id. Then this
> > > will wrongly erase the other threads entry.
> > >
> > > It would be better to skip the erase directly if the EEXIST unwind is
> > > being taken.
> >
> > Hmm, is the "another thread" an alloc() or a destroy()? 
> 
> I was thinking both
> 
> > It doesn't seem to me that there could be another destroy() on the
> > same object since this current destroy() is the abort to an
> > unfinalized object.  And it doesn't seem that another alloc() will
> > get the same vdev ptr since every vdev allocation in the alloc()
> > will be different?
> 
> Ah so you are saying that since the vdev 'old' is local to this thread
> it can't possibly by aliased by another?
> 
> I was worried the id could be aliased, but yes, that seems right that
> the vdev cmpxchg would reject that.
> 
> So lets leave it

Ack. I'll still update this since xa_cmpxchg can give other errno:
+	/* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */
-	/* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */

Thanks
Nicolin