Message ID | 20230511143844.22693-1-yi.l.liu@intel.com |
---|---|
Headers | show |
Series | iommufd: Add nesting infrastructure | expand |
On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote: > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Thursday, May 11, 2023 10:39 PM > > > > Lu Baolu (2): > > iommu: Add new iommu op to create domains owned by userspace > > iommu: Add nested domain support > > > > Nicolin Chen (5): > > iommufd/hw_pagetable: Do not populate user-managed hw_pagetables > > iommufd/selftest: Add domain_alloc_user() support in iommu mock > > iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data > > iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op > > iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl > > > > Yi Liu (4): > > iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation > > iommufd: Pass parent hwpt and user_data to > > iommufd_hw_pagetable_alloc() > > iommufd: IOMMU_HWPT_ALLOC allocation with user data > > iommufd: Add IOMMU_HWPT_INVALIDATE > > > > I didn't see any change in iommufd_hw_pagetable_attach() to handle > stage-1 hwpt differently. > > In concept whatever reserved regions existing on a device should be > directly reflected on the hwpt which the device is attached to. > > So with nesting presumably the reserved regions of the device have > been reported to the userspace and it's user's responsibility to avoid > allocating IOVA from those reserved regions in stage-1 hwpt. Presumably > It's not necessarily to add reserved regions to the IOAS of the parent > hwpt since the device doesn't access that address space after it's > attached to stage-1. The parent is used only for address translation > in the iommu side. But if we don't put them in the IOAS of the parent there is no way for userspace to learn what they are to forward to the VM ? Since we expect the parent IOAS to be usable in an identity mode I think they should be added, at least I can't see a reason not to add them. Which is definately complicating some parts of this.. Jason
On Wed, May 24, 2023 at 03:48:43AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Friday, May 19, 2023 7:50 PM > > > > On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote: > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > Sent: Thursday, May 11, 2023 10:39 PM > > > > > > > > Lu Baolu (2): > > > > iommu: Add new iommu op to create domains owned by userspace > > > > iommu: Add nested domain support > > > > > > > > Nicolin Chen (5): > > > > iommufd/hw_pagetable: Do not populate user-managed hw_pagetables > > > > iommufd/selftest: Add domain_alloc_user() support in iommu mock > > > > iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user > > data > > > > iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op > > > > iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl > > > > > > > > Yi Liu (4): > > > > iommufd/hw_pagetable: Use domain_alloc_user op for domain > > allocation > > > > iommufd: Pass parent hwpt and user_data to > > > > iommufd_hw_pagetable_alloc() > > > > iommufd: IOMMU_HWPT_ALLOC allocation with user data > > > > iommufd: Add IOMMU_HWPT_INVALIDATE > > > > > > > > > > I didn't see any change in iommufd_hw_pagetable_attach() to handle > > > stage-1 hwpt differently. > > > > > > In concept whatever reserved regions existing on a device should be > > > directly reflected on the hwpt which the device is attached to. > > > > > > So with nesting presumably the reserved regions of the device have > > > been reported to the userspace and it's user's responsibility to avoid > > > allocating IOVA from those reserved regions in stage-1 hwpt. > > > > Presumably > > > > > It's not necessarily to add reserved regions to the IOAS of the parent > > > hwpt since the device doesn't access that address space after it's > > > attached to stage-1. The parent is used only for address translation > > > in the iommu side. > > > > But if we don't put them in the IOAS of the parent there is no way for > > userspace to learn what they are to forward to the VM ? > > emmm I wonder whether that is the right interface to report > per-device reserved regions. The iommu driver needs to report different reserved regions for the S1 and S2 iommu_domains, and the IOAS should only get the reserved regions for the S2. Currently the API has no way to report per-domain reserved regions and that is possibly OK for now. The S2 really doesn't have reserved regions beyond the domain aperture. So an ioctl to directly query the reserved regions for a dev_id makes sense. > > Since we expect the parent IOAS to be usable in an identity mode I > > think they should be added, at least I can't see a reason not to add > > them. > > this is a good point. But it mixes things The S2 doesn't have reserved ranges restrictions, we always have some model of a S1, even for identity mode, that would carry the reserved ranges. > With that it makes more sense to make it a vendor specific choice. It isn't vendor specific, the ranges come from the domain that is attached to the IOAS, and we simply don't import ranges for a S2 domain. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, June 6, 2023 10:18 PM > > > > > It's not necessarily to add reserved regions to the IOAS of the parent > > > > hwpt since the device doesn't access that address space after it's > > > > attached to stage-1. The parent is used only for address translation > > > > in the iommu side. > > > > > > But if we don't put them in the IOAS of the parent there is no way for > > > userspace to learn what they are to forward to the VM ? > > > > emmm I wonder whether that is the right interface to report > > per-device reserved regions. > > The iommu driver needs to report different reserved regions for the S1 > and S2 iommu_domains, I can see the difference between RID and RID+PASID, but not sure whether it's a actual requirement regarding to attached domain. e.g. if only talking about RID then the same set of reserved regions should be reported for both S1 attach and S2 attach. > and the IOAS should only get the reserved regions for the S2. > > Currently the API has no way to report per-domain reserved regions and > that is possibly OK for now. The S2 really doesn't have reserved > regions beyond the domain aperture. > > So an ioctl to directly query the reserved regions for a dev_id makes > sense. Or more specifically query the reserved regions for RID-based access. Ideally for PASID there is no reserved region otherwise SVA won't work. 😊 > > > > Since we expect the parent IOAS to be usable in an identity mode I > > > think they should be added, at least I can't see a reason not to add > > > them. > > > > this is a good point. > > But it mixes things > > The S2 doesn't have reserved ranges restrictions, we always have some > model of a S1, even for identity mode, that would carry the reserved > ranges. > > > With that it makes more sense to make it a vendor specific choice. > > It isn't vendor specific, the ranges come from the domain that is > attached to the IOAS, and we simply don't import ranges for a S2 > domain. > With above I think the ranges are static per device. When talking about RID-based nesting alone, ARM needs to add reserved regions to the parent IOAS as identity is a valid S1 mode in nesting. But for Intel RID nesting excludes identity (which becomes a direct attach to S2) so the reserved regions apply to S1 instead of the parent IOAS.
On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote: > I wonder whether we have argued passed each other. > > This series adds reserved regions to S2. I challenged the necessity as > S2 is not directly accessed by the device. > > Then you replied that doing so still made sense to support identity > S1. I think I said/ment if we attach the "s2" iommu domain as a direct attach for identity - eg at boot time, then the IOAS must gain the reserved regions. This is our normal protocol. But when we use the "s2" iommu domain as an actual nested S2 then we don't gain reserved regions. > Intel VT-d supports 4 configurations: > - passthrough (i.e. identity mapped) > - S1 only > - S2 only > - nested > > 'S2 only' is used when vIOMMU is configured in passthrough. S2 only is modeled as attaching an S2 format iommu domain to the RID, and when this is done the IOAS should gain the reserved regions because it is no different behavior than attaching any other iommu domain to a RID. When the S2 is replaced with a S1 nest then the IOAS should loose those reserved regions since it is no longer attached to a RID. > My understanding of ARM SMMU is that from host p.o.v. the CD is the > S1 in the nested configuration. 'identity' is one configuration in the CD > then it's in the business of nesting. I think it is the same. A CD doesn't come into the picture until the guest installs a CD pointing STE. Until that time the S2 is being used as identity. It sounds like the same basic flow. > My preference was that ALLOC_HWPT allows vIOMMU to opt whether > reserved regions of dev_id should be added to the IOAS of the parent > S2 hwpt. Having an API to explicitly load reserved regions of a specific device to an IOAS makes some sense to me. Jason
On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote: > > On Tue, Jun 20, 2023 at 01:43:42AM +0000, Tian, Kevin wrote: > > > I wonder whether we have argued passed each other. > > > > > > This series adds reserved regions to S2. I challenged the necessity as > > > S2 is not directly accessed by the device. > > > > > > Then you replied that doing so still made sense to support identity > > > S1. > > > > I think I said/ment if we attach the "s2" iommu domain as a direct > > attach for identity - eg at boot time, then the IOAS must gain the > > reserved regions. This is our normal protocol. > > > > But when we use the "s2" iommu domain as an actual nested S2 then we > > don't gain reserved regions. > > Then we're aligned. > > Yi/Nicolin, please update this series to not automatically add reserved > regions to S2 in the nesting configuration. I'm a bit late for the conversation here. Yet, how about the IOMMU_RESV_SW_MSI on ARM in the nesting configuration? We'd still call iommufd_group_setup_msi() on the S2 HWPT, despite attaching the device to a nested S1 HWPT right? > It also implies that the user cannot rely on IOAS_IOVA_RANGES to > learn reserved regions for arranging addresses in S1. > > Then we also need a new ioctl to report reserved regions per dev_id. So, in a nesting configuration, QEMU would poll a device's S2 MSI region (i.e. IOMMU_RESV_SW_MSI) to prevent conflict? Thanks Nic
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Wednesday, June 21, 2023 8:05 PM > > On Wed, Jun 21, 2023 at 06:02:21AM +0000, Tian, Kevin wrote: > > > > My understanding of ARM SMMU is that from host p.o.v. the CD is the > > > > S1 in the nested configuration. 'identity' is one configuration in the CD > > > > then it's in the business of nesting. > > > > > > I think it is the same. A CD doesn't come into the picture until the > > > guest installs a CD pointing STE. Until that time the S2 is being used > > > as identity. > > > > > > It sounds like the same basic flow. > > > > After a CD table is installed in a STE I assume the SMMU still allows to > > configure an individual CD entry as identity? e.g. while vSVA is enabled > > on a device the guest can continue to keep CD#0 as identity when the > > default domain of the device is set as 'passthrough'. In this case the > > IOAS still needs to gain reserved regions even though S2 is not directly > > attached from host p.o.v. > > In any nesting configuration the hypervisor cannot directly restrict > what IOVA the guest will use. The VM could make a normal nest and try > to use unusable IOVA. Identity is not really special. Sure. What I talked is the end result e.g. after the user explicitly requests to load reserved regions into an IOAS. > > The VMM should construct the guest memory map so that an identity > iommu_domain can meet the reserved requirements - it needs to do this > anyhow for the initial boot part. It shouuld try to forward the > reserved regions to the guest via ACPI/etc. Yes. > > Being able to explicitly load reserved regions into an IOAS seems like > a useful way to help construct this. > And it's correct in concept because the IOAS is 'implicitly' accessed by the device when the guest domain is identity in this case.
On Tue, Jun 27, 2023 at 06:02:13AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Tuesday, June 27, 2023 1:29 AM > > > > > I'm not sure whether the MSI region needs a special MSI type or > > > just a general RESV_DIRECT type for 1:1 mapping, though. > > > > I don't quite get this part. Isn't MSI having IOMMU_RESV_MSI > > and IOMMU_RESV_SW_MSI? Or does it juset mean we should report > > the iommu_resv_type along with reserved regions in new ioctl? > > > > Currently those are iommu internal types. When defining the new > ioctl we need think about what are necessary presenting to the user. > > Probably just a list of reserved regions plus a flag to mark which > one is SW_MSI? Except SW_MSI all other reserved region types > just need the user to reserve them w/o knowing more detail. I think I prefer the idea we just import the reserved regions from a devid and do not expose any of this detail to userspace. Kernel can make only the SW_MSI a mandatory cut out when the S2 is attached. Jason