Message ID | f205a4e2f5971cd4b1033d7cac41683e10ebabfb.1740504232.git.nicolinc@nvidia.com |
---|---|
State | New |
Headers | show |
Series | iommufd: Add vIOMMU infrastructure (Part-3: vEVENTQ) | expand |
On Tue, Mar 11, 2025 at 03:57:16PM +0000, Will Deacon wrote: > On Tue, Feb 25, 2025 at 09:25:40AM -0800, Nicolin Chen wrote: > > Use it to store all vSMMU-related data. The vsid (Virtual Stream ID) will > > be the first use case. Since the vsid reader will be the eventq handler > > that already holds a streams_mutex, reuse that to fenche the vmaster too. > > "fenche"? fence :) And I fixed other nits too. > > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, > > + struct arm_smmu_nested_domain *nested_domain) > > +{ > > + struct arm_smmu_vmaster *vmaster; > > + unsigned long vsid; > > + int ret; > > + > > + iommu_group_mutex_assert(state->master->dev); > > + > > + /* Skip invalid vSTE */ > > + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) > > + return 0; > > Ok, and we don't need to set 'state->vmaster' in this case because we > only report stage-1 faults back to the vSMMU? This is a good question that I didn't ask myself hard enough.. I think we should probably drop it. An invalid STE should trigger a C_BAD_STE event that is in the supported vEVENT list. I'll run some test before removing this line from v9. > With the nits fixed: > > Acked-by: Will Deacon <will@kernel.org> Thanks! Nicolin
On Mon, Mar 17, 2025 at 12:44:23PM -0300, Jason Gunthorpe wrote: > On Tue, Mar 11, 2025 at 10:43:08AM -0700, Nicolin Chen wrote: > > > > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, > > > > + struct arm_smmu_nested_domain *nested_domain) > > > > +{ > > > > + struct arm_smmu_vmaster *vmaster; > > > > + unsigned long vsid; > > > > + int ret; > > > > + > > > > + iommu_group_mutex_assert(state->master->dev); > > > > + > > > > + /* Skip invalid vSTE */ > > > > + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) > > > > + return 0; > > > > > > Ok, and we don't need to set 'state->vmaster' in this case because we > > > only report stage-1 faults back to the vSMMU? > > > > This is a good question that I didn't ask myself hard enough.. > > > > I think we should probably drop it. An invalid STE should trigger > > a C_BAD_STE event that is in the supported vEVENT list. I'll run > > some test before removing this line from v9. > > It won't trigger C_BAD_STE, recall Robin was opposed to thatm so we have this: > > static void arm_smmu_make_nested_domain_ste( > struct arm_smmu_ste *target, struct arm_smmu_master *master, > struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) > { > unsigned int cfg = > FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0])); > > /* > * Userspace can request a non-valid STE through the nesting interface. > * We relay that into an abort physical STE with the intention that > * C_BAD_STE for this SID can be generated to userspace. > */ > if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) > cfg = STRTAB_STE_0_CFG_ABORT; > > So, in the case of a non-valid STE, and a device access, the HW will > generate one of the translation faults and that will be forwarded. > > Some software component will have to transform those fault events into > C_BAD_STE for the VM. Hmm, double checked the spec. It does say that C_BAD_STE would be triggered: " V, bit [0] STE Valid. [...] Device transactions that select an STE with this field configured to 0 are terminated with an abort reported back to the device and a C_BAD_STE event is recorded." I also did a hack test unsetting the V bit in the kernel. Then, the HW did report C_BAD_STE (0x4) back to the VM (via vEVENTQ). Thanks Nicolin
On Mon, Mar 17, 2025 at 11:49:14AM -0700, Nicolin Chen wrote: > On Mon, Mar 17, 2025 at 12:44:23PM -0300, Jason Gunthorpe wrote: > > On Tue, Mar 11, 2025 at 10:43:08AM -0700, Nicolin Chen wrote: > > > > > +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, > > > > > + struct arm_smmu_nested_domain *nested_domain) > > > > > +{ > > > > > + struct arm_smmu_vmaster *vmaster; > > > > > + unsigned long vsid; > > > > > + int ret; > > > > > + > > > > > + iommu_group_mutex_assert(state->master->dev); > > > > > + > > > > > + /* Skip invalid vSTE */ > > > > > + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) > > > > > + return 0; > > > > > > > > Ok, and we don't need to set 'state->vmaster' in this case because we > > > > only report stage-1 faults back to the vSMMU? > > > > > > This is a good question that I didn't ask myself hard enough.. > > > > > > I think we should probably drop it. An invalid STE should trigger > > > a C_BAD_STE event that is in the supported vEVENT list. I'll run > > > some test before removing this line from v9. > > > > It won't trigger C_BAD_STE, recall Robin was opposed to thatm so we have this: > > > > static void arm_smmu_make_nested_domain_ste( > > struct arm_smmu_ste *target, struct arm_smmu_master *master, > > struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) > > { > > unsigned int cfg = > > FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0])); > > > > /* > > * Userspace can request a non-valid STE through the nesting interface. > > * We relay that into an abort physical STE with the intention that > > * C_BAD_STE for this SID can be generated to userspace. > > */ > > if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) > > cfg = STRTAB_STE_0_CFG_ABORT; > > > > So, in the case of a non-valid STE, and a device access, the HW will > > generate one of the translation faults and that will be forwarded. > > > > Some software component will have to transform those fault events into > > C_BAD_STE for the VM. > > Hmm, double checked the spec. It does say that C_BAD_STE would be > triggered: > > " V, bit [0] STE Valid. > [...] > Device transactions that select an STE with this field configured > to 0 are terminated with an abort reported back to the device and > a C_BAD_STE event is recorded." > > I also did a hack test unsetting the V bit in the kernel. Then, the > HW did report C_BAD_STE (0x4) back to the VM (via vEVENTQ). Yes, I expect that C_BAD_STE will forward just fine. But, as above, it should never be generated by HW because the hypervisor kernel will never install a bad STE, we detect that and convert it to abort. Jason
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index bd9d7c85576a..36961a3579f2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -799,6 +799,11 @@ struct arm_smmu_stream { struct rb_node node; }; +struct arm_smmu_vmaster { + struct arm_vsmmu *vsmmu; + unsigned long vsid; +}; + struct arm_smmu_event { u8 stall : 1, ssv : 1, @@ -824,6 +829,7 @@ struct arm_smmu_master { struct arm_smmu_device *smmu; struct device *dev; struct arm_smmu_stream *streams; + struct arm_smmu_vmaster *vmaster; /* use smmu->streams_mutex */ /* Locked by the iommu core using the group mutex */ struct arm_smmu_ctx_desc_cfg cd_table; unsigned int num_streams; @@ -972,6 +978,7 @@ struct arm_smmu_attach_state { bool disable_ats; ioasid_t ssid; /* Resulting state */ + struct arm_smmu_vmaster *vmaster; bool ats_enabled; }; @@ -1055,9 +1062,30 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, unsigned int viommu_type); +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct arm_smmu_nested_domain *nested_domain); +void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state); +void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master); #else #define arm_smmu_hw_info NULL #define arm_vsmmu_alloc NULL + +static inline int +arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct arm_smmu_nested_domain *nested_domain) +{ + return 0; /* NOP */ +} + +static inline void +arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state) +{ +} + +static inline void +arm_smmu_master_clear_vmaster(struct arm_smmu_master *master) +{ +} #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */ #endif /* _ARM_SMMU_V3_H */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 5aa2e7af58b4..6b712b1ab429 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -85,6 +85,51 @@ static void arm_smmu_make_nested_domain_ste( } } +int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, + struct arm_smmu_nested_domain *nested_domain) +{ + struct arm_smmu_vmaster *vmaster; + unsigned long vsid; + int ret; + + iommu_group_mutex_assert(state->master->dev); + + /* Skip invalid vSTE */ + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) + return 0; + + ret = iommufd_viommu_get_vdev_id(&nested_domain->vsmmu->core, + state->master->dev, &vsid); + if (ret) + return ret; + + vmaster = kzalloc(sizeof(*vmaster), GFP_KERNEL); + if (!vmaster) + return -ENOMEM; + vmaster->vsmmu = nested_domain->vsmmu; + vmaster->vsid = vsid; + state->vmaster = vmaster; + + return 0; +} + +void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state) +{ + struct arm_smmu_master *master = state->master; + + mutex_lock(&master->smmu->streams_mutex); + kfree(master->vmaster); + master->vmaster = state->vmaster; + mutex_unlock(&master->smmu->streams_mutex); +} + +void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master) +{ + struct arm_smmu_attach_state state = { .master = master }; + + arm_smmu_attach_commit_vmaster(&state); +} + static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, struct device *dev) { diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 358072b4e293..964d2cf27d3d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2803,6 +2803,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(new_domain); unsigned long flags; + int ret; /* * arm_smmu_share_asid() must not see two domains pointing to the same @@ -2832,9 +2833,18 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, } if (smmu_domain) { + if (new_domain->type == IOMMU_DOMAIN_NESTED) { + ret = arm_smmu_attach_prepare_vmaster( + state, to_smmu_nested_domain(new_domain)); + if (ret) + return ret; + } + master_domain = kzalloc(sizeof(*master_domain), GFP_KERNEL); - if (!master_domain) + if (!master_domain) { + kfree(state->vmaster); return -ENOMEM; + } master_domain->master = master; master_domain->ssid = state->ssid; if (new_domain->type == IOMMU_DOMAIN_NESTED) @@ -2861,6 +2871,7 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); kfree(master_domain); + kfree(state->vmaster); return -EINVAL; } @@ -2893,6 +2904,8 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) lockdep_assert_held(&arm_smmu_asid_lock); + arm_smmu_attach_commit_vmaster(state); + if (state->ats_enabled && !master->ats_enabled) { arm_smmu_enable_ats(master); } else if (state->ats_enabled && master->ats_enabled) { @@ -3162,6 +3175,7 @@ static int arm_smmu_attach_dev_identity(struct iommu_domain *domain, struct arm_smmu_ste ste; struct arm_smmu_master *master = dev_iommu_priv_get(dev); + arm_smmu_master_clear_vmaster(master); arm_smmu_make_bypass_ste(master->smmu, &ste); arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_BYPASS); return 0; @@ -3180,7 +3194,9 @@ static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain, struct device *dev) { struct arm_smmu_ste ste; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + arm_smmu_master_clear_vmaster(master); arm_smmu_make_abort_ste(&ste); arm_smmu_attach_dev_ste(domain, dev, &ste, STRTAB_STE_1_S1DSS_TERMINATE);