Message ID | 20230209041642.9346-4-yi.l.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | iommufd: Add iommu capability reporting | expand |
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Thursday, February 9, 2023 12:17 PM > > +int iommufd_device_get_info(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_device_info *cmd = ucmd->cmd; > + struct iommufd_object *dev_obj; > + struct device *dev; > + const struct iommu_ops *ops; > + void *data; > + unsigned int length, data_len; > + int rc; > + > + if (cmd->flags || cmd->__reserved || !cmd->data_len) > + return -EOPNOTSUPP; do we want !cmd->data_len being a way to check how many bytes are required in a following call to get the vendor info? > + > + /* > + * Zero the trailing bytes for userspace if the buffer is bigger > + * than the data size kernel actually has. > + */ "Zero the trailing bytes if the user buffer ..." > + > +/** > + * struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO) > + * @size: sizeof(struct iommu_device_info) > + * @flags: Must be 0 > + * @dev_id: The device being attached to the IOMMU iommufd > + * @data_len: Input the type specific data buffer length in bytes also is an output field > + * > + * Query the hardware iommu capability for given device which has been > + * bound to iommufd. @data_len is set to be the size of the buffer to > + * type specific data and the data will be filled. Trailing bytes are "@data_len is the size of the buffer which captures iommu type specific data" > + * zeroed if the user buffer is larger than the data kernel has. > + * > + * The type specific data would be used to sync capability between the > + * vIOMMU and the hardware IOMMU, also for the availabillity checking of > + * iommu hardware features like dirty page tracking in I/O page table. It's fine to report format information related to stage-1 page table which userspace manages. but IMHO this should not be an interface to report which capability is supported by iommufd. Having hardware supporting dirty bit doesn't mean the underlying iommu driver provides necessary support to iommufd dirty tracking. We either don't state this way if following what this series does which simply dumps all raw iommu hw info to userspace, or explicitly require iommu driver to only report information as required when a feature is supported by iommufd. > + * > + * The @out_device_type will be filled if the ioctl succeeds. It would s/will be/is/ > + * be used to decode the data filled in the buffer pointed by @data_ptr. s/would be/is/ > + */ > +struct iommu_device_info { > + __u32 size; > + __u32 flags; > + __u32 dev_id; > + __u32 data_len; > + __aligned_u64 data_ptr; moving forward iommu hw cap is not the only information reported via this interface, e.g. it can be also used to report pasid mode. from this angle it's better to rename above two fields to be iommu specific, e.g.: __u32 iommu_data_len; __aligned_u64 iommu_data_ptr; > + __u32 out_device_type; > + __u32 __reserved; > +}; > +#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE, > IOMMUFD_CMD_DEVICE_GET_INFO) > #endif > -- > 2.34.1
On 10/02/2023 07:55, Tian, Kevin wrote: >> From: Liu, Yi L <yi.l.liu@intel.com> >> Sent: Thursday, February 9, 2023 12:17 PM >> + * zeroed if the user buffer is larger than the data kernel has. >> + * >> + * The type specific data would be used to sync capability between the >> + * vIOMMU and the hardware IOMMU, also for the availabillity checking of >> + * iommu hardware features like dirty page tracking in I/O page table. > > It's fine to report format information related to stage-1 page table > which userspace manages. > > but IMHO this should not be an interface to report which capability is > supported by iommufd. Having hardware supporting dirty bit > doesn't mean the underlying iommu driver provides necessary support > to iommufd dirty tracking. > +1 In fact this shouldn't really be a way to check any capability as we are dumping straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data, forces the application to understand IOMMU hardware specific formats. Maybe that's OK for hw nesting as there's a lot of info you need to know for the vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit disconnected from what really the software (IOMMUFD) and driver can use, without getting onto assumptions. [Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not asking IOMMUFD support] For capability checking, I think this really should be returning capabilities that both IOMMU driver supports ... and that IOMMUFD understands and marshalled on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other thing within the kernel (now that's per-device), or even simpler. That's at least had written initially for the dirty tracking series.
On Fri, Feb 10, 2023 at 11:10:34AM +0000, Joao Martins wrote: > On 10/02/2023 07:55, Tian, Kevin wrote: > >> From: Liu, Yi L <yi.l.liu@intel.com> > >> Sent: Thursday, February 9, 2023 12:17 PM > >> + * zeroed if the user buffer is larger than the data kernel has. > >> + * > >> + * The type specific data would be used to sync capability between the > >> + * vIOMMU and the hardware IOMMU, also for the availabillity checking of > >> + * iommu hardware features like dirty page tracking in I/O page table. > > > > It's fine to report format information related to stage-1 page table > > which userspace manages. > > > > but IMHO this should not be an interface to report which capability is > > supported by iommufd. Having hardware supporting dirty bit > > doesn't mean the underlying iommu driver provides necessary support > > to iommufd dirty tracking. > > > > +1 > > In fact this shouldn't really be a way to check any capability as we are dumping > straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data, > forces the application to understand IOMMU hardware specific formats. Maybe > that's OK for hw nesting as there's a lot of info you need to know for the > vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit > disconnected from what really the software (IOMMUFD) and driver can use, without > getting onto assumptions. > > [Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not > asking IOMMUFD support] > > For capability checking, I think this really should be returning capabilities > that both IOMMU driver supports ... and that IOMMUFD understands and marshalled > on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other > thing within the kernel (now that's per-device), or even simpler. That's at > least had written initially for the dirty tracking series. Yes, the design of IOMMU_DEVICE_GET_INFO is already split. The HW specific structure should only contain things related to operating the HW specific paths (ie other HW specific structures in other APIs) The enclosing struct should have general information about operating the device through the abstract API - like dirty tracking. But it may be that HW will individually report dirty tracking capabilties related to special page table types - ie can the S2 page table do dirty tracking Jason
On Fri, Feb 10, 2023 at 07:55:42AM +0000, Tian, Kevin wrote: > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Thursday, February 9, 2023 12:17 PM > > > > +int iommufd_device_get_info(struct iommufd_ucmd *ucmd) > > +{ > > + struct iommu_device_info *cmd = ucmd->cmd; > > + struct iommufd_object *dev_obj; > > + struct device *dev; > > + const struct iommu_ops *ops; > > + void *data; > > + unsigned int length, data_len; > > + int rc; > > + > > + if (cmd->flags || cmd->__reserved || !cmd->data_len) > > + return -EOPNOTSUPP; > > do we want !cmd->data_len being a way to check how many bytes are > required in a following call to get the vendor info? There is no reason, if the userspace doesn't have a struct large enough then it also doesn't know what the extra bytes should even be used for. No reason to read the. > > +struct iommu_device_info { > > + __u32 size; > > + __u32 flags; > > + __u32 dev_id; > > + __u32 data_len; > > + __aligned_u64 data_ptr; > > moving forward iommu hw cap is not the only information reported > via this interface, e.g. it can be also used to report pasid mode. > > from this angle it's better to rename above two fields to be iommu > specific, e.g.: > > __u32 iommu_data_len; > __aligned_u64 iommu_data_ptr; maybe call it hw info Jason
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 8a9834fc129a..3b64aef24807 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -134,6 +134,78 @@ void iommufd_device_unbind(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD); +static int iommufd_zero_fill_user(u64 ptr, int bytes) +{ + int index = 0; + + for (; index < bytes; index++) { + if (put_user(0, (uint8_t __user *)(ptr + index))) + return -EFAULT; + } + return 0; +} + +int iommufd_device_get_info(struct iommufd_ucmd *ucmd) +{ + struct iommu_device_info *cmd = ucmd->cmd; + struct iommufd_object *dev_obj; + struct device *dev; + const struct iommu_ops *ops; + void *data; + unsigned int length, data_len; + int rc; + + if (cmd->flags || cmd->__reserved || !cmd->data_len) + return -EOPNOTSUPP; + + dev_obj = iommufd_get_object(ucmd->ictx, cmd->dev_id, + IOMMUFD_OBJ_DEVICE); + if (IS_ERR(dev_obj)) + return PTR_ERR(dev_obj); + + dev = container_of(dev_obj, struct iommufd_device, obj)->dev; + + ops = dev_iommu_ops(dev); + if (!ops || !ops->hw_info) { + rc = -EOPNOTSUPP; + goto out_put; + } + + data = ops->hw_info(dev, &data_len); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out_put; + } + + length = min(cmd->data_len, data_len); + if (copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length)) { + rc = -EFAULT; + goto out_free_data; + } + + /* + * Zero the trailing bytes for userspace if the buffer is bigger + * than the data size kernel actually has. + */ + if (length < cmd->data_len) { + rc = iommufd_zero_fill_user(cmd->data_ptr + length, + cmd->data_len - length); + if (rc) + goto out_free_data; + } + + cmd->out_device_type = ops->driver_type; + cmd->data_len = data_len; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + +out_free_data: + kfree(data); +out_put: + iommufd_put_object(dev_obj); + return rc; +} + static int iommufd_device_setup_msi(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt, phys_addr_t sw_msi_start) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 200c783800ad..4a0a1a7fdae1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -258,6 +258,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); void iommufd_device_destroy(struct iommufd_object *obj); +int iommufd_device_get_info(struct iommufd_ucmd *ucmd); struct iommufd_access { struct iommufd_object obj; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3fbe636c3d8a..59aa30ad1090 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -250,6 +250,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd) union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_device_info info; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -281,6 +282,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_DEVICE_GET_INFO, iommufd_device_get_info, struct iommu_device_info, + __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index fda75c8450ee..6cfe102f26f3 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -45,6 +45,7 @@ enum { IOMMUFD_CMD_IOAS_UNMAP, IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, + IOMMUFD_CMD_DEVICE_GET_INFO, }; /** @@ -371,4 +372,39 @@ struct iommu_device_info_vtd { __aligned_u64 cap_reg; __aligned_u64 ecap_reg; }; + +/** + * struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO) + * @size: sizeof(struct iommu_device_info) + * @flags: Must be 0 + * @dev_id: The device being attached to the IOMMU + * @data_len: Input the type specific data buffer length in bytes + * @data_ptr: Pointer to the type specific structure (e.g. + * struct iommu_device_info_vtd) + * @out_device_type: Output the underlying iommu hardware type, it is + * one of enum iommu_device_data_type. + * @__reserved: Must be 0 + * + * Query the hardware iommu capability for given device which has been + * bound to iommufd. @data_len is set to be the size of the buffer to + * type specific data and the data will be filled. Trailing bytes are + * zeroed if the user buffer is larger than the data kernel has. + * + * The type specific data would be used to sync capability between the + * vIOMMU and the hardware IOMMU, also for the availabillity checking of + * iommu hardware features like dirty page tracking in I/O page table. + * + * The @out_device_type will be filled if the ioctl succeeds. It would + * be used to decode the data filled in the buffer pointed by @data_ptr. + */ +struct iommu_device_info { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 data_len; + __aligned_u64 data_ptr; + __u32 out_device_type; + __u32 __reserved; +}; +#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_INFO) #endif