Message ID | 1590697854-21364-1-git-send-email-kwankhede@nvidia.com |
---|---|
Headers | show |
Series | Add UAPIs to support migration for VFIO devices | expand |
On Fri, May 29, 2020 at 02:00:50AM +0530, Kirti Wankhede wrote: > + * Calling the IOCTL with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set > + * returns the dirty pages bitmap for IOMMU container for a given IOVA range. > + * The user must specify the IOVA range and the pgsize through the structure > + * vfio_iommu_type1_dirty_bitmap_get in the data[] portion. This interface > + * supports getting a bitmap of the smallest supported pgsize only and can be > + * modified in future to get a bitmap of any specified supported pgsize. The > + * user must provide a zeroed memory area for the bitmap memory and specify its > + * size in bitmap.size. One bit is used to represent one page consecutively Does "user must provide a zeroed memory area" actually means "the vendor driver sets bits for pages that are dirty and leaves bits unchanged for pages that were not dirtied"? That is more flexible and different from requiring userspace to zero memory. For example, if userspace doesn't actually have to zero memory then it can accumulate dirty pages from multiple devices by passing the same bitmap buffers to multiple VFIO devices. If that's the intention, then the documentation shouldn't say "must" zero memory, because applications will need to violate that :). Instead the documentation should describe the behavior (setting dirty bits, leaving clean bits unmodified). > +struct vfio_iommu_type1_dirty_bitmap_get { > + __u64 iova; /* IO virtual address */ > + __u64 size; /* Size of iova range */ > + struct vfio_bitmap bitmap; > +}; Can the user application efficiently seek to the next dirty bit or does this API force it to scan the entire iova space each time?
On Fri, May 29, 2020 at 02:00:46AM +0530, Kirti Wankhede wrote: > * IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with > respect to IOMMU container rather than per device. All pages pinned by > vendor driver through vfio_pin_pages external API has to be marked as > dirty during migration. When IOMMU capable device is present in the > container and all pages are pinned and mapped, then all pages are marked > dirty. From what I can tell only the iommu participates in dirty page tracking. This places the responsibility for dirty page tracking on IOMMUs. My understanding is that support for dirty page tracking is currently not available in IOMMUs. Can a PCI device implement its own DMA dirty log and let an mdev driver implement the dirty page tracking using this mechanism? That way we don't need to treat all pinned pages as dirty all the time. Stefan
On Tue, 29 Sep 2020 09:27:02 +0100 Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Fri, May 29, 2020 at 02:00:46AM +0530, Kirti Wankhede wrote: > > * IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with > > respect to IOMMU container rather than per device. All pages pinned by > > vendor driver through vfio_pin_pages external API has to be marked as > > dirty during migration. When IOMMU capable device is present in the > > container and all pages are pinned and mapped, then all pages are marked > > dirty. > > From what I can tell only the iommu participates in dirty page tracking. > This places the responsibility for dirty page tracking on IOMMUs. My > understanding is that support for dirty page tracking is currently not > available in IOMMUs. > > Can a PCI device implement its own DMA dirty log and let an mdev driver > implement the dirty page tracking using this mechanism? That way we > don't need to treat all pinned pages as dirty all the time. Look at the last patch in this series, there we define a mechanism whereby the act of a vendor driver pinning pages both marks those pages dirty and indicates a mode in the vfio type1 container where the scope of dirty pages is limited to those pages pinned by the driver. The vfio_dma_rw() interface does the same. We could clearly implement a more lightweight interface for this as well, one without pinning or memory access, but there are no proposed users for such an interface currently. Thanks, Alex