Message ID | 20201008171558.410886-1-jean-philippe@linaro.org |
---|---|
Headers | show |
Series | virtio-iommu: VFIO integration | expand |
On Thu, 8 Oct 2020 19:15:56 +0200 Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > From: Bharat Bhushan <bbhushan2@marvell.com> > > Set IOMMU supported page size mask same as host Linux supported page > size mask. > > Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > --- > hw/vfio/common.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 13471ae2943..e66054b02a7 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -636,6 +636,14 @@ static void vfio_listener_region_add(MemoryListener *listener, > int128_get64(llend), > iommu_idx); > > + ret = memory_region_iommu_set_page_size_mask(giommu->iommu, > + container->pgsizes, > + &err); > + if (ret) { > + g_free(giommu); > + goto fail; > + } > + > ret = memory_region_register_iommu_notifier(section->mr, &giommu->n, > &err); > if (ret) { Acked-by: Alex Williamson <alex.williamson@redhat.com>
Hi Jean, On 10/8/20 7:15 PM, Jean-Philippe Brucker wrote: > From: Bharat Bhushan <bbhushan2@marvell.com> > > Extend VIRTIO_IOMMU_T_MAP/UNMAP request to notify memory listeners. It > will call VFIO notifier to map/unmap regions in the physical IOMMU. > > Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> > Signed-off-by: Eric Auger <eric.auger@redhat.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > --- > v10: > * Use the flags from IOMMUMemoryRegion > * Pass virt_start/virt_end rather than size, to avoid dealing with > overflow and for consistency. > --- > hw/virtio/virtio-iommu.c | 53 ++++++++++++++++++++++++++++++++++++++++ > hw/virtio/trace-events | 2 ++ > 2 files changed, 55 insertions(+) > > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c > index 33115e82186..fcdf3a819f8 100644 > --- a/hw/virtio/virtio-iommu.c > +++ b/hw/virtio/virtio-iommu.c > @@ -125,6 +125,49 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data) > } > } > > +static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr virt_start, > + hwaddr virt_end, hwaddr paddr) > +{ > + IOMMUTLBEntry entry; > + IOMMUNotifierFlag flags = mr->iommu_notify_flags; > + > + if (!(flags & IOMMU_NOTIFIER_MAP)) { > + return; > + } > + > + trace_virtio_iommu_notify_map(mr->parent_obj.name, virt_start, virt_end, > + paddr); > + > + entry.target_as = &address_space_memory; > + entry.addr_mask = virt_end - virt_start; > + entry.iova = virt_start; > + entry.perm = IOMMU_RW; logically you should be able to cascade the struct virtio_iommu_req_map *req flags field instead. > + entry.translated_addr = paddr; > + > + memory_region_notify_iommu(mr, 0, entry); > +} > + > +static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start, > + hwaddr virt_end) > +{ > + IOMMUTLBEntry entry; > + IOMMUNotifierFlag flags = mr->iommu_notify_flags; > + > + if (!(flags & IOMMU_NOTIFIER_UNMAP)) { > + return; > + } > + > + trace_virtio_iommu_notify_unmap(mr->parent_obj.name, virt_start, virt_end); > + > + entry.target_as = &address_space_memory; > + entry.addr_mask = virt_end - virt_start; > + entry.iova = virt_start; > + entry.perm = IOMMU_NONE; > + entry.translated_addr = 0; > + > + memory_region_notify_iommu(mr, 0, entry); > +} > + > static void virtio_iommu_detach_endpoint_from_domain(VirtIOIOMMUEndpoint *ep) > { > if (!ep->domain) { > @@ -315,6 +358,7 @@ static int virtio_iommu_map(VirtIOIOMMU *s, > VirtIOIOMMUDomain *domain; > VirtIOIOMMUInterval *interval; > VirtIOIOMMUMapping *mapping; > + VirtIOIOMMUEndpoint *ep; > > if (flags & ~VIRTIO_IOMMU_MAP_F_MASK) { > return VIRTIO_IOMMU_S_INVAL; > @@ -344,6 +388,10 @@ static int virtio_iommu_map(VirtIOIOMMU *s, > > g_tree_insert(domain->mappings, interval, mapping); > > + QLIST_FOREACH(ep, &domain->endpoint_list, next) { > + virtio_iommu_notify_map(ep->iommu_mr, virt_start, virt_end, phys_start); > + } > + > return VIRTIO_IOMMU_S_OK; > } > > @@ -356,6 +404,7 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s, > VirtIOIOMMUMapping *iter_val; > VirtIOIOMMUInterval interval, *iter_key; > VirtIOIOMMUDomain *domain; > + VirtIOIOMMUEndpoint *ep; > int ret = VIRTIO_IOMMU_S_OK; > > trace_virtio_iommu_unmap(domain_id, virt_start, virt_end); > @@ -373,6 +422,10 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s, > uint64_t current_high = iter_key->high; > > if (interval.low <= current_low && interval.high >= current_high) { > + QLIST_FOREACH(ep, &domain->endpoint_list, next) { > + virtio_iommu_notify_unmap(ep->iommu_mr, current_low, > + current_high); > + } > g_tree_remove(domain->mappings, iter_key); > trace_virtio_iommu_unmap_done(domain_id, current_low, current_high); > } else { > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > index cf1e59de302..65a48555c78 100644 > --- a/hw/virtio/trace-events > +++ b/hw/virtio/trace-events > @@ -106,6 +106,8 @@ virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d" > virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d" > virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address =0x%"PRIx64 > virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t start, uint64_t end) "dev= %d, type=%d start=0x%"PRIx64" end=0x%"PRIx64 > +virtio_iommu_notify_map(const char *name, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64" phys_start=0x%"PRIx64 > +virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 > > # virtio-mem.c > virtio_mem_send_response(uint16_t type) "type=%" PRIu16 > Besides it looks good to me. Thanks Eric
Hi Jean, On 10/8/20 7:15 PM, Jean-Philippe Brucker wrote: > This series adds support for VFIO endpoints to virtio-iommu. > > Versions 1 to 9 were posted by Bharat Bhushan, but I am taking over for > now since he doesn't have much time to spend on it. Thanks again Bharat > for the work! > > Two major changes since [v9]: > > * Don't use per-endoint page_size_mask properties. Instead keep a global > page size for the virtio-iommu device, updated when adding a VFIO > endpoint. Reject hotplug if the page size is incompatible. > > * Try to make the MAP/UNMAP paths more efficient, by keeping track of > memory region within the endpoint structure. > > More testing would be appreciated, since I can only test using a software > model as host at the moment. But it does seem to hold well with PCIe > hotplug/unplug, and pass-through to guest userspace, which are no mean > feat. I tested vhost migration and the following configurations: host 4K- guest 4K: vhost, vfio, vfio hotplug host 64K - guest 64K: vhost, vfio, vfio hotplug host 4K - guest 64K: vhost, vfio, vfio hoplug All those configs worked for me. I haven't noticed any isse with those. Thanks Eric > > Note that one page size combination is not supported: host 64kB guest > 4kB cannot work, because the host IOMMU driver will automatically pick > 64kB pages which prevents mapping at a smaller granule. Supporting this > case would require introducing a page size negotiation mechanism from > the guest all the way to the host IOMMU driver. Possible, but not > planned at the moment. > > [v9] https://lore.kernel.org/qemu-devel/20200323084617.1782-1-bbhushan2@marvell.com/ > > Bharat Bhushan (7): > virtio-iommu: Add memory notifiers for map/unmap > virtio-iommu: Call memory notifiers in attach/detach > virtio-iommu: Add replay() memory region callback > virtio-iommu: Add notify_flag_changed() memory region callback > memory: Add interface to set iommu page size mask > vfio: Set IOMMU page size as per host supported page size > virtio-iommu: Set supported page size mask > > Jean-Philippe Brucker (3): > virtio-iommu: Fix virtio_iommu_mr() > virtio-iommu: Store memory region in endpoint struct > vfio: Don't issue full 2^64 unmap > > include/exec/memory.h | 26 +++++ > hw/vfio/common.c | 19 ++++ > hw/virtio/virtio-iommu.c | 204 ++++++++++++++++++++++++++++++++++++++- > softmmu/memory.c | 13 +++ > hw/virtio/trace-events | 5 + > 5 files changed, 265 insertions(+), 2 deletions(-) >
On Thu, Oct 08, 2020 at 07:15:57PM +0200, Jean-Philippe Brucker wrote: > From: Bharat Bhushan <bbhushan2@marvell.com> > > The virtio-iommu device can deal with arbitrary page sizes for virtual > endpoints, but for endpoints assigned with VFIO it must follow the page > granule used by the host IOMMU driver. > > Implement the interface to set the vIOMMU page size mask, called by VFIO > for each endpoint. We assume that all host IOMMU drivers use the same > page granule (the host page granule). Override the page_size_mask field > in the virtio config space. > > Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > --- > v10: Use global page mask, allowing VFIO to override it until boot. > --- > hw/virtio/virtio-iommu.c | 51 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 51 insertions(+) > > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c > index 8823bfc804a..dd0b3093d1b 100644 > --- a/hw/virtio/virtio-iommu.c > +++ b/hw/virtio/virtio-iommu.c > @@ -914,6 +914,56 @@ static int virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr, > return 0; > } > > +static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr, > + uint64_t page_size_mask, > + Error **errp) > +{ > + int new_granule, old_granule; > + IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr); > + VirtIOIOMMU *s = sdev->viommu; > + > + if (!page_size_mask) { > + return -1; > + } > + > + new_granule = ctz64(page_size_mask); > + old_granule = ctz64(s->config.page_size_mask); > + > + /* > + * Modifying the page size after machine initialization isn't supported. > + * Having a different mask is possible but the guest will use sub-optimal > + * block sizes, so warn about it. > + */ > + if (qdev_hotplug) { > + if (new_granule != old_granule) { > + error_setg(errp, > + "virtio-iommu page mask 0x%"PRIx64 > + " is incompatible with mask 0x%"PRIx64, > + s->config.page_size_mask, page_size_mask); > + return -1; > + } else if (page_size_mask != s->config.page_size_mask) { > + warn_report("virtio-iommu page mask 0x%"PRIx64 > + " does not match 0x%"PRIx64, > + s->config.page_size_mask, page_size_mask); > + } > + return 0; > + } > + > + /* > + * Disallow shrinking the page size. For example if an endpoint only > + * supports 64kB pages, we can't globally enable 4kB pages. But that > + * shouldn't happen, the host is unlikely to setup differing page granules. > + * The other bits are only hints describing optimal block sizes. > + */ > + if (new_granule < old_granule) { > + error_setg(errp, "memory region shrinks the virtio-iommu page granule"); > + return -1; > + } My understanding is that shrink is actually allowed, instead we should forbid growing of the mask? For example, initially the old_granule will always points to the guest page size. Then as long as the host page size (which new_granule represents) is smaller than the old_granule, then it seems fine... Or am I wrong? Another thing, IIUC this function will be majorly called in vfio code when the container page mask will be passed into it. If there're multiple vfio containers that support different host IOMMU page sizes, then IIUC the order of the call to virtio_iommu_set_page_size_mask() is undefined. It's probably related to which "-device vfio-pci,..." parameter is earlier. To make this simpler, I'm thinking whether we should just forbid the case where devices have different iommu page sizes. So when assigned devices are used, we make sure all host iommu page sizes are the same, and the value should be smaller than guest page size. Otherwise we'll simply fall back to guest psize. Thanks, > + > + s->config.page_size_mask = page_size_mask; > + return 0; > +} > + > static void virtio_iommu_device_realize(DeviceState *dev, Error **errp) > { > VirtIODevice *vdev = VIRTIO_DEVICE(dev); > @@ -1146,6 +1196,7 @@ static void virtio_iommu_memory_region_class_init(ObjectClass *klass, > imrc->translate = virtio_iommu_translate; > imrc->replay = virtio_iommu_replay; > imrc->notify_flag_changed = virtio_iommu_notify_flag_changed; > + imrc->iommu_set_page_size_mask = virtio_iommu_set_page_size_mask; > } > > static const TypeInfo virtio_iommu_info = { > -- > 2.28.0 >
On Mon, Oct 19, 2020 at 05:35:39PM -0400, Peter Xu wrote: > > + /* > > + * Disallow shrinking the page size. For example if an endpoint only > > + * supports 64kB pages, we can't globally enable 4kB pages. But that > > + * shouldn't happen, the host is unlikely to setup differing page granules. > > + * The other bits are only hints describing optimal block sizes. > > + */ > > + if (new_granule < old_granule) { > > + error_setg(errp, "memory region shrinks the virtio-iommu page granule"); > > + return -1; > > + } > > My understanding is that shrink is actually allowed, instead we should forbid > growing of the mask? For example, initially the old_granule will always points > to the guest page size. Then as long as the host page size (which new_granule > represents) is smaller than the old_granule, then it seems fine... Or am I wrong? The case I was checking against is two assigned devices with different page sizes. First one sets a 64kB page size, then the second one shouldn't be able to shrink it back to 4kB, because the guest would create mappings not aligned on 64kB, which can't be applied by the pIOMMU of the first device. But let's forget this case for now, in practice all assigned devices use the host page size. > > Another thing, IIUC this function will be majorly called in vfio code when the > container page mask will be passed into it. If there're multiple vfio > containers that support different host IOMMU page sizes, then IIUC the order of > the call to virtio_iommu_set_page_size_mask() is undefined. It's probably > related to which "-device vfio-pci,..." parameter is earlier. > > To make this simpler, I'm thinking whether we should just forbid the case where > devices have different iommu page sizes. So when assigned devices are used, we > make sure all host iommu page sizes are the same, and the value should be > smaller than guest page size. Otherwise we'll simply fall back to guest psize. Mostly agree, I need to simplify this function. I don't think we care about guest page size, though. Currently our default mask is TARGET_PAGE_MASK, which is the smallest size supported by vCPUs (generally 4kB), but it doesn't really mean guest page size, since the guest can choose a larger granule at runtime. Besides virtio-iommu can in theory map at byte granule if there isn't any assigned device, so our default mask could as well be ~0ULL (but doesn't work at the moment, I've tried). So what I'd like to do for next version: * Set qemu_real_host_page_mask as the default page mask, instead of the rather arbitrary TARGET_PAGE_MASK. Otherwise we cannot hotplug assigned devices on a 64kB host, since TARGET_PAGE_MASK is pretty much always 4kB. * Disallow changing the page size. It's simpler and works in practice if we default to qemu_real_host_page_mask. * For non-hotplug devices, allow changing the rest of the mask. For hotplug devices, only warn about it. Thanks, Jean
On Fri, Oct 16, 2020 at 09:58:28AM +0200, Auger Eric wrote: > > +static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr virt_start, > > + hwaddr virt_end, hwaddr paddr) > > +{ > > + IOMMUTLBEntry entry; > > + IOMMUNotifierFlag flags = mr->iommu_notify_flags; > > + > > + if (!(flags & IOMMU_NOTIFIER_MAP)) { > > + return; > > + } > > + > > + trace_virtio_iommu_notify_map(mr->parent_obj.name, virt_start, virt_end, > > + paddr); > > + > > + entry.target_as = &address_space_memory; > > + entry.addr_mask = virt_end - virt_start; > > + entry.iova = virt_start; > > + entry.perm = IOMMU_RW; > logically you should be able to cascade the struct virtio_iommu_req_map > *req flags field instead. Agreed. I'm also thinking of adding a check for VIRTIO_IOMMU_MAP_F_MMIO, to avoid going further into the notifier and maybe do the same for unmap. Thanks, Jean
On Thu, Oct 22, 2020 at 06:39:37PM +0200, Jean-Philippe Brucker wrote: > So what I'd like to do for next version: > > * Set qemu_real_host_page_mask as the default page mask, instead of the > rather arbitrary TARGET_PAGE_MASK. Oh, I thought TARGET_PAGE_MASK was intended - kernel committ 39b3b3c9cac1 ("iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE", 2020-03-27) explicitly introduced a check that virtio-iommu kernel driver will fail directly if this psize is bigger than PAGE_SIZE in the guest. So it sounds reasonable to have the default value as PAGE_SIZE (if it's the same as TARGET_PAGE_SIZE in QEMU, which seems true?). For example, I'm thinking whether qemu_real_host_page_mask could be bigger than PAGE_SIZE in the guest in some environments, then it seems virtio-iommu won't boot anymore without assigned devices, because that extra check above will always fail. > Otherwise we cannot hotplug assigned > devices on a 64kB host, since TARGET_PAGE_MASK is pretty much always > 4kB. > > * Disallow changing the page size. It's simpler and works in > practice if we default to qemu_real_host_page_mask. > > * For non-hotplug devices, allow changing the rest of the mask. For > hotplug devices, only warn about it. Could I ask what's "the rest of the mask"? On the driver side, I see that viommu_domain_finalise() will pick the largest supported page size to use, if so then we seem to be quite restricted on what page size we can use. I'm also a bit curious about what scenario we plan to support in this initial version, especially for ARM. For x86, I think it's probably always 4k everywhere so it's fairly simple. Know little on ARM side... Thanks,
On Thu, Oct 22, 2020 at 04:56:16PM -0400, Peter Xu wrote: > On Thu, Oct 22, 2020 at 06:39:37PM +0200, Jean-Philippe Brucker wrote: > > So what I'd like to do for next version: > > > > * Set qemu_real_host_page_mask as the default page mask, instead of the > > rather arbitrary TARGET_PAGE_MASK. > > Oh, I thought TARGET_PAGE_MASK was intended - kernel committ 39b3b3c9cac1 > ("iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE", 2020-03-27) > explicitly introduced a check that virtio-iommu kernel driver will fail > directly if this psize is bigger than PAGE_SIZE in the guest. So it sounds > reasonable to have the default value as PAGE_SIZE (if it's the same as > TARGET_PAGE_SIZE in QEMU, which seems true?). > > For example, I'm thinking whether qemu_real_host_page_mask could be bigger than > PAGE_SIZE in the guest in some environments, then it seems virtio-iommu won't > boot anymore without assigned devices, because that extra check above will > always fail. Right, I missed this problem again. Switching to qemu_real_host_page_mask is probably not the best idea until we solve the host64k-guest4k problem. > > > Otherwise we cannot hotplug assigned > > devices on a 64kB host, since TARGET_PAGE_MASK is pretty much always > > 4kB. > > > > * Disallow changing the page size. It's simpler and works in > > practice if we default to qemu_real_host_page_mask. > > > > * For non-hotplug devices, allow changing the rest of the mask. For > > hotplug devices, only warn about it. > > Could I ask what's "the rest of the mask"? The LSB in the mask defines the page size. The other bits define which block sizes are supported, for example 2MB and 1GB blocks with a 4k page size. These are only for optimization, the upper bits of the mask could also be all 1s. If the guest aligns its mappings on those block sizes, then the host can use intermediate levels in the page tables resulting in fewer IOTLB entries. > On the driver side, I see that > viommu_domain_finalise() will pick the largest supported page size to use, if > so then we seem to be quite restricted on what page size we can use. In Linux iommu_dma_alloc_remap() tries to allocate blocks based on the page mask (copied by viommu_domain_finalise() into domain->pgsize_bitmap) > I'm also a bit curious about what scenario we plan to support in this initial > version, especially for ARM. For x86, I think it's probably always 4k > everywhere so it's fairly simple. Know little on ARM side... Arm CPUs and SMMU support 4k, 16k and 64k page sizes. I don't think 16k is used anywhere but some distributions chose 64k (RHEL, I think?), others 4k, so we need to support both. Unfortunately as noted above host64k-guest4k is not possible without adding a negotiation mechanism to virtio-iommu, host VFIO and IOMMU driver. Thanks, Jean
On Fri, Oct 23, 2020 at 09:48:58AM +0200, Jean-Philippe Brucker wrote: > Arm CPUs and SMMU support 4k, 16k and 64k page sizes. I don't think 16k is > used anywhere but some distributions chose 64k (RHEL, I think?), others > 4k, so we need to support both. > > Unfortunately as noted above host64k-guest4k is not possible without > adding a negotiation mechanism to virtio-iommu, host VFIO and IOMMU > driver. I see. Then it seems we would still need to support host4k-guest64k. Maybe for assigned case, we can simply AND all the psize_masks of all the vfio containers that supported to replace the default psize mask (TARGET_PAGE_SIZE) without caring about whether it's shrinking or not? Note that current patch only update config.psize_mask to the new one, but I think we need to calculate the subset of all containers rather than a simply update. Then with the help of 39b3b3c9cac1 imho we'll gracefully fail the probe if the psize is not suitable anyway, e.g., host64k-guest4k. Thanks,
On Fri, Oct 23, 2020 at 12:47:02PM -0400, Peter Xu wrote: > On Fri, Oct 23, 2020 at 09:48:58AM +0200, Jean-Philippe Brucker wrote: > > Arm CPUs and SMMU support 4k, 16k and 64k page sizes. I don't think 16k is > > used anywhere but some distributions chose 64k (RHEL, I think?), others > > 4k, so we need to support both. > > > > Unfortunately as noted above host64k-guest4k is not possible without > > adding a negotiation mechanism to virtio-iommu, host VFIO and IOMMU > > driver. > > I see. Then it seems we would still need to support host4k-guest64k. > > Maybe for assigned case, we can simply AND all the psize_masks of all the vfio > containers that supported to replace the default psize mask (TARGET_PAGE_SIZE) > without caring about whether it's shrinking or not? Note that current patch > only update config.psize_mask to the new one, but I think we need to calculate > the subset of all containers rather than a simply update. Yes I think an AND is the right operation. We'll return an error if the resulting mask is 0. Then for hotplug, I think I'll keep the current "best effort" code from this patch. If necessary we could later add a parameter to set a default mask and guarantee hotplug success. Thanks, Jean > Then with the help > of 39b3b3c9cac1 imho we'll gracefully fail the probe if the psize is not > suitable anyway, e.g., host64k-guest4k. > > Thanks, > > -- > Peter Xu >
On Tue, Oct 27, 2020 at 06:38:40PM +0100, Jean-Philippe Brucker wrote: > On Fri, Oct 23, 2020 at 12:47:02PM -0400, Peter Xu wrote: > > On Fri, Oct 23, 2020 at 09:48:58AM +0200, Jean-Philippe Brucker wrote: > > > Arm CPUs and SMMU support 4k, 16k and 64k page sizes. I don't think 16k is > > > used anywhere but some distributions chose 64k (RHEL, I think?), others > > > 4k, so we need to support both. > > > > > > Unfortunately as noted above host64k-guest4k is not possible without > > > adding a negotiation mechanism to virtio-iommu, host VFIO and IOMMU > > > driver. > > > > I see. Then it seems we would still need to support host4k-guest64k. > > > > Maybe for assigned case, we can simply AND all the psize_masks of all the vfio > > containers that supported to replace the default psize mask (TARGET_PAGE_SIZE) > > without caring about whether it's shrinking or not? Note that current patch > > only update config.psize_mask to the new one, but I think we need to calculate > > the subset of all containers rather than a simply update. > > Yes I think an AND is the right operation. We'll return an error if the > resulting mask is 0. Then for hotplug, I think I'll keep the current "best > effort" code from this patch. If necessary we could later add a parameter > to set a default mask and guarantee hotplug success. > > Thanks, > Jean So I should expect a new version with that? > > Then with the help > > of 39b3b3c9cac1 imho we'll gracefully fail the probe if the psize is not > > suitable anyway, e.g., host64k-guest4k. > > > > Thanks, > > > > -- > > Peter Xu > >
On Thu, Oct 08, 2020 at 03:22:54PM -0600, Alex Williamson wrote: > On Thu, 8 Oct 2020 19:15:56 +0200 > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > > From: Bharat Bhushan <bbhushan2@marvell.com> > > > > Set IOMMU supported page size mask same as host Linux supported page > > size mask. > > > > Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > --- > > hw/vfio/common.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > index 13471ae2943..e66054b02a7 100644 > > --- a/hw/vfio/common.c > > +++ b/hw/vfio/common.c > > @@ -636,6 +636,14 @@ static void vfio_listener_region_add(MemoryListener *listener, > > int128_get64(llend), > > iommu_idx); > > > > + ret = memory_region_iommu_set_page_size_mask(giommu->iommu, > > + container->pgsizes, > > + &err); > > + if (ret) { > > + g_free(giommu); > > + goto fail; > > + } > > + > > ret = memory_region_register_iommu_notifier(section->mr, &giommu->n, > > &err); > > if (ret) { > > Acked-by: Alex Williamson <alex.williamson@redhat.com> This one too, seems independent of the rest of the patchset and can be merged separately, right? If so Acked-by: Michael S. Tsirkin <mst@redhat.com>
On Thu, Oct 08, 2020 at 07:15:48PM +0200, Jean-Philippe Brucker wrote: > This series adds support for VFIO endpoints to virtio-iommu. > > Versions 1 to 9 were posted by Bharat Bhushan, but I am taking over for > now since he doesn't have much time to spend on it. Thanks again Bharat > for the work! ok so just minor things left, correct? Do you plan to post v11? > Two major changes since [v9]: > > * Don't use per-endoint page_size_mask properties. Instead keep a global > page size for the virtio-iommu device, updated when adding a VFIO > endpoint. Reject hotplug if the page size is incompatible. > > * Try to make the MAP/UNMAP paths more efficient, by keeping track of > memory region within the endpoint structure. > > More testing would be appreciated, since I can only test using a software > model as host at the moment. But it does seem to hold well with PCIe > hotplug/unplug, and pass-through to guest userspace, which are no mean > feat. > > Note that one page size combination is not supported: host 64kB guest > 4kB cannot work, because the host IOMMU driver will automatically pick > 64kB pages which prevents mapping at a smaller granule. Supporting this > case would require introducing a page size negotiation mechanism from > the guest all the way to the host IOMMU driver. Possible, but not > planned at the moment. > > [v9] https://lore.kernel.org/qemu-devel/20200323084617.1782-1-bbhushan2@marvell.com/ > > Bharat Bhushan (7): > virtio-iommu: Add memory notifiers for map/unmap > virtio-iommu: Call memory notifiers in attach/detach > virtio-iommu: Add replay() memory region callback > virtio-iommu: Add notify_flag_changed() memory region callback > memory: Add interface to set iommu page size mask > vfio: Set IOMMU page size as per host supported page size > virtio-iommu: Set supported page size mask > > Jean-Philippe Brucker (3): > virtio-iommu: Fix virtio_iommu_mr() > virtio-iommu: Store memory region in endpoint struct > vfio: Don't issue full 2^64 unmap > > include/exec/memory.h | 26 +++++ > hw/vfio/common.c | 19 ++++ > hw/virtio/virtio-iommu.c | 204 ++++++++++++++++++++++++++++++++++++++- > softmmu/memory.c | 13 +++ > hw/virtio/trace-events | 5 + > 5 files changed, 265 insertions(+), 2 deletions(-) > > -- > 2.28.0
On Fri, Oct 30, 2020 at 06:27:35AM -0400, Michael S. Tsirkin wrote: > On Thu, Oct 08, 2020 at 07:15:48PM +0200, Jean-Philippe Brucker wrote: > > This series adds support for VFIO endpoints to virtio-iommu. > > > > Versions 1 to 9 were posted by Bharat Bhushan, but I am taking over for > > now since he doesn't have much time to spend on it. Thanks again Bharat > > for the work! > > ok so just minor things left, correct? Do you plan to post v11? Yes, today or early next week Thanks, Jean
On Fri, Oct 30, 2020 at 06:26:31AM -0400, Michael S. Tsirkin wrote: > On Thu, Oct 08, 2020 at 03:22:54PM -0600, Alex Williamson wrote: > > On Thu, 8 Oct 2020 19:15:56 +0200 > > Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > > > > From: Bharat Bhushan <bbhushan2@marvell.com> > > > > > > Set IOMMU supported page size mask same as host Linux supported page > > > size mask. > > > > > > Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> > > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > --- > > > hw/vfio/common.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index 13471ae2943..e66054b02a7 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -636,6 +636,14 @@ static void vfio_listener_region_add(MemoryListener *listener, > > > int128_get64(llend), > > > iommu_idx); > > > > > > + ret = memory_region_iommu_set_page_size_mask(giommu->iommu, > > > + container->pgsizes, > > > + &err); > > > + if (ret) { > > > + g_free(giommu); > > > + goto fail; > > > + } > > > + > > > ret = memory_region_register_iommu_notifier(section->mr, &giommu->n, > > > &err); > > > if (ret) { > > > > Acked-by: Alex Williamson <alex.williamson@redhat.com> > > This one too, seems independent of the rest of the > patchset and can be merged separately, right? This depends on patch 7 which introduces the function Thanks, Jean > If so > > Acked-by: Michael S. Tsirkin <mst@redhat.com> >