Message ID | 20201021085655.1192025-1-daniel.vetter@ffwll.ch |
---|---|
Headers | show |
Series | follow_pfn and other iomap races | expand |
On Wed, Oct 21, 2020 at 10:56:51AM +0200, Daniel Vetter wrote: > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs > files, and the old proc interface. Two check against > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, > this starts to matter, since we don't want random userspace having > access to PCI BARs while a driver is loaded and using it. > > Fix this by adding the same iomem_is_exclusive() check we already have > on the sysfs side in pci_mmap_resource(). > > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com> Maybe not for fixing in this series, but this access to IORESOURCE_BUSY doesn't have any locking. The write side holds the resource_lock at least.. > ret = pci_mmap_page_range(dev, i, vma, > fpriv->mmap_state, write_combine); At this point the vma isn't linked into the address space, so doesn't this happen? CPU 0 CPU1 mmap_region() vma = vm_area_alloc proc_bus_pci_mmap iomem_is_exclusive pci_mmap_page_range revoke_devmem unmap_mapping_range() // vma is not linked to the address space here, // unmap doesn't find it vma_link() !!! The VMA gets mapped with the revoked PTEs I couldn't find anything that prevents it at least, no mmap_sem on the unmap side, just the i_mmap_lock Not seeing how address space and pre-populating during mmap work together? Did I miss locking someplace? Not something to be fixed for this series, this is clearly an improvement, but seems like another problem to tackle? Jason
On Wed, Oct 21, 2020 at 10:56:39AM +0200, Daniel Vetter wrote: > Hi all, > > Round 3 of my patch series to clamp down a bunch of races and gaps > around follow_pfn and other access to iomem mmaps. Previous version: > > v1: https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/ > v2: https://lore.kernel.org/dri-devel/20201009075934.3509076-1-daniel.vetter@ffwll.ch > > And the discussion that sparked this journey: > > https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/ > > I was waiting for the testing result for habanalabs from Oded, but I guess > Oded was waiting for my v3. > > Changes in v3: > - Bunch of polish all over, no functional changes aside from one barrier > in the resource code, for consistency. > - A few more r-b tags. > > Changes in v2: > - tons of small polish&fixes all over, thanks to all the reviewers who > spotted issues > - I managed to test at least the generic_access_phys and pci mmap revoke > stuff with a few gdb sessions using our i915 debug tools (hence now also > the drm/i915 patch to properly request all the pci bar regions) > - reworked approach for the pci mmap revoke: Infrastructure moved into > kernel/resource.c, address_space mapping is now set up at open time for > everyone (which required some sysfs changes). Does indeed look a lot > cleaner and a lot less invasive than I feared at first. > > The big thing I can't test are all the frame_vector changes in habanalbas, > exynos and media. Gerald has given the s390 patch a spin already. > > Review, testing, feedback all very much welcome. > > Daniel Vetter (16): > drm/exynos: Stop using frame_vector helpers > drm/exynos: Use FOLL_LONGTERM for g2d cmdlists > misc/habana: Stop using frame_vector helpers > misc/habana: Use FOLL_LONGTERM for userptr > mm/frame-vector: Use FOLL_LONGTERM > media: videobuf2: Move frame_vector into media subsystem > mm: Close race in generic_access_phys > s390/pci: Remove races against pte updates > mm: Add unsafe_follow_pfn > media/videbuf1|2: Mark follow_pfn usage as unsafe > vfio/type1: Mark follow_pfn as unsafe > PCI: Obey iomem restrictions for procfs mmap > /dev/mem: Only set filp->f_mapping > resource: Move devmem revoke code to resource framework > sysfs: Support zapping of binary attr mmaps > PCI: Revoke mappings like devmem The whole thing looks like a great improvement! Thanks, Jason
On Wed, Oct 21, 2020 at 04:42:11PM +0200, Daniel Vetter wrote: > Uh yes. In drivers/gpu this isn't a problem because we only install > ptes from the vm_ops->fault handler. So no races. And I don't think > you can fix this otherwise through holding locks: mmap_sem we can't > hold because before vma_link we don't even know which mm_struct is > involved, so can't solve the race. Plus this would be worse that > mm_take_all_locks used by mmu notifier. And the address_space > i_mmap_lock is also no good since it's not held during the ->mmap > callback, when we write the ptes. And the resource locks is even less > useful, since we're not going to hold that at vma_link() time for > sure. > > Hence delaying the pte writes after the vma_link, which means ->fault > time, looks like the only way to close this gap. > Trouble is I have no idea how to do this cleanly ... How about add a vm_ops callback 'install_pages'/'prefault_pages' ? Call it after vm_link() - basically just move the remap_pfn, under some other lock, into there. Jason
On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote: > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to > split that. So ideally ->mmap would never set up any ptes. /dev/mem makes pgoff == pfn so it doesn't get changed by remap. pgoff doesn't get touched for MAP_SHARED either, so there are other users that could work like this - eg anyone mmaping IO memory is probably OK. > I guess one option would be if remap_pfn_range would steal the > vma->vm_ops pointer for itself, then it could set up the correct > ->install_ptes hook. But there's tons of callers for that, so not sure > that's a bright idea. The caller has to check that the mapping is still live, and I think hold a lock across the remap? Auto-defering it doesn't seem feasible. Jason
On Wed, Oct 21, 2020 at 1:57 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > When we care about pagecache maintenance, we need to make sure that > both f_mapping and i_mapping point at the right mapping. > > But for iomem mappings we only care about the virtual/pte side of > things, so f_mapping is enough. Also setting inode->i_mapping was > confusing me as a driver maintainer, since in e.g. drivers/gpu we > don't do that. Per Dan this seems to be copypasta from places which do > care about pagecache consistency, but not needed. Hence remove it for > slightly less confusion. > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote: > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote: > > > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to > > > split that. So ideally ->mmap would never set up any ptes. > > > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap. > > > > pgoff doesn't get touched for MAP_SHARED either, so there are other > > users that could work like this - eg anyone mmaping IO memory is > > probably OK. > > I was more generally thinking for io_remap_pfn_users because of the > mkwrite use-case we might have in fbdev emulation in drm. You have a use case for MAP_PRIVATE and io_remap_pfn_range()?? Jason
On Thu, Oct 22, 2020 at 09:00:44AM +0200, Daniel Vetter wrote: > On Thu, Oct 22, 2020 at 1:20 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote: > > > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote: > > > > > > > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to > > > > > split that. So ideally ->mmap would never set up any ptes. > > > > > > > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap. > > > > > > > > pgoff doesn't get touched for MAP_SHARED either, so there are other > > > > users that could work like this - eg anyone mmaping IO memory is > > > > probably OK. > > > > > > I was more generally thinking for io_remap_pfn_users because of the > > > mkwrite use-case we might have in fbdev emulation in drm. > > > > You have a use case for MAP_PRIVATE and io_remap_pfn_range()?? > > Uh no :-) So it is fine, the pgoff mangling only happens for MAP_PRIVATE Jason
On Thu, Oct 22, 2020 at 1:43 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Thu, Oct 22, 2020 at 09:00:44AM +0200, Daniel Vetter wrote: > > On Thu, Oct 22, 2020 at 1:20 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote: > > > > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > > > > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote: > > > > > > > > > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to > > > > > > split that. So ideally ->mmap would never set up any ptes. > > > > > > > > > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap. > > > > > > > > > > pgoff doesn't get touched for MAP_SHARED either, so there are other > > > > > users that could work like this - eg anyone mmaping IO memory is > > > > > probably OK. > > > > > > > > I was more generally thinking for io_remap_pfn_users because of the > > > > mkwrite use-case we might have in fbdev emulation in drm. > > > > > > You have a use case for MAP_PRIVATE and io_remap_pfn_range()?? > > > > Uh no :-) > > So it is fine, the pgoff mangling only happens for MAP_PRIVATE Ah right I got confused, thanks for clarifying. -Daniel