diff mbox series

[3/3] mm: unexport follow_pfn

Message ID 20210316153303.3216674-4-daniel.vetter@ffwll.ch
State New
Headers show
Series switch to unsafe_follow_pfn | expand

Commit Message

Daniel Vetter March 16, 2021, 3:33 p.m. UTC
Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
follow_pte()")) have lost their callsites of follow_pfn(). All the
other ones have been switched over to unsafe_follow_pfn because they
cannot be fixed without breaking userspace api.

Argueably the vfio code is still racy, but that's kinda a bigger
picture. But since it does leak the pte beyond where it drops the pt
lock, without anything else like an mmu notifier guaranteeing
coherence, the problem is at least clearly visible in the vfio code.
So good enough with me.

I've decided to keep the explanation that after dropping the pt lock
you must have an mmu notifier if you keep using the pte somehow by
adjusting it and moving it into the kerneldoc for the new follow_pte()
function.

Cc: 3pvd@google.com
Cc: Jann Horn <jannh@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/mm.h |  2 --
 mm/memory.c        | 26 +++++---------------------
 mm/nommu.c         | 13 +------------
 3 files changed, 6 insertions(+), 35 deletions(-)

Comments

Daniel Vetter March 24, 2021, 7:17 p.m. UTC | #1
On Wed, Mar 24, 2021 at 09:52:11AM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
> > Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
> > follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
> > follow_pte()")) have lost their callsites of follow_pfn(). All the
> > other ones have been switched over to unsafe_follow_pfn because they
> > cannot be fixed without breaking userspace api.
> > 
> > Argueably the vfio code is still racy, but that's kinda a bigger
> > picture. But since it does leak the pte beyond where it drops the pt
> > lock, without anything else like an mmu notifier guaranteeing
> > coherence, the problem is at least clearly visible in the vfio code.
> > So good enough with me.
> > 
> > I've decided to keep the explanation that after dropping the pt lock
> > you must have an mmu notifier if you keep using the pte somehow by
> > adjusting it and moving it into the kerneldoc for the new follow_pte()
> > function.
> > 
> > Cc: 3pvd@google.com
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  include/linux/mm.h |  2 --
> >  mm/memory.c        | 26 +++++---------------------
> >  mm/nommu.c         | 13 +------------
> >  3 files changed, 6 insertions(+), 35 deletions(-)
> 
> I think this is the right thing to do.

Was just about to smash this into the topic branch for testing in
linux-next. Feel like an ack on the series, or at least the two mm
patches?
-Daniel

> 
> Alex is working on fixing VFIO and while kvm is still racy using
> follow pte, I think they are working on it too?
> 
> Jason
Daniel Vetter March 25, 2021, 9:33 p.m. UTC | #2
On Wed, Mar 24, 2021 at 08:17:10PM +0100, Daniel Vetter wrote:
> On Wed, Mar 24, 2021 at 09:52:11AM -0300, Jason Gunthorpe wrote:
> > On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
> > > Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
> > > follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
> > > follow_pte()")) have lost their callsites of follow_pfn(). All the
> > > other ones have been switched over to unsafe_follow_pfn because they
> > > cannot be fixed without breaking userspace api.
> > > 
> > > Argueably the vfio code is still racy, but that's kinda a bigger
> > > picture. But since it does leak the pte beyond where it drops the pt
> > > lock, without anything else like an mmu notifier guaranteeing
> > > coherence, the problem is at least clearly visible in the vfio code.
> > > So good enough with me.
> > > 
> > > I've decided to keep the explanation that after dropping the pt lock
> > > you must have an mmu notifier if you keep using the pte somehow by
> > > adjusting it and moving it into the kerneldoc for the new follow_pte()
> > > function.
> > > 
> > > Cc: 3pvd@google.com
> > > Cc: Jann Horn <jannh@google.com>
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > Cc: Peter Xu <peterx@redhat.com>
> > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >  include/linux/mm.h |  2 --
> > >  mm/memory.c        | 26 +++++---------------------
> > >  mm/nommu.c         | 13 +------------
> > >  3 files changed, 6 insertions(+), 35 deletions(-)
> > 
> > I think this is the right thing to do.
> 
> Was just about to smash this into the topic branch for testing in
> linux-next. Feel like an ack on the series, or at least the two mm
> patches?

Pushed them to my topic branch for a bit of testing in linux-next,
hopefully goes all fine for a pull for 5.13.
-Daniel
Jason Gunthorpe March 29, 2021, 1:31 p.m. UTC | #3
On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
> Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
> follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
> follow_pte()")) have lost their callsites of follow_pfn(). All the
> other ones have been switched over to unsafe_follow_pfn because they
> cannot be fixed without breaking userspace api.
> 
> Argueably the vfio code is still racy, but that's kinda a bigger

vfio and kvm

> picture. But since it does leak the pte beyond where it drops the pt
> lock, without anything else like an mmu notifier guaranteeing
> coherence, the problem is at least clearly visible in the vfio code.
> So good enough with me.
> 
> I've decided to keep the explanation that after dropping the pt lock
> you must have an mmu notifier if you keep using the pte somehow by
> adjusting it and moving it into the kerneldoc for the new follow_pte()
> function.
> 
> Cc: 3pvd@google.com
> Cc: Jann Horn <jannh@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> --- 
>  include/linux/mm.h |  2 --
>  mm/memory.c        | 26 +++++---------------------
>  mm/nommu.c         | 13 +------------
>  3 files changed, 6 insertions(+), 35 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Daniel Vetter April 8, 2021, 10:05 a.m. UTC | #4
On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
> On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
> > Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
> > follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
> > follow_pte()")) have lost their callsites of follow_pfn(). All the
> > other ones have been switched over to unsafe_follow_pfn because they
> > cannot be fixed without breaking userspace api.
> > 
> > Argueably the vfio code is still racy, but that's kinda a bigger
> 
> vfio and kvm

Hm I thought kvm is non-racy due to the mmu notifier catch races?

> 
> > picture. But since it does leak the pte beyond where it drops the pt
> > lock, without anything else like an mmu notifier guaranteeing
> > coherence, the problem is at least clearly visible in the vfio code.
> > So good enough with me.
> > 
> > I've decided to keep the explanation that after dropping the pt lock
> > you must have an mmu notifier if you keep using the pte somehow by
> > adjusting it and moving it into the kerneldoc for the new follow_pte()
> > function.
> > 
> > Cc: 3pvd@google.com
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: kvm@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > --- 
> >  include/linux/mm.h |  2 --
> >  mm/memory.c        | 26 +++++---------------------
> >  mm/nommu.c         | 13 +------------
> >  3 files changed, 6 insertions(+), 35 deletions(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks for your r-b tags, I'll add them.
-Daniel

> 
> Jason
Paolo Bonzini April 8, 2021, 11:40 a.m. UTC | #5
On 08/04/21 12:05, Daniel Vetter wrote:
> On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
>> On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
>>> Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
>>> follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
>>> follow_pte()")) have lost their callsites of follow_pfn(). All the
>>> other ones have been switched over to unsafe_follow_pfn because they
>>> cannot be fixed without breaking userspace api.
>>>
>>> Argueably the vfio code is still racy, but that's kinda a bigger
>>
>> vfio and kvm
> 
> Hm I thought kvm is non-racy due to the mmu notifier catch races?

No, but the plan is indeed to have some struct for each page that uses 
follow_pfn and update it from the MMU notifiers.

Paolo

>>
>>> picture. But since it does leak the pte beyond where it drops the pt
>>> lock, without anything else like an mmu notifier guaranteeing
>>> coherence, the problem is at least clearly visible in the vfio code.
>>> So good enough with me.
>>>
>>> I've decided to keep the explanation that after dropping the pt lock
>>> you must have an mmu notifier if you keep using the pte somehow by
>>> adjusting it and moving it into the kerneldoc for the new follow_pte()
>>> function.
>>>
>>> Cc: 3pvd@google.com
>>> Cc: Jann Horn <jannh@google.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Jason Gunthorpe <jgg@nvidia.com>
>>> Cc: Cornelia Huck <cohuck@redhat.com>
>>> Cc: Peter Xu <peterx@redhat.com>
>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>> Cc: linux-mm@kvack.org
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-samsung-soc@vger.kernel.org
>>> Cc: linux-media@vger.kernel.org
>>> Cc: kvm@vger.kernel.org
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> ---
>>>   include/linux/mm.h |  2 --
>>>   mm/memory.c        | 26 +++++---------------------
>>>   mm/nommu.c         | 13 +------------
>>>   3 files changed, 6 insertions(+), 35 deletions(-)
>>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Thanks for your r-b tags, I'll add them.
> -Daniel
> 
>>
>> Jason
>
Daniel Vetter April 8, 2021, 11:44 a.m. UTC | #6
On Thu, Apr 08, 2021 at 01:40:59PM +0200, Paolo Bonzini wrote:
> On 08/04/21 12:05, Daniel Vetter wrote:
> > On Mon, Mar 29, 2021 at 10:31:01AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Mar 16, 2021 at 04:33:03PM +0100, Daniel Vetter wrote:
> > > > Both kvm (in bd2fae8da794 ("KVM: do not assume PTE is writable after
> > > > follow_pfn")) and vfio (in 07956b6269d3 ("vfio/type1: Use
> > > > follow_pte()")) have lost their callsites of follow_pfn(). All the
> > > > other ones have been switched over to unsafe_follow_pfn because they
> > > > cannot be fixed without breaking userspace api.
> > > > 
> > > > Argueably the vfio code is still racy, but that's kinda a bigger
> > > 
> > > vfio and kvm
> > 
> > Hm I thought kvm is non-racy due to the mmu notifier catch races?
> 
> No, but the plan is indeed to have some struct for each page that uses
> follow_pfn and update it from the MMU notifiers.

Thanks for clarifying, I've fixed the commit message to mention both vfio
and kvm as Jason suggested. I didn't know that the follow_pte usage in kvm
still has some gaps wrt what's invalidated with mmu notifiers.

Thanks, Daniel

> 
> Paolo
> 
> > > 
> > > > picture. But since it does leak the pte beyond where it drops the pt
> > > > lock, without anything else like an mmu notifier guaranteeing
> > > > coherence, the problem is at least clearly visible in the vfio code.
> > > > So good enough with me.
> > > > 
> > > > I've decided to keep the explanation that after dropping the pt lock
> > > > you must have an mmu notifier if you keep using the pte somehow by
> > > > adjusting it and moving it into the kerneldoc for the new follow_pte()
> > > > function.
> > > > 
> > > > Cc: 3pvd@google.com
> > > > Cc: Jann Horn <jannh@google.com>
> > > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > Cc: Peter Xu <peterx@redhat.com>
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: kvm@vger.kernel.org
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >   include/linux/mm.h |  2 --
> > > >   mm/memory.c        | 26 +++++---------------------
> > > >   mm/nommu.c         | 13 +------------
> > > >   3 files changed, 6 insertions(+), 35 deletions(-)
> > > 
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > Thanks for your r-b tags, I'll add them.
> > -Daniel
> > 
> > > 
> > > Jason
> > 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index caec8b25d66f..304588e2f829 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1693,8 +1693,6 @@  int follow_invalidate_pte(struct mm_struct *mm, unsigned long address,
 			  pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pte(struct mm_struct *mm, unsigned long address,
 	       pte_t **ptepp, spinlock_t **ptlp);
-int follow_pfn(struct vm_area_struct *vma, unsigned long address,
-	unsigned long *pfn);
 int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
 		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index e8a145505b69..317e653c8aeb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4724,7 +4724,10 @@  int follow_invalidate_pte(struct mm_struct *mm, unsigned long address,
  * should be taken for read.
  *
  * KVM uses this function.  While it is arguably less bad than ``follow_pfn``,
- * it is not a good general-purpose API.
+ * it is not a good general-purpose API: If callers use the pte after they've
+ * unlocked @ptlp they must ensure coherency with pte updates by using a
+ * &mmu_notifier to follow updates. Any caller not following these requirements
+ * must use unsafe_follow_pfn() instead.
  *
  * Return: zero on success, -ve otherwise.
  */
@@ -4735,25 +4738,7 @@  int follow_pte(struct mm_struct *mm, unsigned long address,
 }
 EXPORT_SYMBOL_GPL(follow_pte);
 
-/**
- * follow_pfn - look up PFN at a user virtual address
- * @vma: memory mapping
- * @address: user virtual address
- * @pfn: location to store found PFN
- *
- * Only IO mappings and raw PFN mappings are allowed. Note that callers must
- * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
- * If this is not feasible, or the access to the @pfn is only very short term,
- * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
- * the access instead. Any caller not following these requirements must use
- * unsafe_follow_pfn() instead.
- *
- * This function does not allow the caller to read the permissions
- * of the PTE.  Do not use it.
- *
- * Return: zero and the pfn at @pfn on success, -ve otherwise.
- */
-int follow_pfn(struct vm_area_struct *vma, unsigned long address,
+static int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn)
 {
 	int ret = -EINVAL;
@@ -4770,7 +4755,6 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	pte_unmap_unlock(ptep, ptl);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(follow_pfn);
 
 /**
  * unsafe_follow_pfn - look up PFN at a user virtual address
diff --git a/mm/nommu.c b/mm/nommu.c
index 1dc983f50e2c..cee29d0791b3 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -111,17 +111,7 @@  unsigned int kobjsize(const void *objp)
 	return page_size(page);
 }
 
-/**
- * follow_pfn - look up PFN at a user virtual address
- * @vma: memory mapping
- * @address: user virtual address
- * @pfn: location to store found PFN
- *
- * Only IO mappings and raw PFN mappings are allowed.
- *
- * Returns zero and the pfn at @pfn on success, -ve otherwise.
- */
-int follow_pfn(struct vm_area_struct *vma, unsigned long address,
+static int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn)
 {
 	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
@@ -130,7 +120,6 @@  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	*pfn = address >> PAGE_SHIFT;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(follow_pfn);
 
 /**
  * unsafe_follow_pfn - look up PFN at a user virtual address