Message ID | 20230308221932.1548827-1-axelrasmussen@google.com |
---|---|
Headers | show |
Series | mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP | expand |
On Wed, Mar 08, 2023 at 02:19:29PM -0800, Axel Rasmussen wrote: > The basic problem is, over time we've added new userfaultfd ioctls, and > we've refactored the code so functions which used to handle only one > case are now re-used to deal with several cases. While this happened, we > didn't bother to rename the functions. > > Similarly, as we added new functions, we cargo-culted pieces of the > now-inconsistent naming scheme, so those functions too ended up with > names that don't make a lot of sense. > > A key point here is, "copy" in most userfaultfd code refers specifically > to UFFDIO_COPY, where we allocate a new page and copy its contents from > userspace. There are many functions with "copy" in the name that don't > actually do this (at least in some cases). > > So, rename things into a consistent scheme. The high level idea is that > the call stack for userfaultfd ioctls becomes: > > userfaultfd_ioctl > -> userfaultfd_(particular ioctl) > -> mfill_atomic_(particular kind of fill operation) > -> mfill_atomic /* loops over pages in range */ > -> mfill_atomic_pte /* deals with single pages */ > -> mfill_atomic_pte_(particular kind of fill operation) > -> mfill_atomic_install_pte > > There are of course some special cases (shmem, hugetlb), but this is the > general structure which all function names now adhere to. > > Acked-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> > --- > fs/userfaultfd.c | 18 +++---- > include/linux/hugetlb.h | 30 +++++------ > include/linux/userfaultfd_k.h | 18 +++---- > mm/hugetlb.c | 20 +++---- > mm/userfaultfd.c | 98 +++++++++++++++++------------------ > 5 files changed, 92 insertions(+), 92 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 44d1ee429eb0..365bf00dd8dd 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1741,9 +1741,9 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, > if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) > goto out; > if (mmget_not_zero(ctx->mm)) { > - ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, > - uffdio_copy.len, &ctx->mmap_changing, > - uffdio_copy.mode); > + ret = mfill_atomic_copy(ctx->mm, uffdio_copy.dst, uffdio_copy.src, > + uffdio_copy.len, &ctx->mmap_changing, > + uffdio_copy.mode); > mmput(ctx->mm); > } else { > return -ESRCH; > @@ -1793,9 +1793,9 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, > goto out; > > if (mmget_not_zero(ctx->mm)) { > - ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, > - uffdio_zeropage.range.len, > - &ctx->mmap_changing); > + ret = mfill_atomic_zeropage(ctx->mm, uffdio_zeropage.range.start, > + uffdio_zeropage.range.len, > + &ctx->mmap_changing); > mmput(ctx->mm); > } else { > return -ESRCH; > @@ -1903,9 +1903,9 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) > goto out; > > if (mmget_not_zero(ctx->mm)) { > - ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, > - uffdio_continue.range.len, > - &ctx->mmap_changing); > + ret = mfill_atomic_continue(ctx->mm, uffdio_continue.range.start, > + uffdio_continue.range.len, > + &ctx->mmap_changing); > mmput(ctx->mm); > } else { > return -ESRCH; > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 7c977d234aba..8f0467bf1cbd 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -158,13 +158,13 @@ unsigned long hugetlb_total_pages(void); > vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long address, unsigned int flags); > #ifdef CONFIG_USERFAULTFD > -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - unsigned long src_addr, > - enum mcopy_atomic_mode mode, > - struct page **pagep, > - bool wp_copy); > +int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + unsigned long src_addr, > + enum mcopy_atomic_mode mode, > + struct page **pagep, > + bool wp_copy); > #endif /* CONFIG_USERFAULTFD */ > bool hugetlb_reserve_pages(struct inode *inode, long from, long to, > struct vm_area_struct *vma, > @@ -393,14 +393,14 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, > } > > #ifdef CONFIG_USERFAULTFD > -static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > - pte_t *dst_pte, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - unsigned long src_addr, > - enum mcopy_atomic_mode mode, > - struct page **pagep, > - bool wp_copy) > +static inline int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, > + pte_t *dst_pte, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + unsigned long src_addr, > + enum mcopy_atomic_mode mode, > + struct page **pagep, > + bool wp_copy) > { > BUG(); > return 0; > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 3767f18114ef..468080125612 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -61,15 +61,15 @@ extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > unsigned long dst_addr, struct page *page, > bool newly_allocated, bool wp_copy); > > -extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, > - unsigned long src_start, unsigned long len, > - atomic_t *mmap_changing, __u64 mode); > -extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > - unsigned long dst_start, > - unsigned long len, > - atomic_t *mmap_changing); > -extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, > - unsigned long len, atomic_t *mmap_changing); > +extern ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start, > + unsigned long src_start, unsigned long len, > + atomic_t *mmap_changing, __u64 mode); > +extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, > + unsigned long dst_start, > + unsigned long len, > + atomic_t *mmap_changing); > +extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start, > + unsigned long len, atomic_t *mmap_changing); > extern int mwriteprotect_range(struct mm_struct *dst_mm, > unsigned long start, unsigned long len, > bool enable_wp, atomic_t *mmap_changing); > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 07abcb6eb203..4c9276549394 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6154,17 +6154,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > #ifdef CONFIG_USERFAULTFD > /* > - * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with > - * modifications for huge pages. > + * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte > + * with modifications for hugetlb pages. > */ > -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > - pte_t *dst_pte, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - unsigned long src_addr, > - enum mcopy_atomic_mode mode, > - struct page **pagep, > - bool wp_copy) > +int hugetlb_mfill_atomic_pte(struct mm_struct *dst_mm, > + pte_t *dst_pte, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + unsigned long src_addr, > + enum mcopy_atomic_mode mode, > + struct page **pagep, > + bool wp_copy) > { > bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); > struct hstate *h = hstate_vma(dst_vma); > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 53c3d916ff66..84db5b2fad3a 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -127,13 +127,13 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > return ret; > } > > -static int mcopy_atomic_pte(struct mm_struct *dst_mm, > - pmd_t *dst_pmd, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - unsigned long src_addr, > - struct page **pagep, > - bool wp_copy) > +static int mfill_atomic_pte_copy(struct mm_struct *dst_mm, > + pmd_t *dst_pmd, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + unsigned long src_addr, > + struct page **pagep, > + bool wp_copy) > { > void *page_kaddr; > int ret; > @@ -204,10 +204,10 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, > goto out; > } > > -static int mfill_zeropage_pte(struct mm_struct *dst_mm, > - pmd_t *dst_pmd, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr) > +static int mfill_atomic_pte_zeropage(struct mm_struct *dst_mm, > + pmd_t *dst_pmd, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr) > { > pte_t _dst_pte, *dst_pte; > spinlock_t *ptl; > @@ -240,11 +240,11 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, > } > > /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ > -static int mcontinue_atomic_pte(struct mm_struct *dst_mm, > - pmd_t *dst_pmd, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - bool wp_copy) > +static int mfill_atomic_pte_continue(struct mm_struct *dst_mm, > + pmd_t *dst_pmd, > + struct vm_area_struct *dst_vma, > + unsigned long dst_addr, > + bool wp_copy) > { > struct inode *inode = file_inode(dst_vma->vm_file); > pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); > @@ -307,10 +307,10 @@ static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address) > > #ifdef CONFIG_HUGETLB_PAGE > /* > - * __mcopy_atomic processing for HUGETLB vmas. Note that this routine is > + * mfill_atomic processing for HUGETLB vmas. Note that this routine is > * called with mmap_lock held, it will release mmap_lock before returning. > */ > -static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, > +static __always_inline ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm, > struct vm_area_struct *dst_vma, > unsigned long dst_start, > unsigned long src_start, > @@ -411,7 +411,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, > goto out_unlock; > } > > - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, > + err = hugetlb_mfill_atomic_pte(dst_mm, dst_pte, dst_vma, > dst_addr, src_addr, mode, &page, > wp_copy); > > @@ -463,7 +463,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, > } > #else /* !CONFIG_HUGETLB_PAGE */ > /* fail at build time if gcc attempts to use this */ > -extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, > +extern ssize_t mfill_atomic_hugetlb(struct mm_struct *dst_mm, > struct vm_area_struct *dst_vma, > unsigned long dst_start, > unsigned long src_start, > @@ -484,8 +484,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > ssize_t err; > > if (mode == MCOPY_ATOMIC_CONTINUE) { > - return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, > - wp_copy); > + return mfill_atomic_pte_continue(dst_mm, dst_pmd, dst_vma, > + dst_addr, wp_copy); > } > > /* > @@ -500,11 +500,11 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > */ > if (!(dst_vma->vm_flags & VM_SHARED)) { > if (mode == MCOPY_ATOMIC_NORMAL) > - err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, > - dst_addr, src_addr, page, > - wp_copy); > + err = mfill_atomic_pte_copy(dst_mm, dst_pmd, dst_vma, > + dst_addr, src_addr, page, > + wp_copy); > else > - err = mfill_zeropage_pte(dst_mm, dst_pmd, > + err = mfill_atomic_pte_zeropage(dst_mm, dst_pmd, > dst_vma, dst_addr); > } else { > err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, > @@ -516,13 +516,13 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, > return err; > } > > -static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, > - unsigned long dst_start, > - unsigned long src_start, > - unsigned long len, > - enum mcopy_atomic_mode mcopy_mode, > - atomic_t *mmap_changing, > - __u64 mode) > +static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm, > + unsigned long dst_start, > + unsigned long src_start, > + unsigned long len, > + enum mcopy_atomic_mode mcopy_mode, > + atomic_t *mmap_changing, > + __u64 mode) > { > struct vm_area_struct *dst_vma; > ssize_t err; > @@ -588,9 +588,9 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, > * If this is a HUGETLB vma, pass off to appropriate routine > */ > if (is_vm_hugetlb_page(dst_vma)) > - return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, > - src_start, len, mcopy_mode, > - wp_copy); > + return mfill_atomic_hugetlb(dst_mm, dst_vma, dst_start, > + src_start, len, mcopy_mode, > + wp_copy); > > if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) > goto out_unlock; > @@ -688,26 +688,26 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, > return copied ? copied : err; > } > > -ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, > - unsigned long src_start, unsigned long len, > - atomic_t *mmap_changing, __u64 mode) > +ssize_t mfill_atomic_copy(struct mm_struct *dst_mm, unsigned long dst_start, > + unsigned long src_start, unsigned long len, > + atomic_t *mmap_changing, __u64 mode) > { > - return __mcopy_atomic(dst_mm, dst_start, src_start, len, > - MCOPY_ATOMIC_NORMAL, mmap_changing, mode); > + return mfill_atomic(dst_mm, dst_start, src_start, len, > + MCOPY_ATOMIC_NORMAL, mmap_changing, mode); > } > > -ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > - unsigned long len, atomic_t *mmap_changing) > +ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, unsigned long start, > + unsigned long len, atomic_t *mmap_changing) > { > - return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, > - mmap_changing, 0); > + return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, > + mmap_changing, 0); > } > > -ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, > - unsigned long len, atomic_t *mmap_changing) > +ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start, > + unsigned long len, atomic_t *mmap_changing) > { > - return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, > - mmap_changing, 0); > + return mfill_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, > + mmap_changing, 0); > } > > long uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, > -- > 2.40.0.rc1.284.g88254d51c5-goog >
On Wed, Mar 08, 2023 at 02:19:32PM -0800, Axel Rasmussen wrote: > UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new > PTE to resolve a missing fault, one can install a write-protected one. > This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in > combination. > > This was motivated by testing HugeTLB HGM [1], and in particular its > interaction with userfaultfd features. Existing userfaultfd code > supports using WP and MINOR modes together (i.e. you can register an > area with both enabled), but without this CONTINUE flag the combination > is in practice unusable. > > So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing > as UFFDIO_COPY_MODE_WP, but for *minor* faults. > > Update the selftest to do some very basic exercising of the new flag. > > [1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/ > > Acked-by: Peter Xu <peterx@redhat.com> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> > --- > fs/userfaultfd.c | 8 ++++++-- > include/linux/userfaultfd_k.h | 3 ++- > include/uapi/linux/userfaultfd.h | 7 +++++++ > mm/userfaultfd.c | 5 +++-- > tools/testing/selftests/mm/userfaultfd.c | 4 ++++ > 5 files changed, 22 insertions(+), 5 deletions(-) > > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > index 005e5e306266..14059a0861bf 100644 > --- a/include/uapi/linux/userfaultfd.h > +++ b/include/uapi/linux/userfaultfd.h > @@ -297,6 +297,13 @@ struct uffdio_writeprotect { > struct uffdio_continue { > struct uffdio_range range; > #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) > + /* > + * UFFDIO_CONTINUE_MODE_WP will map the page write protected on > + * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the > + * write protected ioctl is implemented for the range > + * according to the uffdio_register.ioctls. > + */ > +#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1) Please add the description of the new flag to Documentation/ and to the userfaultfd man pages. > __u64 mode; > > /*
On Thu, Mar 9, 2023 at 1:11 AM Mike Rapoport <rppt@kernel.org> wrote: > > On Wed, Mar 08, 2023 at 02:19:32PM -0800, Axel Rasmussen wrote: > > UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new > > PTE to resolve a missing fault, one can install a write-protected one. > > This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in > > combination. > > > > This was motivated by testing HugeTLB HGM [1], and in particular its > > interaction with userfaultfd features. Existing userfaultfd code > > supports using WP and MINOR modes together (i.e. you can register an > > area with both enabled), but without this CONTINUE flag the combination > > is in practice unusable. > > > > So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing > > as UFFDIO_COPY_MODE_WP, but for *minor* faults. > > > > Update the selftest to do some very basic exercising of the new flag. > > > > [1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/ > > > > Acked-by: Peter Xu <peterx@redhat.com> > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> > > Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> > > > --- > > fs/userfaultfd.c | 8 ++++++-- > > include/linux/userfaultfd_k.h | 3 ++- > > include/uapi/linux/userfaultfd.h | 7 +++++++ > > mm/userfaultfd.c | 5 +++-- > > tools/testing/selftests/mm/userfaultfd.c | 4 ++++ > > 5 files changed, 22 insertions(+), 5 deletions(-) > > > > diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h > > index 005e5e306266..14059a0861bf 100644 > > --- a/include/uapi/linux/userfaultfd.h > > +++ b/include/uapi/linux/userfaultfd.h > > @@ -297,6 +297,13 @@ struct uffdio_writeprotect { > > struct uffdio_continue { > > struct uffdio_range range; > > #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) > > + /* > > + * UFFDIO_CONTINUE_MODE_WP will map the page write protected on > > + * the fly. UFFDIO_CONTINUE_MODE_WP is available only if the > > + * write protected ioctl is implemented for the range > > + * according to the uffdio_register.ioctls. > > + */ > > +#define UFFDIO_CONTINUE_MODE_WP ((__u64)1<<1) > > Please add the description of the new flag to Documentation/ and to the > userfaultfd man pages. Funny enough, neither flag is mentioned in Documentation/ today - I'll add a short passage about both. Happy to update the man pages as well, I'll send that patch separately. Thanks for reviewing! > > > __u64 mode; > > > > /* > > -- > Sincerely yours, > Mike.