Message ID | 20250428182904.93989-1-npache@redhat.com |
---|---|
Headers | show |
Series | mm: introduce THP deferred setting | expand |
On Tue, Apr 29, 2025 at 7:49 AM Zi Yan <ziy@nvidia.com> wrote: > > On 28 Apr 2025, at 14:29, Nico Pache wrote: > > > setting /transparent_hugepages/enabled=always allows applications > > to benefit from THPs without having to madvise. However, the pf handler > > s/pf/page fault > > > takes very few considerations to decide weather or not to actually use a > > s/weather/whether > > > THP. This can lead to a lot of wasted memory. khugepaged only operates > > on memory that was either allocated with enabled=always or MADV_HUGEPAGE. > > > > Introduce the ability to set enabled=defer, which will prevent THPs from > > being allocated by the page fault handler unless madvise is set, > > leaving it up to khugepaged to decide which allocations will collapse to a > > THP. This should allow applications to benefits from THPs, while curbing > > some of the memory waste. > > > > Co-developed-by: Rafael Aquini <raquini@redhat.com> > > Signed-off-by: Rafael Aquini <raquini@redhat.com> > > Signed-off-by: Nico Pache <npache@redhat.com> > > --- > > include/linux/huge_mm.h | 15 +++++++++++++-- > > mm/huge_memory.c | 31 +++++++++++++++++++++++++++---- > > 2 files changed, 40 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index e3d15c737008..57e6c962afb1 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -48,6 +48,7 @@ enum transparent_hugepage_flag { > > TRANSPARENT_HUGEPAGE_UNSUPPORTED, > > TRANSPARENT_HUGEPAGE_FLAG, > > TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, > > + TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG, > > What does INST mean here? Can you add one sentence on this new flag > in the commit log to explain what it is short for? "INSERT". Someone else commented on the length of this FLAG name. I forgot to update it. I can shorten it to something like ..DEFER_FLAG or DEFER_PF_FLAG > > > > TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, > > TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, > > TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, > > @@ -186,6 +187,7 @@ static inline bool hugepage_global_enabled(void) > > { > > return transparent_hugepage_flags & > > ((1<<TRANSPARENT_HUGEPAGE_FLAG) | > > + (1<<TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG) | > > (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)); > > } > > > > @@ -195,6 +197,12 @@ static inline bool hugepage_global_always(void) > > (1<<TRANSPARENT_HUGEPAGE_FLAG); > > } > > > > +static inline bool hugepage_global_defer(void) > > +{ > > + return transparent_hugepage_flags & > > + (1<<TRANSPARENT_HUGEPAGE_DEFER_PF_INST_FLAG); > > +} > > + > > static inline int highest_order(unsigned long orders) > > { > > return fls_long(orders) - 1; > > @@ -291,13 +299,16 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > > unsigned long tva_flags, > > unsigned long orders) > > { > > + if ((tva_flags & TVA_IN_PF) && hugepage_global_defer() && > > + !(vm_flags & VM_HUGEPAGE)) > > + return 0; > > + > > /* Optimization to check if required orders are enabled early. */ > > if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { > > unsigned long mask = READ_ONCE(huge_anon_orders_always); > > - > > This newline should stay, right? Yes, I can fix that. > > The rest looks good to me. Thanks. Acked-by: Zi Yan <ziy@nvidia.com> Thank you! -- Nico > > Best Regards, > Yan, Zi >
On 28 Apr 2025, at 14:29, Nico Pache wrote: > Now that we have defer to globally disable THPs at fault time, lets add > a defer setting to the mTHP options. This will allow khugepaged to > operate at that order, while avoiding it at PF time. > > Signed-off-by: Nico Pache <npache@redhat.com> > --- > include/linux/huge_mm.h | 5 +++++ > mm/huge_memory.c | 38 +++++++++++++++++++++++++++++++++----- > mm/khugepaged.c | 8 ++++---- > 3 files changed, 42 insertions(+), 9 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 57e6c962afb1..a877c59bea67 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -96,6 +96,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; > #define TVA_SMAPS (1 << 0) /* Will be used for procfs */ > #define TVA_IN_PF (1 << 1) /* Page fault handler */ > #define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ > +#define TVA_IN_KHUGEPAGE ((1 << 2) | (1 << 3)) /* Khugepaged defer support */ Why is TVA_IN_KHUGEPAGE a superset of TVA_ENFORCE_SYSFS? Because khugepaged also obeys sysfs configuration? I wonder if explicitly coding the behavior is better. For example, in __thp_vma_allowable_orders(), enforce_sysfs = tva_flags & (TVA_ENFORCE_SYSFS | TVA_IN_KHUGEPAGE). > > #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ > (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) > @@ -182,6 +183,7 @@ extern unsigned long transparent_hugepage_flags; > extern unsigned long huge_anon_orders_always; > extern unsigned long huge_anon_orders_madvise; > extern unsigned long huge_anon_orders_inherit; > +extern unsigned long huge_anon_orders_defer; > > static inline bool hugepage_global_enabled(void) > { > @@ -306,6 +308,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > /* Optimization to check if required orders are enabled early. */ > if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { And code here becomes tva_flags & (TVA_ENFORCE_SYSFS | TVA_IN_KHUGEPAGE). Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi
On Wed, Apr 30, 2025 at 2:34 PM Zi Yan <ziy@nvidia.com> wrote: > > On 28 Apr 2025, at 14:29, Nico Pache wrote: > > > Now that we have defer to globally disable THPs at fault time, lets add > > a defer setting to the mTHP options. This will allow khugepaged to > > operate at that order, while avoiding it at PF time. > > > > Signed-off-by: Nico Pache <npache@redhat.com> > > --- > > include/linux/huge_mm.h | 5 +++++ > > mm/huge_memory.c | 38 +++++++++++++++++++++++++++++++++----- > > mm/khugepaged.c | 8 ++++---- > > 3 files changed, 42 insertions(+), 9 deletions(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 57e6c962afb1..a877c59bea67 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -96,6 +96,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; > > #define TVA_SMAPS (1 << 0) /* Will be used for procfs */ > > #define TVA_IN_PF (1 << 1) /* Page fault handler */ > > #define TVA_ENFORCE_SYSFS (1 << 2) /* Obey sysfs configuration */ > > +#define TVA_IN_KHUGEPAGE ((1 << 2) | (1 << 3)) /* Khugepaged defer support */ > > Why is TVA_IN_KHUGEPAGE a superset of TVA_ENFORCE_SYSFS? Because khugepaged > also obeys sysfs configuration? Correct! The need for a TVA_IN_KHUGEPAGED is to isolate the "deferred" mTHPs from being "allowed" unless we are in khugepaged. > > I wonder if explicitly coding the behavior is better. For example, > in __thp_vma_allowable_orders(), enforce_sysfs = tva_flags & (TVA_ENFORCE_SYSFS | TVA_IN_KHUGEPAGE). I'm rather indifferent about either approach. If you (or any others) have a strong preference for an explicit (none-supersetted) TVA flag I can make the change. > > > > > #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ > > (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) > > @@ -182,6 +183,7 @@ extern unsigned long transparent_hugepage_flags; > > extern unsigned long huge_anon_orders_always; > > extern unsigned long huge_anon_orders_madvise; > > extern unsigned long huge_anon_orders_inherit; > > +extern unsigned long huge_anon_orders_defer; > > > > static inline bool hugepage_global_enabled(void) > > { > > @@ -306,6 +308,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, > > /* Optimization to check if required orders are enabled early. */ > > if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { > > And code here becomes tva_flags & (TVA_ENFORCE_SYSFS | TVA_IN_KHUGEPAGE). or just (enforce_sysfs & vma_is_anon) like you mentioned. Then we check for the TVA_IN_KHUGEPAGED before appending the defer bits. > > Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> Thanks ! > > -- > Best Regards, > Yan, Zi >