From patchwork Mon Sep 23 14:15:42 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 20532 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qa0-f69.google.com (mail-qa0-f69.google.com [209.85.216.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 8253325E03 for ; Mon, 23 Sep 2013 16:01:24 +0000 (UTC) Received: by mail-qa0-f69.google.com with SMTP id cm18sf2481521qab.4 for ; Mon, 23 Sep 2013 09:01:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:x-original-sender:x-original-authentication-results :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-unsubscribe; bh=38+FbHkc2qOZtjNphqi21GE2U2fBsqYmZ7qvChHLrmM=; b=GhozIUjh8T/iI1AhsRrjQKYx672O4eoZhv09XBgATn0dBIlxDqOmltbsIM6+ui+0Va T2UOKFD8pwRkF51DWvU9rbq4n5s/HmX0ri/98tMjIGfcUpK7IARj3p7x7eOXExyfSrAT oAdfz+lHiwuO6SQ9f6k4vgQVIGYZXJg/7o64oKU16/KQsJyLc+yZoCRmNIwXPQK+AWjP fwm705oh2c5it83rHHmsCGWwcB7JZ7uVCOnA7aLGiGtDzPLiUvX/4wiW5vOsSArsQSR9 vI5uaONyCWM8CQYQOYw7xgxrXNgG0HvwlkQI2K54WfpOZ0uO4brbB+Sl1go00BizDj7G vGCw== X-Gm-Message-State: ALoCoQk+aMIcpPZlmlhy0wycG6xbrNe1FZIM3SSDcnIZMhh5TriwZgnfBqs0xJSf9VIGdcLl5iub X-Received: by 10.236.73.164 with SMTP id v24mr7478225yhd.24.1379952084045; Mon, 23 Sep 2013 09:01:24 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.48.41 with SMTP id i9ls1231079qen.44.gmail; Mon, 23 Sep 2013 09:01:23 -0700 (PDT) X-Received: by 10.52.170.136 with SMTP id am8mr1018003vdc.33.1379952083921; Mon, 23 Sep 2013 09:01:23 -0700 (PDT) Received: from mail-vc0-f175.google.com (mail-vc0-f175.google.com [209.85.220.175]) by mx.google.com with ESMTPS id nr7si6976102vec.65.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Sep 2013 09:01:23 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.175 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.175; Received: by mail-vc0-f175.google.com with SMTP id ia10so2309249vcb.34 for ; Mon, 23 Sep 2013 09:01:23 -0700 (PDT) X-Received: by 10.58.134.16 with SMTP id pg16mr2404687veb.21.1379952083812; Mon, 23 Sep 2013 09:01:23 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp176048vcz; Mon, 23 Sep 2013 09:01:22 -0700 (PDT) X-Received: by 10.180.88.71 with SMTP id be7mr13645419wib.25.1379945757251; Mon, 23 Sep 2013 07:15:57 -0700 (PDT) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx.google.com with ESMTPS id vj5si10486960wjc.43.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Sep 2013 07:15:57 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.212.181 is neither permitted nor denied by best guess record for domain of steve.capper@linaro.org) client-ip=209.85.212.181; Received: by mail-wi0-f181.google.com with SMTP id ex4so2339971wid.14 for ; Mon, 23 Sep 2013 07:15:56 -0700 (PDT) X-Received: by 10.180.73.65 with SMTP id j1mr13849294wiv.10.1379945756532; Mon, 23 Sep 2013 07:15:56 -0700 (PDT) Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87]) by mx.google.com with ESMTPSA id i8sm26146219wiy.6.1969.12.31.16.00.00 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Sep 2013 07:15:56 -0700 (PDT) From: Steve Capper To: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk, nico@linaro.org, linaro-kernel@lists.linaro.org, patches@linaro.org, Steve Capper Subject: [RESEND RFC V2] ARM: mm: make UACCESS_WITH_MEMCPY huge page aware Date: Mon, 23 Sep 2013 15:15:42 +0100 Message-Id: <1379945742-9457-1-git-send-email-steve.capper@linaro.org> X-Mailer: git-send-email 1.7.10.4 X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: steve.capper@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.175 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Resending, as I ommitted a few important CC's. --- The memory pinning code in uaccess_with_memcpy.c does not check for HugeTLB or THP pmds, and will enter an infinite loop should a __copy_to_user or __clear_user occur against a huge page. This patch adds detection code for huge pages to pin_page_for_write. As this code can be executed in a fast path it refers to the actual pmds rather than the vma. If a HugeTLB or THP is found (they have the same pmd representation on ARM), the page table spinlock is taken to prevent modification whilst the page is pinned. On ARM, huge pages are only represented as pmds, thus no huge pud checks are performed. (For huge puds one would lock the page table in a similar manner as in the pmd case). Two helper functions are introduced; pmd_thp_or_huge will check whether or not a page is huge or transparent huge (which have the same pmd layout on ARM), and pmd_hugewillfault will detect whether or not a page fault will occur on write to the page. Changes since first RFC: * The page mask is widened for hugepages to reduce the number of potential locks/unlocks. (A knobbled /dev/zero with its latency reduction chunks removed shows a 2x data rate boost with hugepages backing: dd if=/dev/zero of=/dev/null bs=10M count=1024 ) Signed-off-by: Steve Capper --- arch/arm/include/asm/pgtable-3level.h | 3 ++ arch/arm/lib/uaccess_with_memcpy.c | 57 ++++++++++++++++++++++++++++++----- 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 5689c18..39c54cf 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -206,6 +206,9 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define __HAVE_ARCH_PMD_WRITE #define pmd_write(pmd) (!(pmd_val(pmd) & PMD_SECT_RDONLY)) +#define pmd_hugewillfault(pmd) (!pmd_young(pmd) || !pmd_write(pmd)) +#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING) diff --git a/arch/arm/lib/uaccess_with_memcpy.c b/arch/arm/lib/uaccess_with_memcpy.c index 025f742..78756db 100644 --- a/arch/arm/lib/uaccess_with_memcpy.c +++ b/arch/arm/lib/uaccess_with_memcpy.c @@ -18,11 +18,13 @@ #include /* for in_atomic() */ #include #include +#include #include #include static int -pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp) +pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp, + unsigned long *page_mask) { unsigned long addr = (unsigned long)_addr; pgd_t *pgd; @@ -40,7 +42,36 @@ pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp) return 0; pmd = pmd_offset(pud, addr); - if (unlikely(pmd_none(*pmd) || pmd_bad(*pmd))) + if (unlikely(pmd_none(*pmd))) + return 0; + + /* + * A pmd can be bad if it refers to a HugeTLB or THP page. + * + * Both THP and HugeTLB pages have the same pmd layout + * and should not be manipulated by the pte functions. + * + * Lock the page table for the destination and check + * to see that it's still huge and whether or not we will + * need to fault on write, or if we have a splitting THP. + */ + if (unlikely(pmd_thp_or_huge(*pmd))) { + ptl = ¤t->mm->page_table_lock; + spin_lock(ptl); + if (unlikely(!pmd_thp_or_huge(*pmd) + || pmd_hugewillfault(*pmd) + || pmd_trans_splitting(*pmd))) { + spin_unlock(ptl); + return 0; + } + + *ptep = NULL; + *ptlp = ptl; + *page_mask = HPAGE_MASK; + return 1; + } + + if (unlikely(pmd_bad(*pmd))) return 0; pte = pte_offset_map_lock(current->mm, pmd, addr, &ptl); @@ -52,6 +83,7 @@ pin_page_for_write(const void __user *_addr, pte_t **ptep, spinlock_t **ptlp) *ptep = pte; *ptlp = ptl; + *page_mask = PAGE_MASK; return 1; } @@ -60,6 +92,7 @@ static unsigned long noinline __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n) { int atomic; + unsigned long page_mask; if (unlikely(segment_eq(get_fs(), KERNEL_DS))) { memcpy((void *)to, from, n); @@ -76,7 +109,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n) spinlock_t *ptl; int tocopy; - while (!pin_page_for_write(to, &pte, &ptl)) { + while (!pin_page_for_write(to, &pte, &ptl, &page_mask)) { if (!atomic) up_read(¤t->mm->mmap_sem); if (__put_user(0, (char __user *)to)) @@ -85,7 +118,7 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n) down_read(¤t->mm->mmap_sem); } - tocopy = (~(unsigned long)to & ~PAGE_MASK) + 1; + tocopy = (~(unsigned long)to & ~page_mask) + 1; if (tocopy > n) tocopy = n; @@ -94,7 +127,10 @@ __copy_to_user_memcpy(void __user *to, const void *from, unsigned long n) from += tocopy; n -= tocopy; - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); + else + spin_unlock(ptl); } if (!atomic) up_read(¤t->mm->mmap_sem); @@ -121,6 +157,8 @@ __copy_to_user(void __user *to, const void *from, unsigned long n) static unsigned long noinline __clear_user_memset(void __user *addr, unsigned long n) { + unsigned long page_mask; + if (unlikely(segment_eq(get_fs(), KERNEL_DS))) { memset((void *)addr, 0, n); return 0; @@ -132,14 +170,14 @@ __clear_user_memset(void __user *addr, unsigned long n) spinlock_t *ptl; int tocopy; - while (!pin_page_for_write(addr, &pte, &ptl)) { + while (!pin_page_for_write(addr, &pte, &ptl, &page_mask)) { up_read(¤t->mm->mmap_sem); if (__put_user(0, (char __user *)addr)) goto out; down_read(¤t->mm->mmap_sem); } - tocopy = (~(unsigned long)addr & ~PAGE_MASK) + 1; + tocopy = (~(unsigned long)addr & ~page_mask) + 1; if (tocopy > n) tocopy = n; @@ -147,7 +185,10 @@ __clear_user_memset(void __user *addr, unsigned long n) addr += tocopy; n -= tocopy; - pte_unmap_unlock(pte, ptl); + if (pte) + pte_unmap_unlock(pte, ptl); + else + spin_unlock(ptl); } up_read(¤t->mm->mmap_sem);