From patchwork Fri Nov 6 23:28:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 321558 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D69E1C5DF9D for ; Fri, 6 Nov 2020 23:30:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9DE5120867 for ; Fri, 6 Nov 2020 23:30:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728939AbgKFX3d (ORCPT ); Fri, 6 Nov 2020 18:29:33 -0500 Received: from mga04.intel.com ([192.55.52.120]:36107 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729211AbgKFX3b (ORCPT ); Fri, 6 Nov 2020 18:29:31 -0500 IronPort-SDR: Gl4blOXiuS6J03W3b37a0fr/OwprQOOahYeYrMho7jAzsWYV60oA/7M7HXBVaG11DoxqzLllaj sstFG23KcFlA== X-IronPort-AV: E=McAfee;i="6000,8403,9797"; a="167025595" X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="167025595" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:22 -0800 IronPort-SDR: 3/S5cCn+EWPD7piEWFg8bT+4+Ras3ej4Ko9LX4eBZ2eVf2PEVSU+5ghE3Ehxw1wKc5JbOk4Cci aWGFIu8Y1pWg== X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="529966019" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:22 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Ira Weiny , x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dan Williams , Greg KH Subject: [PATCH V3 01/10] x86/pkeys: Create pkeys_common.h Date: Fri, 6 Nov 2020 15:28:59 -0800 Message-Id: <20201106232908.364581-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201106232908.364581-1-ira.weiny@intel.com> References: <20201106232908.364581-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in similar fashions and can share common defines. Specifically PKS and PKU each have: 1. A single control register 2. The same number of keys 3. The same number of bits in the register per key 4. Access and Write disable in the same bit locations That means that we can share all the macros that synthesize and manipulate register values between the two features. Normally, these macros would be put in asm/pkeys.h to be used internally and externally to the arch code. However, the defines are required in pgtable.h and inclusion of pkeys.h in that header creates complex dependencies which are best resolved in a separate header. Share these defines by moving them into a new header, change their names to reflect the common use, and include the header where needed. Reviewed-by: Dave Hansen Signed-off-by: Ira Weiny --- NOTE: The initialization of init_pkru_value cause checkpatch errors because of the space after the '(' in the macros. We leave this as is because it is more readable in this format. And it was existing code. --- Changes from RFC V3 Per Dave Hansen Update commit message Add comment to PKR_AD_KEY macro --- arch/x86/include/asm/pgtable.h | 13 ++++++------- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/include/asm/pkeys_common.h | 15 +++++++++++++++ arch/x86/kernel/fpu/xstate.c | 8 ++++---- arch/x86/mm/pkeys.c | 14 ++++++-------- 5 files changed, 33 insertions(+), 19 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_common.h diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..bfbfb951fe65 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1360,9 +1360,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ -#define PKRU_AD_BIT 0x1 -#define PKRU_WD_BIT 0x2 -#define PKRU_BITS_PER_PKEY 2 +#include #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS extern u32 init_pkru_value; @@ -1372,18 +1370,19 @@ extern u32 init_pkru_value; static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; - return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits)); + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; + + return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); } static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; /* * Access-disable disables writes too so we need to check * both bits here. */ - return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits)); + return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); } static inline u16 pte_flags_pkey(unsigned long pte_flags) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 2ff9b98812b7..f9feba80894b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PKEYS_H #define _ASM_X86_PKEYS_H +#include + #define ARCH_DEFAULT_PKEY 0 /* diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h new file mode 100644 index 000000000000..737d916f476c --- /dev/null +++ b/arch/x86/include/asm/pkeys_common.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKEYS_INTERNAL_H +#define _ASM_X86_PKEYS_INTERNAL_H + +#define PKR_AD_BIT 0x1 +#define PKR_WD_BIT 0x2 +#define PKR_BITS_PER_PKEY 2 + +/* + * Generate an Access-Disable mask for the given pkey. Several of these can be + * OR'd together to generate pkey register values. + */ +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) + +#endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 5d8047441a0a..a99afc70cc0a 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -995,7 +995,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { u32 old_pkru; - int pkey_shift = (pkey * PKRU_BITS_PER_PKEY); + int pkey_shift = (pkey * PKR_BITS_PER_PKEY); u32 new_pkru_bits = 0; /* @@ -1014,16 +1014,16 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, /* Set the bits we need in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKRU_AD_BIT; + new_pkru_bits |= PKR_AD_BIT; if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKRU_WD_BIT; + new_pkru_bits |= PKR_WD_BIT; /* Shift the bits in to the correct place in PKRU for pkey: */ new_pkru_bits <<= pkey_shift; /* Get old PKRU and mask off any old bits in place: */ old_pkru = read_pkru(); - old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift); + old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); /* Write old part along with new part: */ write_pkru(old_pkru | new_pkru_bits); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 8873ed1438a9..f5efb4007e74 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -111,19 +111,17 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey return vma_pkey(vma); } -#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) - /* * Make the default PKRU value (at execve() time) as restrictive * as possible. This ensures that any threads clone()'d early * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) | - PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | - PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | - PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | - PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); +u32 init_pkru_value = PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) | + PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) | + PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) | + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15); /* * Called from the FPU code when creating a fresh set of FPU @@ -173,7 +171,7 @@ static ssize_t init_pkru_write_file(struct file *file, * up immediately if someone attempts to disable access * or writes to pkey 0. */ - if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) + if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT)) return -EINVAL; WRITE_ONCE(init_pkru_value, new_init_pkru); From patchwork Fri Nov 6 23:29:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 321562 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79322C4741F for ; Fri, 6 Nov 2020 23:29:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 36FA921556 for ; Fri, 6 Nov 2020 23:29:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728766AbgKFX3Z (ORCPT ); Fri, 6 Nov 2020 18:29:25 -0500 Received: from mga12.intel.com ([192.55.52.136]:55504 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728220AbgKFX3Z (ORCPT ); Fri, 6 Nov 2020 18:29:25 -0500 IronPort-SDR: as/5zFG6huIOUo21dE1SihoE0Cyb3X1Bj4kBPiWMGtA8n6NIK6lmZRQEE2EdZuCwasnaUJa8vq O0t75kCctF6A== X-IronPort-AV: E=McAfee;i="6000,8403,9797"; a="148893598" X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="148893598" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:25 -0800 IronPort-SDR: rbUZpwuvk6xYCkoYTkH1Bt7ETdx8XmgGvmGGIOdL7uYGh08RC67YOXBkNPJElhbQggJOK0KJwk uJGPtviVWExw== X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="354907170" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:24 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Ira Weiny , x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dan Williams , Greg KH Subject: [PATCH V3 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support Date: Fri, 6 Nov 2020 15:29:00 -0800 Message-Id: <20201106232908.364581-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201106232908.364581-1-ira.weiny@intel.com> References: <20201106232908.364581-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Define a helper, update_pkey_val(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. Co-developed-by: Peter Zijlstra Signed-off-by: Peter Zijlstra Signed-off-by: Ira Weiny --- Changes from RFC V3: Per Dave Hansen Update and add comments per Dave's review Per Peter Correct attribution --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 22 ++++------------------ arch/x86/mm/pkeys.c | 23 +++++++++++++++++++++++ 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index f9feba80894b..4526245b03e5 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -136,4 +136,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index a99afc70cc0a..a3bca3211eba 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -994,9 +994,7 @@ const void *get_xsave_field_ptr(int xfeature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru; - int pkey_shift = (pkey * PKR_BITS_PER_PKEY); - u32 new_pkru_bits = 0; + u32 pkru; /* * This check implies XSAVE support. OSPKE only gets @@ -1012,21 +1010,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, */ WARN_ON_ONCE(pkey >= arch_max_pkey()); - /* Set the bits we need in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - new_pkru_bits <<= pkey_shift; - - /* Get old PKRU and mask off any old bits in place: */ - old_pkru = read_pkru(); - old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); - - /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + pkru = read_pkru(); + pkru = update_pkey_val(pkru, pkey, init_val); + write_pkru(pkru); return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f5efb4007e74..d1dfe743e79f 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -208,3 +208,26 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=", setup_init_pkru); + +/* + * Replace disable bits for @pkey with values from @flags + * + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) +{ + int pkey_shift = pkey * PKR_BITS_PER_PKEY; + + /* Mask out old bit values */ + pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + + /* Or in new values */ + if (flags & PKEY_DISABLE_ACCESS) + pk_reg |= PKR_AD_BIT << pkey_shift; + if (flags & PKEY_DISABLE_WRITE) + pk_reg |= PKR_WD_BIT << pkey_shift; + + return pk_reg; +} From patchwork Fri Nov 6 23:29:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 321559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7881C2D0A3 for ; Fri, 6 Nov 2020 23:30:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94D2520867 for ; Fri, 6 Nov 2020 23:30:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729091AbgKFXaP (ORCPT ); Fri, 6 Nov 2020 18:30:15 -0500 Received: from mga11.intel.com ([192.55.52.93]:33110 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728245AbgKFX3d (ORCPT ); Fri, 6 Nov 2020 18:29:33 -0500 IronPort-SDR: HF5sF+Tsgce4BkFLuf12xuR7o/elVm0mfEmZ5D1873rykJJaB/PaDTyjm9VXVyO4Nn4pFbkWp0 NkRTgyrcLfKQ== X-IronPort-AV: E=McAfee;i="6000,8403,9797"; a="166102923" X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="166102923" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:32 -0800 IronPort-SDR: 7OXTxCuiKkfwSFExCxTUeymmQTDe7INfh86kVHcjc1y9qrZLrZ/kkJqkRhOnu/5xZMA39DYe9E Dqmpyit1/jwA== X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="364338328" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:32 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Ira Weiny , x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dan Williams , Greg KH Subject: [PATCH V3 06/10] x86/entry: Preserve PKRS MSR across exceptions Date: Fri, 6 Nov 2020 15:29:04 -0800 Message-Id: <20201106232908.364581-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201106232908.364581-1-ira.weiny@intel.com> References: <20201106232908.364581-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The PKRS MSR is not managed by XSAVE. It is preserved through a context switch but this support leaves exception handling code open to memory accesses during exceptions. 2 possible places for preserving this state were considered, irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and was potentially fraught with unintended consequences.[2] irqentry_state_t was already an object being used in the exception handling and is straightforward. It is also easy for any number of nested states to be tracked and eventually can be enhanced to store the reference counting required to support PKS through kmap reentry Preserve the current task's PKRS values in irqentry_state_t on exception entry and restoring them on exception exit. Each nested exception is further saved allowing for any number of levels of exception handling. Peter and Thomas both suggested parts of the patch, IDT and NMI respectively. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t Cc: Dave Hansen Cc: Andy Lutomirski Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Signed-off-by: Ira Weiny --- Changes from V1 remove redundant irq_state->pkrs This value is only needed for the global tracking. So it should be included in that patch and not in this one. Changes from RFC V3 Standardize on 'irq_state' variable name Per Dave Hansen irq_save_pkrs() -> irq_save_set_pkrs() Rebased based on clean up patch by Thomas Gleixner This includes moving irq_[save_set|restore]_pkrs() to the core as well. --- arch/x86/entry/common.c | 38 +++++++++++++++++++++++++++++ arch/x86/include/asm/pkeys_common.h | 5 ++-- arch/x86/mm/pkeys.c | 2 +- include/linux/entry-common.h | 13 ++++++++++ kernel/entry/common.c | 14 +++++++++-- 5 files changed, 67 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 87dea56a15d2..1b6a419a6fac 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -19,6 +19,7 @@ #include #include #include +#include #ifdef CONFIG_XEN_PV #include @@ -209,6 +210,41 @@ SYSCALL_DEFINE0(ni_syscall) return -ENOSYS; } +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +/* + * PKRS is a per-logical-processor MSR which overlays additional protection for + * pages which have been mapped with a protection key. + * + * The register is not maintained with XSAVE so we have to maintain the MSR + * value in software during context switch and exception handling. + * + * Context switches save the MSR in the task struct thus taking that value to + * other processors if necessary. + * + * To protect against exceptions having access to this memory we save the + * current running value and set the PKRS value for the duration of the + * exception. Thus preventing exception handlers from having the elevated + * access of the interrupted task. + */ +noinstr void irq_save_set_pkrs(irqentry_state_t *irq_state, u32 val) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + irq_state->thread_pkrs = current->thread.saved_pkrs; + write_pkrs(INIT_PKRS_VALUE); +} + +noinstr void irq_restore_pkrs(irqentry_state_t *irq_state) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + write_pkrs(irq_state->thread_pkrs); + current->thread.saved_pkrs = irq_state->thread_pkrs; +} +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #ifdef CONFIG_XEN_PV #ifndef CONFIG_PREEMPTION /* @@ -272,6 +308,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) inhcall = get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(irq_state.exit_rcu)) { + /* Normally called by irqentry_exit, we must restore pkrs here */ + irq_restore_pkrs(&irq_state); instrumentation_begin(); irqentry_exit_cond_resched(); instrumentation_end(); diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index 801a75615209..11a95e6efd2d 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -27,9 +27,10 @@ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS -void write_pkrs(u32 new_pkrs); +DECLARE_PER_CPU(u32, pkrs_cache); +noinstr void write_pkrs(u32 new_pkrs); #else -static inline void write_pkrs(u32 new_pkrs) { } +static __always_inline void write_pkrs(u32 new_pkrs) { } #endif #endif /*_ASM_X86_PKEYS_INTERNAL_H */ diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 76a62419c446..6892d4524868 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -248,7 +248,7 @@ DEFINE_PER_CPU(u32, pkrs_cache); * until all prior executions of WRPKRU have completed execution * and updated the PKRU register. */ -void write_pkrs(u32 new_pkrs) +noinstr void write_pkrs(u32 new_pkrs) { u32 *pkrs; diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 1193a70bcf1b..21eae007061d 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -348,6 +348,8 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs); #ifndef irqentry_state /** * struct irqentry_state - Opaque object for exception state storage + * @thread_pkrs: Thread Supervisor Pkey value to be restored when exception is + * complete. * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the * exit path has to invoke rcu_irq_exit(). * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that @@ -362,6 +364,9 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs); * the maintenance of the irqentry_*() functions. */ typedef struct irqentry_state { +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS + u32 thread_pkrs; +#endif union { bool exit_rcu; bool lockdep; @@ -369,6 +374,14 @@ typedef struct irqentry_state { } irqentry_state_t; #endif +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +noinstr void irq_save_set_pkrs(irqentry_state_t *irq_state, u32 val); +noinstr void irq_restore_pkrs(irqentry_state_t *irq_state); +#else +static __always_inline void irq_save_set_pkrs(irqentry_state_t *irq_state, u32 val) { } +static __always_inline void irq_restore_pkrs(irqentry_state_t *irq_state) { } +#endif + /** * irqentry_enter - Handle state tracking on ordinary interrupt entries * @regs: Pointer to pt_regs of interrupted context diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 5ed9d45fb41b..3590f5bb5870 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -334,7 +334,7 @@ noinstr void irqentry_enter(struct pt_regs *regs, irqentry_state_t *irq_state) instrumentation_end(); irq_state->exit_rcu = true; - return; + goto done; } /* @@ -348,6 +348,9 @@ noinstr void irqentry_enter(struct pt_regs *regs, irqentry_state_t *irq_state) rcu_irq_enter_check_tick(); trace_hardirqs_off_finish(); instrumentation_end(); + +done: + irq_save_set_pkrs(irq_state, INIT_PKRS_VALUE); } void irqentry_exit_cond_resched(void) @@ -369,7 +372,12 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t *irq_state) /* Check whether this returns to user mode */ if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); - } else if (!regs_irqs_disabled(regs)) { + return; + } + + irq_restore_pkrs(irq_state); + + if (!regs_irqs_disabled(regs)) { /* * If RCU was not watching on entry this needs to be done * carefully and needs the same ordering of lockdep/tracing @@ -415,10 +423,12 @@ void noinstr irqentry_nmi_enter(struct pt_regs *regs, irqentry_state_t *irq_stat trace_hardirqs_off_finish(); ftrace_nmi_enter(); instrumentation_end(); + irq_save_set_pkrs(irq_state, INIT_PKRS_VALUE); } void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t *irq_state) { + irq_restore_pkrs(irq_state); instrumentation_begin(); ftrace_nmi_exit(); if (irq_state->lockdep) { From patchwork Fri Nov 6 23:29:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 321561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 957E0C6379D for ; Fri, 6 Nov 2020 23:29:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50E86221FA for ; Fri, 6 Nov 2020 23:29:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729277AbgKFX3j (ORCPT ); Fri, 6 Nov 2020 18:29:39 -0500 Received: from mga09.intel.com ([134.134.136.24]:6401 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728971AbgKFX3h (ORCPT ); Fri, 6 Nov 2020 18:29:37 -0500 IronPort-SDR: g3xz5ewhkGPgV2heYBE9ywq+ikQlSZd2ZYcVI63+AHxl9mE6KIIpxPhXHTGiEA55jXE9Qt03ss 3DTJSQfOQd3g== X-IronPort-AV: E=McAfee;i="6000,8403,9797"; a="169761084" X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="169761084" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:36 -0800 IronPort-SDR: oJ4qP16GqiVLol5S/l+vImE4qLgOR+i9YlkkBN9/5avQg3oj0gbnAgdl4iA235XVX4UjkAEu+l zCDEturXBT3Q== X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="337730609" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:35 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-doc@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dan Williams , Greg KH Subject: [PATCH V3 08/10] x86/pks: Add PKS kernel API Date: Fri, 6 Nov 2020 15:29:06 -0800 Message-Id: <20201106232908.364581-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201106232908.364581-1-ira.weiny@intel.com> References: <20201106232908.364581-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu PKS allows kernel users to define domains of page mappings which have additional protections beyond the paging protections. Add an API to allocate, use, and free a protection key which identifies such a domain. Export 5 new symbols pks_key_alloc(), pks_mknoaccess(), pks_mkread(), pks_mkrdwr(), and pks_key_free(). Add 2 new macros; PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey). Update the protection key documentation to cover pkeys on supervisor pages. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- Changes from V2 From Greg KH Replace all WARN_ON_ONCE() uses with pr_err() From Dan Williams Add __must_check to pks_key_alloc() to help ensure users are using the API correctly Changes from V1 Per Dave Hansen Add flags to pks_key_alloc() to help future proof the interface if/when the key space is exhausted. Changes from RFC V3 Per Dave Hansen Put WARN_ON_ONCE in pks_key_free() s/pks_mknoaccess/pks_mk_noaccess/ s/pks_mkread/pks_mk_readonly/ s/pks_mkrdwr/pks_mk_readwrite/ Change return pks_key_alloc() to EOPNOTSUPP when not supported or configured Per Peter Zijlstra Remove unneeded preempt disable/enable --- Documentation/core-api/protection-keys.rst | 102 +++++++++++++--- arch/x86/include/asm/pgtable_types.h | 12 ++ arch/x86/include/asm/pkeys.h | 11 ++ arch/x86/include/asm/pkeys_common.h | 4 + arch/x86/mm/pkeys.c | 128 +++++++++++++++++++++ include/linux/pgtable.h | 4 + include/linux/pkeys.h | 24 ++++ 7 files changed, 267 insertions(+), 18 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index ec575e72d0b2..c4e6c480562f 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys ====================== -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable +Processor" Server CPUs and later. And It will be available in future +non-server Intel parts and future AMD processors. + +Future Intel processors will support Protection Keys for Supervisor pages +(PKS). + +For anyone wishing to test or use user space pkeys, it is available in Amazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image. + +pkeys work by dedicating 4 previously Reserved bits in each page table entry to +a "protection key", giving 16 possible keys. User and Supervisor pages are +treated separately. + +Protections for each page are controlled with per CPU registers for each type +of page User and Supervisor. Each of these 32 bit register stores two separate +bits (Access Disable and Write Disable) for each key. -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel. + +Being a CPU register, pkeys are inherently thread-local, potentially giving +each thread an independent set of protections from every other thread. There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. -Syscalls -======== +For kernel space rdmsr/wrmsr are used to access the kernel MSRs. + + +Syscalls for user space keys +============================ There are 3 system calls which directly interact with pkeys:: @@ -98,3 +109,58 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +========================== + +The following interface is used to allocate, use, and free a pkey which defines +a 'protection domain' within the kernel. Setting a pkey value in a supervisor +mapping adds that mapping to the protection domain. + + int pks_key_alloc(const char * const pkey_user, int flags); + #define PAGE_KERNEL_PKEY(pkey) + #define _PAGE_KEY(pkey) + void pks_mk_noaccess(int pkey); + void pks_mk_readonly(int pkey); + void pks_mk_readwrite(int pkey); + void pks_key_free(int pkey); + +pks_key_alloc() allocates keys dynamically to allow better use of the limited +key space. 'flags' alter the allocation based on the users need. Currently +they can request an exclusive key. + +Callers of pks_key_alloc() _must_ be prepared for it to fail and take +appropriate action. This is due mainly to the fact that PKS may not be +available on all arch's. Failure to check the return of pks_key_alloc() and +using any of the rest of the API is undefined. + +Kernel users must set the PTE permissions in the page table entries for the +mappings they want to protect. This can be done with PAGE_KERNEL_PKEY() or +_PAGE_KEY(). + +The pks_mk*() family of calls allows kernel users the ability to change the +protections for the domain identified by the pkey specified. 3 states are +available pks_mk_noaccess(), pks_mk_readonly(), and pks_mk_readwrite() which +set the access to none, read, and read/write respectively. + +Finally, pks_key_free() allows a user to return the key to the allocator for +use by others. + +The interface maintains pks_mk_noaccess() (Access Disabled (AD=1)) for all keys +not currently allocated. Therefore, the user can depend on access being +disabled when pks_key_alloc() returns a key and the user should remove mappings +from the domain (remove the pkey from the PTE) prior to calling pks_key_free(). + +It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing +but still maintains ordering properties similar to WRPKRU. Thus it is safe to +immediately use a mapping when the pks_mk*() functions returns. + +The current SDM section on PKRS needs updating but should be the same as that +of WRPKRU. So to quote from the WRPKRU text: + + WRPKRU will never execute transiently. Memory accesses + affected by PKRU register will not execute (even transiently) + until all prior executions of WRPKRU have completed execution + and updated the PKRU register. + diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..c9fdfbdcbbfb 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -73,6 +73,12 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) +#else +#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -229,6 +235,12 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey)) +#else +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + #endif /* __ASSEMBLY__ */ /* xwr */ diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 4526245b03e5..990fe9c4787c 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -3,6 +3,7 @@ #define _ASM_X86_PKEYS_H #include +#include #define ARCH_DEFAULT_PKEY 0 @@ -138,4 +139,14 @@ static inline int vma_pkey(struct vm_area_struct *vma) u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +__must_check int pks_key_alloc(const char *const pkey_user, int flags); +void pks_key_free(int pkey); + +void pks_mk_noaccess(int pkey); +void pks_mk_readonly(int pkey); +void pks_mk_readwrite(int pkey); + +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index 11a95e6efd2d..f921c58793f9 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -26,6 +26,10 @@ PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) +/* PKS supports 16 keys. Key 0 is reserved for the kernel. */ +#define PKS_KERN_DEFAULT_KEY 0 +#define PKS_NUM_KEYS 16 + #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS DECLARE_PER_CPU(u32, pkrs_cache); noinstr void write_pkrs(u32 new_pkrs); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 6892d4524868..57718716cc70 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -3,6 +3,9 @@ * Intel Memory Protection Keys management * Copyright (c) 2015, Intel Corporation. */ +#undef pr_fmt +#define pr_fmt(fmt) "x86/pkeys: " fmt + #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ @@ -231,6 +234,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) return pk_reg; } +EXPORT_SYMBOL_GPL(update_pkey_val); DEFINE_PER_CPU(u32, pkrs_cache); @@ -262,3 +266,127 @@ noinstr void write_pkrs(u32 new_pkrs) } put_cpu_ptr(pkrs); } +EXPORT_SYMBOL_GPL(write_pkrs); + +/** + * Do not call this directly, see pks_mk*() below. + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + */ +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, + pkey, protection); + write_pkrs(current->thread.saved_pkrs); +} + +/** + * PKS access control functions + * + * Change the access of the domain specified by the pkey. These are global + * updates. They only affects the current running thread. It is undefined and + * a bug for users to call this without having allocated a pkey and using it as + * pkey here. + * + * pks_mk_noaccess() + * Disable all access to the domain + * pks_mk_readonly() + * Make the domain Read only + * pks_mk_readwrite() + * Make the domain Read/Write + * + * @pkey the pkey for which the access should change. + * + */ +void pks_mk_noaccess(int pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); +} +EXPORT_SYMBOL_GPL(pks_mk_noaccess); + +void pks_mk_readonly(int pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_WRITE); +} +EXPORT_SYMBOL_GPL(pks_mk_readonly); + +void pks_mk_readwrite(int pkey) +{ + pks_update_protection(pkey, 0); +} +EXPORT_SYMBOL_GPL(pks_mk_readwrite); + +static const char pks_key_user0[] = "kernel"; + +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */ +static const char *pks_key_users[PKS_NUM_KEYS] = { + pks_key_user0 +}; + +/* + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for + * its use. We use ulong for the bit operations but only 16 bits are used. + */ +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY; + +/* + * pks_key_alloc - Allocate a PKS key + * @pkey_user: String stored for debugging of key exhaustion. The caller is + * responsible to maintain this memory until pks_key_free(). + * @flags: Flags to modify behavior: see pks_alloc_flags + * + * Returns: pkey if success + * -EOPNOTSUPP if pks is not supported or not enabled + * -ENOSPC if no keys are available (even for sharing) + */ +__must_check int pks_key_alloc(const char * const pkey_user, int flags) +{ + int nr; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return -EOPNOTSUPP; + + while (1) { + nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS); + if (nr >= PKS_NUM_KEYS) { + pr_info("Cannot allocate supervisor key for %s.\n", + pkey_user); + return -ENOSPC; + } + if (!test_and_set_bit_lock(nr, &pks_key_allocation_map)) + break; + } + + /* for debugging key exhaustion */ + pks_key_users[nr] = pkey_user; + + return nr; +} +EXPORT_SYMBOL_GPL(pks_key_alloc); + +/* + * pks_key_free - Free a previously allocate PKS key + * @pkey: Key to be free'ed + */ +void pks_key_free(int pkey) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY) { + pr_err("Invalid PKey value: %d\n", pkey); + return; + } + + /* Restore to default of no access */ + pks_mk_noaccess(pkey); + pks_key_users[pkey] = NULL; + __clear_bit(pkey, &pks_key_allocation_map); +} +EXPORT_SYMBOL_GPL(pks_key_free); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 38c33eabea89..cd72d73e8e1c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1402,6 +1402,10 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif +#ifndef PAGE_KERNEL_PKEY +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + /* * Page Table Modification bits for pgtbl_mod_mask. * diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba976048..bed0e293f13b 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -50,4 +50,28 @@ static inline void copy_init_pkru_to_fpregs(void) #endif /* ! CONFIG_ARCH_HAS_PKEYS */ +#define PKS_FLAG_EXCLUSIVE 0x00 + +#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +static inline __must_check int pks_key_alloc(const char * const pkey_user, int flags) +{ + return -EOPNOTSUPP; +} +static inline void pks_key_free(int pkey) +{ +} +static inline void pks_mk_noaccess(int pkey) +{ + pr_err("%s is not valid without PKS support\n", __func__); +} +static inline void pks_mk_readonly(int pkey) +{ + pr_err("%s is not valid without PKS support\n", __func__); +} +static inline void pks_mk_readwrite(int pkey) +{ + pr_err("%s is not valid without PKS support\n", __func__); +} +#endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /* _LINUX_PKEYS_H */ From patchwork Fri Nov 6 23:29:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 321560 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BC89C55ABD for ; Fri, 6 Nov 2020 23:30:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3534320867 for ; Fri, 6 Nov 2020 23:30:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729271AbgKFX3i (ORCPT ); Fri, 6 Nov 2020 18:29:38 -0500 Received: from mga11.intel.com ([192.55.52.93]:33110 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728270AbgKFX3i (ORCPT ); Fri, 6 Nov 2020 18:29:38 -0500 IronPort-SDR: 3VLyvhvDiNTVAXyPdpnx72z1LGlQFa9PnZDpMhB5Yzh5YHXwnNIKrDoYBZK3OrnHxPHTahq9o8 zmSNHVP0fgvA== X-IronPort-AV: E=McAfee;i="6000,8403,9797"; a="166102931" X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="166102931" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:37 -0800 IronPort-SDR: k4byyvrseYAeoBugGywJ+QOjnx9bAxJF6P0x70TOG2wyk/Qhysfn8ADhVz292RTBJTIDkyng9d GrFVwhEgV3Mw== X-IronPort-AV: E=Sophos;i="5.77,457,1596524400"; d="scan'208";a="307352972" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:29:37 -0800 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra , Dave Hansen Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-doc@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dan Williams , Greg KH Subject: [PATCH V3 09/10] x86/pks: Enable Protection Keys Supervisor (PKS) Date: Fri, 6 Nov 2020 15:29:07 -0800 Message-Id: <20201106232908.364581-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201106232908.364581-1-ira.weiny@intel.com> References: <20201106232908.364581-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. Kernel users can thus defines 'domains' of page mappings which have an extra level of protection beyond those specified in the supervisor page table entries. Enable PKS on supported CPUS. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- Changes from V2 From Thomas: Make this patch last so PKS is not enabled until all the PKS mechanisms are in place. Specifically: 1) Modify setup_pks() to call write_pkrs() to properly set up the initial value when enabled. 2) Split this patch into two. 1) a precursor patch with the required defines/config options and 2) this patch which actually enables feature on CPUs which support it. Changes since RFC V3 Per Dave Hansen Update comment Add X86_FEATURE_PKS to disabled-features.h Rebase based on latest TIP tree --- arch/x86/include/asm/disabled-features.h | 6 +++++- arch/x86/kernel/cpu/common.c | 15 +++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 164587177152..82540f0c5b6c 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -44,7 +44,11 @@ # define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31)) #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ -#define DISABLE_PKS (1<<(X86_FEATURE_PKS & 31)) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +# define DISABLE_PKS 0 +#else +# define DISABLE_PKS (1<<(X86_FEATURE_PKS & 31)) +#endif #ifdef CONFIG_X86_5LEVEL # define DISABLE_LA57 0 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 35ad8480c464..f8929a557d72 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -58,6 +58,7 @@ #include #include #include +#include #include "cpu.h" @@ -1494,6 +1495,19 @@ static void validate_apic_and_package_id(struct cpuinfo_x86 *c) #endif } +/* + * PKS is independent of PKU and either or both may be supported on a CPU. + * Configure PKS if the CPU supports the feature. + */ +static void setup_pks(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + write_pkrs(INIT_PKRS_VALUE); + cr4_set_bits(X86_CR4_PKS); +} + /* * This does the hard work of actually picking apart the CPU stuff... */ @@ -1591,6 +1605,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_pks(); /* * Clear/Set all flags overridden by options, need do it