From patchwork Mon Sep 13 20:01:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 510003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D3F2C43219 for ; Mon, 13 Sep 2021 20:04:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 811DF61130 for ; Mon, 13 Sep 2021 20:04:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237491AbhIMUFt (ORCPT ); Mon, 13 Sep 2021 16:05:49 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347782AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336358" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336358" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643906" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:29 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 02/13] Documentation/x86: Add documentation for User Interrupts Date: Mon, 13 Sep 2021 13:01:21 -0700 Message-Id: <20210913200132.3396598-3-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org For now, include just the hardware and software architecture summary. Signed-off-by: Sohil Mehta --- Documentation/x86/index.rst | 1 + Documentation/x86/user-interrupts.rst | 107 ++++++++++++++++++++++++++ 2 files changed, 108 insertions(+) create mode 100644 Documentation/x86/user-interrupts.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 383048396336..0d416b02131b 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -31,6 +31,7 @@ x86-specific Documentation tsx_async_abort buslock usb-legacy-support + user-interrupts i386/index x86_64/index sva diff --git a/Documentation/x86/user-interrupts.rst b/Documentation/x86/user-interrupts.rst new file mode 100644 index 000000000000..bc90251d6c2e --- /dev/null +++ b/Documentation/x86/user-interrupts.rst @@ -0,0 +1,107 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +User Interrupts (UINTR) +======================= + +Overview +======== +User Interrupts provides a low latency event delivery and inter process +communication mechanism. These events can be delivered directly to userspace +without a transition through the kernel. + +In the User Interrupts architecture, a receiver is always expected to be a user +space task. However, a user interrupt can be sent by another user space task, +kernel or an external source (like a device). The feature that allows another +task to send an interrupt is referred to as User IPI. + +Hardware Summary +================ +User Interrupts is a posted interrupt delivery mechanism. The interrupts are +first posted to a memory location and then delivered to the receiver when they +are running with CPL=3. + +Kernel managed architectural data structures +-------------------------------------------- +UPID: User Posted Interrupt Descriptor - Holds receiver interrupt vector +information and notification state (like an ongoing notification, suppressed +notifications). + +UITT: User Interrupt Target Table - Stores UPID pointer and vector information +for interrupt routing on the sender side. Referred by the senduipi instruction. + +The interrupt state of each task is referenced via MSRs which are saved and +restored by the kernel during context switch. + +Instructions +------------ +senduipi - send a user IPI to a target task based on the UITT index. + +clui - Mask user interrupts by clearing UIF (User Interrupt Flag). + +stui - Unmask user interrupts by setting UIF. + +testui - Test current value of UIF. + +uiret - return from a user interrupt handler. + +User IPI +-------- +When a User IPI sender executes 'senduipi ' the hardware refers the UITT +table entry pointed by the index and posts the interrupt vector into the +receiver's UPID. + +If the receiver is running the sender cpu would send a physical IPI to the +receiver's cpu. On the receiver side this IPI is detected as a User Interrupt. +The User Interrupt handler for the receiver is invoked and the vector number is +pushed onto the stack. + +Upon execution of 'uiret' in the interrupt handler, the control is transferred +back to instruction that was interrupted. + +Refer the Intel Software Developer's Manual for more details. + +Software Architecture +===================== +User Interrupts (Uintr) is an opt-in feature (unlike signals). Applications +wanting to use Uintr are expected to register themselves with the kernel using +the Uintr related system calls. A Uintr receiver is always a userspace task. A +Uintr sender can be another userspace task, kernel or a device. + +1) A receiver can register/unregister an interrupt handler using the Uintr +receiver related syscalls. + uintr_register_handler(handler, flags) + +2) A syscall also allows a receiver to register a vector and create a user +interrupt file descriptor - uintr_fd. + uintr_fd = uintr_create_fd(vector, flags) + +Uintr can be useful in some of the usages where eventfd or signals are used for +frequent userspace event notifications. The semantics of uintr_fd are somewhat +similar to an eventfd() or the write end of a pipe. + +3) Any sender with access to uintr_fd can use it to deliver events (in this +case - interrupts) to a receiver. A sender task can manage its connection with +the receiver using the sender related syscalls based on uintr_fd. + uipi_index = uintr_register_sender(uintr_fd, flags) + +Using an FD abstraction provides a secure mechanism to connect with a receiver. +The FD sharing and isolation mechanisms put in place by the kernel would extend +to Uintr as well. + +4a) After the initial setup, a sender task can use the SENDUIPI instruction to +generate user IPIs without any kernel intervention. + SENDUIPI + +If the receiver is running (CPL=3), then the user interrupt is delivered +directly without a kernel transition. If the receiver isn't running the +interrupt is delivered when the receiver gets context switched back. If the +receiver is blocked in the kernel, the user interrupt is delivered to the +kernel which then unblocks the intended receiver to deliver the interrupt. + +4b) If the sender is the kernel or a device, the uintr_fd can be passed onto +the related kernel entity to allow them to setup a connection and then generate +a user interrupt for event delivery. + +Refer the Uintr man-pages for details on the syscall interface. From patchwork Mon Sep 13 20:01:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 510002 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAAD5C433F5 for ; Mon, 13 Sep 2021 20:04:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C773C610A6 for ; Mon, 13 Sep 2021 20:04:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347869AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347789AbhIMUFr (ORCPT ); Mon, 13 Sep 2021 16:05:47 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336365" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336365" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643912" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:30 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 04/13] x86/fpu/xstate: Enumerate User Interrupts supervisor state Date: Mon, 13 Sep 2021 13:01:23 -0700 Message-Id: <20210913200132.3396598-5-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Enable xstate supervisor support for User Interrupts by default. The user interrupt state for a task consists of the MSR state and the User Interrupt Flag (UIF) value. XSAVES and XRSTORS handle saving and restoring both of these states. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/fpu/types.h | 20 +++++++++++++++++++- arch/x86/include/asm/fpu/xstate.h | 3 ++- arch/x86/kernel/cpu/common.c | 6 ++++++ arch/x86/kernel/fpu/xstate.c | 20 +++++++++++++++++--- 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index f5a38a5f3ae1..b614f1416bea 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -118,7 +118,7 @@ enum xfeature { XFEATURE_RSRVD_COMP_11, XFEATURE_RSRVD_COMP_12, XFEATURE_RSRVD_COMP_13, - XFEATURE_RSRVD_COMP_14, + XFEATURE_UINTR, XFEATURE_LBR, XFEATURE_MAX, @@ -135,6 +135,7 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_UINTR (1 << XFEATURE_UINTR) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE) @@ -237,6 +238,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 14 is supervisor state used for User Interrupts state. + * The size of this state is 48 bytes + */ +struct uintr_state { + u64 handler; + u64 stack_adjust; + u32 uitt_size; + u8 uinv; + u8 pad1; + u8 pad2; + u8 uif_pad3; /* bit 7 - UIF, bits 6:0 - reserved */ + u64 upid_addr; + u64 uirr; + u64 uitt_addr; +} __packed; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index 109dfcc75299..4dd4e83c0c9d 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -44,7 +44,8 @@ (XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU) /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_UINTR) /* * A supervisor state component may not always contain valuable information, diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 55fee930b6d1..3a0a3f5cfe0f 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -334,6 +334,12 @@ static __always_inline void setup_uintr(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_UINTR)) goto disable_uintr; + /* Confirm XSAVE support for UINTR is present. */ + if (!cpu_has_xfeatures(XFEATURE_MASK_UINTR, NULL)) { + pr_info_once("x86: User Interrupts (UINTR) not enabled. XSAVE support for UINTR is missing.\n"); + goto clear_uintr_cap; + } + /* * User Interrupts currently doesn't support PTI. For processors that * support User interrupts PTI in auto mode will default to off. Need diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c8def1b7f8fb..ab19403effb0 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -38,6 +38,10 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "PASID state", + "unknown xstate feature 11", + "unknown xstate feature 12", + "unknown xstate feature 13", + "User Interrupts registers", "unknown xstate feature" , }; @@ -53,6 +57,10 @@ static short xsave_cpuid_features[] __initdata = { X86_FEATURE_INTEL_PT, X86_FEATURE_PKU, X86_FEATURE_ENQCMD, + -1, /* Unknown 11 */ + -1, /* Unknown 12 */ + -1, /* Unknown 13 */ + X86_FEATURE_UINTR, }; /* @@ -236,6 +244,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_UINTR); } /* @@ -372,7 +381,8 @@ static void __init print_xstate_offset_size(void) XFEATURE_MASK_PKRU | \ XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ - XFEATURE_MASK_PASID) + XFEATURE_MASK_PASID | \ + XFEATURE_MASK_UINTR) /* * setup the xstate image representing the init state @@ -532,6 +542,7 @@ static void check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); + XCHECK_SZ(sz, nr, XFEATURE_UINTR, struct uintr_state); /* * Make *SURE* to add any feature numbers in below if @@ -539,9 +550,12 @@ static void check_xstate_against_struct(int nr) * numbers. */ if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_LBR))) { + (nr == XFEATURE_RSRVD_COMP_11) || + (nr == XFEATURE_RSRVD_COMP_12) || + (nr == XFEATURE_RSRVD_COMP_13) || + (nr == XFEATURE_LBR) || + (nr >= XFEATURE_MAX)) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); } From patchwork Mon Sep 13 20:01:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 510001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBEDC4332F for ; Mon, 13 Sep 2021 20:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27D68610A6 for ; Mon, 13 Sep 2021 20:04:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347915AbhIMUF5 (ORCPT ); Mon, 13 Sep 2021 16:05:57 -0400 Received: from mga05.intel.com ([192.55.52.43]:38697 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347806AbhIMUFs (ORCPT ); Mon, 13 Sep 2021 16:05:48 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336371" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336371" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643918" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 06/13] x86/uintr: Introduce uintr receiver syscalls Date: Mon, 13 Sep 2021 13:01:25 -0700 Message-Id: <20210913200132.3396598-7-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Any application that wants to receive a user interrupt needs to register an interrupt handler with the kernel. Add a registration syscall that sets up the interrupt handler and the related kernel structures for the task that makes this syscall. Only one interrupt handler per task can be registered with the kernel/hardware. Each task has its private interrupt vector space of 64 vectors. The vector registration and the related FD management is covered later. Also add an unregister syscall to let a task unregister the interrupt handler. The UPID for each receiver task needs to be updated whenever a task gets context switched or it moves from one cpu to another. This will also be covered later. The system calls haven't been wired up yet so no real harm is done if we don't update the UPID right now. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/processor.h | 6 + arch/x86/include/asm/uintr.h | 13 ++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/uintr_core.c | 240 +++++++++++++++++++++++++++++++ arch/x86/kernel/uintr_fd.c | 58 ++++++++ 5 files changed, 318 insertions(+) create mode 100644 arch/x86/include/asm/uintr.h create mode 100644 arch/x86/kernel/uintr_core.c create mode 100644 arch/x86/kernel/uintr_fd.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 9ad2acaaae9b..d229bfac8b4f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -9,6 +9,7 @@ struct task_struct; struct mm_struct; struct io_bitmap; struct vm86; +struct uintr_receiver; #include #include @@ -529,6 +530,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_INTERRUPTS + /* User Interrupt state*/ + struct uintr_receiver *ui_recv; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h new file mode 100644 index 000000000000..4f35bd8bd4e0 --- /dev/null +++ b/arch/x86/include/asm/uintr.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_UINTR_H +#define _ASM_X86_UINTR_H + +#ifdef CONFIG_X86_USER_INTERRUPTS + +bool uintr_arch_enabled(void); +int do_uintr_register_handler(u64 handler); +int do_uintr_unregister_handler(void); + +#endif /* CONFIG_X86_USER_INTERRUPTS */ + +#endif /* _ASM_X86_UINTR_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8f4e8fa6ed75..060ca9f23e23 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -140,6 +140,7 @@ obj-$(CONFIG_UPROBES) += uprobes.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o +obj-$(CONFIG_X86_USER_INTERRUPTS) += uintr_fd.o uintr_core.o obj-$(CONFIG_X86_UMIP) += umip.o obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c new file mode 100644 index 000000000000..2c6042a6840a --- /dev/null +++ b/arch/x86/kernel/uintr_core.c @@ -0,0 +1,240 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, Intel Corporation. + * + * Sohil Mehta + * Jacob Pan + */ +#define pr_fmt(fmt) "uintr: " fmt + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +/* User Posted Interrupt Descriptor (UPID) */ +struct uintr_upid { + struct { + u8 status; /* bit 0: ON, bit 1: SN, bit 2-7: reserved */ + u8 reserved1; /* Reserved */ + u8 nv; /* Notification vector */ + u8 reserved2; /* Reserved */ + u32 ndst; /* Notification destination */ + } nc __packed; /* Notification control */ + u64 puir; /* Posted user interrupt requests */ +} __aligned(64); + +/* UPID Notification control status */ +#define UPID_ON 0x0 /* Outstanding notification */ +#define UPID_SN 0x1 /* Suppressed notification */ + +struct uintr_upid_ctx { + struct uintr_upid *upid; + refcount_t refs; +}; + +struct uintr_receiver { + struct uintr_upid_ctx *upid_ctx; +}; + +inline bool uintr_arch_enabled(void) +{ + return static_cpu_has(X86_FEATURE_UINTR); +} + +static inline bool is_uintr_receiver(struct task_struct *t) +{ + return !!t->thread.ui_recv; +} + +static inline u32 cpu_to_ndst(int cpu) +{ + u32 apicid = (u32)apic->cpu_present_to_apicid(cpu); + + WARN_ON_ONCE(apicid == BAD_APICID); + + if (!x2apic_enabled()) + return (apicid << 8) & 0xFF00; + + return apicid; +} + +static void free_upid(struct uintr_upid_ctx *upid_ctx) +{ + kfree(upid_ctx->upid); + upid_ctx->upid = NULL; + kfree(upid_ctx); +} + +/* TODO: UPID needs to be allocated by a KPTI compatible allocator */ +static struct uintr_upid_ctx *alloc_upid(void) +{ + struct uintr_upid_ctx *upid_ctx; + struct uintr_upid *upid; + + upid_ctx = kzalloc(sizeof(*upid_ctx), GFP_KERNEL); + if (!upid_ctx) + return NULL; + + upid = kzalloc(sizeof(*upid), GFP_KERNEL); + + if (!upid) { + kfree(upid_ctx); + return NULL; + } + + upid_ctx->upid = upid; + refcount_set(&upid_ctx->refs, 1); + + return upid_ctx; +} + +static void put_upid_ref(struct uintr_upid_ctx *upid_ctx) +{ + if (refcount_dec_and_test(&upid_ctx->refs)) + free_upid(upid_ctx); +} + +int do_uintr_unregister_handler(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + struct uintr_receiver *ui_recv; + u64 msr64; + + if (!is_uintr_receiver(t)) + return -EINVAL; + + pr_debug("recv: Unregister handler and clear MSRs for task=%d\n", + t->pid); + + /* + * TODO: Evaluate usage of fpregs_lock() and get_xsave_addr(). Bugs + * have been reported recently for PASID and WRPKRU. + * + * UPID and ui_recv will be referenced during context switch. Need to + * disable preemption while modifying the MSRs, UPID and ui_recv thread + * struct. + */ + fpregs_lock(); + + /* Clear only the receiver specific state. Sender related state is not modified */ + if (fpregs_state_valid(fpu, smp_processor_id())) { + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= ~GENMASK_ULL(39, 32); + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + wrmsrl(MSR_IA32_UINTR_PD, 0ULL); + wrmsrl(MSR_IA32_UINTR_RR, 0ULL); + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); + wrmsrl(MSR_IA32_UINTR_HANDLER, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = 0; + p->stack_adjust = 0; + p->upid_addr = 0; + p->uinv = 0; + p->uirr = 0; + } + } + + ui_recv = t->thread.ui_recv; + /* + * Suppress notifications so that no further interrupts are generated + * based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + + fpregs_unlock(); + + return 0; +} + +int do_uintr_register_handler(u64 handler) +{ + struct uintr_receiver *ui_recv; + struct uintr_upid *upid; + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + u64 misc_msr; + int cpu; + + if (is_uintr_receiver(t)) + return -EBUSY; + + ui_recv = kzalloc(sizeof(*ui_recv), GFP_KERNEL); + if (!ui_recv) + return -ENOMEM; + + ui_recv->upid_ctx = alloc_upid(); + if (!ui_recv->upid_ctx) { + kfree(ui_recv); + pr_debug("recv: alloc upid failed for task=%d\n", t->pid); + return -ENOMEM; + } + + /* + * TODO: Evaluate usage of fpregs_lock() and get_xsave_addr(). Bugs + * have been reported recently for PASID and WRPKRU. + * + * UPID and ui_recv will be referenced during context switch. Need to + * disable preemption while modifying the MSRs, UPID and ui_recv thread + * struct. + */ + fpregs_lock(); + + cpu = smp_processor_id(); + upid = ui_recv->upid_ctx->upid; + upid->nc.nv = UINTR_NOTIFICATION_VECTOR; + upid->nc.ndst = cpu_to_ndst(cpu); + + t->thread.ui_recv = ui_recv; + + if (fpregs_state_valid(fpu, cpu)) { + wrmsrl(MSR_IA32_UINTR_HANDLER, handler); + wrmsrl(MSR_IA32_UINTR_PD, (u64)ui_recv->upid_ctx->upid); + + /* Set value as size of ABI redzone */ + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 128); + + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, misc_msr); + misc_msr |= (u64)UINTR_NOTIFICATION_VECTOR << 32; + wrmsrl(MSR_IA32_UINTR_MISC, misc_msr); + } else { + struct xregs_state *xsave; + struct uintr_state *p; + + xsave = &fpu->state.xsave; + xsave->header.xfeatures |= XFEATURE_MASK_UINTR; + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = handler; + p->upid_addr = (u64)ui_recv->upid_ctx->upid; + p->stack_adjust = 128; + p->uinv = UINTR_NOTIFICATION_VECTOR; + } + } + + fpregs_unlock(); + + pr_debug("recv: task=%d register handler=%llx upid %px\n", + t->pid, handler, upid); + + return 0; +} diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c new file mode 100644 index 000000000000..a1a9c105fdab --- /dev/null +++ b/arch/x86/kernel/uintr_fd.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021, Intel Corporation. + * + * Sohil Mehta + */ +#define pr_fmt(fmt) "uintr: " fmt + +#include +#include + +#include + +/* + * sys_uintr_register_handler - setup user interrupt handler for receiver. + */ +SYSCALL_DEFINE2(uintr_register_handler, u64 __user *, handler, unsigned int, flags) +{ + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + /* TODO: Validate the handler address */ + if (!handler) + return -EFAULT; + + ret = do_uintr_register_handler((u64)handler); + + pr_debug("recv: register handler task=%d flags %d handler %lx ret %d\n", + current->pid, flags, (unsigned long)handler, ret); + + return ret; +} + +/* + * sys_uintr_unregister_handler - Teardown user interrupt handler for receiver. + */ +SYSCALL_DEFINE1(uintr_unregister_handler, unsigned int, flags) +{ + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + ret = do_uintr_unregister_handler(); + + pr_debug("recv: unregister handler task=%d flags %d ret %d\n", + current->pid, flags, ret); + + return ret; +} From patchwork Mon Sep 13 20:01:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 510000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98553C433F5 for ; Mon, 13 Sep 2021 20:04:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 804596108B for ; Mon, 13 Sep 2021 20:04:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347979AbhIMUGI (ORCPT ); Mon, 13 Sep 2021 16:06:08 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347850AbhIMUFw (ORCPT ); Mon, 13 Sep 2021 16:05:52 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336375" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336375" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643922" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 07/13] x86/process/64: Add uintr task context switch support Date: Mon, 13 Sep 2021 13:01:26 -0700 Message-Id: <20210913200132.3396598-8-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org User interrupt state is saved and restored using xstate supervisor feature support. This includes the MSR state and the User Interrupt Flag (UIF) value. During context switch update the UPID for a uintr task to reflect the current state of the task; namely whether the task should receive interrupt notifications and which cpu the task is currently running on. XSAVES clears the notification vector (UINV) in the MISC MSR to prevent interrupts from being recognized in the UIRR MSR while the task is being context switched. The UINV is restored back when the kernel does an XRSTORS. However, this conflicts with the kernel's lazy restore optimization which skips an XRSTORS if the kernel is scheduling the same user task back and the underlying MSR state hasn't been modified. Special handling is needed for a uintr task in the context switch path to keep using this optimization. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/entry-common.h | 4 ++ arch/x86/include/asm/uintr.h | 9 ++++ arch/x86/kernel/fpu/core.c | 8 +++ arch/x86/kernel/process_64.c | 4 ++ arch/x86/kernel/uintr_core.c | 75 +++++++++++++++++++++++++++++ 5 files changed, 100 insertions(+) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index 14ebd2196569..4e6c4d0912a5 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -8,6 +8,7 @@ #include #include #include +#include /* Check that the stack and regs on entry from user mode are sane. */ static __always_inline void arch_check_user_regs(struct pt_regs *regs) @@ -57,6 +58,9 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, if (unlikely(ti_work & _TIF_NEED_FPU_LOAD)) switch_fpu_return(); + if (static_cpu_has(X86_FEATURE_UINTR)) + switch_uintr_return(); + #ifdef CONFIG_COMPAT /* * Compat syscalls set TS_COMPAT. Make sure we clear it before diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index 4f35bd8bd4e0..f7ccb67014b8 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,6 +8,15 @@ bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); +/* TODO: Inline the context switch related functions */ +void switch_uintr_prepare(struct task_struct *prev); +void switch_uintr_return(void); + +#else /* !CONFIG_X86_USER_INTERRUPTS */ + +static inline void switch_uintr_prepare(struct task_struct *prev) {} +static inline void switch_uintr_return(void) {} + #endif /* CONFIG_X86_USER_INTERRUPTS */ #endif /* _ASM_X86_UINTR_H */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7ada7bd03a32..e30588bf7ce9 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -95,6 +95,14 @@ EXPORT_SYMBOL(irq_fpu_usable); * over the place. * * FXSAVE and all XSAVE variants preserve the FPU register state. + * + * When XSAVES is called with XFEATURE_UINTR enabled it + * saves the FPU state and clears the interrupt notification + * vector byte of the MISC_MSR [bits 39:32]. This is required + * to stop detecting additional User Interrupts after we + * have saved the FPU state. Before going back to userspace + * we would correct this and only program the byte that was + * cleared. */ void save_fpregs_to_fpstate(struct fpu *fpu) { diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index ec0d836a13b1..62b82137db9c 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_IA32_EMULATION @@ -565,6 +566,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(hardirq_stack_inuse)); + if (static_cpu_has(X86_FEATURE_UINTR)) + switch_uintr_prepare(prev_p); + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_prepare(prev_fpu, cpu); diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 2c6042a6840a..7a29888050ad 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -238,3 +238,78 @@ int do_uintr_register_handler(u64 handler) return 0; } + +/* Suppress notifications since this task is being context switched out */ +void switch_uintr_prepare(struct task_struct *prev) +{ + struct uintr_upid *upid; + + if (is_uintr_receiver(prev)) { + upid = prev->thread.ui_recv->upid_ctx->upid; + set_bit(UPID_SN, (unsigned long *)&upid->nc.status); + } +} + +/* + * Do this right before we are going back to userspace after the FPU has been + * reloaded i.e. TIF_NEED_FPU_LOAD is clear. + * Called from arch_exit_to_user_mode_prepare() with interrupts disabled. + */ +void switch_uintr_return(void) +{ + struct uintr_upid *upid; + u64 misc_msr; + + if (is_uintr_receiver(current)) { + /* + * The XSAVES instruction clears the UINTR notification + * vector(UINV) in the UINT_MISC MSR when user context gets + * saved. Before going back to userspace we need to restore the + * notification vector. XRSTORS would automatically restore the + * notification but we can't be sure that XRSTORS will always + * be called when going back to userspace. Also if XSAVES gets + * called twice the UINV stored in the Xstate buffer will be + * overwritten. Threfore, before going back to userspace we + * always check if the UINV is set and reprogram if needed. + * + * Alternatively, we could combine this with + * switch_fpu_return() and program the MSR whenever we are + * skipping the XRSTORS. We need special precaution to make + * sure the UINV value in the XSTATE buffer doesn't get + * overwritten by calling XSAVES twice. + */ + WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)); + + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, misc_msr); + if (!(misc_msr & GENMASK_ULL(39, 32))) { + misc_msr |= (u64)UINTR_NOTIFICATION_VECTOR << 32; + wrmsrl(MSR_IA32_UINTR_MISC, misc_msr); + } + + /* + * It is necessary to clear the SN bit after we set UINV and + * NDST to avoid incorrect interrupt routing. + */ + upid = current->thread.ui_recv->upid_ctx->upid; + upid->nc.ndst = cpu_to_ndst(smp_processor_id()); + clear_bit(UPID_SN, (unsigned long *)&upid->nc.status); + + /* + * Interrupts might have accumulated in the UPID while the + * thread was preempted. In this case invoke the hardware + * detection sequence manually by sending a self IPI with UINV. + * Since UINV is set and SN is cleared, any new UINTR + * notifications due to the self IPI or otherwise would result + * in the hardware updating the UIRR directly. + * No real interrupt would be generated as a result of this. + * + * The alternative is to atomically read and clear the UPID and + * program the UIRR. In that case the kernel would need to + * carefully manage the race with the hardware if the UPID gets + * updated after the read. + */ + if (READ_ONCE(upid->puir)) + apic->send_IPI_self(UINTR_NOTIFICATION_VECTOR); + } +} From patchwork Mon Sep 13 20:01:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 509999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C58EC4332F for ; Mon, 13 Sep 2021 20:04:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 18D3861107 for ; Mon, 13 Sep 2021 20:04:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348035AbhIMUGK (ORCPT ); Mon, 13 Sep 2021 16:06:10 -0400 Received: from mga05.intel.com ([192.55.52.43]:38692 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347852AbhIMUFx (ORCPT ); Mon, 13 Sep 2021 16:05:53 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336380" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336380" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643925" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:31 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 08/13] x86/process/64: Clean up uintr task fork and exit paths Date: Mon, 13 Sep 2021 13:01:27 -0700 Message-Id: <20210913200132.3396598-9-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The user interrupt MSRs and the user interrupt state is task specific. During task fork and exit clear the task state, clear the MSRs and dereference the shared resources. Some of the memory resources like the UPID are referenced in the file descriptor and could be in use while the uintr_fd is still valid. Instead of freeing up the UPID just dereference it. Eventually when every user releases the reference the memory resource will be freed up. Signed-off-by: Jacob Pan Signed-off-by: Sohil Mehta --- arch/x86/include/asm/uintr.h | 3 ++ arch/x86/kernel/fpu/core.c | 9 ++++++ arch/x86/kernel/process.c | 9 ++++++ arch/x86/kernel/uintr_core.c | 55 ++++++++++++++++++++++++++++++++++++ 4 files changed, 76 insertions(+) diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index f7ccb67014b8..cef4dd81d40e 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,12 +8,15 @@ bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); +void uintr_free(struct task_struct *task); + /* TODO: Inline the context switch related functions */ void switch_uintr_prepare(struct task_struct *prev); void switch_uintr_return(void); #else /* !CONFIG_X86_USER_INTERRUPTS */ +static inline void uintr_free(struct task_struct *task) {} static inline void switch_uintr_prepare(struct task_struct *prev) {} static inline void switch_uintr_return(void) {} diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index e30588bf7ce9..c0a54f7aaa2a 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -260,6 +260,7 @@ int fpu_clone(struct task_struct *dst) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; + struct uintr_state *uintr_state; /* The new task's FPU state cannot be valid in the hardware. */ dst_fpu->last_cpu = -1; @@ -284,6 +285,14 @@ int fpu_clone(struct task_struct *dst) else save_fpregs_to_fpstate(dst_fpu); + + /* UINTR state is not expected to be inherited (in the current design). */ + if (static_cpu_has(X86_FEATURE_UINTR)) { + uintr_state = get_xsave_addr(&dst_fpu->state.xsave, XFEATURE_UINTR); + if (uintr_state) + memset(uintr_state, 0, sizeof(*uintr_state)); + } + fpregs_unlock(); set_tsk_thread_flag(dst, TIF_NEED_FPU_LOAD); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 1d9463e3096b..83677f76bd7b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -87,6 +88,12 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 = NULL; #endif + +#ifdef CONFIG_X86_USER_INTERRUPTS + /* User Interrupt state is unique for each task */ + dst->thread.ui_recv = NULL; +#endif + return fpu_clone(dst); } @@ -103,6 +110,8 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + uintr_free(tsk); + fpu__drop(fpu); } diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 7a29888050ad..a2a13f890139 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -313,3 +313,58 @@ void switch_uintr_return(void) apic->send_IPI_self(UINTR_NOTIFICATION_VECTOR); } } + +/* + * This should only be called from exit_thread(). + * exit_thread() can happen in current context when the current thread is + * exiting or it can happen for a new thread that is being created. + * For new threads is_uintr_receiver() should fail. + */ +void uintr_free(struct task_struct *t) +{ + struct uintr_receiver *ui_recv; + struct fpu *fpu; + + if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_receiver(t)) + return; + + if (WARN_ON_ONCE(t != current)) + return; + + fpu = &t->thread.fpu; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + wrmsrl(MSR_IA32_UINTR_MISC, 0ULL); + wrmsrl(MSR_IA32_UINTR_PD, 0ULL); + wrmsrl(MSR_IA32_UINTR_RR, 0ULL); + wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); + wrmsrl(MSR_IA32_UINTR_HANDLER, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->handler = 0; + p->uirr = 0; + p->upid_addr = 0; + p->stack_adjust = 0; + p->uinv = 0; + } + } + + /* Check: Can a thread be context switched while it is exiting? */ + ui_recv = t->thread.ui_recv; + + /* + * Suppress notifications so that no further interrupts are + * generated based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + + fpregs_unlock(); +} From patchwork Mon Sep 13 20:01:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sohil Mehta X-Patchwork-Id: 509998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34833C4332F for ; Mon, 13 Sep 2021 20:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20DE161106 for ; Mon, 13 Sep 2021 20:05:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348142AbhIMUG3 (ORCPT ); Mon, 13 Sep 2021 16:06:29 -0400 Received: from mga05.intel.com ([192.55.52.43]:38689 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347968AbhIMUGH (ORCPT ); Mon, 13 Sep 2021 16:06:07 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="307336385" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="307336385" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 13:04:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="469643931" Received: from sohilbuildbox.sc.intel.com (HELO localhost.localdomain) ([172.25.110.4]) by fmsmga007.fm.intel.com with ESMTP; 13 Sep 2021 13:04:32 -0700 From: Sohil Mehta To: x86@kernel.org Cc: Sohil Mehta , Tony Luck , Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Andy Lutomirski , Jens Axboe , Christian Brauner , Peter Zijlstra , Shuah Khan , Arnd Bergmann , Jonathan Corbet , Ashok Raj , Jacob Pan , Gayatri Kammela , Zeng Guang , Dan Williams , Randy E Witt , Ravi V Shankar , Ramesh Thomas , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [RFC PATCH 10/13] x86/uintr: Introduce user IPI sender syscalls Date: Mon, 13 Sep 2021 13:01:29 -0700 Message-Id: <20210913200132.3396598-11-sohil.mehta@intel.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210913200132.3396598-1-sohil.mehta@intel.com> References: <20210913200132.3396598-1-sohil.mehta@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add a registration syscall for a task to register itself as a user interrupt sender using the uintr_fd generated by the receiver. A task can register multiple uintr_fds. Each unique successful connection creates a new entry in the User Interrupt Target Table (UITT). Each entry in the UITT table is referred by the UITT index (uipi_index). The uipi_index returned during the registration syscall lets a sender generate a user IPI using the 'SENDUIPI ' instruction. Also, add a sender unregister syscall to unregister a particular task from the uintr_fd. Calling close on the uintr_fd will disconnect all threads in a sender process from that FD. Currently, the UITT size is arbitrarily chosen as 256 entries corresponding to a 4KB page. Based on feedback and usage data this can either be increased/decreased or made dynamic later. Architecturally, the UITT table can be unique for each thread or shared across threads of the same thread group. The current implementation keeps the UITT as unique for the each thread. This makes the kernel implementation relatively simple and only threads that use uintr get setup with the related structures. However, this means that the uipi_index for each thread would be inconsistent wrt to other threads. (Executing 'SENDUIPI 2' on threads of the same process could generate different user interrupts.) Alternatively, the benefit of sharing the UITT table is that all threads would see the same view of the UITT table. Also the kernel UITT memory allocation would be more efficient if multiple threads connect to the same uintr_fd. However, this would mean the kernel needs to keep the UITT table size MISC_MSR[] in sync across these threads. Also the UPID/UITT teardown flows might need additional consideration. Signed-off-by: Sohil Mehta --- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/uintr.h | 15 ++ arch/x86/kernel/process.c | 1 + arch/x86/kernel/uintr_core.c | 355 ++++++++++++++++++++++++++++++- arch/x86/kernel/uintr_fd.c | 133 ++++++++++++ 5 files changed, 495 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index d229bfac8b4f..3482c3182e39 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -10,6 +10,7 @@ struct mm_struct; struct io_bitmap; struct vm86; struct uintr_receiver; +struct uintr_sender; #include #include @@ -533,6 +534,7 @@ struct thread_struct { #ifdef CONFIG_X86_USER_INTERRUPTS /* User Interrupt state*/ struct uintr_receiver *ui_recv; + struct uintr_sender *ui_send; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/uintr.h b/arch/x86/include/asm/uintr.h index 1f00e2a63da4..ef3521dd7fb9 100644 --- a/arch/x86/include/asm/uintr.h +++ b/arch/x86/include/asm/uintr.h @@ -8,6 +8,7 @@ struct uintr_upid_ctx { struct task_struct *task; /* Receiver task */ struct uintr_upid *upid; refcount_t refs; + bool receiver_active; /* Flag for UPID being mapped to a receiver */ }; struct uintr_receiver_info { @@ -16,12 +17,26 @@ struct uintr_receiver_info { u64 uvec; /* Vector number */ }; +struct uintr_sender_info { + struct list_head node; + struct uintr_uitt_ctx *uitt_ctx; + struct task_struct *task; + struct uintr_upid_ctx *r_upid_ctx; /* Receiver's UPID context */ + struct callback_head twork; /* Task work head */ + unsigned int uitt_index; +}; + bool uintr_arch_enabled(void); int do_uintr_register_handler(u64 handler); int do_uintr_unregister_handler(void); int do_uintr_register_vector(struct uintr_receiver_info *r_info); void do_uintr_unregister_vector(struct uintr_receiver_info *r_info); +int do_uintr_register_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info); +void do_uintr_unregister_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info); + void uintr_free(struct task_struct *task); /* TODO: Inline the context switch related functions */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 83677f76bd7b..9db33e467b30 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -92,6 +92,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_X86_USER_INTERRUPTS /* User Interrupt state is unique for each task */ dst->thread.ui_recv = NULL; + dst->thread.ui_send = NULL; #endif return fpu_clone(dst); diff --git a/arch/x86/kernel/uintr_core.c b/arch/x86/kernel/uintr_core.c index 9dcb9f60e5bc..8f331c5fe0cf 100644 --- a/arch/x86/kernel/uintr_core.c +++ b/arch/x86/kernel/uintr_core.c @@ -21,6 +21,11 @@ #include #include +/* + * Each UITT entry is 16 bytes in size. + * Current UITT table size is set as 4KB (256 * 16 bytes) + */ +#define UINTR_MAX_UITT_NR 256 #define UINTR_MAX_UVEC_NR 64 /* User Posted Interrupt Descriptor (UPID) */ @@ -44,6 +49,27 @@ struct uintr_receiver { u64 uvec_mask; /* track active vector per bit */ }; +/* User Interrupt Target Table Entry (UITTE) */ +struct uintr_uitt_entry { + u8 valid; /* bit 0: valid, bit 1-7: reserved */ + u8 user_vec; + u8 reserved[6]; + u64 target_upid_addr; +} __packed __aligned(16); + +struct uintr_uitt_ctx { + struct uintr_uitt_entry *uitt; + /* Protect UITT */ + spinlock_t uitt_lock; + refcount_t refs; +}; + +struct uintr_sender { + struct uintr_uitt_ctx *uitt_ctx; + /* track active uitt entries per bit */ + u64 uitt_mask[BITS_TO_U64(UINTR_MAX_UITT_NR)]; +}; + inline bool uintr_arch_enabled(void) { return static_cpu_has(X86_FEATURE_UINTR); @@ -54,6 +80,36 @@ static inline bool is_uintr_receiver(struct task_struct *t) return !!t->thread.ui_recv; } +static inline bool is_uintr_sender(struct task_struct *t) +{ + return !!t->thread.ui_send; +} + +static inline bool is_uintr_task(struct task_struct *t) +{ + return(is_uintr_receiver(t) || is_uintr_sender(t)); +} + +static inline bool is_uitt_empty(struct task_struct *t) +{ + return !!bitmap_empty((unsigned long *)t->thread.ui_send->uitt_mask, + UINTR_MAX_UITT_NR); +} + +/* + * No lock is needed to read the active flag. Writes only happen from + * r_info->task that owns the UPID. Everyone else would just read this flag. + * + * This only provides a static check. The receiver may become inactive right + * after this check. The primary reason to have this check is to prevent future + * senders from connecting with this UPID, since the receiver task has already + * made this UPID inactive. + */ +static bool uintr_is_receiver_active(struct uintr_receiver_info *r_info) +{ + return r_info->upid_ctx->receiver_active; +} + static inline u32 cpu_to_ndst(int cpu) { u32 apicid = (u32)apic->cpu_present_to_apicid(cpu); @@ -94,6 +150,7 @@ static struct uintr_upid_ctx *alloc_upid(void) upid_ctx->upid = upid; refcount_set(&upid_ctx->refs, 1); upid_ctx->task = get_task_struct(current); + upid_ctx->receiver_active = true; return upid_ctx; } @@ -110,6 +167,64 @@ static struct uintr_upid_ctx *get_upid_ref(struct uintr_upid_ctx *upid_ctx) return upid_ctx; } +static void free_uitt(struct uintr_uitt_ctx *uitt_ctx) +{ + unsigned long flags; + + spin_lock_irqsave(&uitt_ctx->uitt_lock, flags); + kfree(uitt_ctx->uitt); + uitt_ctx->uitt = NULL; + spin_unlock_irqrestore(&uitt_ctx->uitt_lock, flags); + + kfree(uitt_ctx); +} + +/* TODO: Replace UITT allocation with KPTI compatible memory allocator */ +static struct uintr_uitt_ctx *alloc_uitt(void) +{ + struct uintr_uitt_ctx *uitt_ctx; + struct uintr_uitt_entry *uitt; + + uitt_ctx = kzalloc(sizeof(*uitt_ctx), GFP_KERNEL); + if (!uitt_ctx) + return NULL; + + uitt = kzalloc(sizeof(*uitt) * UINTR_MAX_UITT_NR, GFP_KERNEL); + if (!uitt) { + kfree(uitt_ctx); + return NULL; + } + + uitt_ctx->uitt = uitt; + spin_lock_init(&uitt_ctx->uitt_lock); + refcount_set(&uitt_ctx->refs, 1); + + return uitt_ctx; +} + +static void put_uitt_ref(struct uintr_uitt_ctx *uitt_ctx) +{ + if (refcount_dec_and_test(&uitt_ctx->refs)) + free_uitt(uitt_ctx); +} + +static struct uintr_uitt_ctx *get_uitt_ref(struct uintr_uitt_ctx *uitt_ctx) +{ + refcount_inc(&uitt_ctx->refs); + return uitt_ctx; +} + +static inline void mark_uitte_invalid(struct uintr_sender_info *s_info) +{ + struct uintr_uitt_entry *uitte; + unsigned long flags; + + spin_lock_irqsave(&s_info->uitt_ctx->uitt_lock, flags); + uitte = &s_info->uitt_ctx->uitt[s_info->uitt_index]; + uitte->valid = 0; + spin_unlock_irqrestore(&s_info->uitt_ctx->uitt_lock, flags); +} + static void __clear_vector_from_upid(u64 uvec, struct uintr_upid *upid) { clear_bit(uvec, (unsigned long *)&upid->puir); @@ -175,6 +290,210 @@ static void receiver_clear_uvec(struct callback_head *head) kfree(r_info); } +static void teardown_uitt(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + u64 msr64; + + put_uitt_ref(t->thread.ui_send->uitt_ctx); + kfree(t->thread.ui_send); + t->thread.ui_send = NULL; + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= GENMASK_ULL(63, 32); + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + wrmsrl(MSR_IA32_UINTR_TT, 0ULL); + } else { + struct uintr_state *p; + + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->uitt_size = 0; + p->uitt_addr = 0; + } + } + + fpregs_unlock(); +} + +static int init_uitt(void) +{ + struct task_struct *t = current; + struct fpu *fpu = &t->thread.fpu; + struct uintr_sender *ui_send; + u64 msr64; + + ui_send = kzalloc(sizeof(*t->thread.ui_send), GFP_KERNEL); + if (!ui_send) + return -ENOMEM; + + ui_send->uitt_ctx = alloc_uitt(); + if (!ui_send->uitt_ctx) { + pr_debug("send: Alloc UITT failed for task=%d\n", t->pid); + kfree(ui_send); + return -ENOMEM; + } + + fpregs_lock(); + + if (fpregs_state_valid(fpu, smp_processor_id())) { + wrmsrl(MSR_IA32_UINTR_TT, (u64)ui_send->uitt_ctx->uitt | 1); + /* Modify only the relevant bits of the MISC MSR */ + rdmsrl(MSR_IA32_UINTR_MISC, msr64); + msr64 &= GENMASK_ULL(63, 32); + msr64 |= UINTR_MAX_UITT_NR; + wrmsrl(MSR_IA32_UINTR_MISC, msr64); + } else { + struct xregs_state *xsave; + struct uintr_state *p; + + xsave = &fpu->state.xsave; + xsave->header.xfeatures |= XFEATURE_MASK_UINTR; + p = get_xsave_addr(&fpu->state.xsave, XFEATURE_UINTR); + if (p) { + p->uitt_size = UINTR_MAX_UITT_NR; + p->uitt_addr = (u64)ui_send->uitt_ctx->uitt | 1; + } + } + + fpregs_unlock(); + + pr_debug("send: Setup a new UITT=%px for task=%d with size %d\n", + ui_send->uitt_ctx->uitt, t->pid, UINTR_MAX_UITT_NR * 16); + + t->thread.ui_send = ui_send; + + return 0; +} + +static void __free_uitt_entry(unsigned int entry) +{ + struct task_struct *t = current; + unsigned long flags; + + if (entry >= UINTR_MAX_UITT_NR) + return; + + if (!is_uintr_sender(t)) + return; + + pr_debug("send: Freeing UITTE entry %d for task=%d\n", entry, t->pid); + + spin_lock_irqsave(&t->thread.ui_send->uitt_ctx->uitt_lock, flags); + memset(&t->thread.ui_send->uitt_ctx->uitt[entry], 0, + sizeof(struct uintr_uitt_entry)); + spin_unlock_irqrestore(&t->thread.ui_send->uitt_ctx->uitt_lock, flags); + + clear_bit(entry, (unsigned long *)t->thread.ui_send->uitt_mask); + + if (is_uitt_empty(t)) { + pr_debug("send: UITT mask is empty. Dereference and teardown UITT\n"); + teardown_uitt(); + } +} + +static void sender_free_uitte(struct callback_head *head) +{ + struct uintr_sender_info *s_info; + + s_info = container_of(head, struct uintr_sender_info, twork); + + __free_uitt_entry(s_info->uitt_index); + put_uitt_ref(s_info->uitt_ctx); + put_upid_ref(s_info->r_upid_ctx); + put_task_struct(s_info->task); + kfree(s_info); +} + +void do_uintr_unregister_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info) +{ + int ret; + + /* + * To make sure any new senduipi result in a #GP fault. + * The task work might take non-zero time to kick the process out. + */ + mark_uitte_invalid(s_info); + + pr_debug("send: Adding Free UITTE %d task work for task=%d\n", + s_info->uitt_index, s_info->task->pid); + + init_task_work(&s_info->twork, sender_free_uitte); + ret = task_work_add(s_info->task, &s_info->twork, true); + if (ret) { + /* + * Dereferencing the UITT and UPID here since the task has + * exited. + */ + pr_debug("send: Free UITTE %d task=%d has already exited\n", + s_info->uitt_index, s_info->task->pid); + put_upid_ref(s_info->r_upid_ctx); + put_uitt_ref(s_info->uitt_ctx); + put_task_struct(s_info->task); + kfree(s_info); + return; + } +} + +int do_uintr_register_sender(struct uintr_receiver_info *r_info, + struct uintr_sender_info *s_info) +{ + struct uintr_uitt_entry *uitte = NULL; + struct uintr_sender *ui_send; + struct task_struct *t = current; + unsigned long flags; + int entry; + int ret; + + /* + * Only a static check. Receiver could exit anytime after this check. + * This check only prevents connections using uintr_fd after the + * receiver has already exited/unregistered. + */ + if (!uintr_is_receiver_active(r_info)) + return -ESHUTDOWN; + + if (is_uintr_sender(t)) { + entry = find_first_zero_bit((unsigned long *)t->thread.ui_send->uitt_mask, + UINTR_MAX_UITT_NR); + if (entry >= UINTR_MAX_UITT_NR) + return -ENOSPC; + } else { + BUILD_BUG_ON(UINTR_MAX_UITT_NR < 1); + entry = 0; + ret = init_uitt(); + if (ret) + return ret; + } + + ui_send = t->thread.ui_send; + + set_bit(entry, (unsigned long *)ui_send->uitt_mask); + + spin_lock_irqsave(&ui_send->uitt_ctx->uitt_lock, flags); + uitte = &ui_send->uitt_ctx->uitt[entry]; + pr_debug("send: sender=%d receiver=%d UITTE entry %d address %px\n", + current->pid, r_info->upid_ctx->task->pid, entry, uitte); + + uitte->user_vec = r_info->uvec; + uitte->target_upid_addr = (u64)r_info->upid_ctx->upid; + uitte->valid = 1; + spin_unlock_irqrestore(&ui_send->uitt_ctx->uitt_lock, flags); + + s_info->r_upid_ctx = get_upid_ref(r_info->upid_ctx); + s_info->uitt_ctx = get_uitt_ref(ui_send->uitt_ctx); + s_info->task = get_task_struct(current); + s_info->uitt_index = entry; + + return 0; +} + int do_uintr_unregister_handler(void) { struct task_struct *t = current; @@ -222,6 +541,8 @@ int do_uintr_unregister_handler(void) } ui_recv = t->thread.ui_recv; + ui_recv->upid_ctx->receiver_active = false; + /* * Suppress notifications so that no further interrupts are generated * based on this UPID. @@ -437,14 +758,14 @@ void switch_uintr_return(void) * This should only be called from exit_thread(). * exit_thread() can happen in current context when the current thread is * exiting or it can happen for a new thread that is being created. - * For new threads is_uintr_receiver() should fail. + * For new threads is_uintr_task() should fail. */ void uintr_free(struct task_struct *t) { struct uintr_receiver *ui_recv; struct fpu *fpu; - if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_receiver(t)) + if (!static_cpu_has(X86_FEATURE_UINTR) || !is_uintr_task(t)) return; if (WARN_ON_ONCE(t != current)) @@ -456,6 +777,7 @@ void uintr_free(struct task_struct *t) if (fpregs_state_valid(fpu, smp_processor_id())) { wrmsrl(MSR_IA32_UINTR_MISC, 0ULL); + wrmsrl(MSR_IA32_UINTR_TT, 0ULL); wrmsrl(MSR_IA32_UINTR_PD, 0ULL); wrmsrl(MSR_IA32_UINTR_RR, 0ULL); wrmsrl(MSR_IA32_UINTR_STACKADJUST, 0ULL); @@ -470,20 +792,31 @@ void uintr_free(struct task_struct *t) p->upid_addr = 0; p->stack_adjust = 0; p->uinv = 0; + p->uitt_addr = 0; + p->uitt_size = 0; } } /* Check: Can a thread be context switched while it is exiting? */ - ui_recv = t->thread.ui_recv; + if (is_uintr_receiver(t)) { + ui_recv = t->thread.ui_recv; - /* - * Suppress notifications so that no further interrupts are - * generated based on this UPID. - */ - set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); - put_upid_ref(ui_recv->upid_ctx); - kfree(ui_recv); - t->thread.ui_recv = NULL; + /* + * Suppress notifications so that no further interrupts are + * generated based on this UPID. + */ + set_bit(UPID_SN, (unsigned long *)&ui_recv->upid_ctx->upid->nc.status); + ui_recv->upid_ctx->receiver_active = false; + put_upid_ref(ui_recv->upid_ctx); + kfree(ui_recv); + t->thread.ui_recv = NULL; + } fpregs_unlock(); + + if (is_uintr_sender(t)) { + put_uitt_ref(t->thread.ui_send->uitt_ctx); + kfree(t->thread.ui_send); + t->thread.ui_send = NULL; + } } diff --git a/arch/x86/kernel/uintr_fd.c b/arch/x86/kernel/uintr_fd.c index f0548bbac776..3c82c032c0b9 100644 --- a/arch/x86/kernel/uintr_fd.c +++ b/arch/x86/kernel/uintr_fd.c @@ -15,6 +15,9 @@ struct uintrfd_ctx { struct uintr_receiver_info *r_info; + /* Protect sender_list */ + spinlock_t sender_lock; + struct list_head sender_list; }; #ifdef CONFIG_PROC_FS @@ -30,11 +33,20 @@ static void uintrfd_show_fdinfo(struct seq_file *m, struct file *file) static int uintrfd_release(struct inode *inode, struct file *file) { struct uintrfd_ctx *uintrfd_ctx = file->private_data; + struct uintr_sender_info *s_info, *tmp; + unsigned long flags; pr_debug("recv: Release uintrfd for r_task %d uvec %llu\n", uintrfd_ctx->r_info->upid_ctx->task->pid, uintrfd_ctx->r_info->uvec); + spin_lock_irqsave(&uintrfd_ctx->sender_lock, flags); + list_for_each_entry_safe(s_info, tmp, &uintrfd_ctx->sender_list, node) { + list_del(&s_info->node); + do_uintr_unregister_sender(uintrfd_ctx->r_info, s_info); + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, flags); + do_uintr_unregister_vector(uintrfd_ctx->r_info); kfree(uintrfd_ctx); @@ -81,6 +93,9 @@ SYSCALL_DEFINE2(uintr_create_fd, u64, vector, unsigned int, flags) goto out_free_ctx; } + INIT_LIST_HEAD(&uintrfd_ctx->sender_list); + spin_lock_init(&uintrfd_ctx->sender_lock); + /* TODO: Get user input for flags - UFD_CLOEXEC */ /* Check: Do we need O_NONBLOCK? */ uintrfd = anon_inode_getfd("[uintrfd]", &uintrfd_fops, uintrfd_ctx, @@ -150,3 +165,121 @@ SYSCALL_DEFINE1(uintr_unregister_handler, unsigned int, flags) return ret; } + +/* + * sys_uintr_register_sender - setup user inter-processor interrupt sender. + */ +SYSCALL_DEFINE2(uintr_register_sender, int, uintrfd, unsigned int, flags) +{ + struct uintr_sender_info *s_info; + struct uintrfd_ctx *uintrfd_ctx; + unsigned long lock_flags; + struct file *uintr_f; + struct fd f; + int ret = 0; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + f = fdget(uintrfd); + uintr_f = f.file; + if (!uintr_f) + return -EBADF; + + if (uintr_f->f_op != &uintrfd_fops) { + ret = -EOPNOTSUPP; + goto out_fdput; + } + + uintrfd_ctx = (struct uintrfd_ctx *)uintr_f->private_data; + + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_for_each_entry(s_info, &uintrfd_ctx->sender_list, node) { + if (s_info->task == current) { + ret = -EISCONN; + break; + } + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + if (ret) + goto out_fdput; + + s_info = kzalloc(sizeof(*s_info), GFP_KERNEL); + if (!s_info) { + ret = -ENOMEM; + goto out_fdput; + } + + ret = do_uintr_register_sender(uintrfd_ctx->r_info, s_info); + if (ret) { + kfree(s_info); + goto out_fdput; + } + + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_add(&s_info->node, &uintrfd_ctx->sender_list); + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + ret = s_info->uitt_index; + +out_fdput: + pr_debug("send: register sender task=%d flags %d ret(uipi_id)=%d\n", + current->pid, flags, ret); + + fdput(f); + return ret; +} + +/* + * sys_uintr_unregister_sender - Unregister user inter-processor interrupt sender. + */ +SYSCALL_DEFINE2(uintr_unregister_sender, int, uintrfd, unsigned int, flags) +{ + struct uintr_sender_info *s_info; + struct uintrfd_ctx *uintrfd_ctx; + struct file *uintr_f; + unsigned long lock_flags; + struct fd f; + int ret; + + if (!uintr_arch_enabled()) + return -EOPNOTSUPP; + + if (flags) + return -EINVAL; + + f = fdget(uintrfd); + uintr_f = f.file; + if (!uintr_f) + return -EBADF; + + if (uintr_f->f_op != &uintrfd_fops) { + ret = -EOPNOTSUPP; + goto out_fdput; + } + + uintrfd_ctx = (struct uintrfd_ctx *)uintr_f->private_data; + + ret = -EINVAL; + spin_lock_irqsave(&uintrfd_ctx->sender_lock, lock_flags); + list_for_each_entry(s_info, &uintrfd_ctx->sender_list, node) { + if (s_info->task == current) { + ret = 0; + list_del(&s_info->node); + do_uintr_unregister_sender(uintrfd_ctx->r_info, s_info); + break; + } + } + spin_unlock_irqrestore(&uintrfd_ctx->sender_lock, lock_flags); + + pr_debug("send: unregister sender uintrfd %d for task=%d ret %d\n", + uintrfd, current->pid, ret); + +out_fdput: + fdput(f); + return ret; +}