Message ID | 1510942921-12564-1-git-send-email-will.deacon@arm.com |
---|---|
Headers | show |
Series | arm64: Unmap the kernel whilst running in userspace (KAISER) | expand |
On 11/17, Will Deacon wrote: > Hi all, > > This patch series implements something along the lines of KAISER for arm64: > > https://gruss.cc/files/kaiser.pdf > > although I wrote this from scratch because the paper has some funny > assumptions about how the architecture works. There is a patch series > in review for x86, which follows a similar approach: > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > and the topic was recently covered by LWN (currently subscriber-only): > > https://lwn.net/Articles/738975/ > > The basic idea is that transitions to and from userspace are proxied > through a trampoline page which is mapped into a separate page table and > can switch the full kernel mapping in and out on exception entry and > exit respectively. This is a valuable defence against various KASLR and > timing attacks, particularly as the trampoline page is at a fixed virtual > address and therefore the kernel text can be randomized independently. > > The major consequences of the trampoline are: > > * We can no longer make use of global mappings for kernel space, so > each task is assigned two ASIDs: one for user mappings and one for > kernel mappings > > * Our ASID moves into TTBR1 so that we can quickly switch between the > trampoline and kernel page tables > > * Switching TTBR0 always requires use of the zero page, so we can > dispense with some of our errata workaround code. > > * entry.S gets more complicated to read > > The performance hit from this series isn't as bad as I feared: things > like cyclictest and kernbench seem to be largely unaffected, although > syscall micro-benchmarks appear to show that syscall overhead is roughly > doubled, and this has an impact on things like hackbench which exhibits > a ~10% hit due to its heavy context-switching. Do you have performance benchmark numbers on CPUs with the Falkor errata? I'm interested to see how much the TLB invalidate hurts heavy context-switching workloads on these CPUs. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote: > Hi all, > > This patch series implements something along the lines of KAISER for arm64: > > https://gruss.cc/files/kaiser.pdf > > although I wrote this from scratch because the paper has some funny > assumptions about how the architecture works. There is a patch series > in review for x86, which follows a similar approach: > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > and the topic was recently covered by LWN (currently subscriber-only): > > https://lwn.net/Articles/738975/ > > The basic idea is that transitions to and from userspace are proxied > through a trampoline page which is mapped into a separate page table and > can switch the full kernel mapping in and out on exception entry and > exit respectively. This is a valuable defence against various KASLR and > timing attacks, particularly as the trampoline page is at a fixed virtual > address and therefore the kernel text can be randomized independently. > > The major consequences of the trampoline are: > > * We can no longer make use of global mappings for kernel space, so > each task is assigned two ASIDs: one for user mappings and one for > kernel mappings > > * Our ASID moves into TTBR1 so that we can quickly switch between the > trampoline and kernel page tables > > * Switching TTBR0 always requires use of the zero page, so we can > dispense with some of our errata workaround code. > > * entry.S gets more complicated to read > > The performance hit from this series isn't as bad as I feared: things > like cyclictest and kernbench seem to be largely unaffected, although > syscall micro-benchmarks appear to show that syscall overhead is roughly > doubled, and this has an impact on things like hackbench which exhibits > a ~10% hit due to its heavy context-switching. > > Patches based on 4.14 and also pushed here: > > git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git kaiser > > Feedback welcome, > > Will > Very nice! I am quite pleased, because this makes KASLR much more useful than it is now. My main question is why we need a separate trampoline vector table: it seems to me that with some minor surgery (as proposed below), we can make the kernel_ventry macro instantiations tolerant for being loaded somewhere in the fixmap (which I think is a better place for this than at the base of the VMALLOC space), removing the need to change vbar_el1 back and forth. The only downside is that exceptions taken from EL1 will also use absolute addressing, but I don't think that is a huge price to pay. -------------->8------------------ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index f8ce4cdd3bb5..7f89ebc690b1 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -71,6 +71,20 @@ .macro kernel_ventry, el, label, regsize = 64 .align 7 +alternative_if_not ARM64_MAP_KERNEL_AT_EL0 + .if \regsize == 64 + msr tpidrro_el0, x30 // preserve x30 + .endif + .if \el == 0 + mrs x30, ttbr1_el1 + sub x30, x30, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE) + bic x30, x30, #USER_ASID_FLAG + msr ttbr1_el1, x30 + isb + .endif + ldr x30, =el\()\el\()_\label +alternative_else_nop_endif + sub sp, sp, #S_FRAME_SIZE #ifdef CONFIG_VMAP_STACK /* @@ -82,7 +96,11 @@ tbnz x0, #THREAD_SHIFT, 0f sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0 sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp +alternative_if_not ARM64_MAP_KERNEL_AT_EL0 + br x30 +alternative_else b el\()\el\()_\label +alternative_endif 0: /* @@ -91,6 +109,10 @@ * userspace, and can clobber EL0 registers to free up GPRs. */ +alternative_if_not ARM64_MAP_KERNEL_AT_EL0 + mrs x30, tpidrro_el0 // restore x30 +alternative_else_nop_endif + /* Stash the original SP (minus S_FRAME_SIZE) in tpidr_el0. */ msr tpidr_el0, x0 @@ -98,8 +120,11 @@ sub x0, sp, x0 msr tpidrro_el0, x0 - /* Switch to the overflow stack */ - adr_this_cpu sp, overflow_stack + OVERFLOW_STACK_SIZE, x0 + /* Switch to the overflow stack of this CPU */ + ldr x0, =overflow_stack + OVERFLOW_STACK_SIZE + mov sp, x0 + mrs x0, tpidr_el1 + add sp, sp, x0 /* * Check whether we were already on the overflow stack. This may happen @@ -108,19 +133,30 @@ mrs x0, tpidr_el0 // sp of interrupted context sub x0, sp, x0 // delta with top of overflow stack tst x0, #~(OVERFLOW_STACK_SIZE - 1) // within range? - b.ne __bad_stack // no? -> bad stack pointer + b.eq 1f + ldr x0, =__bad_stack // no? -> bad stack pointer + br x0 /* We were already on the overflow stack. Restore sp/x0 and carry on. */ - sub sp, sp, x0 +1: sub sp, sp, x0 mrs x0, tpidrro_el0 #endif +alternative_if_not ARM64_MAP_KERNEL_AT_EL0 + br x30 +alternative_else b el\()\el\()_\label +alternative_endif .endm - .macro kernel_entry, el, regsize = 64 + .macro kernel_entry, el, regsize = 64, restore_x30 = 1 .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 .endif + .if \restore_x30 +alternative_if_not ARM64_MAP_KERNEL_AT_EL0 + mrs x30, tpidrro_el0 // restore x30 +alternative_else_nop_endif + .endif stp x0, x1, [sp, #16 * 0] stp x2, x3, [sp, #16 * 1] stp x4, x5, [sp, #16 * 2] @@ -363,7 +399,7 @@ tsk .req x28 // current thread_info */ .pushsection ".entry.text", "ax" - .align 11 + .align PAGE_SHIFT ENTRY(vectors) kernel_ventry 1, sync_invalid // Synchronous EL1t kernel_ventry 1, irq_invalid // IRQ EL1t @@ -391,6 +427,8 @@ ENTRY(vectors) kernel_ventry 0, fiq_invalid, 32 // FIQ 32-bit EL0 kernel_ventry 0, error_invalid, 32 // Error 32-bit EL0 #endif + .ltorg + .align PAGE_SHIFT END(vectors) #ifdef CONFIG_VMAP_STACK @@ -408,7 +446,7 @@ __bad_stack: * S_FRAME_SIZE) was stashed in tpidr_el0 by kernel_ventry. */ sub sp, sp, #S_FRAME_SIZE - kernel_entry 1 + kernel_entry 1, restore_x30=0 mrs x0, tpidr_el0 add x0, x0, #S_FRAME_SIZE str x0, [sp, #S_SP]
On Fri, Nov 17, 2017 at 04:19:35PM -0800, Stephen Boyd wrote: > On 11/17, Will Deacon wrote: > > Hi all, > > > > This patch series implements something along the lines of KAISER for arm64: > > > > https://gruss.cc/files/kaiser.pdf > > > > although I wrote this from scratch because the paper has some funny > > assumptions about how the architecture works. There is a patch series > > in review for x86, which follows a similar approach: > > > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > > > and the topic was recently covered by LWN (currently subscriber-only): > > > > https://lwn.net/Articles/738975/ > > > > The basic idea is that transitions to and from userspace are proxied > > through a trampoline page which is mapped into a separate page table and > > can switch the full kernel mapping in and out on exception entry and > > exit respectively. This is a valuable defence against various KASLR and > > timing attacks, particularly as the trampoline page is at a fixed virtual > > address and therefore the kernel text can be randomized independently. > > > > The major consequences of the trampoline are: > > > > * We can no longer make use of global mappings for kernel space, so > > each task is assigned two ASIDs: one for user mappings and one for > > kernel mappings > > > > * Our ASID moves into TTBR1 so that we can quickly switch between the > > trampoline and kernel page tables > > > > * Switching TTBR0 always requires use of the zero page, so we can > > dispense with some of our errata workaround code. > > > > * entry.S gets more complicated to read > > > > The performance hit from this series isn't as bad as I feared: things > > like cyclictest and kernbench seem to be largely unaffected, although > > syscall micro-benchmarks appear to show that syscall overhead is roughly > > doubled, and this has an impact on things like hackbench which exhibits > > a ~10% hit due to its heavy context-switching. > > Do you have performance benchmark numbers on CPUs with the Falkor > errata? I'm interested to see how much the TLB invalidate hurts > heavy context-switching workloads on these CPUs. I don't, but I'm also not sure what I can do about it if it's an issue. Will
Hi Ard, Cheers for having a look. On Sat, Nov 18, 2017 at 03:25:06PM +0000, Ard Biesheuvel wrote: > On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote: > > This patch series implements something along the lines of KAISER for arm64: > > Very nice! I am quite pleased, because this makes KASLR much more > useful than it is now. Agreed. I might actually start enabling now ;) > My main question is why we need a separate trampoline vector table: it > seems to me that with some minor surgery (as proposed below), we can > make the kernel_ventry macro instantiations tolerant for being loaded > somewhere in the fixmap (which I think is a better place for this than > at the base of the VMALLOC space), removing the need to change > vbar_el1 back and forth. The only downside is that exceptions taken > from EL1 will also use absolute addressing, but I don't think that is > a huge price to pay. I think there are two aspects to this: 1. Moving the vectors to the fixmap 2. Avoiding the vbar toggle I think (1) is a good idea, so I'll hack that up for v2. The vbar toggle isn't as obvious: avoiding it adds some overhead to EL1 irq entry because we're writing tpidrro_el0 as well as loading from the literal pool. I think that it also makes the code more difficult to reason about because we'd have to make sure we don't try to use the fixmap mapping before it's actually mapped, which I think would mean we'd need a set of early vectors that we then switch away from in a CPU hotplug notifier or something. I'll see if I can measure the cost of the current vbar switching to get an idea of the potential performance available. Will
On 20 November 2017 at 18:06, Will Deacon <will.deacon@arm.com> wrote: > Hi Ard, > > Cheers for having a look. > > On Sat, Nov 18, 2017 at 03:25:06PM +0000, Ard Biesheuvel wrote: >> On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote: >> > This patch series implements something along the lines of KAISER for arm64: >> >> Very nice! I am quite pleased, because this makes KASLR much more >> useful than it is now. > > Agreed. I might actually start enabling now ;) > I think it makes more sense to have enabled on your phone than on the devboard on your desk. >> My main question is why we need a separate trampoline vector table: it >> seems to me that with some minor surgery (as proposed below), we can >> make the kernel_ventry macro instantiations tolerant for being loaded >> somewhere in the fixmap (which I think is a better place for this than >> at the base of the VMALLOC space), removing the need to change >> vbar_el1 back and forth. The only downside is that exceptions taken >> from EL1 will also use absolute addressing, but I don't think that is >> a huge price to pay. > > I think there are two aspects to this: > > 1. Moving the vectors to the fixmap > 2. Avoiding the vbar toggle > > I think (1) is a good idea, so I'll hack that up for v2. The vbar toggle > isn't as obvious: avoiding it adds some overhead to EL1 irq entry because > we're writing tpidrro_el0 as well as loading from the literal pool. Yeah, but in what workloads are interrupts taken while running in the kernel a dominant factor? > I think > that it also makes the code more difficult to reason about because we'd have > to make sure we don't try to use the fixmap mapping before it's actually > mapped, which I think would mean we'd need a set of early vectors that we > then switch away from in a CPU hotplug notifier or something. > I don't think this is necessary. The vector page with absolute addressing would tolerate being accessed via its natural mapping inside the kernel image as well as via the mapping in the fixmap region. > I'll see if I can measure the cost of the current vbar switching to get > an idea of the potential performance available. > Yeah, makes sense. If the bulk of the performance hit is elsewhere, there's no point in focusing on this bit.
On 11/17/2017 10:21 AM, Will Deacon wrote: > Hi all, > > This patch series implements something along the lines of KAISER for arm64: > > https://gruss.cc/files/kaiser.pdf > > although I wrote this from scratch because the paper has some funny > assumptions about how the architecture works. There is a patch series > in review for x86, which follows a similar approach: > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > and the topic was recently covered by LWN (currently subscriber-only): > > https://lwn.net/Articles/738975/ > > The basic idea is that transitions to and from userspace are proxied > through a trampoline page which is mapped into a separate page table and > can switch the full kernel mapping in and out on exception entry and > exit respectively. This is a valuable defence against various KASLR and > timing attacks, particularly as the trampoline page is at a fixed virtual > address and therefore the kernel text can be randomized independently. > > The major consequences of the trampoline are: > > * We can no longer make use of global mappings for kernel space, so > each task is assigned two ASIDs: one for user mappings and one for > kernel mappings > > * Our ASID moves into TTBR1 so that we can quickly switch between the > trampoline and kernel page tables > > * Switching TTBR0 always requires use of the zero page, so we can > dispense with some of our errata workaround code. > > * entry.S gets more complicated to read > > The performance hit from this series isn't as bad as I feared: things > like cyclictest and kernbench seem to be largely unaffected, although > syscall micro-benchmarks appear to show that syscall overhead is roughly > doubled, and this has an impact on things like hackbench which exhibits > a ~10% hit due to its heavy context-switching. > > Patches based on 4.14 and also pushed here: > > git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git kaiser > > Feedback welcome, > > Will > Passed some basic tests on Hikey Android and my Mustang box. I'll leave the Mustang building kernels for a few days. You're welcome to add Tested-by or I can re-test on v2. Thanks, Laura
Hi! > This patch series implements something along the lines of KAISER for arm64: > > https://gruss.cc/files/kaiser.pdf > > although I wrote this from scratch because the paper has some funny > assumptions about how the architecture works. There is a patch series > in review for x86, which follows a similar approach: > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > and the topic was recently covered by LWN (currently subscriber-only): > > https://lwn.net/Articles/738975/ > > The basic idea is that transitions to and from userspace are proxied > through a trampoline page which is mapped into a separate page table and > can switch the full kernel mapping in and out on exception entry and > exit respectively. This is a valuable defence against various KASLR and > timing attacks, particularly as the trampoline page is at a fixed virtual > address and therefore the kernel text can be randomized > independently. If I'm willing to do timing attacks to defeat KASLR... what prevents me from using CPU caches to do that? There was blackhat talk about exactly that IIRC... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, Nov 20, 2017 at 02:50:58PM -0800, Laura Abbott wrote: > On 11/17/2017 10:21 AM, Will Deacon wrote: > >This patch series implements something along the lines of KAISER for arm64: > > Passed some basic tests on Hikey Android and my Mustang box. I'll > leave the Mustang building kernels for a few days. You're welcome > to add Tested-by or I can re-test on v2. Cheers, Laura. I've got a few changes for v2 based on Ard's feedback, so if you could retest that when I post it then it would be much appreciated. Will
On Wed, Nov 22, 2017 at 05:19:14PM +0100, Pavel Machek wrote: > > This patch series implements something along the lines of KAISER for arm64: > > > > https://gruss.cc/files/kaiser.pdf > > > > although I wrote this from scratch because the paper has some funny > > assumptions about how the architecture works. There is a patch series > > in review for x86, which follows a similar approach: > > > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > > > and the topic was recently covered by LWN (currently subscriber-only): > > > > https://lwn.net/Articles/738975/ > > > > The basic idea is that transitions to and from userspace are proxied > > through a trampoline page which is mapped into a separate page table and > > can switch the full kernel mapping in and out on exception entry and > > exit respectively. This is a valuable defence against various KASLR and > > timing attacks, particularly as the trampoline page is at a fixed virtual > > address and therefore the kernel text can be randomized > > independently. > > If I'm willing to do timing attacks to defeat KASLR... what prevents > me from using CPU caches to do that? Is that a rhetorical question? If not, then I'm probably not the best person to answer it. All I'm doing here is protecting against a class of attacks on kaslr that make use of the TLB/page-table walker to determine where the kernel is mapped. > There was blackhat talk about exactly that IIRC... Got a link? I'd be interested to see how the idea works in case there's an orthogonal defence against it. Will
On Mon, Nov 20, 2017 at 06:20:39PM +0000, Ard Biesheuvel wrote: > On 20 November 2017 at 18:06, Will Deacon <will.deacon@arm.com> wrote: > > I'll see if I can measure the cost of the current vbar switching to get > > an idea of the potential performance available. > > > > Yeah, makes sense. If the bulk of the performance hit is elsewhere, > there's no point in focusing on this bit. I had a go at implementing a variant on your suggestion where we avoid swizzling the vbar on exception entry/exit but I couldn't reliably measure a difference in performance. It appears that the ISB needed by the TTBR change is dominant, so the vbar write is insignificant. Will
On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote: > Hi! > >> This patch series implements something along the lines of KAISER for arm64: >> >> https://gruss.cc/files/kaiser.pdf >> >> although I wrote this from scratch because the paper has some funny >> assumptions about how the architecture works. There is a patch series >> in review for x86, which follows a similar approach: >> >> http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> >> >> and the topic was recently covered by LWN (currently subscriber-only): >> >> https://lwn.net/Articles/738975/ >> >> The basic idea is that transitions to and from userspace are proxied >> through a trampoline page which is mapped into a separate page table and >> can switch the full kernel mapping in and out on exception entry and >> exit respectively. This is a valuable defence against various KASLR and >> timing attacks, particularly as the trampoline page is at a fixed virtual >> address and therefore the kernel text can be randomized >> independently. > > If I'm willing to do timing attacks to defeat KASLR... what prevents > me from using CPU caches to do that? > Because it is impossible to get a cache hit on an access to an unmapped address? > There was blackhat talk about exactly that IIRC... > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed 2017-11-22 21:19:28, Ard Biesheuvel wrote: > On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > >> This patch series implements something along the lines of KAISER for arm64: > >> > >> https://gruss.cc/files/kaiser.pdf > >> > >> although I wrote this from scratch because the paper has some funny > >> assumptions about how the architecture works. There is a patch series > >> in review for x86, which follows a similar approach: > >> > >> http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > >> > >> and the topic was recently covered by LWN (currently subscriber-only): > >> > >> https://lwn.net/Articles/738975/ > >> > >> The basic idea is that transitions to and from userspace are proxied > >> through a trampoline page which is mapped into a separate page table and > >> can switch the full kernel mapping in and out on exception entry and > >> exit respectively. This is a valuable defence against various KASLR and > >> timing attacks, particularly as the trampoline page is at a fixed virtual > >> address and therefore the kernel text can be randomized > >> independently. > > > > If I'm willing to do timing attacks to defeat KASLR... what prevents > > me from using CPU caches to do that? > > > > Because it is impossible to get a cache hit on an access to an > unmapped address? Um, no, I don't need to be able to directly access kernel addresses. I just put some data in _same place in cache where kernel data would go_, then do syscall and look if my data are still cached. Caches don't have infinite associativity. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed 2017-11-22 19:37:14, Will Deacon wrote: > On Wed, Nov 22, 2017 at 05:19:14PM +0100, Pavel Machek wrote: > > > This patch series implements something along the lines of KAISER for arm64: > > > > > > https://gruss.cc/files/kaiser.pdf > > > > > > although I wrote this from scratch because the paper has some funny > > > assumptions about how the architecture works. There is a patch series > > > in review for x86, which follows a similar approach: > > > > > > http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> > > > > > > and the topic was recently covered by LWN (currently subscriber-only): > > > > > > https://lwn.net/Articles/738975/ > > > > > > The basic idea is that transitions to and from userspace are proxied > > > through a trampoline page which is mapped into a separate page table and > > > can switch the full kernel mapping in and out on exception entry and > > > exit respectively. This is a valuable defence against various KASLR and > > > timing attacks, particularly as the trampoline page is at a fixed virtual > > > address and therefore the kernel text can be randomized > > > independently. > > > > If I'm willing to do timing attacks to defeat KASLR... what prevents > > me from using CPU caches to do that? > > Is that a rhetorical question? If not, then I'm probably not the best person > to answer it. All I'm doing here is protecting against a class of attacks on > kaslr that make use of the TLB/page-table walker to determine where the > kernel is mapped. Yeah. What I'm saying is that I can use cache effects to probe where kernel is mapped (and what it is doing). > > There was blackhat talk about exactly that IIRC... > > Got a link? I'd be interested to see how the idea works in case there's an > orthogonal defence against it. https://www.youtube.com/watch?v=9KsnFWejpQg (Tell me if it is not the right one). As of defenses... yes. "maxcpus=1" and flush caches on switch to usermode will do the trick :-). Ok, so that was sarcastic. I'm not sure if good defense exists. ARM is better than i386 because reading time and cache flushing is priviledged, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> On 22 Nov 2017, at 22:33, Pavel Machek <pavel@ucw.cz> wrote: > >> On Wed 2017-11-22 21:19:28, Ard Biesheuvel wrote: >>> On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote: >>> Hi! >>> >>>> This patch series implements something along the lines of KAISER for arm64: >>>> >>>> https://gruss.cc/files/kaiser.pdf >>>> >>>> although I wrote this from scratch because the paper has some funny >>>> assumptions about how the architecture works. There is a patch series >>>> in review for x86, which follows a similar approach: >>>> >>>> http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com> >>>> >>>> and the topic was recently covered by LWN (currently subscriber-only): >>>> >>>> https://lwn.net/Articles/738975/ >>>> >>>> The basic idea is that transitions to and from userspace are proxied >>>> through a trampoline page which is mapped into a separate page table and >>>> can switch the full kernel mapping in and out on exception entry and >>>> exit respectively. This is a valuable defence against various KASLR and >>>> timing attacks, particularly as the trampoline page is at a fixed virtual >>>> address and therefore the kernel text can be randomized >>>> independently. >>> >>> If I'm willing to do timing attacks to defeat KASLR... what prevents >>> me from using CPU caches to do that? >>> >> >> Because it is impossible to get a cache hit on an access to an >> unmapped address? > > Um, no, I don't need to be able to directly access kernel addresses. I > just put some data in _same place in cache where kernel data would > go_, then do syscall and look if my data are still cached. Caches > don't have infinite associativity. > Ah ok. Interesting. But how does that leak address bits that are covered by the tag?
Hi! > >>> If I'm willing to do timing attacks to defeat KASLR... what prevents > >>> me from using CPU caches to do that? > >>> > >> > >> Because it is impossible to get a cache hit on an access to an > >> unmapped address? > > > > Um, no, I don't need to be able to directly access kernel addresses. I > > just put some data in _same place in cache where kernel data would > > go_, then do syscall and look if my data are still cached. Caches > > don't have infinite associativity. > > > > Ah ok. Interesting. > > But how does that leak address bits that are covered by the tag? Same as leaking any other address bits? Caches are "virtually indexed", and tag does not come into play... Maybe this explains it? https://www.youtube.com/watch?v=9KsnFWejpQg Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents >>>>> me from using CPU caches to do that? >>>>> >>>> >>>> Because it is impossible to get a cache hit on an access to an >>>> unmapped address? >>> >>> Um, no, I don't need to be able to directly access kernel addresses. I >>> just put some data in _same place in cache where kernel data would >>> go_, then do syscall and look if my data are still cached. Caches >>> don't have infinite associativity. >>> >> >> Ah ok. Interesting. >> >> But how does that leak address bits that are covered by the tag? > > Same as leaking any other address bits? Caches are "virtually > indexed", Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. > and tag does not come into play... > Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) > Maybe this explains it? > No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr. Thanks, Ard.
Hi! > > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: > > > > Hi! > > > >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents > >>>>> me from using CPU caches to do that? > >>>>> > >>>> > >>>> Because it is impossible to get a cache hit on an access to an > >>>> unmapped address? > >>> > >>> Um, no, I don't need to be able to directly access kernel addresses. I > >>> just put some data in _same place in cache where kernel data would > >>> go_, then do syscall and look if my data are still cached. Caches > >>> don't have infinite associativity. > >>> > >> > >> Ah ok. Interesting. > >> > >> But how does that leak address bits that are covered by the tag? > > > > Same as leaking any other address bits? Caches are "virtually > > indexed", > > Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. > > > and tag does not come into play... > > > > Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) > Well, KASLR is about keeping bits of kernel virtual address secret from userland. Leaking them through cache sidechannel means KASLR is defeated. > > Maybe this explains it? > > > > No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr. Ok, look at this one: https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX-wp.pdf You can use timing instead of TSX, right? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote: > Hi! > >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: >> > >> > Hi! >> > >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents >> >>>>> me from using CPU caches to do that? >> >>>>> >> >>>> >> >>>> Because it is impossible to get a cache hit on an access to an >> >>>> unmapped address? >> >>> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I >> >>> just put some data in _same place in cache where kernel data would >> >>> go_, then do syscall and look if my data are still cached. Caches >> >>> don't have infinite associativity. >> >>> >> >> >> >> Ah ok. Interesting. >> >> >> >> But how does that leak address bits that are covered by the tag? >> > >> > Same as leaking any other address bits? Caches are "virtually >> > indexed", >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. >> >> > and tag does not come into play... >> > >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) >> > > Well, KASLR is about keeping bits of kernel virtual address secret > from userland. Leaking them through cache sidechannel means KASLR is > defeated. > Yes, that is what you claim. But you are not explaining how any of the bits that we do want to keep secret can be discovered by making inferences from which lines in a primed cache were evicted during a syscall. The cache index maps to low order bits. You can use this, e.g., to attack table based AES, because there is only ~4 KB worth of tables, and you are interested in finding out which exact entries of the table were read by the process under attack. You are saying the same approach will help you discover 30 high order bits of a virtual kernel address, by observing the cache evictions in a physically indexed physically tagged cache. How? > >> > Maybe this explains it? >> > >> >> No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr. > > Ok, look at this one: > > https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX-wp.pdf > > You can use timing instead of TSX, right? The TSX attack is TLB based not cache based.
On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote: > On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: > >> > > >> > Hi! > >> > > >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents > >> >>>>> me from using CPU caches to do that? > >> >>>>> > >> >>>> > >> >>>> Because it is impossible to get a cache hit on an access to an > >> >>>> unmapped address? > >> >>> > >> >>> Um, no, I don't need to be able to directly access kernel addresses. I > >> >>> just put some data in _same place in cache where kernel data would > >> >>> go_, then do syscall and look if my data are still cached. Caches > >> >>> don't have infinite associativity. > >> >>> > >> >> > >> >> Ah ok. Interesting. > >> >> > >> >> But how does that leak address bits that are covered by the tag? > >> > > >> > Same as leaking any other address bits? Caches are "virtually > >> > indexed", > >> > >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. > >> > >> > and tag does not come into play... > >> > > >> > >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) > >> > > > > Well, KASLR is about keeping bits of kernel virtual address secret > > from userland. Leaking them through cache sidechannel means KASLR is > > defeated. > > > > Yes, that is what you claim. But you are not explaining how any of the > bits that we do want to keep secret can be discovered by making > inferences from which lines in a primed cache were evicted during a > syscall. > > The cache index maps to low order bits. You can use this, e.g., to > attack table based AES, because there is only ~4 KB worth of tables, > and you are interested in finding out which exact entries of the table > were read by the process under attack. > > You are saying the same approach will help you discover 30 high order > bits of a virtual kernel address, by observing the cache evictions in > a physically indexed physically tagged cache. How? I assumed high bits are hashed into cache index. I might have been wrong. Anyway, page tables are about same size as AES tables. So...: http://cve.circl.lu/cve/CVE-2017-5927 Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote: > On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote: >> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote: >> > Hi! >> > >> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: >> >> > >> >> > Hi! >> >> > >> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents >> >> >>>>> me from using CPU caches to do that? >> >> >>>>> >> >> >>>> >> >> >>>> Because it is impossible to get a cache hit on an access to an >> >> >>>> unmapped address? >> >> >>> >> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I >> >> >>> just put some data in _same place in cache where kernel data would >> >> >>> go_, then do syscall and look if my data are still cached. Caches >> >> >>> don't have infinite associativity. >> >> >>> >> >> >> >> >> >> Ah ok. Interesting. >> >> >> >> >> >> But how does that leak address bits that are covered by the tag? >> >> > >> >> > Same as leaking any other address bits? Caches are "virtually >> >> > indexed", >> >> >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. >> >> >> >> > and tag does not come into play... >> >> > >> >> >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) >> >> >> > >> > Well, KASLR is about keeping bits of kernel virtual address secret >> > from userland. Leaking them through cache sidechannel means KASLR is >> > defeated. >> > >> >> Yes, that is what you claim. But you are not explaining how any of the >> bits that we do want to keep secret can be discovered by making >> inferences from which lines in a primed cache were evicted during a >> syscall. >> >> The cache index maps to low order bits. You can use this, e.g., to >> attack table based AES, because there is only ~4 KB worth of tables, >> and you are interested in finding out which exact entries of the table >> were read by the process under attack. >> >> You are saying the same approach will help you discover 30 high order >> bits of a virtual kernel address, by observing the cache evictions in >> a physically indexed physically tagged cache. How? > > I assumed high bits are hashed into cache index. I might have been > wrong. Anyway, page tables are about same size as AES tables. So...: > > http://cve.circl.lu/cve/CVE-2017-5927 > Very interesting paper. Can you explain why you think its findings can be extrapolated to apply to attacks across address spaces? Because that is what would be required for it to be able to defeat KASLR.
On Thu 2017-11-23 11:38:52, Ard Biesheuvel wrote: > On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote: > > On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote: > >> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote: > >> > Hi! > >> > > >> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: > >> >> > > >> >> > Hi! > >> >> > > >> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents > >> >> >>>>> me from using CPU caches to do that? > >> >> >>>>> > >> >> >>>> > >> >> >>>> Because it is impossible to get a cache hit on an access to an > >> >> >>>> unmapped address? > >> >> >>> > >> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I > >> >> >>> just put some data in _same place in cache where kernel data would > >> >> >>> go_, then do syscall and look if my data are still cached. Caches > >> >> >>> don't have infinite associativity. > >> >> >>> > >> >> >> > >> >> >> Ah ok. Interesting. > >> >> >> > >> >> >> But how does that leak address bits that are covered by the tag? > >> >> > > >> >> > Same as leaking any other address bits? Caches are "virtually > >> >> > indexed", > >> >> > >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. > >> >> > >> >> > and tag does not come into play... > >> >> > > >> >> > >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) > >> >> > >> > > >> > Well, KASLR is about keeping bits of kernel virtual address secret > >> > from userland. Leaking them through cache sidechannel means KASLR is > >> > defeated. > >> > > >> > >> Yes, that is what you claim. But you are not explaining how any of the > >> bits that we do want to keep secret can be discovered by making > >> inferences from which lines in a primed cache were evicted during a > >> syscall. > >> > >> The cache index maps to low order bits. You can use this, e.g., to > >> attack table based AES, because there is only ~4 KB worth of tables, > >> and you are interested in finding out which exact entries of the table > >> were read by the process under attack. > >> > >> You are saying the same approach will help you discover 30 high order > >> bits of a virtual kernel address, by observing the cache evictions in > >> a physically indexed physically tagged cache. How? > > > > I assumed high bits are hashed into cache index. I might have been > > wrong. Anyway, page tables are about same size as AES tables. So...: > > > > http://cve.circl.lu/cve/CVE-2017-5927 > > > > Very interesting paper. Can you explain why you think its findings can > be extrapolated to apply to attacks across address spaces? Because > that is what would be required for it to be able to defeat KASLR. Can you explain why not? You clearly understand AES tables can be attacked cross-address-space, and there's no reason page tables could not be attacked same way. I'm not saying that's the best way to launch the attack, but it certainly looks possible to me. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 23 November 2017 at 17:54, Pavel Machek <pavel@ucw.cz> wrote: > On Thu 2017-11-23 11:38:52, Ard Biesheuvel wrote: >> On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote: >> > On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote: >> >> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote: >> >> > Hi! >> >> > >> >> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote: >> >> >> > >> >> >> > Hi! >> >> >> > >> >> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents >> >> >> >>>>> me from using CPU caches to do that? >> >> >> >>>>> >> >> >> >>>> >> >> >> >>>> Because it is impossible to get a cache hit on an access to an >> >> >> >>>> unmapped address? >> >> >> >>> >> >> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I >> >> >> >>> just put some data in _same place in cache where kernel data would >> >> >> >>> go_, then do syscall and look if my data are still cached. Caches >> >> >> >>> don't have infinite associativity. >> >> >> >>> >> >> >> >> >> >> >> >> Ah ok. Interesting. >> >> >> >> >> >> >> >> But how does that leak address bits that are covered by the tag? >> >> >> > >> >> >> > Same as leaking any other address bits? Caches are "virtually >> >> >> > indexed", >> >> >> >> >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr. >> >> >> >> >> >> > and tag does not come into play... >> >> >> > >> >> >> >> >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared) >> >> >> >> >> > >> >> > Well, KASLR is about keeping bits of kernel virtual address secret >> >> > from userland. Leaking them through cache sidechannel means KASLR is >> >> > defeated. >> >> > >> >> >> >> Yes, that is what you claim. But you are not explaining how any of the >> >> bits that we do want to keep secret can be discovered by making >> >> inferences from which lines in a primed cache were evicted during a >> >> syscall. >> >> >> >> The cache index maps to low order bits. You can use this, e.g., to >> >> attack table based AES, because there is only ~4 KB worth of tables, >> >> and you are interested in finding out which exact entries of the table >> >> were read by the process under attack. >> >> >> >> You are saying the same approach will help you discover 30 high order >> >> bits of a virtual kernel address, by observing the cache evictions in >> >> a physically indexed physically tagged cache. How? >> > >> > I assumed high bits are hashed into cache index. I might have been >> > wrong. Anyway, page tables are about same size as AES tables. So...: >> > >> > http://cve.circl.lu/cve/CVE-2017-5927 >> > >> >> Very interesting paper. Can you explain why you think its findings can >> be extrapolated to apply to attacks across address spaces? Because >> that is what would be required for it to be able to defeat KASLR. > > Can you explain why not? > > You clearly understand AES tables can be attacked cross-address-space, > and there's no reason page tables could not be attacked same way. I'm > not saying that's the best way to launch the attack, but it certainly > looks possible to me. > There are two sides to this: - on the one hand, a round trip into the kernel is quite likely to result in many more cache evictions than the ones from which you will be able to infer what address was being resolved by the page table walker, adding noise to the signal, - on the other hand, the kernel mappings are deliberately coarse grained so that they can be cached in the TLB with literally only a handful of entries, so it is not guaranteed that a TLB miss will occur that results in a page table walk that you are interested in. Given the statistical approach, it may simply mean taking more samples, but how many more? 10x 100000x? Given that the current attack takes 10s of seconds to mount, that is a significant limitation. For the TLB side, it may help to mount an additional attack to prime the TLB, but that itself is likely to add noise to the cache state measurements.