Message ID | 20240325100359.17001-1-brgl@bgdev.pl |
---|---|
Headers | show |
Series | firmware: qcom: qseecom: convert to using the TZ allocator | expand |
On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: > > On 3/25/24 11:03 AM, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> > > > > SCM calls that take memory buffers as arguments require that they be > > page-aligned, physically continuous and non-cachable. The same > > requirements apply to the buffer used to pass additional arguments to SCM > > calls that take more than 4. > > > > To that end drivers typically use dma_alloc_coherent() to allocate memory > > of suitable format which is slow and inefficient space-wise. > > > > SHM Bridge is a safety mechanism that - once enabled - will only allow > > passing buffers to the TrustZone that have been explicitly marked as > > shared. It improves the overall system safety with SCM calls and is > > required by the upcoming scminvoke functionality. > > > > The end goal of this series is to enable SHM bridge support for those > > architectures that support it but to that end we first need to unify the > > way memory for SCM calls is allocated. This in itself is beneficial as > > the current approach of using dma_alloc_coherent() in most places is quite > > slow. > > > > First let's add a new TZ Memory allocator that allows users to create > > dynamic memory pools of format suitable for sharing with the TrustZone. > > Make it ready for implementing multiple build-time modes. > > > > Convert all relevant drivers to using it. Add separate pools for SCM core > > and for qseecom. > > > > Finally add support for SHM bridge and make it the default mode of > > operation with the generic allocator as fallback for the platforms that > > don't support SHM bridge. > > > > Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on > > sa8775p-ride and lenovo X13s (please do retest on those platforms if you > > can). > > Not sure in which version things changed (I haven't really kept up with > that, sorry), but this version (with the generic allocator selected in > the config) fails reading EFI vars on my Surface Pro X (sc8180x): > > [ 2.381020] BUG: scheduling while atomic: mount/258/0x00000002 > [ 2.383356] Modules linked in: > [ 2.385669] Preemption disabled at: > [ 2.385672] [<ffff800080f7868c>] qcom_tzmem_alloc+0x7c/0x224 > [ 2.390325] CPU: 1 PID: 258 Comm: mount Not tainted 6.8.0-1-surface-dev #2 > [ 2.392632] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023 > [ 2.394955] Call trace: > [ 2.397269] dump_backtrace+0x94/0x114 > [ 2.399583] show_stack+0x18/0x24 > [ 2.401883] dump_stack_lvl+0x48/0x60 > [ 2.404181] dump_stack+0x18/0x24 > [ 2.406476] __schedule_bug+0x84/0xa0 > [ 2.408770] __schedule+0x6f4/0x7fc > [ 2.411051] schedule+0x30/0xf0 > [ 2.413323] synchronize_rcu_expedited+0x158/0x1ec > [ 2.415594] lru_cache_disable+0x28/0x74 > [ 2.417853] __alloc_contig_migrate_range+0x68/0x210 > [ 2.420122] alloc_contig_range+0x140/0x280 > [ 2.422384] cma_alloc+0x128/0x404 > [ 2.424643] cma_alloc_aligned+0x44/0x6c > [ 2.426881] dma_alloc_contiguous+0x30/0x44 > [ 2.429111] __dma_direct_alloc_pages.isra.0+0x60/0x20c > [ 2.431343] dma_direct_alloc+0x6c/0x2ec > [ 2.433569] dma_alloc_attrs+0x80/0xf4 > [ 2.435786] qcom_tzmem_pool_add_memory+0x8c/0x178 > [ 2.438008] qcom_tzmem_alloc+0xe8/0x224 > [ 2.440232] qsee_uefi_get_next_variable+0x78/0x2cc > [ 2.442443] qcuefi_get_next_variable+0x50/0x94 > [ 2.444643] efivar_get_next_variable+0x20/0x2c > [ 2.446832] efivar_init+0x8c/0x29c > [ 2.449009] efivarfs_fill_super+0xd4/0xec > [ 2.451182] get_tree_single+0x74/0xbc > [ 2.453349] efivarfs_get_tree+0x18/0x24 > [ 2.455513] vfs_get_tree+0x28/0xe8 > [ 2.457680] vfs_cmd_create+0x5c/0xf4 > [ 2.459840] __do_sys_fsconfig+0x458/0x598 > [ 2.461993] __arm64_sys_fsconfig+0x24/0x30 > [ 2.464143] invoke_syscall+0x48/0x110 > [ 2.466281] el0_svc_common.constprop.0+0x40/0xe0 > [ 2.468415] do_el0_svc+0x1c/0x28 > [ 2.470546] el0_svc+0x34/0xb4 > [ 2.472669] el0t_64_sync_handler+0x120/0x12c > [ 2.474793] el0t_64_sync+0x190/0x194 > > and subsequently > > [ 2.477613] DEBUG_LOCKS_WARN_ON(val > preempt_count()) > [ 2.477618] WARNING: CPU: 4 PID: 258 at kernel/sched/core.c:5889 preempt_count_sub+0x90/0xd4 > [ 2.478682] Modules linked in: > [ 2.479214] CPU: 4 PID: 258 Comm: mount Tainted: G W 6.8.0-1-surface-dev #2 > [ 2.479752] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023 > [ 2.480296] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 2.480839] pc : preempt_count_sub+0x90/0xd4 > [ 2.481380] lr : preempt_count_sub+0x90/0xd4 > [ 2.481917] sp : ffff8000857cbb00 > [ 2.482450] x29: ffff8000857cbb00 x28: ffff8000806b759c x27: 8000000000000005 > [ 2.482988] x26: 0000000000000000 x25: ffff0000802cbaa0 x24: ffff0000809228b0 > [ 2.483525] x23: 0000000000000000 x22: ffff800082b462f0 x21: 0000000000001000 > [ 2.484062] x20: ffff80008363d000 x19: ffff000080922880 x18: fffffffffffc9660 > [ 2.484600] x17: 0000000000000020 x16: 0000000000000000 x15: 0000000000000038 > [ 2.485137] x14: 0000000000000000 x13: ffff800082649258 x12: 00000000000006db > [ 2.485674] x11: 0000000000000249 x10: ffff8000826fc930 x9 : ffff800082649258 > [ 2.486207] x8 : 00000000ffffdfff x7 : ffff8000826f9258 x6 : 0000000000000249 > [ 2.486738] x5 : 0000000000000000 x4 : 40000000ffffe249 x3 : 0000000000000000 > [ 2.487267] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000841fa300 > [ 2.487792] Call trace: > [ 2.488311] preempt_count_sub+0x90/0xd4 > [ 2.488831] _raw_spin_unlock_irqrestore+0x1c/0x44 > [ 2.489352] qcom_tzmem_alloc+0x1cc/0x224 > [ 2.489873] qsee_uefi_get_next_variable+0x78/0x2cc > [ 2.490390] qcuefi_get_next_variable+0x50/0x94 > [ 2.490907] efivar_get_next_variable+0x20/0x2c > [ 2.491420] efivar_init+0x8c/0x29c > [ 2.491931] efivarfs_fill_super+0xd4/0xec > [ 2.492440] get_tree_single+0x74/0xbc > [ 2.492948] efivarfs_get_tree+0x18/0x24 > [ 2.493453] vfs_get_tree+0x28/0xe8 > [ 2.493957] vfs_cmd_create+0x5c/0xf4 > [ 2.494459] __do_sys_fsconfig+0x458/0x598 > [ 2.494963] __arm64_sys_fsconfig+0x24/0x30 > [ 2.495468] invoke_syscall+0x48/0x110 > [ 2.495972] el0_svc_common.constprop.0+0x40/0xe0 > [ 2.496475] do_el0_svc+0x1c/0x28 > [ 2.496976] el0_svc+0x34/0xb4 > [ 2.497475] el0t_64_sync_handler+0x120/0x12c > [ 2.497975] el0t_64_sync+0x190/0x194 > [ 2.498466] ---[ end trace 0000000000000000 ]--- > [ 2.507347] qcom_scm firmware:scm: qseecom: scm call failed with error -22 > [ 2.507813] efivars: get_next_variable: status=8000000000000007 > > If I understand correctly, it enters an atomic section in > qcom_tzmem_alloc() and then tries to schedule somewhere down the line. > So this shouldn't be qseecom specific. > > I should probably also say that I'm currently testing this on a patched > v6.8 kernel, so there's a chance that it's my fault. However, as far as > I understand, it enters an atomic section in qcom_tzmem_alloc() and then > later tries to expand the pool memory with dma_alloc_coherent(). Which > AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the > issue here). > > I've also tried the shmem allocator option, but that seems to get stuck > quite early at boot, before I even have usb-serial access to get any > logs. If I can find some more time, I'll try to see if I can get some > useful output for that. > Ah, I think it happens here: + guard(spinlock_irqsave)(&pool->lock); + +again: + vaddr = gen_pool_alloc(pool->genpool, size); + if (!vaddr) { + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) + goto again; We were called with GFP_KERNEL so this is what we pass on to qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need to revisit it. Thanks for the catch! Bart
On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: > > > > If I understand correctly, it enters an atomic section in > > qcom_tzmem_alloc() and then tries to schedule somewhere down the line. > > So this shouldn't be qseecom specific. > > > > I should probably also say that I'm currently testing this on a patched > > v6.8 kernel, so there's a chance that it's my fault. However, as far as > > I understand, it enters an atomic section in qcom_tzmem_alloc() and then > > later tries to expand the pool memory with dma_alloc_coherent(). Which > > AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the > > issue here). > > > > I've also tried the shmem allocator option, but that seems to get stuck > > quite early at boot, before I even have usb-serial access to get any > > logs. If I can find some more time, I'll try to see if I can get some > > useful output for that. > > > > Ah, I think it happens here: > > + guard(spinlock_irqsave)(&pool->lock); > + > +again: > + vaddr = gen_pool_alloc(pool->genpool, size); > + if (!vaddr) { > + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) > + goto again; > > We were called with GFP_KERNEL so this is what we pass on to > qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need > to revisit it. Thanks for the catch! > > Bart Can you try the following tree? https://git.codelinaro.org/bartosz_golaszewski/linux.git topic/shm-bridge-v10 gen_pool_alloc() and gen_pool_add_virt() can be used without external serialization. We only really need to protect the list of areas in the pool when adding a new element. We could possibly even use list_add_tail_rcu() as it updates the pointers atomically and go lockless. Bart
On 3/29/24 11:22 AM, Bartosz Golaszewski wrote: > On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: >> >> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: >>> >>> If I understand correctly, it enters an atomic section in >>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line. >>> So this shouldn't be qseecom specific. >>> >>> I should probably also say that I'm currently testing this on a patched >>> v6.8 kernel, so there's a chance that it's my fault. However, as far as >>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then >>> later tries to expand the pool memory with dma_alloc_coherent(). Which >>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the >>> issue here). >>> >>> I've also tried the shmem allocator option, but that seems to get stuck >>> quite early at boot, before I even have usb-serial access to get any >>> logs. If I can find some more time, I'll try to see if I can get some >>> useful output for that. >>> >> >> Ah, I think it happens here: >> >> + guard(spinlock_irqsave)(&pool->lock); >> + >> +again: >> + vaddr = gen_pool_alloc(pool->genpool, size); >> + if (!vaddr) { >> + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) >> + goto again; >> >> We were called with GFP_KERNEL so this is what we pass on to >> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need >> to revisit it. Thanks for the catch! >> >> Bart > > Can you try the following tree? > > https://git.codelinaro.org/bartosz_golaszewski/linux.git > topic/shm-bridge-v10 > > gen_pool_alloc() and gen_pool_add_virt() can be used without external > serialization. We only really need to protect the list of areas in the > pool when adding a new element. We could possibly even use > list_add_tail_rcu() as it updates the pointers atomically and go > lockless. Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y. Unfortunately, with the shmbridge mode it still gets stuck at boot (and I haven't had the time to look into it yet). And for more bad news: It looks like the new allocator now fully exposes a bug that I've been tracking down the last couple of days. In short, uefisecapp doesn't seem to be happy when we split the allocations for request and response into two, causing commands to fail. Instead it wants a single buffer for both. Before, it seemed to be fairly sporadic (likely because kzalloc in sequence just returned consecutive memory almost all of the time) but now it's basically every call that fails. I have a fix for that almost ready and I'll likely post it in the next hour. But that means that you'll probably have to rebase this series on top of it... Best regards, Max
On 3/29/24 7:53 PM, Maximilian Luz wrote: > On 3/29/24 11:22 AM, Bartosz Golaszewski wrote: >> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: >>> >>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>> >>>> If I understand correctly, it enters an atomic section in >>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line. >>>> So this shouldn't be qseecom specific. >>>> >>>> I should probably also say that I'm currently testing this on a patched >>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as >>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then >>>> later tries to expand the pool memory with dma_alloc_coherent(). Which >>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the >>>> issue here). >>>> >>>> I've also tried the shmem allocator option, but that seems to get stuck >>>> quite early at boot, before I even have usb-serial access to get any >>>> logs. If I can find some more time, I'll try to see if I can get some >>>> useful output for that. >>>> >>> >>> Ah, I think it happens here: >>> >>> + guard(spinlock_irqsave)(&pool->lock); >>> + >>> +again: >>> + vaddr = gen_pool_alloc(pool->genpool, size); >>> + if (!vaddr) { >>> + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) >>> + goto again; >>> >>> We were called with GFP_KERNEL so this is what we pass on to >>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need >>> to revisit it. Thanks for the catch! >>> >>> Bart >> >> Can you try the following tree? >> >> https://git.codelinaro.org/bartosz_golaszewski/linux.git >> topic/shm-bridge-v10 >> >> gen_pool_alloc() and gen_pool_add_virt() can be used without external >> serialization. We only really need to protect the list of areas in the >> pool when adding a new element. We could possibly even use >> list_add_tail_rcu() as it updates the pointers atomically and go >> lockless. > > Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y. > Unfortunately, with the shmbridge mode it still gets stuck at boot (and > I haven't had the time to look into it yet). > > And for more bad news: It looks like the new allocator now fully exposes > a bug that I've been tracking down the last couple of days. In short, > uefisecapp doesn't seem to be happy when we split the allocations for > request and response into two, causing commands to fail. Instead it > wants a single buffer for both. Before, it seemed to be fairly sporadic > (likely because kzalloc in sequence just returned consecutive memory > almost all of the time) but now it's basically every call that fails. > > I have a fix for that almost ready and I'll likely post it in the next > hour. But that means that you'll probably have to rebase this series > on top of it... Forgot to mention: I tested it with the fix and this series, and that works. > Best regards, > Max >
On Fri, Mar 29, 2024 at 7:56 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: > > > > On 3/29/24 7:53 PM, Maximilian Luz wrote: > > On 3/29/24 11:22 AM, Bartosz Golaszewski wrote: > >> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > >>> > >>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: > >>>> > >>>> If I understand correctly, it enters an atomic section in > >>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line. > >>>> So this shouldn't be qseecom specific. > >>>> > >>>> I should probably also say that I'm currently testing this on a patched > >>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as > >>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then > >>>> later tries to expand the pool memory with dma_alloc_coherent(). Which > >>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the > >>>> issue here). > >>>> > >>>> I've also tried the shmem allocator option, but that seems to get stuck > >>>> quite early at boot, before I even have usb-serial access to get any > >>>> logs. If I can find some more time, I'll try to see if I can get some > >>>> useful output for that. > >>>> > >>> > >>> Ah, I think it happens here: > >>> > >>> + guard(spinlock_irqsave)(&pool->lock); > >>> + > >>> +again: > >>> + vaddr = gen_pool_alloc(pool->genpool, size); > >>> + if (!vaddr) { > >>> + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) > >>> + goto again; > >>> > >>> We were called with GFP_KERNEL so this is what we pass on to > >>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need > >>> to revisit it. Thanks for the catch! > >>> > >>> Bart > >> > >> Can you try the following tree? > >> > >> https://git.codelinaro.org/bartosz_golaszewski/linux.git > >> topic/shm-bridge-v10 > >> > >> gen_pool_alloc() and gen_pool_add_virt() can be used without external > >> serialization. We only really need to protect the list of areas in the > >> pool when adding a new element. We could possibly even use > >> list_add_tail_rcu() as it updates the pointers atomically and go > >> lockless. > > > > Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y. > > Unfortunately, with the shmbridge mode it still gets stuck at boot (and > > I haven't had the time to look into it yet). > > > > And for more bad news: It looks like the new allocator now fully exposes > > a bug that I've been tracking down the last couple of days. In short, > > uefisecapp doesn't seem to be happy when we split the allocations for > > request and response into two, causing commands to fail. Instead it > > wants a single buffer for both. Before, it seemed to be fairly sporadic > > (likely because kzalloc in sequence just returned consecutive memory > > almost all of the time) but now it's basically every call that fails. > > > > I have a fix for that almost ready and I'll likely post it in the next > > hour. But that means that you'll probably have to rebase this series > > on top of it... > > Forgot to mention: I tested it with the fix and this series, and that > works. > Both with and without SHM bridge? If so, please Cc me on the fix. Bart
On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: > On Fri, Mar 29, 2024 at 7:56 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: >> >> >> >> On 3/29/24 7:53 PM, Maximilian Luz wrote: >>> On 3/29/24 11:22 AM, Bartosz Golaszewski wrote: >>>> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: >>>>> >>>>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>>>> >>>>>> If I understand correctly, it enters an atomic section in >>>>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line. >>>>>> So this shouldn't be qseecom specific. >>>>>> >>>>>> I should probably also say that I'm currently testing this on a patched >>>>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as >>>>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then >>>>>> later tries to expand the pool memory with dma_alloc_coherent(). Which >>>>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the >>>>>> issue here). >>>>>> >>>>>> I've also tried the shmem allocator option, but that seems to get stuck >>>>>> quite early at boot, before I even have usb-serial access to get any >>>>>> logs. If I can find some more time, I'll try to see if I can get some >>>>>> useful output for that. >>>>>> >>>>> >>>>> Ah, I think it happens here: >>>>> >>>>> + guard(spinlock_irqsave)(&pool->lock); >>>>> + >>>>> +again: >>>>> + vaddr = gen_pool_alloc(pool->genpool, size); >>>>> + if (!vaddr) { >>>>> + if (qcom_tzmem_try_grow_pool(pool, size, gfp)) >>>>> + goto again; >>>>> >>>>> We were called with GFP_KERNEL so this is what we pass on to >>>>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need >>>>> to revisit it. Thanks for the catch! >>>>> >>>>> Bart >>>> >>>> Can you try the following tree? >>>> >>>> https://git.codelinaro.org/bartosz_golaszewski/linux.git >>>> topic/shm-bridge-v10 >>>> >>>> gen_pool_alloc() and gen_pool_add_virt() can be used without external >>>> serialization. We only really need to protect the list of areas in the >>>> pool when adding a new element. We could possibly even use >>>> list_add_tail_rcu() as it updates the pointers atomically and go >>>> lockless. >>> >>> Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y. >>> Unfortunately, with the shmbridge mode it still gets stuck at boot (and >>> I haven't had the time to look into it yet). >>> >>> And for more bad news: It looks like the new allocator now fully exposes >>> a bug that I've been tracking down the last couple of days. In short, >>> uefisecapp doesn't seem to be happy when we split the allocations for >>> request and response into two, causing commands to fail. Instead it >>> wants a single buffer for both. Before, it seemed to be fairly sporadic >>> (likely because kzalloc in sequence just returned consecutive memory >>> almost all of the time) but now it's basically every call that fails. >>> >>> I have a fix for that almost ready and I'll likely post it in the next >>> hour. But that means that you'll probably have to rebase this series >>> on top of it... >> >> Forgot to mention: I tested it with the fix and this series, and that >> works. >> > > Both with and without SHM bridge? With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately still get stuck at boot (regardless of the fix). I think that's happening even before anything efivar related should come up. > If so, please Cc me on the fix. Sure, will do. Best regards, Max
On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: > > On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: > > > > Both with and without SHM bridge? > > With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything > works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately > still get stuck at boot (regardless of the fix). I think that's > happening even before anything efivar related should come up. > This is on X13s? I will get one in 3 weeks. Can you get the bootlog somehow? Does the laptop have any serial console? Bart
On 3/29/24 8:26 PM, Bartosz Golaszewski wrote: > On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: >> >> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: >>> >>> Both with and without SHM bridge? >> >> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything >> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately >> still get stuck at boot (regardless of the fix). I think that's >> happening even before anything efivar related should come up. >> > > This is on X13s? I will get one in 3 weeks. Can you get the bootlog > somehow? Does the laptop have any serial console? Surface Pro X (sc8180x), but it should be similar enough to the X13s in that regard. At least from what people with access to the X13s told me, the qseecom stuff seems to behave the same. Unfortunately I don't have a direct serial console. Best I have is USB-serial, but it's not even getting there. I'll have to try and see if I can get some more info on the screen. Best regards, Max
On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote: > > On 3/29/24 8:26 PM, Bartosz Golaszewski wrote: > > On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: > >> > >> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: > >>> > >>> Both with and without SHM bridge? > >> > >> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything > >> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately > >> still get stuck at boot (regardless of the fix). I think that's > >> happening even before anything efivar related should come up. > >> > > > > This is on X13s? I will get one in 3 weeks. Can you get the bootlog > > somehow? Does the laptop have any serial console? > > Surface Pro X (sc8180x), but it should be similar enough to the X13s in > that regard. At least from what people with access to the X13s told me, > the qseecom stuff seems to behave the same. > > Unfortunately I don't have a direct serial console. Best I have is > USB-serial, but it's not even getting there. I'll have to try and see if > I can get some more info on the screen. > I have access to a sc8180x-primus board, does it make sense to test with this one? If so, could you give me instructions on how to do it? Bart
On 3/29/24 8:46 PM, Bartosz Golaszewski wrote: > On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote: >> >> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote: >>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>> >>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: >>>>> >>>>> Both with and without SHM bridge? >>>> >>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything >>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately >>>> still get stuck at boot (regardless of the fix). I think that's >>>> happening even before anything efivar related should come up. >>>> >>> >>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog >>> somehow? Does the laptop have any serial console? >> >> Surface Pro X (sc8180x), but it should be similar enough to the X13s in >> that regard. At least from what people with access to the X13s told me, >> the qseecom stuff seems to behave the same. >> >> Unfortunately I don't have a direct serial console. Best I have is >> USB-serial, but it's not even getting there. I'll have to try and see if >> I can get some more info on the screen. >> > > I have access to a sc8180x-primus board, does it make sense to test > with this one? If so, could you give me instructions on how to do it? I guess it's worth a shot. From what I can tell, there shouldn't be any patches in my tree that would conflict with it. So I guess it should just be building it with CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting. I am currently testing it on top of a patched v6.8 tree though (but that should just contain patches to get the Pro X running). You can find the full tree at https://github.com/linux-surface/kernel/tree/spx/v6.8 The last commit is the fix I mentioned, so you might want to revert that, since the shmem issue triggers regardless of that and it prevents your series from applying cleanly. Best regards, Max
On Fri, 29 Mar 2024 20:57:52 +0100, Maximilian Luz <luzmaximilian@gmail.com> said: > On 3/29/24 8:46 PM, Bartosz Golaszewski wrote: >> On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote: >>> >>> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote: >>>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>>> >>>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: >>>>>> >>>>>> Both with and without SHM bridge? >>>>> >>>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything >>>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately >>>>> still get stuck at boot (regardless of the fix). I think that's >>>>> happening even before anything efivar related should come up. >>>>> >>>> >>>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog >>>> somehow? Does the laptop have any serial console? >>> >>> Surface Pro X (sc8180x), but it should be similar enough to the X13s in >>> that regard. At least from what people with access to the X13s told me, >>> the qseecom stuff seems to behave the same. >>> >>> Unfortunately I don't have a direct serial console. Best I have is >>> USB-serial, but it's not even getting there. I'll have to try and see if >>> I can get some more info on the screen. >>> >> >> I have access to a sc8180x-primus board, does it make sense to test >> with this one? If so, could you give me instructions on how to do it? > > I guess it's worth a shot. > > From what I can tell, there shouldn't be any patches in my tree that > would conflict with it. So I guess it should just be building it with > CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting. > > I am currently testing it on top of a patched v6.8 tree though (but that > should just contain patches to get the Pro X running). You can find the > full tree at > > https://github.com/linux-surface/kernel/tree/spx/v6.8 > > The last commit is the fix I mentioned, so you might want to revert > that, since the shmem issue triggers regardless of that and it prevents > your series from applying cleanly. > > Best regards, > Max > sc8180x-primus' support upstream is quite flaky. The board boots 50% of time. However it's true that with SHM bridge it gets to: mount: mounting efivarfs on /sys/firmware/efi/efivars failed: Operation not supported and stops 100% of the time. Without SHM bridge I cannot boot it either because I suppose I need the patch you sent yesterday. I haven't had the time to rebase it yet, it's quite intrusive to my series. I can confirm that with that patch the board still boots but still 50% of the time. Bart
On 4/3/24 9:47 AM, Bartosz Golaszewski wrote: > On Tue, Apr 2, 2024 at 10:44 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: >> >> On Sat, Mar 30, 2024 at 8:16 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: >>> >>> On Fri, 29 Mar 2024 20:57:52 +0100, Maximilian Luz <luzmaximilian@gmail.com> said: >>>> On 3/29/24 8:46 PM, Bartosz Golaszewski wrote: >>>>> On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>>>> >>>>>> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote: >>>>>>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote: >>>>>>>> >>>>>>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote: >>>>>>>>> >>>>>>>>> Both with and without SHM bridge? >>>>>>>> >>>>>>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything >>>>>>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately >>>>>>>> still get stuck at boot (regardless of the fix). I think that's >>>>>>>> happening even before anything efivar related should come up. >>>>>>>> >>>>>>> >>>>>>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog >>>>>>> somehow? Does the laptop have any serial console? >>>>>> >>>>>> Surface Pro X (sc8180x), but it should be similar enough to the X13s in >>>>>> that regard. At least from what people with access to the X13s told me, >>>>>> the qseecom stuff seems to behave the same. >>>>>> >>>>>> Unfortunately I don't have a direct serial console. Best I have is >>>>>> USB-serial, but it's not even getting there. I'll have to try and see if >>>>>> I can get some more info on the screen. >>>>>> >>>>> >>>>> I have access to a sc8180x-primus board, does it make sense to test >>>>> with this one? If so, could you give me instructions on how to do it? >>>> >>>> I guess it's worth a shot. >>>> >>>> From what I can tell, there shouldn't be any patches in my tree that >>>> would conflict with it. So I guess it should just be building it with >>>> CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting. >>>> >>>> I am currently testing it on top of a patched v6.8 tree though (but that >>>> should just contain patches to get the Pro X running). You can find the >>>> full tree at >>>> >>>> https://github.com/linux-surface/kernel/tree/spx/v6.8 >>>> >>>> The last commit is the fix I mentioned, so you might want to revert >>>> that, since the shmem issue triggers regardless of that and it prevents >>>> your series from applying cleanly. >>>> >>>> Best regards, >>>> Max >>>> >>> >>> sc8180x-primus' support upstream is quite flaky. The board boots 50% of time. >>> However it's true that with SHM bridge it gets to: >>> >>> mount: mounting efivarfs on /sys/firmware/efi/efivars failed: Operation not supported >>> >>> and stops 100% of the time. Without SHM bridge I cannot boot it either because >>> I suppose I need the patch you sent yesterday. I haven't had the time to >>> rebase it yet, it's quite intrusive to my series. >>> >>> I can confirm that with that patch the board still boots but still 50% of the >>> time. >>> >>> Bart >> >> Hi! >> >> I was under the impression that until v8, the series worked on sc8180x >> but I'm seeing that even v7 has the same issue with SHM Bridge on >> sc8180x-primus. Could you confirm? Because I'm not sure if I should >> track the differences or the whole thing was broken for this platform >> from the beginning. >> >> Bart > > Interestingly, it doesn't seem like a problem with qseecom - even if I > disable the driver, the board still freezes after the first SCM call > using SHM bridge. I suspect - and am trying to clarify that with qcom > - that this architecture doesn't support SHM bridge but doesn't report > it either unlike other older platforms. Or maybe there's some quirk > somewhere. Anyway, I'm on it. Awesome, thanks! Best regards, Max
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> SCM calls that take memory buffers as arguments require that they be page-aligned, physically continuous and non-cachable. The same requirements apply to the buffer used to pass additional arguments to SCM calls that take more than 4. To that end drivers typically use dma_alloc_coherent() to allocate memory of suitable format which is slow and inefficient space-wise. SHM Bridge is a safety mechanism that - once enabled - will only allow passing buffers to the TrustZone that have been explicitly marked as shared. It improves the overall system safety with SCM calls and is required by the upcoming scminvoke functionality. The end goal of this series is to enable SHM bridge support for those architectures that support it but to that end we first need to unify the way memory for SCM calls is allocated. This in itself is beneficial as the current approach of using dma_alloc_coherent() in most places is quite slow. First let's add a new TZ Memory allocator that allows users to create dynamic memory pools of format suitable for sharing with the TrustZone. Make it ready for implementing multiple build-time modes. Convert all relevant drivers to using it. Add separate pools for SCM core and for qseecom. Finally add support for SHM bridge and make it the default mode of operation with the generic allocator as fallback for the platforms that don't support SHM bridge. Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on sa8775p-ride and lenovo X13s (please do retest on those platforms if you can). v8 -> v9: - split the qseecom driver rework into two parts: first convert it to using the __free() helper and then make it switch to tzmem - use QCOM_SCM_PERM_RW instead of (QCOM_SCM_PERM_WRITE | QCOM_SCM_PERM_READ) - add the TZMEM MAINTAINERS entry in correct alphabetical order - add a missing break; in a switch case in the tzmem module v7 -> v8: - make the pool size dynamic and add different policies for pool growth - improve commit messages and the cover letter: describe what the SHM bridge is and why do we need it and the new allocator, explain why it's useful to merge these changes already, independently from scminvoke - improve kerneldoc format - improve the comment on the PIL SCM calls - fix license tags, drop "or-later" for GPL v2 - add lockdep and sleeping asserts - minor tweaks and improvements v6 -> v7: - fix a Kconfig issue: TZMEM must select GENERIC_ALLOCATOR v5 -> v6: Fixed two issues reported by autobuilders: - add a fix for memory leaks in the qseecom driver as the first patch for easier backporting to the v6.6.y branch - explicitly cast the bus address stored in a variable of type dma_addr_t to phys_addr_t expected by the genpool API v4 -> v5: - fix the return value from qcom_tzmem_init() if SHM Bridge is not supported - remove a comment that's no longer useful - collect tags v3 -> v4: - include linux/sizes.h for SZ_X macros - use dedicated RCU APIs to dereference radix tree slots - fix kerneldocs - fix the comment in patch 14/15: it's the hypervisor, not the TrustZone that creates the SHM bridge v2 -> v3: - restore pool management and use separate pools for different users - don't use the new allocator in qcom_scm_pas_init_image() as the TrustZone will create an SHM bridge for us here - rewrite the entire series again for most part v1 -> v2: - too many changes to list, it's a complete rewrite as explained above Bartosz Golaszewski (13): firmware: qcom: add a dedicated TrustZone buffer allocator firmware: qcom: scm: enable the TZ mem allocator firmware: qcom: scm: smc: switch to using the SCM allocator firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ allocator firmware: qcom: qseecom: convert to using the cleanup helpers firmware: qcom: qseecom: convert to using the TZ allocator firmware: qcom: scm: add support for SHM bridge operations firmware: qcom: tzmem: enable SHM Bridge support firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image() arm64: defconfig: enable SHM Bridge support for the TZ memory allocator MAINTAINERS | 8 + arch/arm64/configs/defconfig | 1 + drivers/firmware/qcom/Kconfig | 31 ++ drivers/firmware/qcom/Makefile | 1 + .../firmware/qcom/qcom_qseecom_uefisecapp.c | 285 ++++------- drivers/firmware/qcom/qcom_scm-smc.c | 30 +- drivers/firmware/qcom/qcom_scm.c | 182 ++++--- drivers/firmware/qcom/qcom_scm.h | 6 + drivers/firmware/qcom/qcom_tzmem.c | 455 ++++++++++++++++++ drivers/firmware/qcom/qcom_tzmem.h | 13 + include/linux/firmware/qcom/qcom_qseecom.h | 4 +- include/linux/firmware/qcom/qcom_scm.h | 6 + include/linux/firmware/qcom/qcom_tzmem.h | 56 +++ 13 files changed, 813 insertions(+), 265 deletions(-) create mode 100644 drivers/firmware/qcom/qcom_tzmem.c create mode 100644 drivers/firmware/qcom/qcom_tzmem.h create mode 100644 include/linux/firmware/qcom/qcom_tzmem.h