[v9,00/13] firmware: qcom: qseecom: convert to using the TZ allocator

Message ID	20240325100359.17001-1-brgl@bgdev.pl
Headers	show Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D826181812 for <linux-arm-msm@vger.kernel.org>; Mon, 25 Mar 2024 10:04:06 +0000 (UTC) From: Bartosz Golaszewski <brgl@bgdev.pl> To: Andy Gross <agross@kernel.org>, Bjorn Andersson <andersson@kernel.org>, Konrad Dybcio <konrad.dybcio@linaro.org>, Elliot Berman <quic_eberman@quicinc.com>, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>, Guru Das Srinagesh <quic_gurus@quicinc.com>, Andrew Halaney <ahalaney@redhat.com>, Maximilian Luz <luzmaximilian@gmail.com>, Alex Elder <elder@linaro.org>, Srini Kandagatla <srinivas.kandagatla@linaro.org>, Arnd Bergmann <arnd@arndb.de> Cc: linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kernel@quicinc.com, Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Subject: [PATCH v9 00/13] firmware: qcom: qseecom: convert to using the TZ allocator Date: Mon, 25 Mar 2024 11:03:46 +0100 Message-Id: <20240325100359.17001-1-brgl@bgdev.pl> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	firmware: qcom: qseecom: convert to using the TZ allocator \| expand [v9,00/13] firmware: qcom: qseecom: convert to using the TZ allocator [v9,02/13] firmware: qcom: scm: enable the TZ mem allocator [v9,04/13] firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator [v9,06/13] firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator [v9,08/13] firmware: qcom: qseecom: convert to using the cleanup helpers [v9,12/13] firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image() [v9,13/13] arm64: defconfig: enable SHM Bridge support for the TZ memory allocator

Bartosz Golaszewski March 25, 2024, 10:03 a.m. UTC

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

SCM calls that take memory buffers as arguments require that they be
page-aligned, physically continuous and non-cachable. The same
requirements apply to the buffer used to pass additional arguments to SCM
calls that take more than 4.

To that end drivers typically use dma_alloc_coherent() to allocate memory
of suitable format which is slow and inefficient space-wise.

SHM Bridge is a safety mechanism that - once enabled - will only allow
passing buffers to the TrustZone that have been explicitly marked as
shared. It improves the overall system safety with SCM calls and is
required by the upcoming scminvoke functionality.

The end goal of this series is to enable SHM bridge support for those
architectures that support it but to that end we first need to unify the
way memory for SCM calls is allocated. This in itself is beneficial as
the current approach of using dma_alloc_coherent() in most places is quite
slow.

First let's add a new TZ Memory allocator that allows users to create
dynamic memory pools of format suitable for sharing with the TrustZone.
Make it ready for implementing multiple build-time modes.

Convert all relevant drivers to using it. Add separate pools for SCM core
and for qseecom.

Finally add support for SHM bridge and make it the default mode of
operation with the generic allocator as fallback for the platforms that
don't support SHM bridge.

Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on
sa8775p-ride and lenovo X13s (please do retest on those platforms if you
can).

v8 -> v9:
- split the qseecom driver rework into two parts: first convert it to using
  the __free() helper and then make it switch to tzmem
- use QCOM_SCM_PERM_RW instead of (QCOM_SCM_PERM_WRITE | QCOM_SCM_PERM_READ)
- add the TZMEM MAINTAINERS entry in correct alphabetical order
- add a missing break; in a switch case in the tzmem module

v7 -> v8:
- make the pool size dynamic and add different policies for pool growth
- improve commit messages and the cover letter: describe what the SHM
  bridge is and why do we need it and the new allocator, explain why it's
  useful to merge these changes already, independently from scminvoke
- improve kerneldoc format
- improve the comment on the PIL SCM calls
- fix license tags, drop "or-later" for GPL v2
- add lockdep and sleeping asserts
- minor tweaks and improvements

v6 -> v7:
- fix a Kconfig issue: TZMEM must select GENERIC_ALLOCATOR

v5 -> v6:
Fixed two issues reported by autobuilders:
- add a fix for memory leaks in the qseecom driver as the first patch for
  easier backporting to the v6.6.y branch
- explicitly cast the bus address stored in a variable of type dma_addr_t
  to phys_addr_t expected by the genpool API

v4 -> v5:
- fix the return value from qcom_tzmem_init() if SHM Bridge is not supported
- remove a comment that's no longer useful
- collect tags

v3 -> v4:
- include linux/sizes.h for SZ_X macros
- use dedicated RCU APIs to dereference radix tree slots
- fix kerneldocs
- fix the comment in patch 14/15: it's the hypervisor, not the TrustZone
  that creates the SHM bridge

v2 -> v3:
- restore pool management and use separate pools for different users
- don't use the new allocator in qcom_scm_pas_init_image() as the
  TrustZone will create an SHM bridge for us here
- rewrite the entire series again for most part

v1 -> v2:
- too many changes to list, it's a complete rewrite as explained above

Bartosz Golaszewski (13):
  firmware: qcom: add a dedicated TrustZone buffer allocator
  firmware: qcom: scm: enable the TZ mem allocator
  firmware: qcom: scm: smc: switch to using the SCM allocator
  firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator
  firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ
    allocator
  firmware: qcom: qseecom: convert to using the cleanup helpers
  firmware: qcom: qseecom: convert to using the TZ allocator
  firmware: qcom: scm: add support for SHM bridge operations
  firmware: qcom: tzmem: enable SHM Bridge support
  firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image()
  arm64: defconfig: enable SHM Bridge support for the TZ memory
    allocator

 MAINTAINERS                                   |   8 +
 arch/arm64/configs/defconfig                  |   1 +
 drivers/firmware/qcom/Kconfig                 |  31 ++
 drivers/firmware/qcom/Makefile                |   1 +
 .../firmware/qcom/qcom_qseecom_uefisecapp.c   | 285 ++++-------
 drivers/firmware/qcom/qcom_scm-smc.c          |  30 +-
 drivers/firmware/qcom/qcom_scm.c              | 182 ++++---
 drivers/firmware/qcom/qcom_scm.h              |   6 +
 drivers/firmware/qcom/qcom_tzmem.c            | 455 ++++++++++++++++++
 drivers/firmware/qcom/qcom_tzmem.h            |  13 +
 include/linux/firmware/qcom/qcom_qseecom.h    |   4 +-
 include/linux/firmware/qcom/qcom_scm.h        |   6 +
 include/linux/firmware/qcom/qcom_tzmem.h      |  56 +++
 13 files changed, 813 insertions(+), 265 deletions(-)
 create mode 100644 drivers/firmware/qcom/qcom_tzmem.c
 create mode 100644 drivers/firmware/qcom/qcom_tzmem.h
 create mode 100644 include/linux/firmware/qcom/qcom_tzmem.h

Bartosz Golaszewski March 28, 2024, 6:55 p.m. UTC | #1

On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On 3/25/24 11:03 AM, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > SCM calls that take memory buffers as arguments require that they be
> > page-aligned, physically continuous and non-cachable. The same
> > requirements apply to the buffer used to pass additional arguments to SCM
> > calls that take more than 4.
> >
> > To that end drivers typically use dma_alloc_coherent() to allocate memory
> > of suitable format which is slow and inefficient space-wise.
> >
> > SHM Bridge is a safety mechanism that - once enabled - will only allow
> > passing buffers to the TrustZone that have been explicitly marked as
> > shared. It improves the overall system safety with SCM calls and is
> > required by the upcoming scminvoke functionality.
> >
> > The end goal of this series is to enable SHM bridge support for those
> > architectures that support it but to that end we first need to unify the
> > way memory for SCM calls is allocated. This in itself is beneficial as
> > the current approach of using dma_alloc_coherent() in most places is quite
> > slow.
> >
> > First let's add a new TZ Memory allocator that allows users to create
> > dynamic memory pools of format suitable for sharing with the TrustZone.
> > Make it ready for implementing multiple build-time modes.
> >
> > Convert all relevant drivers to using it. Add separate pools for SCM core
> > and for qseecom.
> >
> > Finally add support for SHM bridge and make it the default mode of
> > operation with the generic allocator as fallback for the platforms that
> > don't support SHM bridge.
> >
> > Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on
> > sa8775p-ride and lenovo X13s (please do retest on those platforms if you
> > can).
>
> Not sure in which version things changed (I haven't really kept up with
> that, sorry), but this version (with the generic allocator selected in
> the config) fails reading EFI vars on my Surface Pro X (sc8180x):
>
> [    2.381020] BUG: scheduling while atomic: mount/258/0x00000002
> [    2.383356] Modules linked in:
> [    2.385669] Preemption disabled at:
> [    2.385672] [<ffff800080f7868c>] qcom_tzmem_alloc+0x7c/0x224
> [    2.390325] CPU: 1 PID: 258 Comm: mount Not tainted 6.8.0-1-surface-dev #2
> [    2.392632] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023
> [    2.394955] Call trace:
> [    2.397269]  dump_backtrace+0x94/0x114
> [    2.399583]  show_stack+0x18/0x24
> [    2.401883]  dump_stack_lvl+0x48/0x60
> [    2.404181]  dump_stack+0x18/0x24
> [    2.406476]  __schedule_bug+0x84/0xa0
> [    2.408770]  __schedule+0x6f4/0x7fc
> [    2.411051]  schedule+0x30/0xf0
> [    2.413323]  synchronize_rcu_expedited+0x158/0x1ec
> [    2.415594]  lru_cache_disable+0x28/0x74
> [    2.417853]  __alloc_contig_migrate_range+0x68/0x210
> [    2.420122]  alloc_contig_range+0x140/0x280
> [    2.422384]  cma_alloc+0x128/0x404
> [    2.424643]  cma_alloc_aligned+0x44/0x6c
> [    2.426881]  dma_alloc_contiguous+0x30/0x44
> [    2.429111]  __dma_direct_alloc_pages.isra.0+0x60/0x20c
> [    2.431343]  dma_direct_alloc+0x6c/0x2ec
> [    2.433569]  dma_alloc_attrs+0x80/0xf4
> [    2.435786]  qcom_tzmem_pool_add_memory+0x8c/0x178
> [    2.438008]  qcom_tzmem_alloc+0xe8/0x224
> [    2.440232]  qsee_uefi_get_next_variable+0x78/0x2cc
> [    2.442443]  qcuefi_get_next_variable+0x50/0x94
> [    2.444643]  efivar_get_next_variable+0x20/0x2c
> [    2.446832]  efivar_init+0x8c/0x29c
> [    2.449009]  efivarfs_fill_super+0xd4/0xec
> [    2.451182]  get_tree_single+0x74/0xbc
> [    2.453349]  efivarfs_get_tree+0x18/0x24
> [    2.455513]  vfs_get_tree+0x28/0xe8
> [    2.457680]  vfs_cmd_create+0x5c/0xf4
> [    2.459840]  __do_sys_fsconfig+0x458/0x598
> [    2.461993]  __arm64_sys_fsconfig+0x24/0x30
> [    2.464143]  invoke_syscall+0x48/0x110
> [    2.466281]  el0_svc_common.constprop.0+0x40/0xe0
> [    2.468415]  do_el0_svc+0x1c/0x28
> [    2.470546]  el0_svc+0x34/0xb4
> [    2.472669]  el0t_64_sync_handler+0x120/0x12c
> [    2.474793]  el0t_64_sync+0x190/0x194
>
> and subsequently
>
> [    2.477613] DEBUG_LOCKS_WARN_ON(val > preempt_count())
> [    2.477618] WARNING: CPU: 4 PID: 258 at kernel/sched/core.c:5889 preempt_count_sub+0x90/0xd4
> [    2.478682] Modules linked in:
> [    2.479214] CPU: 4 PID: 258 Comm: mount Tainted: G        W          6.8.0-1-surface-dev #2
> [    2.479752] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023
> [    2.480296] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    2.480839] pc : preempt_count_sub+0x90/0xd4
> [    2.481380] lr : preempt_count_sub+0x90/0xd4
> [    2.481917] sp : ffff8000857cbb00
> [    2.482450] x29: ffff8000857cbb00 x28: ffff8000806b759c x27: 8000000000000005
> [    2.482988] x26: 0000000000000000 x25: ffff0000802cbaa0 x24: ffff0000809228b0
> [    2.483525] x23: 0000000000000000 x22: ffff800082b462f0 x21: 0000000000001000
> [    2.484062] x20: ffff80008363d000 x19: ffff000080922880 x18: fffffffffffc9660
> [    2.484600] x17: 0000000000000020 x16: 0000000000000000 x15: 0000000000000038
> [    2.485137] x14: 0000000000000000 x13: ffff800082649258 x12: 00000000000006db
> [    2.485674] x11: 0000000000000249 x10: ffff8000826fc930 x9 : ffff800082649258
> [    2.486207] x8 : 00000000ffffdfff x7 : ffff8000826f9258 x6 : 0000000000000249
> [    2.486738] x5 : 0000000000000000 x4 : 40000000ffffe249 x3 : 0000000000000000
> [    2.487267] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000841fa300
> [    2.487792] Call trace:
> [    2.488311]  preempt_count_sub+0x90/0xd4
> [    2.488831]  _raw_spin_unlock_irqrestore+0x1c/0x44
> [    2.489352]  qcom_tzmem_alloc+0x1cc/0x224
> [    2.489873]  qsee_uefi_get_next_variable+0x78/0x2cc
> [    2.490390]  qcuefi_get_next_variable+0x50/0x94
> [    2.490907]  efivar_get_next_variable+0x20/0x2c
> [    2.491420]  efivar_init+0x8c/0x29c
> [    2.491931]  efivarfs_fill_super+0xd4/0xec
> [    2.492440]  get_tree_single+0x74/0xbc
> [    2.492948]  efivarfs_get_tree+0x18/0x24
> [    2.493453]  vfs_get_tree+0x28/0xe8
> [    2.493957]  vfs_cmd_create+0x5c/0xf4
> [    2.494459]  __do_sys_fsconfig+0x458/0x598
> [    2.494963]  __arm64_sys_fsconfig+0x24/0x30
> [    2.495468]  invoke_syscall+0x48/0x110
> [    2.495972]  el0_svc_common.constprop.0+0x40/0xe0
> [    2.496475]  do_el0_svc+0x1c/0x28
> [    2.496976]  el0_svc+0x34/0xb4
> [    2.497475]  el0t_64_sync_handler+0x120/0x12c
> [    2.497975]  el0t_64_sync+0x190/0x194
> [    2.498466] ---[ end trace 0000000000000000 ]---
> [    2.507347] qcom_scm firmware:scm: qseecom: scm call failed with error -22
> [    2.507813] efivars: get_next_variable: status=8000000000000007
>
> If I understand correctly, it enters an atomic section in
> qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
> So this shouldn't be qseecom specific.
>
> I should probably also say that I'm currently testing this on a patched
> v6.8 kernel, so there's a chance that it's my fault. However, as far as
> I understand, it enters an atomic section in qcom_tzmem_alloc() and then
> later tries to expand the pool memory with dma_alloc_coherent(). Which
> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
> issue here).
>
> I've also tried the shmem allocator option, but that seems to get stuck
> quite early at boot, before I even have usb-serial access to get any
> logs. If I can find some more time, I'll try to see if I can get some
> useful output for that.
>

Ah, I think it happens here:

+       guard(spinlock_irqsave)(&pool->lock);
+
+again:
+       vaddr = gen_pool_alloc(pool->genpool, size);
+       if (!vaddr) {
+               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
+                       goto again;

We were called with GFP_KERNEL so this is what we pass on to
qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
to revisit it. Thanks for the catch!

Bart

Bartosz Golaszewski March 29, 2024, 10:22 a.m. UTC | #2

On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >
> > If I understand correctly, it enters an atomic section in
> > qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
> > So this shouldn't be qseecom specific.
> >
> > I should probably also say that I'm currently testing this on a patched
> > v6.8 kernel, so there's a chance that it's my fault. However, as far as
> > I understand, it enters an atomic section in qcom_tzmem_alloc() and then
> > later tries to expand the pool memory with dma_alloc_coherent(). Which
> > AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
> > issue here).
> >
> > I've also tried the shmem allocator option, but that seems to get stuck
> > quite early at boot, before I even have usb-serial access to get any
> > logs. If I can find some more time, I'll try to see if I can get some
> > useful output for that.
> >
>
> Ah, I think it happens here:
>
> +       guard(spinlock_irqsave)(&pool->lock);
> +
> +again:
> +       vaddr = gen_pool_alloc(pool->genpool, size);
> +       if (!vaddr) {
> +               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
> +                       goto again;
>
> We were called with GFP_KERNEL so this is what we pass on to
> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
> to revisit it. Thanks for the catch!
>
> Bart

Can you try the following tree?

    https://git.codelinaro.org/bartosz_golaszewski/linux.git
topic/shm-bridge-v10

gen_pool_alloc() and gen_pool_add_virt() can be used without external
serialization. We only really need to protect the list of areas in the
pool when adding a new element. We could possibly even use
list_add_tail_rcu() as it updates the pointers atomically and go
lockless.

Bart

Maximilian Luz March 29, 2024, 6:53 p.m. UTC | #3

On 3/29/24 11:22 AM, Bartosz Golaszewski wrote:
> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>
>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>
>>> If I understand correctly, it enters an atomic section in
>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
>>> So this shouldn't be qseecom specific.
>>>
>>> I should probably also say that I'm currently testing this on a patched
>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as
>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then
>>> later tries to expand the pool memory with dma_alloc_coherent(). Which
>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
>>> issue here).
>>>
>>> I've also tried the shmem allocator option, but that seems to get stuck
>>> quite early at boot, before I even have usb-serial access to get any
>>> logs. If I can find some more time, I'll try to see if I can get some
>>> useful output for that.
>>>
>>
>> Ah, I think it happens here:
>>
>> +       guard(spinlock_irqsave)(&pool->lock);
>> +
>> +again:
>> +       vaddr = gen_pool_alloc(pool->genpool, size);
>> +       if (!vaddr) {
>> +               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
>> +                       goto again;
>>
>> We were called with GFP_KERNEL so this is what we pass on to
>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
>> to revisit it. Thanks for the catch!
>>
>> Bart
> 
> Can you try the following tree?
> 
>      https://git.codelinaro.org/bartosz_golaszewski/linux.git
> topic/shm-bridge-v10
> 
> gen_pool_alloc() and gen_pool_add_virt() can be used without external
> serialization. We only really need to protect the list of areas in the
> pool when adding a new element. We could possibly even use
> list_add_tail_rcu() as it updates the pointers atomically and go
> lockless.

Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y.
Unfortunately, with the shmbridge mode it still gets stuck at boot (and
I haven't had the time to look into it yet).

And for more bad news: It looks like the new allocator now fully exposes
a bug that I've been tracking down the last couple of days. In short,
uefisecapp doesn't seem to be happy when we split the allocations for
request and response into two, causing commands to fail. Instead it
wants a single buffer for both. Before, it seemed to be fairly sporadic
(likely because kzalloc in sequence just returned consecutive memory
almost all of the time) but now it's basically every call that fails.

I have a fix for that almost ready and I'll likely post it in the next
hour. But that means that you'll probably have to rebase this series
on top of it...

Best regards,
Max

Maximilian Luz March 29, 2024, 6:56 p.m. UTC | #4

On 3/29/24 7:53 PM, Maximilian Luz wrote:
> On 3/29/24 11:22 AM, Bartosz Golaszewski wrote:
>> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>>
>>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>
>>>> If I understand correctly, it enters an atomic section in
>>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
>>>> So this shouldn't be qseecom specific.
>>>>
>>>> I should probably also say that I'm currently testing this on a patched
>>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as
>>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then
>>>> later tries to expand the pool memory with dma_alloc_coherent(). Which
>>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
>>>> issue here).
>>>>
>>>> I've also tried the shmem allocator option, but that seems to get stuck
>>>> quite early at boot, before I even have usb-serial access to get any
>>>> logs. If I can find some more time, I'll try to see if I can get some
>>>> useful output for that.
>>>>
>>>
>>> Ah, I think it happens here:
>>>
>>> +       guard(spinlock_irqsave)(&pool->lock);
>>> +
>>> +again:
>>> +       vaddr = gen_pool_alloc(pool->genpool, size);
>>> +       if (!vaddr) {
>>> +               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
>>> +                       goto again;
>>>
>>> We were called with GFP_KERNEL so this is what we pass on to
>>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
>>> to revisit it. Thanks for the catch!
>>>
>>> Bart
>>
>> Can you try the following tree?
>>
>>      https://git.codelinaro.org/bartosz_golaszewski/linux.git
>> topic/shm-bridge-v10
>>
>> gen_pool_alloc() and gen_pool_add_virt() can be used without external
>> serialization. We only really need to protect the list of areas in the
>> pool when adding a new element. We could possibly even use
>> list_add_tail_rcu() as it updates the pointers atomically and go
>> lockless.
> 
> Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y.
> Unfortunately, with the shmbridge mode it still gets stuck at boot (and
> I haven't had the time to look into it yet).
> 
> And for more bad news: It looks like the new allocator now fully exposes
> a bug that I've been tracking down the last couple of days. In short,
> uefisecapp doesn't seem to be happy when we split the allocations for
> request and response into two, causing commands to fail. Instead it
> wants a single buffer for both. Before, it seemed to be fairly sporadic
> (likely because kzalloc in sequence just returned consecutive memory
> almost all of the time) but now it's basically every call that fails.
> 
> I have a fix for that almost ready and I'll likely post it in the next
> hour. But that means that you'll probably have to rebase this series
> on top of it...

Forgot to mention: I tested it with the fix and this series, and that
works.

> Best regards,
> Max
>

Bartosz Golaszewski March 29, 2024, 7:07 p.m. UTC | #5

On Fri, Mar 29, 2024 at 7:56 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
>
>
> On 3/29/24 7:53 PM, Maximilian Luz wrote:
> > On 3/29/24 11:22 AM, Bartosz Golaszewski wrote:
> >> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >>>
> >>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>>>
> >>>> If I understand correctly, it enters an atomic section in
> >>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
> >>>> So this shouldn't be qseecom specific.
> >>>>
> >>>> I should probably also say that I'm currently testing this on a patched
> >>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as
> >>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then
> >>>> later tries to expand the pool memory with dma_alloc_coherent(). Which
> >>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
> >>>> issue here).
> >>>>
> >>>> I've also tried the shmem allocator option, but that seems to get stuck
> >>>> quite early at boot, before I even have usb-serial access to get any
> >>>> logs. If I can find some more time, I'll try to see if I can get some
> >>>> useful output for that.
> >>>>
> >>>
> >>> Ah, I think it happens here:
> >>>
> >>> +       guard(spinlock_irqsave)(&pool->lock);
> >>> +
> >>> +again:
> >>> +       vaddr = gen_pool_alloc(pool->genpool, size);
> >>> +       if (!vaddr) {
> >>> +               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
> >>> +                       goto again;
> >>>
> >>> We were called with GFP_KERNEL so this is what we pass on to
> >>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
> >>> to revisit it. Thanks for the catch!
> >>>
> >>> Bart
> >>
> >> Can you try the following tree?
> >>
> >>      https://git.codelinaro.org/bartosz_golaszewski/linux.git
> >> topic/shm-bridge-v10
> >>
> >> gen_pool_alloc() and gen_pool_add_virt() can be used without external
> >> serialization. We only really need to protect the list of areas in the
> >> pool when adding a new element. We could possibly even use
> >> list_add_tail_rcu() as it updates the pointers atomically and go
> >> lockless.
> >
> > Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y.
> > Unfortunately, with the shmbridge mode it still gets stuck at boot (and
> > I haven't had the time to look into it yet).
> >
> > And for more bad news: It looks like the new allocator now fully exposes
> > a bug that I've been tracking down the last couple of days. In short,
> > uefisecapp doesn't seem to be happy when we split the allocations for
> > request and response into two, causing commands to fail. Instead it
> > wants a single buffer for both. Before, it seemed to be fairly sporadic
> > (likely because kzalloc in sequence just returned consecutive memory
> > almost all of the time) but now it's basically every call that fails.
> >
> > I have a fix for that almost ready and I'll likely post it in the next
> > hour. But that means that you'll probably have to rebase this series
> > on top of it...
>
> Forgot to mention: I tested it with the fix and this series, and that
> works.
>

Both with and without SHM bridge?

If so, please Cc me on the fix.

Bart

Maximilian Luz March 29, 2024, 7:22 p.m. UTC | #6

On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
> On Fri, Mar 29, 2024 at 7:56 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>>
>>
>> On 3/29/24 7:53 PM, Maximilian Luz wrote:
>>> On 3/29/24 11:22 AM, Bartosz Golaszewski wrote:
>>>> On Thu, Mar 28, 2024 at 7:55 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>>>>
>>>>> On Thu, Mar 28, 2024 at 5:50 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>>>
>>>>>> If I understand correctly, it enters an atomic section in
>>>>>> qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
>>>>>> So this shouldn't be qseecom specific.
>>>>>>
>>>>>> I should probably also say that I'm currently testing this on a patched
>>>>>> v6.8 kernel, so there's a chance that it's my fault. However, as far as
>>>>>> I understand, it enters an atomic section in qcom_tzmem_alloc() and then
>>>>>> later tries to expand the pool memory with dma_alloc_coherent(). Which
>>>>>> AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
>>>>>> issue here).
>>>>>>
>>>>>> I've also tried the shmem allocator option, but that seems to get stuck
>>>>>> quite early at boot, before I even have usb-serial access to get any
>>>>>> logs. If I can find some more time, I'll try to see if I can get some
>>>>>> useful output for that.
>>>>>>
>>>>>
>>>>> Ah, I think it happens here:
>>>>>
>>>>> +       guard(spinlock_irqsave)(&pool->lock);
>>>>> +
>>>>> +again:
>>>>> +       vaddr = gen_pool_alloc(pool->genpool, size);
>>>>> +       if (!vaddr) {
>>>>> +               if (qcom_tzmem_try_grow_pool(pool, size, gfp))
>>>>> +                       goto again;
>>>>>
>>>>> We were called with GFP_KERNEL so this is what we pass on to
>>>>> qcom_tzmem_try_grow_pool() but we're now holding the spinlock. I need
>>>>> to revisit it. Thanks for the catch!
>>>>>
>>>>> Bart
>>>>
>>>> Can you try the following tree?
>>>>
>>>>       https://git.codelinaro.org/bartosz_golaszewski/linux.git
>>>> topic/shm-bridge-v10
>>>>
>>>> gen_pool_alloc() and gen_pool_add_virt() can be used without external
>>>> serialization. We only really need to protect the list of areas in the
>>>> pool when adding a new element. We could possibly even use
>>>> list_add_tail_rcu() as it updates the pointers atomically and go
>>>> lockless.
>>>
>>> Thanks! That fixes the allocations for CONFIG_QCOM_TZMEM_MODE_GENERIC=y.
>>> Unfortunately, with the shmbridge mode it still gets stuck at boot (and
>>> I haven't had the time to look into it yet).
>>>
>>> And for more bad news: It looks like the new allocator now fully exposes
>>> a bug that I've been tracking down the last couple of days. In short,
>>> uefisecapp doesn't seem to be happy when we split the allocations for
>>> request and response into two, causing commands to fail. Instead it
>>> wants a single buffer for both. Before, it seemed to be fairly sporadic
>>> (likely because kzalloc in sequence just returned consecutive memory
>>> almost all of the time) but now it's basically every call that fails.
>>>
>>> I have a fix for that almost ready and I'll likely post it in the next
>>> hour. But that means that you'll probably have to rebase this series
>>> on top of it...
>>
>> Forgot to mention: I tested it with the fix and this series, and that
>> works.
>>
> 
> Both with and without SHM bridge?

With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
still get stuck at boot (regardless of the fix). I think that's
happening even before anything efivar related should come up.

> If so, please Cc me on the fix.

Sure, will do.

Best regards,
Max

Bartosz Golaszewski March 29, 2024, 7:26 p.m. UTC | #7

On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
> >
> > Both with and without SHM bridge?
>
> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
> still get stuck at boot (regardless of the fix). I think that's
> happening even before anything efivar related should come up.
>

This is on X13s? I will get one in 3 weeks. Can you get the bootlog
somehow? Does the laptop have any serial console?

Bart

Maximilian Luz March 29, 2024, 7:38 p.m. UTC | #8

On 3/29/24 8:26 PM, Bartosz Golaszewski wrote:
> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
>>>
>>> Both with and without SHM bridge?
>>
>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
>> still get stuck at boot (regardless of the fix). I think that's
>> happening even before anything efivar related should come up.
>>
> 
> This is on X13s? I will get one in 3 weeks. Can you get the bootlog
> somehow? Does the laptop have any serial console?

Surface Pro X (sc8180x), but it should be similar enough to the X13s in
that regard. At least from what people with access to the X13s told me,
the qseecom stuff seems to behave the same.

Unfortunately I don't have a direct serial console. Best I have is
USB-serial, but it's not even getting there. I'll have to try and see if
I can get some more info on the screen.

Best regards,
Max

Bartosz Golaszewski March 29, 2024, 7:46 p.m. UTC | #9

On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote:
> > On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>
> >> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
> >>>
> >>> Both with and without SHM bridge?
> >>
> >> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
> >> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
> >> still get stuck at boot (regardless of the fix). I think that's
> >> happening even before anything efivar related should come up.
> >>
> >
> > This is on X13s? I will get one in 3 weeks. Can you get the bootlog
> > somehow? Does the laptop have any serial console?
>
> Surface Pro X (sc8180x), but it should be similar enough to the X13s in
> that regard. At least from what people with access to the X13s told me,
> the qseecom stuff seems to behave the same.
>
> Unfortunately I don't have a direct serial console. Best I have is
> USB-serial, but it's not even getting there. I'll have to try and see if
> I can get some more info on the screen.
>

I have access to a sc8180x-primus board, does it make sense to test
with this one? If so, could you give me instructions on how to do it?

Bart

Maximilian Luz March 29, 2024, 7:57 p.m. UTC | #10

On 3/29/24 8:46 PM, Bartosz Golaszewski wrote:
> On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote:
>>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>
>>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
>>>>>
>>>>> Both with and without SHM bridge?
>>>>
>>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
>>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
>>>> still get stuck at boot (regardless of the fix). I think that's
>>>> happening even before anything efivar related should come up.
>>>>
>>>
>>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog
>>> somehow? Does the laptop have any serial console?
>>
>> Surface Pro X (sc8180x), but it should be similar enough to the X13s in
>> that regard. At least from what people with access to the X13s told me,
>> the qseecom stuff seems to behave the same.
>>
>> Unfortunately I don't have a direct serial console. Best I have is
>> USB-serial, but it's not even getting there. I'll have to try and see if
>> I can get some more info on the screen.
>>
> 
> I have access to a sc8180x-primus board, does it make sense to test
> with this one? If so, could you give me instructions on how to do it?

I guess it's worth a shot.

 From what I can tell, there shouldn't be any patches in my tree that
would conflict with it. So I guess it should just be building it with
CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting.

I am currently testing it on top of a patched v6.8 tree though (but that
should just contain patches to get the Pro X running). You can find the
full tree at

     https://github.com/linux-surface/kernel/tree/spx/v6.8

The last commit is the fix I mentioned, so you might want to revert
that, since the shmem issue triggers regardless of that and it prevents
your series from applying cleanly.

Best regards,
Max

Bartosz Golaszewski March 30, 2024, 7:16 p.m. UTC | #11

On Fri, 29 Mar 2024 20:57:52 +0100, Maximilian Luz
<luzmaximilian@gmail.com> said:
> On 3/29/24 8:46 PM, Bartosz Golaszewski wrote:
>> On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>
>>> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote:
>>>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>>
>>>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
>>>>>>
>>>>>> Both with and without SHM bridge?
>>>>>
>>>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
>>>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
>>>>> still get stuck at boot (regardless of the fix). I think that's
>>>>> happening even before anything efivar related should come up.
>>>>>
>>>>
>>>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog
>>>> somehow? Does the laptop have any serial console?
>>>
>>> Surface Pro X (sc8180x), but it should be similar enough to the X13s in
>>> that regard. At least from what people with access to the X13s told me,
>>> the qseecom stuff seems to behave the same.
>>>
>>> Unfortunately I don't have a direct serial console. Best I have is
>>> USB-serial, but it's not even getting there. I'll have to try and see if
>>> I can get some more info on the screen.
>>>
>>
>> I have access to a sc8180x-primus board, does it make sense to test
>> with this one? If so, could you give me instructions on how to do it?
>
> I guess it's worth a shot.
>
>  From what I can tell, there shouldn't be any patches in my tree that
> would conflict with it. So I guess it should just be building it with
> CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting.
>
> I am currently testing it on top of a patched v6.8 tree though (but that
> should just contain patches to get the Pro X running). You can find the
> full tree at
>
>      https://github.com/linux-surface/kernel/tree/spx/v6.8
>
> The last commit is the fix I mentioned, so you might want to revert
> that, since the shmem issue triggers regardless of that and it prevents
> your series from applying cleanly.
>
> Best regards,
> Max
>

sc8180x-primus' support upstream is quite flaky. The board boots 50% of time.
However it's true that with SHM bridge it gets to:

mount: mounting efivarfs on /sys/firmware/efi/efivars failed:
Operation not supported

and stops 100% of the time. Without SHM bridge I cannot boot it either because
I suppose I need the patch you sent yesterday. I haven't had the time to
rebase it yet, it's quite intrusive to my series.

I can confirm that with that patch the board still boots but still 50% of the
time.

Bart

Maximilian Luz April 4, 2024, 6:37 p.m. UTC | #12

On 4/3/24 9:47 AM, Bartosz Golaszewski wrote:
> On Tue, Apr 2, 2024 at 10:44 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>
>> On Sat, Mar 30, 2024 at 8:16 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>>
>>> On Fri, 29 Mar 2024 20:57:52 +0100, Maximilian Luz <luzmaximilian@gmail.com> said:
>>>> On 3/29/24 8:46 PM, Bartosz Golaszewski wrote:
>>>>> On Fri, 29 Mar 2024 at 20:39, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>>>
>>>>>> On 3/29/24 8:26 PM, Bartosz Golaszewski wrote:
>>>>>>> On Fri, 29 Mar 2024 at 20:22, Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 3/29/24 8:07 PM, Bartosz Golaszewski wrote:
>>>>>>>>>
>>>>>>>>> Both with and without SHM bridge?
>>>>>>>>
>>>>>>>> With CONFIG_QCOM_TZMEM_MODE_GENERIC=y (and the upcoming fix) everything
>>>>>>>> works. With CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y things unfortunately
>>>>>>>> still get stuck at boot (regardless of the fix). I think that's
>>>>>>>> happening even before anything efivar related should come up.
>>>>>>>>
>>>>>>>
>>>>>>> This is on X13s? I will get one in 3 weeks. Can you get the bootlog
>>>>>>> somehow? Does the laptop have any serial console?
>>>>>>
>>>>>> Surface Pro X (sc8180x), but it should be similar enough to the X13s in
>>>>>> that regard. At least from what people with access to the X13s told me,
>>>>>> the qseecom stuff seems to behave the same.
>>>>>>
>>>>>> Unfortunately I don't have a direct serial console. Best I have is
>>>>>> USB-serial, but it's not even getting there. I'll have to try and see if
>>>>>> I can get some more info on the screen.
>>>>>>
>>>>>
>>>>> I have access to a sc8180x-primus board, does it make sense to test
>>>>> with this one? If so, could you give me instructions on how to do it?
>>>>
>>>> I guess it's worth a shot.
>>>>
>>>>   From what I can tell, there shouldn't be any patches in my tree that
>>>> would conflict with it. So I guess it should just be building it with
>>>> CONFIG_QCOM_TZMEM_MODE_SHMBRIDGE=y and booting.
>>>>
>>>> I am currently testing it on top of a patched v6.8 tree though (but that
>>>> should just contain patches to get the Pro X running). You can find the
>>>> full tree at
>>>>
>>>>       https://github.com/linux-surface/kernel/tree/spx/v6.8
>>>>
>>>> The last commit is the fix I mentioned, so you might want to revert
>>>> that, since the shmem issue triggers regardless of that and it prevents
>>>> your series from applying cleanly.
>>>>
>>>> Best regards,
>>>> Max
>>>>
>>>
>>> sc8180x-primus' support upstream is quite flaky. The board boots 50% of time.
>>> However it's true that with SHM bridge it gets to:
>>>
>>> mount: mounting efivarfs on /sys/firmware/efi/efivars failed: Operation not supported
>>>
>>> and stops 100% of the time. Without SHM bridge I cannot boot it either because
>>> I suppose I need the patch you sent yesterday. I haven't had the time to
>>> rebase it yet, it's quite intrusive to my series.
>>>
>>> I can confirm that with that patch the board still boots but still 50% of the
>>> time.
>>>
>>> Bart
>>
>> Hi!
>>
>> I was under the impression that until v8, the series worked on sc8180x
>> but I'm seeing that even v7 has the same issue with SHM Bridge on
>> sc8180x-primus. Could you confirm? Because I'm not sure if I should
>> track the differences or the whole thing was broken for this platform
>> from the beginning.
>>
>> Bart
> 
> Interestingly, it doesn't seem like a problem with qseecom - even if I
> disable the driver, the board still freezes after the first SCM call
> using SHM bridge. I suspect - and am trying to clarify that with qcom
> - that this architecture doesn't support SHM bridge but doesn't report
> it either unlike other older platforms. Or maybe there's some quirk
> somewhere. Anyway, I'm on it.

Awesome, thanks!

Best regards,
Max

[v9,00/13] firmware: qcom: qseecom: convert to using the TZ allocator

Message

Comments