diff mbox series

efi: don't map the entire mokvar table to determine its size

Message ID 20250226201839.2374631-1-pjones@redhat.com
State Accepted
Commit 2b90e7ace79774a3540ce569e000388f8d22c9e0
Headers show
Series efi: don't map the entire mokvar table to determine its size | expand

Commit Message

Peter Jones Feb. 26, 2025, 8:18 p.m. UTC
Currently when validating the mokvar table, we (re)map the entire table
on each iteration of the loop, adding space as we discover new entries.
If the table grows over a certain size, this fails due to limitations of
early_memmap(), and we get a failure and traceback:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.15-200.fc41.x86_64 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250221-6.copr8698600 02/21/2025
  RIP: 0010:__early_ioremap+0xef/0x220
  Code: e5 00 f0 ff ff 48 81 e5 00 f0 ff ff 4c 89 6c 24 08 41 81 e4 ff 0f 00 00 4c 29 ed 48 89 e8 48 c1 e8 0c 41 89 c7 83 f8 40 76 04 <0f> 0b eb 82 45 6b ee c0 41 81 c5 ff 05 00 00 45 85 ff 74 36 83 3d
  RSP: 0000:ffffffff96803dd8 EFLAGS: 00010002 ORIG_RAX: 0000000000000000
  RAX: 0000000000000041 RBX: 0000000000000001 RCX: ffffffff97768250
  RDX: 8000000000000163 RSI: 0000000000000001 RDI: 000000007c4c3000
  RBP: 0000000000041000 R08: ffffffffff201630 R09: 0000000000000030
  R10: 000000007c4c3000 R11: ffffffff96803e20 R12: 0000000000000000
  R13: 000000007c4c3000 R14: 0000000000000001 R15: 0000000000000041
  FS:  0000000000000000(0000) GS:ffffffff97291000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff9f1d8000040e CR3: 00000000653a4000 CR4: 00000000000000f0
  Call Trace:
   <TASK>
   ? __early_ioremap+0xef/0x220
   ? __warn.cold+0x93/0xfa
   ? __early_ioremap+0xef/0x220
   ? report_bug+0xff/0x140
   ? early_fixup_exception+0x5d/0xb0
   ? early_idt_handler_common+0x2f/0x3a
   ? __early_ioremap+0xef/0x220
   ? efi_mokvar_table_init+0xce/0x1d0
   ? setup_arch+0x864/0xc10
   ? start_kernel+0x6b/0xa10
   ? x86_64_start_reservations+0x24/0x30
   ? x86_64_start_kernel+0xed/0xf0
   ? common_startup_64+0x13e/0x141
   </TASK>
  ---[ end trace 0000000000000000 ]---
  mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187.

Mapping the entire structure isn't actually necessary, as we don't ever
need more than one entry header mapped at once.

This patch changes efi_mokvar_table_init() to only map each entry
header, not the entire table, when determining the table size.  Since
we're not mapping any data past the variable name, it also changes the
code to enforce that each variable name is NUL terminated, rather than
attempting to verify it in place.

Signed-off-by: Peter Jones <pjones@redhat.com>
---
 drivers/firmware/efi/mokvar-table.c | 41 +++++++++--------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

Comments

Ard Biesheuvel Feb. 27, 2025, 7:50 a.m. UTC | #1
On Wed, 26 Feb 2025 at 21:18, Peter Jones <pjones@redhat.com> wrote:
>
> Currently when validating the mokvar table, we (re)map the entire table
> on each iteration of the loop, adding space as we discover new entries.
> If the table grows over a certain size, this fails due to limitations of
> early_memmap(), and we get a failure and traceback:
>
>   ------------[ cut here ]------------
>   WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:139 __early_ioremap+0xef/0x220
>   Modules linked in:
>   CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.15-200.fc41.x86_64 #1
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20250221-6.copr8698600 02/21/2025
>   RIP: 0010:__early_ioremap+0xef/0x220
>   Code: e5 00 f0 ff ff 48 81 e5 00 f0 ff ff 4c 89 6c 24 08 41 81 e4 ff 0f 00 00 4c 29 ed 48 89 e8 48 c1 e8 0c 41 89 c7 83 f8 40 76 04 <0f> 0b eb 82 45 6b ee c0 41 81 c5 ff 05 00 00 45 85 ff 74 36 83 3d
>   RSP: 0000:ffffffff96803dd8 EFLAGS: 00010002 ORIG_RAX: 0000000000000000
>   RAX: 0000000000000041 RBX: 0000000000000001 RCX: ffffffff97768250
>   RDX: 8000000000000163 RSI: 0000000000000001 RDI: 000000007c4c3000
>   RBP: 0000000000041000 R08: ffffffffff201630 R09: 0000000000000030
>   R10: 000000007c4c3000 R11: ffffffff96803e20 R12: 0000000000000000
>   R13: 000000007c4c3000 R14: 0000000000000001 R15: 0000000000000041
>   FS:  0000000000000000(0000) GS:ffffffff97291000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: ffff9f1d8000040e CR3: 00000000653a4000 CR4: 00000000000000f0
>   Call Trace:
>    <TASK>
>    ? __early_ioremap+0xef/0x220
>    ? __warn.cold+0x93/0xfa
>    ? __early_ioremap+0xef/0x220
>    ? report_bug+0xff/0x140
>    ? early_fixup_exception+0x5d/0xb0
>    ? early_idt_handler_common+0x2f/0x3a
>    ? __early_ioremap+0xef/0x220
>    ? efi_mokvar_table_init+0xce/0x1d0
>    ? setup_arch+0x864/0xc10
>    ? start_kernel+0x6b/0xa10
>    ? x86_64_start_reservations+0x24/0x30
>    ? x86_64_start_kernel+0xed/0xf0
>    ? common_startup_64+0x13e/0x141
>    </TASK>
>   ---[ end trace 0000000000000000 ]---
>   mokvar: Failed to map EFI MOKvar config table pa=0x7c4c3000, size=265187.
>
> Mapping the entire structure isn't actually necessary, as we don't ever
> need more than one entry header mapped at once.
>
> This patch changes efi_mokvar_table_init() to only map each entry
> header, not the entire table, when determining the table size.  Since
> we're not mapping any data past the variable name, it also changes the
> code to enforce that each variable name is NUL terminated, rather than
> attempting to verify it in place.
>
> Signed-off-by: Peter Jones <pjones@redhat.com>
> ---
>  drivers/firmware/efi/mokvar-table.c | 41 +++++++++--------------------
>  1 file changed, 13 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
> index 5ed0602c2f7..66eb83a0f12 100644
> --- a/drivers/firmware/efi/mokvar-table.c
> +++ b/drivers/firmware/efi/mokvar-table.c
> @@ -103,7 +103,6 @@ void __init efi_mokvar_table_init(void)
>         void *va = NULL;
>         unsigned long cur_offset = 0;
>         unsigned long offset_limit;
> -       unsigned long map_size = 0;
>         unsigned long map_size_needed = 0;
>         unsigned long size;
>         struct efi_mokvar_table_entry *mokvar_entry;
> @@ -134,48 +133,34 @@ void __init efi_mokvar_table_init(void)
>          */
>         err = -EINVAL;
>         while (cur_offset + sizeof(*mokvar_entry) <= offset_limit) {
> -               mokvar_entry = va + cur_offset;
> -               map_size_needed = cur_offset + sizeof(*mokvar_entry);
> -               if (map_size_needed > map_size) {
> -                       if (va)
> -                               early_memunmap(va, map_size);
> -                       /*
> -                        * Map a little more than the fixed size entry
> -                        * header, anticipating some data. It's safe to
> -                        * do so as long as we stay within current memory
> -                        * descriptor.
> -                        */
> -                       map_size = min(map_size_needed + 2*EFI_PAGE_SIZE,
> -                                      offset_limit);
> -                       va = early_memremap(efi.mokvar_table, map_size);
> -                       if (!va) {
> -                               pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%lu.\n",
> -                                      efi.mokvar_table, map_size);
> -                               return;
> -                       }
> -                       mokvar_entry = va + cur_offset;
> +               if (va)
> +                       early_memunmap(va, sizeof(*mokvar_entry));
> +               va = early_memremap(efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
> +               if (!va) {
> +                       pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%zu.\n",
> +                              efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
> +                       return;
>                 }
> +               mokvar_entry = va;
>
>                 /* Check for last sentinel entry */
>                 if (mokvar_entry->name[0] == '\0') {
>                         if (mokvar_entry->data_size != 0)
>                                 break;
>                         err = 0;
> +                       map_size_needed = cur_offset + sizeof(*mokvar_entry);
>                         break;
>                 }
>
> -               /* Sanity check that the name is null terminated */
> -               size = strnlen(mokvar_entry->name,
> -                              sizeof(mokvar_entry->name));
> -               if (size >= sizeof(mokvar_entry->name))
> -                       break;
> +               /* Enforce that the name is null terminated */
> +               mokvar_entry->name[sizeof(mokvar_entry->name)-1] = '\0';
>
>                 /* Advance to the next entry */
> -               cur_offset = map_size_needed + mokvar_entry->data_size;
> +               cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
>         }
>
>         if (va)
> -               early_memunmap(va, map_size);
> +               early_memunmap(va, sizeof(*mokvar_entry));
>         if (err) {
>                 pr_err("EFI MOKvar config table is not valid\n");
>                 return;

Thanks for the fix.

Should we add something like the below to avoid mapping the same page
over and over again? Or is this premature optimization?


--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -99,13 +99,13 @@
  */
 void __init efi_mokvar_table_init(void)
 {
+       struct efi_mokvar_table_entry __aligned(1) *mokvar_entry, *next_entry;
        efi_memory_desc_t md;
        void *va = NULL;
        unsigned long cur_offset = 0;
        unsigned long offset_limit;
        unsigned long map_size_needed = 0;
        unsigned long size;
-       struct efi_mokvar_table_entry *mokvar_entry;
        int err;

        if (!efi_enabled(EFI_MEMMAP))
@@ -142,7 +142,7 @@
                        return;
                }
                mokvar_entry = va;
-
+next:
                /* Check for last sentinel entry */
                if (mokvar_entry->name[0] == '\0') {
                        if (mokvar_entry->data_size != 0)
@@ -156,7 +156,19 @@ void __init efi_mokvar_table_init(void)
                mokvar_entry->name[sizeof(mokvar_entry->name) - 1] = '\0';

                /* Advance to the next entry */
-               cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
+               size = sizeof(*mokvar_entry) + mokvar_entry->data_size;
+               cur_offset += size;
+
+               /*
+                * Don't bother remapping if the current entry header and the
+                * next one end on the same page.
+                */
+               next_entry = (void *)((unsigned long)mokvar_entry + size);
+               if (((((unsigned long)(mokvar_entry + 1) - 1) ^
+                     ((unsigned long)(next_entry + 1) - 1)) &
PAGE_MASK) == 0) {
+                       mokvar_entry = next_entry;
+                       goto next;
+               }
        }

        if (va)
Peter Jones Feb. 27, 2025, 5:13 p.m. UTC | #2
On Thu, Feb 27, 2025 at 08:50:08AM +0100, Ard Biesheuvel wrote:
> 
> Should we add something like the below to avoid mapping the same page
> over and over again? Or is this premature optimization?
> 

I can't honestly say I'm sure either way, but I'm leaning towards
thinking it's probably worthwhile.  On my development tree the number of
these we wind up doing in the maximal case is 31, and but in a typical
case it's more like 20, with a series that looks something like the
below list of sizes and relative addresses from the first entry.  (I
generated this with ls and awk, so it's not quite exact but it's
fairly representative.)  I've marked which ones could be eliminated.

map 264 at 0x0
unmap 264 at 0x0    <-- gone
map 264 at 0x146    <-- gone
unmap 264 at 0x146  <-- gone
map 264 at 0x2a2    <-- gone
unmap 264 at 0x2a2  <-- gone
map 264 at 0x43e    <-- gone
unmap 264 at 0x43e  <-- gone
map 264 at 0x548    <-- gone
unmap 264 at 0x548  <-- gone
map 264 at 0x660    <-- gone
unmap 264 at 0x660  <-- gone
map 264 at 0x84d    <-- gone
unmap 264 at 0x84d
map 264 at 0x191f
unmap 264 at 0x191f <-- gone
map 264 at 0x1a73   <-- gone
unmap 264 at 0x1a73 <-- gone
map 264 at 0x1b7c   <-- gone
unmap 264 at 0x1b7c <-- gone
map 264 at 0x1cd0   <-- gone
unmap 264 at 0x1cd0
map 264 at 0x21a8
unmap 264 at 0x21a8 <-- gone
map 264 at 0x22c2   <-- gone
unmap 264 at 0x22c2 <-- gone
map 264 at 0x23cb   <-- gone
unmap 264 at 0x23cb <-- gone
map 264 at 0x24d4   <-- gone
unmap 264 at 0x24d4 <-- gone
map 264 at 0x263c   <-- gone
unmap 264 at 0x263c <-- gone
map 264 at 0x2746   <-- gone
unmap 264 at 0x2746
map 264 at 0x4043
unmap 264 at 0x4043
map 264 at 0x86f7
unmap 264 at 0x86f7

So going from 19 map/unmap pairs to 5.  Seems like it can't hurt, but
it's a small number either way.

Anyway, I tried your patch and it works for me:

Tested-By: Peter Jones <pjones@redhat.com>
Ard Biesheuvel Feb. 27, 2025, 5:20 p.m. UTC | #3
On Thu, 27 Feb 2025 at 18:13, Peter Jones <pjones@redhat.com> wrote:
>
> On Thu, Feb 27, 2025 at 08:50:08AM +0100, Ard Biesheuvel wrote:
> >
> > Should we add something like the below to avoid mapping the same page
> > over and over again? Or is this premature optimization?
> >
>
> I can't honestly say I'm sure either way, but I'm leaning towards
> thinking it's probably worthwhile.  On my development tree the number of
> these we wind up doing in the maximal case is 31, and but in a typical
> case it's more like 20, with a series that looks something like the
> below list of sizes and relative addresses from the first entry.  (I
> generated this with ls and awk, so it's not quite exact but it's
> fairly representative.)  I've marked which ones could be eliminated.
>
> map 264 at 0x0
> unmap 264 at 0x0    <-- gone
> map 264 at 0x146    <-- gone
> unmap 264 at 0x146  <-- gone
> map 264 at 0x2a2    <-- gone
> unmap 264 at 0x2a2  <-- gone
> map 264 at 0x43e    <-- gone
> unmap 264 at 0x43e  <-- gone
> map 264 at 0x548    <-- gone
> unmap 264 at 0x548  <-- gone
> map 264 at 0x660    <-- gone
> unmap 264 at 0x660  <-- gone
> map 264 at 0x84d    <-- gone
> unmap 264 at 0x84d
> map 264 at 0x191f
> unmap 264 at 0x191f <-- gone
> map 264 at 0x1a73   <-- gone
> unmap 264 at 0x1a73 <-- gone
> map 264 at 0x1b7c   <-- gone
> unmap 264 at 0x1b7c <-- gone
> map 264 at 0x1cd0   <-- gone
> unmap 264 at 0x1cd0
> map 264 at 0x21a8
> unmap 264 at 0x21a8 <-- gone
> map 264 at 0x22c2   <-- gone
> unmap 264 at 0x22c2 <-- gone
> map 264 at 0x23cb   <-- gone
> unmap 264 at 0x23cb <-- gone
> map 264 at 0x24d4   <-- gone
> unmap 264 at 0x24d4 <-- gone
> map 264 at 0x263c   <-- gone
> unmap 264 at 0x263c <-- gone
> map 264 at 0x2746   <-- gone
> unmap 264 at 0x2746
> map 264 at 0x4043
> unmap 264 at 0x4043
> map 264 at 0x86f7
> unmap 264 at 0x86f7
>
> So going from 19 map/unmap pairs to 5.  Seems like it can't hurt, but
> it's a small number either way.
>

So this would go from 19 to 3 on a 16k pages kernel. So I'm leaning to
applying it as well.

BTW these results confirm my suspicion that these headers may appear
misaligned, hence the __aligned(1)

> Anyway, I tried your patch and it works for me:
>
> Tested-By: Peter Jones <pjones@redhat.com>
>

Thanks. I'll apply it as a separate patch, and only tag your patch cc:stable
diff mbox series

Patch

diff --git a/drivers/firmware/efi/mokvar-table.c b/drivers/firmware/efi/mokvar-table.c
index 5ed0602c2f7..66eb83a0f12 100644
--- a/drivers/firmware/efi/mokvar-table.c
+++ b/drivers/firmware/efi/mokvar-table.c
@@ -103,7 +103,6 @@  void __init efi_mokvar_table_init(void)
 	void *va = NULL;
 	unsigned long cur_offset = 0;
 	unsigned long offset_limit;
-	unsigned long map_size = 0;
 	unsigned long map_size_needed = 0;
 	unsigned long size;
 	struct efi_mokvar_table_entry *mokvar_entry;
@@ -134,48 +133,34 @@  void __init efi_mokvar_table_init(void)
 	 */
 	err = -EINVAL;
 	while (cur_offset + sizeof(*mokvar_entry) <= offset_limit) {
-		mokvar_entry = va + cur_offset;
-		map_size_needed = cur_offset + sizeof(*mokvar_entry);
-		if (map_size_needed > map_size) {
-			if (va)
-				early_memunmap(va, map_size);
-			/*
-			 * Map a little more than the fixed size entry
-			 * header, anticipating some data. It's safe to
-			 * do so as long as we stay within current memory
-			 * descriptor.
-			 */
-			map_size = min(map_size_needed + 2*EFI_PAGE_SIZE,
-				       offset_limit);
-			va = early_memremap(efi.mokvar_table, map_size);
-			if (!va) {
-				pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%lu.\n",
-				       efi.mokvar_table, map_size);
-				return;
-			}
-			mokvar_entry = va + cur_offset;
+		if (va)
+			early_memunmap(va, sizeof(*mokvar_entry));
+		va = early_memremap(efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+		if (!va) {
+			pr_err("Failed to map EFI MOKvar config table pa=0x%lx, size=%zu.\n",
+			       efi.mokvar_table + cur_offset, sizeof(*mokvar_entry));
+			return;
 		}
+		mokvar_entry = va;
 
 		/* Check for last sentinel entry */
 		if (mokvar_entry->name[0] == '\0') {
 			if (mokvar_entry->data_size != 0)
 				break;
 			err = 0;
+			map_size_needed = cur_offset + sizeof(*mokvar_entry);
 			break;
 		}
 
-		/* Sanity check that the name is null terminated */
-		size = strnlen(mokvar_entry->name,
-			       sizeof(mokvar_entry->name));
-		if (size >= sizeof(mokvar_entry->name))
-			break;
+		/* Enforce that the name is null terminated */
+		mokvar_entry->name[sizeof(mokvar_entry->name)-1] = '\0';
 
 		/* Advance to the next entry */
-		cur_offset = map_size_needed + mokvar_entry->data_size;
+		cur_offset += sizeof(*mokvar_entry) + mokvar_entry->data_size;
 	}
 
 	if (va)
-		early_memunmap(va, map_size);
+		early_memunmap(va, sizeof(*mokvar_entry));
 	if (err) {
 		pr_err("EFI MOKvar config table is not valid\n");
 		return;