mbox series

[bpf-next,v2,0/3] bpf: Fix array bounds error with may_goto and add selftest

Message ID 20250213131214.164982-1-mrpre@163.com
Headers show
Series bpf: Fix array bounds error with may_goto and add selftest | expand

Message

Jiayuan Chen Feb. 13, 2025, 1:12 p.m. UTC
Syzbot caught an array out-of-bounds bug [1]. It turns out that when the
BPF program runs through do_misc_fixups(), it allocates an extra 8 bytes
on the call stack, which eventually causes stack_depth to exceed 512.

I was able to reproduce this issue probabilistically by enabling
CONFIG_UBSAN=y and disabling CONFIG_BPF_JIT_ALWAYS_ON with the selfttest
I provide in second patch(although it doesn't happen every time - I didn't
dig deeper into why UBSAN behaves this way).

Furthermore, if I set /proc/sys/net/core/bpf_jit_enable to 0 to disable
the jit, a panic occurs, and the reason is the same, that bpf_func is
assigned an incorrect address.

[---[ end trace ]---
[Oops: general protection fault, probably for non-canonical address
0x100f0e0e0d090808: 0000 [#1] PREEMPT SMP NOPTI
[Tainted: [W]=WARN, [O]=OOT_MODULE
[RIP: 0010:bpf_test_run+0x1d2/0x360
[RSP: 0018:ffffafc7955178a0 EFLAGS: 00010246
[RAX: 100f0e0e0d090808 RBX: ffff8e9fdb2c4100 RCX: 0000000000000018
[RDX: 00000000002b5b18 RSI: ffffafc780497048 RDI: ffff8ea04d601700
[RBP: ffffafc780497000 R08: ffffafc795517a0c R09: 0000000000000000
[R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8ea04d601700
[R13: ffffafc795517928 R14: ffffafc795517928 R15: 0000000000000000
[CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[CR2: 00007f181c064648 CR3: 00000001aa2be003 CR4: 0000000000770ef0
[DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[PKRU: 55555554
[Call Trace:
[ <TASK>
[ ? die_addr+0x36/0x90
[ ? exc_general_protection+0x237/0x430
[ ? asm_exc_general_protection+0x26/0x30
[ ? bpf_test_run+0x1d2/0x360
[ ? bpf_test_run+0x10d/0x360
[ ? __link_object+0x12a/0x1e0
[ ? slab_build_skb+0x23/0x130
[ ? kmem_cache_alloc_noprof+0x2ea/0x3f0
[ ? sk_prot_alloc+0xc2/0x120
[ bpf_prog_test_run_skb+0x21b/0x590
[ __sys_bpf+0x340/0xa80
[ __x64_sys_bpf+0x1e/0x30

---
v1 -> v2:
Directly reject loading programs with a stack size greater than 512 when
jit disabled.(Suggested by Alexei Starovoitov)
https://lore.kernel.org/bpf/20250212135251.85487-1-mrpre@163.com/T/#u

---
Jiayuan Chen (3):
  bpf: Fix array bounds error with may_goto
  selftests/bpf: Allow the program to select specific modes for testing
  selftests/bpf: Add selftest for may_goto

 kernel/bpf/core.c                             | 18 +++++--
 kernel/bpf/verifier.c                         |  7 +++
 tools/testing/selftests/bpf/progs/bpf_misc.h  |  2 +
 .../selftests/bpf/progs/verifier_stack_ptr.c  | 50 +++++++++++++++++++
 tools/testing/selftests/bpf/test_loader.c     | 27 ++++++++++
 5 files changed, 100 insertions(+), 4 deletions(-)

Comments

Alexei Starovoitov Feb. 14, 2025, 12:58 a.m. UTC | #1
On Thu, Feb 13, 2025 at 8:42 AM Jiayuan Chen <mrpre@163.com> wrote:
>
> On Thu, Feb 13, 2025 at 08:04:05AM -0800, Alexei Starovoitov wrote:
> > On Thu, Feb 13, 2025 at 5:13 AM Jiayuan Chen <mrpre@163.com> wrote:
> > >
> > > Add test cases to ensure the maximum stack size can be properly limited to
> > > 512.
> > >
> > > Test result:
> > > echo "0" > /proc/sys/net/core/bpf_jit_enable
> > > ./test_progs -t verifier_stack_ptr
> > > verifier_stack_ptr/PTR_TO_STACK stack size 512 with may_goto with jit:SKIP
> > > verifier_stack_ptr/PTR_TO_STACK stack size 512 with may_goto without jit:OK
> > >
> > > echo "1" > /proc/sys/net/core/bpf_jit_enable
> > > verifier_stack_ptr/PTR_TO_STACK stack size 512 with may_goto with jit:OK
> > > verifier_stack_ptr/PTR_TO_STACK stack size 512 with may_goto without jit:SKIP
> >
> > echo '0|1' is not longer necessary ?
> > The commit log seems obsolete?
> >
> > pw-bot: cr
>
> It looks like the problem only arises when CONFIG_BPF_JIT_ALWAYS_ON is
> turned off, and we're only restricting the stack size when
> prog->jit_requested is false. To test this, I simulated different
> scenarios by echoing '0' or '1' to see how the program would behave when
> jit_requested is enabled or disabled.
>
> As expected, when I echoed '0', the program failed verification, and when
> I echoed '1', it ran smoothly.

I misunderstood the tags in patch 2. I thought:

+#define __use_jit() __attribute__((btf_decl_tag("comment:run_mode=jit")))
+#define __use_interp()
__attribute__((btf_decl_tag("comment:run_mode=interpreter")))

"use jit" actually means use jit.

while what it's doing is different:

+ if ((jit_enabled && spec->run_mode & INTERP) ||
+    (!jit_enabled && spec->run_mode & JIT)) {
+     test__skip();
+     return;
+ }
+

The tags should probably be named __load_if_JITed and __load_if_interpreted
or something like that.