mbox series

[RFC,0/2] mips: bpf: An eBPF JIT implementation for 32-bit MIPS

Message ID 20210720231036.3740924-1-johan.almbladh@anyfinetworks.com
Headers show
Series mips: bpf: An eBPF JIT implementation for 32-bit MIPS | expand

Message

Johan Almbladh July 20, 2021, 11:10 p.m. UTC
Hello!

I have been working on this JIT during last couple of weeks, following
my initial questions and thoughs around this in April ("Completing eBPF
JIT support for MIPS32"). Perhaps I should have been clearer that I
intended to add the missing functionality, but when I received no response,
no activity on the subject since 2019, and with MIPS the company switching
to RISC-V, I frankly did not think anyone else was interested. I was not
aware that Tony was working on the same thing. Anyway, here it goes.

This is an implementation of an eBPF JIT for MIPS I-V and MIPS32. The
implementation supports all 32-bit and 64-bit ALU and JMP operations,
including the recently-added atomics. 64-bit div/mod and 64-bit atomics
are implemented using function calls to math64 and atomic64 functions,
respectively. All 32-bit operations are implemented natively by the JIT.

The implemention is intended to provide good ALU32 performance, and
completeness for ALU64 instructions so it never has to fall back to the
interpreter. Care has also been taken to make the code as simple and
clean as possible. Complex and input-sensitive logic that is hard to
test has intentionally been avoided, especially for ALU64 operations.
The JIT relies on the verifier to do more complex analysis such as
explicit zero-extension.

Relation to the MIPS64 JIT
==========================
The decision to not extend the existing MIPS64 JIT with 32-bit support
was made for the following reasons.

First, the 64-bit JIT is already very complex. It contains its own static
analyzer for doing zero- and sign-extensions on 32-bit values. That is
complexity not needed for the 32-bit JIT.

Second, the 32-bit JIT has more in common with other 32-bit JITs, say, ARM,
than MIPS64. The register mapping will be different. ALU32 operations are
different. ALU64 operations are different. JMP/JMP32 operations are different.
What is native word size and easy on one is emulated and difficult on the
other, and vice-versa.

There may of course be utility code that can be shared between the two
JITs, but as a whole the 32-bit and 64-bit JITs are likely easier to
test and maintain as separate, dedicated implementations rather than as
one big JIT that needs to handle a super-set of the combined omplexity.

Register mapping
================
All 64-bit eBPF registers are mapped to native 32-bit MIPS register pairs,
and does not use any stack scratch space for register swapping. This means
that all eBPF register data is kept in CPU registers all the time, which
is good for performance of course. It also simplifies the register management
a lot and reduces the hunger for temporary registers since we do not have
to move data around.

Native register pairs are ordered according to CPU endianness, following the
O32 calling convention for passing 64-bit arguments and return values. The
eBPF return value, arguments and callee-saved registers are mapped to their
native MIPS equivalents.

Since the 32 highest bits in the eBPF FP (frame pointer) register are
always zero, only one general-purpose register is actually needed for the
mapping. The MIPS fp register is used for this purpose. The high bits are
mapped to MIPS register r0. This saves us one CPU register, which is much
needed for temporaries, while still allowing us to treat the R10 (FP)
register just like any other eBPF register in the JIT.

The MIPS gp (global pointer) and at (assembler temporary) registers are
used as internal temporary registers for constant blinding. CPU registers
t6-t9 are used internally by the JIT when constructing more complex 64-bit
operations. This is precisely what is needed - two registers to store an
immediate operand value, and two more as scratch registers to perform the
operation.

The register mapping is shown below.

    R0 - $v1, $v0   return value
    R1 - $a1, $a0   argument 1, passed in registers
    R2 - $a3, $a2   argument 2, passed in registers
    R3 - $t1, $t0   argument 3, passed on stack
    R4 - $t3, $t2   argument 4, passed on stack
    R5 - $t4, $t3   argument 5, passed on stack
    R6 - $s1, $s0   callee-saved
    R7 - $s3, $s2   callee-saved
    R8 - $s5, $s4   callee-saved
    R9 - $s7, $s6   callee-saved
    FP - $r0, $fp   32-bit frame pointer
    AX - $gp, $at   constant-blinding
         $t6 - $t9  unallocated, JIT temporaries

Jump offsets
============
The JIT tries to map all conditional JMP operations to MIPS conditional
PC-relative branches. The MIPS branch offset field is 18 bits, in bytes,
which is equivalent to the eBPF 16-bit instruction offset. However, since
the JIT may emit more than one CPU instruction per eBPF instruction, the
value may overflow the field width. If that happens, the JIT converts the
long conditional jump to a short PC-relative branch with the condition
inverted, jumping over a long unconditional absolute jmp (ja).

This conversion will change the instruction offset mapping used for jumps,
and may in turn result in more branch offset overflows. The JIT therefore
dry-runs the translation until no more branches are converted and the
offsets do not change anymore. There is an upper bound on this of course,
and if the JIT hits that limit, the last two iterations are run with all
branches being converted.

Testing
=======
The implementation has been verified with the BPF test suite on QEMU,
emulating MIPS32r2 in big and little endian configurations. It has also
been verified on a MIPS 24Kc CPU (MT7628 SoC, little endian). The MIPS
I-V variants that exist for some operations has been verified "manually"
by forcing fallback to pre-r1 instructions only. As of this writing, the
BPF test suite JITs all tests successfully.

    test_bpf: Summary: 378 PASSED, 0 FAILED, [366/366 JIT'ed]

During the development of this JIT, several new tests were added to the
test suite in order to test corner cases inherent to 32-bit JITs, tail
calls and also to actually trigger the branch conversion handling. That
is another patch set, though.

Cheers,
Johan

Johan Almbladh (2):
  mips: bpf: add eBPF JIT for 32-bit MIPS
  mips: bpf: enable 32-bit eBPF JIT

 arch/mips/Kconfig           |    5 +-
 arch/mips/net/Makefile      |    7 +-
 arch/mips/net/ebpf_jit_32.c | 2207 +++++++++++++++++++++++++++++++++++
 3 files changed, 2216 insertions(+), 3 deletions(-)
 create mode 100644 arch/mips/net/ebpf_jit_32.c