Message ID | 20240510052408.2173579-1-thiago.bauermann@linaro.org |
---|---|
Headers | show |
Series | Add support for AArch64 MOPS instructions | expand |
On 2024-05-10 06:24, Thiago Jung Bauermann wrote: > Hello, > > This version is to adapt to Luis' clarification that MOPS instructions > don't need to be treated as atomic sequences and can be single-stepped. > If the OS reschedules the inferior to a different CPU while a main or > epilogue instruction is executed, it will reset the sequence back to the > prologue instruction. Curious -- if you single step each of the instructions, then there will be kernel code executed on the CPU in between each of the instructions in the sequence, and other userspace code (of other tasks too, like the debugger itself, potentially). So the kernel is free to context switch in between the instructions in the sequence, and _only_ restarts the sequence when the task is moved to another CPU? Weird that it can context switch without losing state on the same CPU but not to a different CPU. But then again, I have no idea what the instructions themselves do. :-)
Pedro Alves <pedro@palves.net> writes: > On 2024-05-10 06:24, Thiago Jung Bauermann wrote: >> Hello, >> >> This version is to adapt to Luis' clarification that MOPS instructions >> don't need to be treated as atomic sequences and can be single-stepped. >> If the OS reschedules the inferior to a different CPU while a main or >> epilogue instruction is executed, it will reset the sequence back to the >> prologue instruction. > > Curious -- if you single step each of the instructions, then there will > be kernel code executed on the CPU in between each of the instructions > in the sequence, and other userspace code (of other tasks too, like the > debugger itself, potentially). So the kernel is free to context switch > in between the instructions in the sequence, and _only_ restarts the sequence > when the task is moved to another CPU? Weird that it can context switch > without losing state on the same CPU but not to a different CPU. The kernel commits implementing this behaviour actually have a good explanation on this: $ git log --reverse -n2 8cd076a67dc8 commit 8536ceaa747174ded7983f13906b225e0c33ac51 Author: Kristina Martsenko <kristina.martsenko@arm.com> AuthorDate: Tue May 9 15:22:31 2023 +0100 Commit: Catalin Marinas <catalin.marinas@arm.com> CommitDate: Mon Jun 5 17:05:41 2023 +0100 arm64: mops: handle MOPS exceptions The memory copy/set instructions added as part of FEAT_MOPS can take an exception (e.g. page fault) part-way through their execution and resume execution afterwards. If however the task is re-scheduled and execution resumes on a different CPU, then the CPU may take a new type of exception to indicate this. This is because the architecture allows two options (Option A and Option B) to implement the instructions and a heterogeneous system can have different implementations between CPUs. In this case the OS has to reset the registers and restart execution from the prologue instruction. The algorithm for doing this is provided as part of the Arm ARM. Add an exception handler for the new exception and wire it up for userspace tasks. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-8-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> commit 8cd076a67dc8eac5d613b3258f656efa7a54412e Author: Kristina Martsenko <kristina.martsenko@arm.com> AuthorDate: Tue May 9 15:22:32 2023 +0100 Commit: Catalin Marinas <catalin.marinas@arm.com> CommitDate: Mon Jun 5 17:05:41 2023 +0100 arm64: mops: handle single stepping after MOPS exception When a MOPS main or epilogue instruction is being executed, the task may get scheduled on a different CPU and restart execution from the prologue instruction. If the main or epilogue instruction is being single stepped then it makes sense to finish the step and take the step exception before starting to execute the next (prologue) instruction. So fast-forward the single step state machine when taking a MOPS exception. This means that if a main or epilogue instruction is single stepped with ptrace, the debugger will sometimes observe the PC moving back to the prologue instruction. (As already mentioned, this should be rare as it only happens when the task is scheduled to another CPU during the step.) This also ensures that perf breakpoints count prologue instructions consistently (i.e. every time they are executed), rather than skipping them when there also happens to be a breakpoint on a main or epilogue instruction. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-9-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> > But then again, I have no idea what the instructions themselves do. :-) From the Arm ARM: "CPYP performs some preconditioning of the arguments suitable for using the CPYM instruction, and performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYM performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYE performs the last part of the memory copy." Ditto for other kinds of prologue, main and epilogue instructions. I would say that the prologue instruction is copies some poorly aligned bytes at the beginning of the memory region, then the main instruction copies the memory in chunks that are convenient for the processor implementation, then the epilogue instruction copies the remaining poorly aligned bytes at the end.