Message ID | 20241114160131.48616-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | accel/tcg: Convert victim tlb to IntervalTree | expand |
On 11/14/24 08:00, Richard Henderson wrote: > v1: 20241009150855.804605-1-richard.henderson@linaro.org > > The initial idea was: how much can we do with an intelligent data > structure for the same cost as a linear search through an array? > > > r~ > > > Richard Henderson (54): > util/interval-tree: Introduce interval_tree_free_nodes > accel/tcg: Split out tlbfast_flush_locked > accel/tcg: Split out tlbfast_{index,entry} > accel/tcg: Split out tlbfast_flush_range_locked > accel/tcg: Fix flags usage in mmu_lookup1, atomic_mmu_lookup > accel/tcg: Assert non-zero length in tlb_flush_range_by_mmuidx* > accel/tcg: Assert bits in range in tlb_flush_range_by_mmuidx* > accel/tcg: Flush entire tlb when a masked range wraps > accel/tcg: Add IntervalTreeRoot to CPUTLBDesc > accel/tcg: Populate IntervalTree in tlb_set_page_full > accel/tcg: Remove IntervalTree entry in tlb_flush_page_locked > accel/tcg: Remove IntervalTree entries in tlb_flush_range_locked > accel/tcg: Process IntervalTree entries in tlb_reset_dirty > accel/tcg: Process IntervalTree entries in tlb_set_dirty > accel/tcg: Use tlb_hit_page in victim_tlb_hit > accel/tcg: Pass full addr to victim_tlb_hit > accel/tcg: Replace victim_tlb_hit with tlbtree_hit > accel/tcg: Remove the victim tlb > accel/tcg: Remove tlb_n_used_entries_inc > include/exec/tlb-common: Move CPUTLBEntryFull from hw/core/cpu.h > accel/tcg: Delay plugin adjustment in probe_access_internal > accel/tcg: Call cpu_ld*_code_mmu from cpu_ld*_code > accel/tcg: Check original prot bits for read in atomic_mmu_lookup > accel/tcg: Preserve tlb flags in tlb_set_compare > accel/tcg: Return CPUTLBEntryFull not pointer in probe_access_full_mmu > accel/tcg: Return CPUTLBEntryFull not pointer in probe_access_full > accel/tcg: Return CPUTLBEntryFull not pointer in probe_access_internal > accel/tcg: Introduce tlb_lookup > accel/tcg: Partially unify MMULookupPageData and TLBLookupOutput > accel/tcg: Merge mmu_lookup1 into mmu_lookup > accel/tcg: Always use IntervalTree for code lookups > accel/tcg: Link CPUTLBEntry to CPUTLBEntryTree > accel/tcg: Remove CPUTLBDesc.fulltlb > target/alpha: Convert to TCGCPUOps.tlb_fill_align > target/avr: Convert to TCGCPUOps.tlb_fill_align > target/i386: Convert to TCGCPUOps.tlb_fill_align > target/loongarch: Convert to TCGCPUOps.tlb_fill_align > target/m68k: Convert to TCGCPUOps.tlb_fill_align > target/m68k: Do not call tlb_set_page in helper_ptest > target/microblaze: Convert to TCGCPUOps.tlb_fill_align > target/mips: Convert to TCGCPUOps.tlb_fill_align > target/openrisc: Convert to TCGCPUOps.tlb_fill_align > target/ppc: Convert to TCGCPUOps.tlb_fill_align > target/riscv: Convert to TCGCPUOps.tlb_fill_align > target/rx: Convert to TCGCPUOps.tlb_fill_align > target/s390x: Convert to TCGCPUOps.tlb_fill_align > target/sh4: Convert to TCGCPUOps.tlb_fill_align > target/sparc: Convert to TCGCPUOps.tlb_fill_align > target/tricore: Convert to TCGCPUOps.tlb_fill_align > target/xtensa: Convert to TCGCPUOps.tlb_fill_align > accel/tcg: Drop TCGCPUOps.tlb_fill > accel/tcg: Unexport tlb_set_page* > accel/tcg: Merge tlb_fill_align into callers > accel/tcg: Return CPUTLBEntryTree from tlb_set_page_full > > include/exec/cpu-all.h | 3 + > include/exec/exec-all.h | 65 +- > include/exec/tlb-common.h | 68 +- > include/hw/core/cpu.h | 75 +- > include/hw/core/tcg-cpu-ops.h | 10 - > include/qemu/interval-tree.h | 11 + > target/alpha/cpu.h | 6 +- > target/avr/cpu.h | 7 +- > target/i386/tcg/helper-tcg.h | 6 +- > target/loongarch/internals.h | 7 +- > target/m68k/cpu.h | 7 +- > target/microblaze/cpu.h | 7 +- > target/mips/tcg/tcg-internal.h | 6 +- > target/openrisc/cpu.h | 8 +- > target/ppc/internal.h | 7 +- > target/riscv/cpu.h | 8 +- > target/s390x/s390x-internal.h | 7 +- > target/sh4/cpu.h | 8 +- > target/sparc/cpu.h | 8 +- > target/tricore/cpu.h | 7 +- > target/xtensa/cpu.h | 8 +- > accel/tcg/cputlb.c | 994 +++++++++++++-------------- > target/alpha/cpu.c | 2 +- > target/alpha/helper.c | 23 +- > target/arm/ptw.c | 10 +- > target/arm/tcg/helper-a64.c | 4 +- > target/arm/tcg/mte_helper.c | 15 +- > target/arm/tcg/sve_helper.c | 6 +- > target/avr/cpu.c | 2 +- > target/avr/helper.c | 19 +- > target/i386/tcg/sysemu/excp_helper.c | 36 +- > target/i386/tcg/tcg-cpu.c | 2 +- > target/loongarch/cpu.c | 2 +- > target/loongarch/tcg/tlb_helper.c | 17 +- > target/m68k/cpu.c | 2 +- > target/m68k/helper.c | 32 +- > target/microblaze/cpu.c | 2 +- > target/microblaze/helper.c | 33 +- > target/mips/cpu.c | 2 +- > target/mips/tcg/sysemu/tlb_helper.c | 29 +- > target/openrisc/cpu.c | 2 +- > target/openrisc/mmu.c | 39 +- > target/ppc/cpu_init.c | 2 +- > target/ppc/mmu_helper.c | 21 +- > target/riscv/cpu_helper.c | 22 +- > target/riscv/tcg/tcg-cpu.c | 2 +- > target/rx/cpu.c | 19 +- > target/s390x/cpu.c | 4 +- > target/s390x/tcg/excp_helper.c | 23 +- > target/sh4/cpu.c | 2 +- > target/sh4/helper.c | 24 +- > target/sparc/cpu.c | 2 +- > target/sparc/mmu_helper.c | 44 +- > target/tricore/cpu.c | 2 +- > target/tricore/helper.c | 19 +- > target/xtensa/cpu.c | 2 +- > target/xtensa/helper.c | 28 +- > util/interval-tree.c | 20 + > util/selfmap.c | 13 +- > 59 files changed, 938 insertions(+), 923 deletions(-) > I tested this change by booting a debian x86_64 image, it works as expected. I noticed that this change does not come for free (64s before, 82s after - 1.3x). Is that acceptable? Pierrick
On 11/14/24 11:56, Pierrick Bouvier wrote: > I tested this change by booting a debian x86_64 image, it works as expected. > > I noticed that this change does not come for free (64s before, 82s after - 1.3x). Is that > acceptable? Well, no. But I didn't notice any change during boot tests. I used hyperfine over 'make check-functional'. I would only expect benefits to be seen during longer lived vm's, since a boot test doesn't run applications long enough to see tlb entries accumulate. I have not attempted to create a reproducible test for that so far. r~
On 11/14/24 12:58, Richard Henderson wrote: > On 11/14/24 11:56, Pierrick Bouvier wrote: >> I tested this change by booting a debian x86_64 image, it works as expected. >> >> I noticed that this change does not come for free (64s before, 82s after - 1.3x). Is that >> acceptable? > Well, no. But I didn't notice any change during boot tests. I used hyperfine over 'make > check-functional'. > > I would only expect benefits to be seen during longer lived vm's, since a boot test > doesn't run applications long enough to see tlb entries accumulate. I have not attempted > to create a reproducible test for that so far. > > I didn't use check-functional neither. I used a vanilla debian bookworm install, with a modified /etc/rc.local calling poweroff, and ran 3 times with/without change with turbo disabled on my cpu. > r~
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > On 11/14/24 12:58, Richard Henderson wrote: >> On 11/14/24 11:56, Pierrick Bouvier wrote: >>> I tested this change by booting a debian x86_64 image, it works as expected. >>> >>> I noticed that this change does not come for free (64s before, 82s after - 1.3x). Is that >>> acceptable? >> Well, no. But I didn't notice any change during boot tests. I used hyperfine over 'make >> check-functional'. >> I would only expect benefits to be seen during longer lived vm's, >> since a boot test >> doesn't run applications long enough to see tlb entries accumulate. I have not attempted >> to create a reproducible test for that so far. >> > > I didn't use check-functional neither. > I used a vanilla debian bookworm install, with a modified > /etc/rc.local calling poweroff, and ran 3 times with/without change > with turbo disabled on my cpu. If you want to really stress the VM handling you should use stress-ng to exercise page faulting and recovery. Wrap it up in a systemd unit for a reproducible test: cat /etc/systemd/system/benchmark-stress-ng.service # A benchmark target # # This shutsdown once the boot has completed [Unit] Description=Default Requires=basic.target After=basic.target AllowIsolate=yes [Service] Type=oneshot ExecStart=stress-ng --perf --iomix 4 --vm 2 --timeout 10s ExecStartPost=/sbin/poweroff [Install] WantedBy=multi-user.target and then call with something like: -append "root=/dev/sda2 console=ttyAMA0 systemd.unit=benchmark-stress-ng.service" > >> r~
On 11/15/24 03:43, Alex Bennée wrote: > Pierrick Bouvier <pierrick.bouvier@linaro.org> writes: > >> On 11/14/24 12:58, Richard Henderson wrote: >>> On 11/14/24 11:56, Pierrick Bouvier wrote: >>>> I tested this change by booting a debian x86_64 image, it works as expected. >>>> >>>> I noticed that this change does not come for free (64s before, 82s after - 1.3x). Is that >>>> acceptable? >>> Well, no. But I didn't notice any change during boot tests. I used hyperfine over 'make >>> check-functional'. >>> I would only expect benefits to be seen during longer lived vm's, >>> since a boot test >>> doesn't run applications long enough to see tlb entries accumulate. I have not attempted >>> to create a reproducible test for that so far. >>> >> >> I didn't use check-functional neither. >> I used a vanilla debian bookworm install, with a modified >> /etc/rc.local calling poweroff, and ran 3 times with/without change >> with turbo disabled on my cpu. > > If you want to really stress the VM handling you should use stress-ng to > exercise page faulting and recovery. Wrap it up in a systemd unit for a > reproducible test: > > cat /etc/systemd/system/benchmark-stress-ng.service > # A benchmark target > # > # This shutsdown once the boot has completed > > [Unit] > Description=Default > Requires=basic.target > After=basic.target > AllowIsolate=yes > > [Service] > Type=oneshot > ExecStart=stress-ng --perf --iomix 4 --vm 2 --timeout 10s > ExecStartPost=/sbin/poweroff > > [Install] > WantedBy=multi-user.target > > and then call with something like: > > -append "root=/dev/sda2 console=ttyAMA0 systemd.unit=benchmark-stress-ng.service" > Thanks for the advice. >> >>> r~ >