Message ID | 20201013132535.3599453-1-f4bug@amsat.org |
---|---|
Headers | show |
Series | target/mips: Make the number of TLB entries a CPU property | expand |
On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote: > Yocto developers have expressed interest in running MIPS32 > CPU with custom number of TLB: > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html > > Help them by making the number of TLB entries a CPU property, > keeping our set of CPU definitions in sync with real hardware. You mean keeping the 34kf model within qemu in sync, rather than creating a nonsense model that doesn't exist. Question: is this cpu parameter useful for anything else? Because the ideal solution for a CI loop is to use one of the mips cpu models that has the hw page table walker (CP0C3_PW). Having qemu being able to refill the tlb itself is massively faster. We do not currently implement a mips cpu that has the PW. When I downloaded the document set in 2014, I only got the mips64-pra and neglected to get the mips32-pra. So I don't actually know if the PW applies to mips32. I do know that there's nothing in the kernel that ifdefs around it. So: (1) anyone know if the PW incompatible with mips32? (2) if not, was there any mips32 hw built with PW that we could model? (3) if not, would a cpu parameter to force-enable PW for any r2+ cpu be more useful that frobbing the number of tlb entries? r~
Hi Philippe, Thank you very much for looking at this. I gave a spin to your 3 patch series in original setup, and as expected with '-cpu 34Kf,tlb-entries=64' option it works great. If nobody objects, and your patches could be merged, we would greatly appreciate it. Thanks, Victor
On 10/13/20 4:11 PM, Richard Henderson wrote: > On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote: >> Yocto developers have expressed interest in running MIPS32 >> CPU with custom number of TLB: >> https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html >> >> Help them by making the number of TLB entries a CPU property, >> keeping our set of CPU definitions in sync with real hardware. > > You mean keeping the 34kf model within qemu in sync, rather than creating a > nonsense model that doesn't exist. > > Question: is this cpu parameter useful for anything else? > > Because the ideal solution for a CI loop is to use one of the mips cpu models > that has the hw page table walker (CP0C3_PW). Having qemu being able to refill > the tlb itself is massively faster. > > We do not currently implement a mips cpu that has the PW. When I downloaded Bah, "mips32 cpu". We do have the P5600 that does has it, though the code is wrapped up in TARGET_MIPS64. I'll also note that the code could be better placed [*] > (1) anyone know if the PW incompatible with mips32? I've since found a copy of the mips32-pra in the wayback machine and have answered this as "no" -- PW is documented for mips32. > (2) if not, was there any mips32 hw built with PW > that we could model? But I still don't know about this. A further question for the Yocto folks: could you make use of a 64-bit kernel in testing a 32-bit userspace? And I guess maybe we should update our recommendations in the docs. Thoughts on this, Phil? r~ [*] Where it is now, it can't be used for gdb (mips_cpu_get_phys_page_debug). When used there, we should not modify cpu state, i.e. actually insert the PTE into the MIPS TLB, but we could still make use of the information available.
Hi Richard, Please forgive my cumbersome mailing agent at work. Please look inline for 'victor>'
On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote: > Thank you very much for looking at this. I gave a spin to > your 3 patch series in original setup, and as expected with > '-cpu 34Kf,tlb-entries=64' option it works great. > > If nobody objects, and your patches could be merged, we > would greatly appreciate it. Speaking as one of the Yocto Project maintainers, this is really helpful for us, thanks! qemumips is one of our slowest platforms for automated testing so this performance improvement helps a lot. Cheers, Richard
On Tue, 2020-10-13 at 19:22 -0700, Richard Henderson wrote: > On 10/13/20 4:11 PM, Richard Henderson wrote: > > On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote: > > > Yocto developers have expressed interest in running MIPS32 > > > CPU with custom number of TLB: > > > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html > > > > > > Help them by making the number of TLB entries a CPU property, > > > keeping our set of CPU definitions in sync with real hardware. > > > > You mean keeping the 34kf model within qemu in sync, rather than > > creating a > > nonsense model that doesn't exist. > > > > Question: is this cpu parameter useful for anything else? > > > > Because the ideal solution for a CI loop is to use one of the mips > > cpu models > > that has the hw page table walker (CP0C3_PW). Having qemu being > > able to refill > > the tlb itself is massively faster. > > > > We do not currently implement a mips cpu that has the PW. When I > > downloaded > > Bah, "mips32 cpu". > > We do have the P5600 that does has it, though the code is wrapped up > in TARGET_MIPS64. I'll also note that the code could be better > placed [*] > > > (1) anyone know if the PW incompatible with mips32? > > I've since found a copy of the mips32-pra in the wayback machine and > have answered this as "no" -- PW is documented for mips32. > > > (2) if not, was there any mips32 hw built with PW > > that we could model? > > But I still don't know about this. > > A further question for the Yocto folks: could you make use of a 64- > bit kernel in testing a 32-bit userspace? We run testing of 64 bit kernel with 64 bit userspace and 32 bit kernel with 32 bit userspace, we've tested that for years. I know some of our users do use 64 bit kernels with 32 bit userspace and we do limited testing of that for multilib support. I think we did try testing an R2 setup but found little performance change and I think it may have been unreliable so we didn't make the switch. We did move to 34kf relatively recently as that did perform marginally better and matched qemu's recommendations. We've also run into a lot of problems with 32 bit mips in general if we go over 256MB memory since that seems to trigger highmem and hangs regularly for us. We're working on infrastructure to save out those hung VMs to help us debug such issues but don't have that yet. Its not regular enough and we don't have the expertise to debug it properly as yet unfortunately. There is a question of how valid a 32 bit kernel is these days, particularly given the memory issues we seem to run into there with larger images. Cheers, Richard
在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道: > Allow changing the number of TLB entries for > testing/tunning purpose. > > Example to force a 34Kf cpu with 64 TLB: > > $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ... > > This is helpful for developers of the Yocto Project [*]: > > Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit > MIPS CI loop. It was observed that in this case CI test execution > time was almost twice longer than 64bit MIPS variant that runs > under MIPS64R2-generic model. It was investigated and concluded > that the difference in number of TLBs 16 in 34Kf case vs 64 in > MIPS64R2-generic is responsible for most of CI real time execution > difference. Because with 16 TLBs linux user-land trashes TLB more > and it needs to execute more instructions in TLB refill handler > calls, as result it runs much longer. > > [*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html > > Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 > Reported-by: Victor Kamensky <kamensky@cisco.com> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- Hi Philippe, I think name can this property vtlb-entries? MIPS R2 had introduced dual-TLB feature and the entries specified here is the number of VTLB, while FTLB is another set of entries with fixed pagesize. Although MIPS TCG haven't implemented dual-TLB but it can prevent future confusion. Thanks. - Jiaxun
On 10/14/20 12:20 PM, Jiaxun Yang wrote: > 在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道: >> Allow changing the number of TLB entries for >> testing/tunning purpose. >> >> Example to force a 34Kf cpu with 64 TLB: >> >> $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ... >> >> This is helpful for developers of the Yocto Project [*]: >> >> Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit >> MIPS CI loop. It was observed that in this case CI test execution >> time was almost twice longer than 64bit MIPS variant that runs >> under MIPS64R2-generic model. It was investigated and concluded >> that the difference in number of TLBs 16 in 34Kf case vs 64 in >> MIPS64R2-generic is responsible for most of CI real time execution >> difference. Because with 16 TLBs linux user-land trashes TLB more >> and it needs to execute more instructions in TLB refill handler >> calls, as result it runs much longer. >> >> [*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html >> >> Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 >> Reported-by: Victor Kamensky <kamensky@cisco.com> >> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >> --- > Hi Philippe, > > I think name can this property vtlb-entries? > > MIPS R2 had introduced dual-TLB feature and the entries specified here > is the number of VTLB, while FTLB is another set of entries with fixed > pagesize. > > Although MIPS TCG haven't implemented dual-TLB but it can prevent future > confusion. Sure, good idea. I'll follow Richard suggestion first, having a look at the P5600. > > Thanks. > > - Jiaxun >
On 10/14/20 9:14 AM, Richard Purdie wrote: > On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote: >> Thank you very much for looking at this. I gave a spin to >> your 3 patch series in original setup, and as expected with >> '-cpu 34Kf,tlb-entries=64' option it works great. >> >> If nobody objects, and your patches could be merged, we >> would greatly appreciate it. > > Speaking as one of the Yocto Project maintainers, this is really > helpful for us, thanks! > > qemumips is one of our slowest platforms for automated testing so this > performance improvement helps a lot. Could you try Richard's suggestion? Using '-cpu P5600' instead? It is available in Linux since v5.8. > > Cheers, > > Richard > >
In order just to keep on the same thread, here is piece of information I found: I looked at "MIPS32® 34Kf™ Processor Core Datasheet" [1] Page 8 in "Joint TLB (JTLB)" section says: "The JTLB is a fully associative TLB cache containing 16, 32, or 64-dual-entries mapping up to 128 virtual pages to their corresponding physical addresses." So "34Kf-64tlb" cpu model I proposed turns out not to be "fictitious" after all. Having 64 TLBs is all within this CPU spec. It is not clear why original 34Kf model choose worst case scenario wrt TLB numbers. Commit log where 34Kf was introduced does not have much details. So IMO on 34Kf route we have the following choices: 1) I can rephrase commit message and resubmit commit for "34Kf-64tlb" cpu model, if it could be merged 2) We can bump up number of TLBs to 64 in existing 34Kf model since it is within the spec. 3) Use Phil's series and tlb-entries cpu parameter would cover all 3 variants of 16,32,64 TLBs allowed by 34Kf data sheet spec. Please see inline wrt asked '-cpu P5600' testing. Look for 'victor2>' [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00419-2B-34Kf-DTS-01.20.pdf
On Wed, Oct 14, 2020 at 1:20 PM Victor Kamensky (kamensky) <kamensky@cisco.com> wrote: > > In order just to keep on the same thread, here is piece of information > I found: > > I looked at "MIPS32® 34Kf™ Processor Core Datasheet" [1] > > Page 8 in "Joint TLB (JTLB)" section says: > > "The JTLB is a fully associative TLB cache containing 16, 32, > or 64-dual-entries mapping up to 128 virtual pages to their > corresponding physical addresses." > > So "34Kf-64tlb" cpu model I proposed turns out not to be "fictitious" > after all. Having 64 TLBs is all within this CPU spec. It is not clear > why original 34Kf model choose worst case scenario wrt > TLB numbers. Commit log where 34Kf was introduced does not > have much details. thanks for digging this information from CPU specs. It seems using 64 TLB as default might be a good option for 34Kf then > > So IMO on 34Kf route we have the following choices: > > 1) I can rephrase commit message and resubmit commit for > "34Kf-64tlb" cpu model, if it could be merged > > 2) We can bump up number of TLBs to 64 in existing 34Kf model > since it is within the spec. this looks a good option since it is with in specs and is backward compatible. > > 3) Use Phil's series and tlb-entries cpu parameter would cover all I agree. > 3 variants of 16,32,64 TLBs allowed by 34Kf data sheet spec. > > Please see inline wrt asked '-cpu P5600' testing. Look for 'victor2>' > > [1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00419-2B-34Kf-DTS-01.20.pdf > > ________________________________________ > From: Philippe Mathieu-Daudé <philippe.mathieu.daude@gmail.com> on behalf of Philippe Mathieu-Daudé <f4bug@amsat.org> > Sent: Wednesday, October 14, 2020 7:53 AM > To: Richard Purdie; Victor Kamensky (kamensky); qemu-devel@nongnu.org > Cc: Aleksandar Rikalo; Khem Raj; Aleksandar Markovic; Aurelien Jarno; Richard Henderson > Subject: Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property > > On 10/14/20 9:14 AM, Richard Purdie wrote: > > On Wed, 2020-10-14 at 01:36 +0000, Victor Kamensky (kamensky) wrote: > >> Thank you very much for looking at this. I gave a spin to > >> your 3 patch series in original setup, and as expected with > >> '-cpu 34Kf,tlb-entries=64' option it works great. > >> > >> If nobody objects, and your patches could be merged, we > >> would greatly appreciate it. > > > > Speaking as one of the Yocto Project maintainers, this is really > > helpful for us, thanks! > > > > qemumips is one of our slowest platforms for automated testing so this > > performance improvement helps a lot. > > Could you try Richard's suggestion? Using '-cpu P5600' instead? > It is available in Linux since v5.8. > > victor2> I've tried exact image that works on 34Kf and 34Kf-64tlb models > victor2> image with '-cpu P5600'. it does not boot: it dies in init (systemd). > victor2> I can look under gdb with qemu -s -S options, what is going on there > victor2> but it will take time. > victor2> If someone have some clues what might cause it please let > victor2> me know. Here is high level information about setup: > victor2> - qemu version is 5.1.0 > victor2> - kernel base version is 5.8.9 > victor2> - systemd version is 1_246.6 > victor2> - user land CPU related build options "-meb -meb -mabi=32 -mhard-float -march=mips32r2 -mllsc -mips32r2" > > Thanks, > Victor > > > > > Cheers, > > > > Richard > > > >
Hi Guys, I looked at issue with P5600 machine under gdb of kernel. arch_check_elf from arch/mips/kernel/elf.c rejects our sysroot binaries with -ENOEXEC code, since our binaries do not have EF_MIPS_NAN2008 ELF header flag set and this CPU model does not have cpu_has_nan_legacy, i.e mips_use_nan_legacy=false. So at least we would need to have to change our user-land ABI compilation flags to cleanly match EF_MIPS_NAN2008 requirements. I am not sure whether it is an option, and how it would impact older CPUs. For experiment sake I added ieee754=relaxed kernel option to override mips_use_nan_legacy setting and system gets some sings of life after that but then it hangs further down the road. I briefly tried to look at this, but it is not clear what is going on. On first look it seems that system is trashing on nested do_page_fault calls. It might be that something missing in our kernel config, or we hitting some kernel bug, or bug in P5600 qemu model. It is hard to tell right now. Is it fair to say that we put enough effort exploring P5600 route and it seems does not work for us without additional substantial work. Is possible to come back to 34Kf route, doing very small localized very well defined change of bumping TLBs number for model that we know works well for us? Since we figured out that 34Kf spec allows 16, 32, or 64 TLBs my first personal preference would be to use Phil's patch series with addressing review comments. And additionally it would be great to set number of 34KF TLB to 64 by default. If anyone out there (IMO unlikely) depends that before model had only 16 TLBs, he/she can use cpu parameters to put it back to 16. My second alternative choice is to accept 34Kf-64tlb model, after I rephrase commit message. Thanks, Victor
On 10/15/20 11:56 AM, Victor Kamensky (kamensky) via wrote: > Is possible to come back to 34Kf route, doing > very small localized very well defined change > of bumping TLBs number for model that we know > works well for us? Yes, thanks for testing. I think we should also add a property to enable Config3.PM for any cpu, and see how that gets on. But simply adjusting the number of tlb entries is a good start, and is the only thing that will work for older kernels. r~