Message ID | 20170805001159.12769-1-jeremy.linton@arm.com |
---|---|
Headers | show |
Series | Parse ACPI/PPTT for cache information | expand |
+Cc Xiongfeng (who is also working on the PPTT but focusing on CPU topology) Hi Jeremy, On 2017/8/5 8:11, Jeremy Linton wrote: > ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is > used to describe the processor and cache topologies. Ideally it is > used to extend/override information provided by the hardware, but > right now ARM64 is entirely dependent on firmware provided tables. > > This patch parses the table for the cache topology only. Its quite > trivial to add processor/cluster/???/socket level parsing as well, > but that information isn't as useful as the already provided NUMA > SRAT/SLIT information which provides relative distances. The one > useful thing, is the number of physical sockets but due to the > way arm64 considers "clusters" to be sockets, a larger discussion > is required here. I think we need the socket to represent the true topology of the SoC, which means that considering clusters to be sockets is wrong on ARM64 server platforms, a "socket" needs to be a memory controller attached I think. Take D05 for example, there are two physical SoC sockets on the board but with two CPU DIE (with memory controller) on each physical socket, and 4 clusters on each CPU DIE. When considering clusters as sockets (that's the code for now), there are 16 "sockets" to represent to OS for schedule input, but only 4 NUMA nodes, which are confusing the scheduler a lot... Xiongfeng was working on the CPU topology based on PPTT, and the code is under internal review, if it's OK for you, we can send them out for review comments to see if we can join our effort together, or we can work on top of your patches, as you like :) > > An example of lstopo with this patch: > > [root@mammon-juno-rh ~]# lstopo-no-graphics > Machine (8072MB) > Package L#0 + L2 L#0 (1024KB) > L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) > L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) > L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) > Package L#1 + L2 L#1 (2048KB) > L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4) > L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5) > HostBridge L#0 > PCIBridge > PCIBridge > PCIBridge > PCI 1095:3132 > Block(Disk) L#0 "sda" > PCIBridge > PCI 11ab:4380 > Net L#1 "enp8s0" > > Jeremy Linton (4): > drivers: base: cacheinfo: Add support for ACPI based firmware tables > arm64: cacheinfo: Add support for ACPI/PPTT generated topology > ACPI/PPTT: Add Processor Properties Topology Table parsing > ACPI: Enable PPTT support on ARM64 > > arch/arm64/kernel/cacheinfo.c | 23 ++- > drivers/acpi/arm64/Kconfig | 3 + > drivers/acpi/arm64/Makefile | 1 + > drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++ I think PPTT is not ARM64 only, can be used for x86 too, shall we locate them on drivers/acpi? Rafael was working a lot on the PPTT proposal for the spec, I think he can comment on this :) Rafael, what do you think? Thanks Hanjun -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/7/2017 4:20 AM, Hanjun Guo wrote: > +Cc Xiongfeng (who is also working on the PPTT but focusing on > CPU topology) > > Hi Jeremy, > > On 2017/8/5 8:11, Jeremy Linton wrote: >> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is >> used to describe the processor and cache topologies. Ideally it is >> used to extend/override information provided by the hardware, but >> right now ARM64 is entirely dependent on firmware provided tables. >> >> This patch parses the table for the cache topology only. Its quite >> trivial to add processor/cluster/???/socket level parsing as well, >> but that information isn't as useful as the already provided NUMA >> SRAT/SLIT information which provides relative distances. The one >> useful thing, is the number of physical sockets but due to the >> way arm64 considers "clusters" to be sockets, a larger discussion >> is required here. > > I think we need the socket to represent the true topology of > the SoC, which means that considering clusters to be sockets is > wrong on ARM64 server platforms, a "socket" needs to be a memory > controller attached I think. > > Take D05 for example, there are two physical SoC sockets on > the board but with two CPU DIE (with memory controller) on each > physical socket, and 4 clusters on each CPU DIE. > > When considering clusters as sockets (that's the code for now), > there are 16 "sockets" to represent to OS for schedule input, > but only 4 NUMA nodes, which are confusing the scheduler a lot... > > Xiongfeng was working on the CPU topology based on PPTT, and the code > is under internal review, if it's OK for you, we can send them out > for review comments to see if we can join our effort together, or > we can work on top of your patches, as you like :) > >> >> An example of lstopo with this patch: >> >> [root@mammon-juno-rh ~]# lstopo-no-graphics >> Machine (8072MB) >> Package L#0 + L2 L#0 (1024KB) >> L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) >> L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) >> L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) >> L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) >> Package L#1 + L2 L#1 (2048KB) >> L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4) >> L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5) >> HostBridge L#0 >> PCIBridge >> PCIBridge >> PCIBridge >> PCI 1095:3132 >> Block(Disk) L#0 "sda" >> PCIBridge >> PCI 11ab:4380 >> Net L#1 "enp8s0" >> >> Jeremy Linton (4): >> drivers: base: cacheinfo: Add support for ACPI based firmware tables >> arm64: cacheinfo: Add support for ACPI/PPTT generated topology >> ACPI/PPTT: Add Processor Properties Topology Table parsing >> ACPI: Enable PPTT support on ARM64 >> >> arch/arm64/kernel/cacheinfo.c | 23 ++- >> drivers/acpi/arm64/Kconfig | 3 + >> drivers/acpi/arm64/Makefile | 1 + >> drivers/acpi/arm64/pptt.c | 389 >> ++++++++++++++++++++++++++++++++++++++++++ > > I think PPTT is not ARM64 only, can be used for x86 too, > shall we locate them on drivers/acpi? Austin and I have been working on the CPU topology. Sounds like we have some overlap with your work. We have a working prototype at https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but are still doing some cleanup and fixes for our needs. Drivers/acpi makes sense to us, and was our working assumption. > > Rafael was working a lot on the PPTT proposal for the > spec, I think he can comment on this :) > > Rafael, what do you think? > > Thanks > Hanjun > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On 08/07/2017 05:20 AM, Hanjun Guo wrote: > +Cc Xiongfeng (who is also working on the PPTT but focusing on > CPU topology) > > Hi Jeremy, > > On 2017/8/5 8:11, Jeremy Linton wrote: >> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is >> used to describe the processor and cache topologies. Ideally it is >> used to extend/override information provided by the hardware, but >> right now ARM64 is entirely dependent on firmware provided tables. >> >> This patch parses the table for the cache topology only. Its quite >> trivial to add processor/cluster/???/socket level parsing as well, >> but that information isn't as useful as the already provided NUMA >> SRAT/SLIT information which provides relative distances. The one >> useful thing, is the number of physical sockets but due to the >> way arm64 considers "clusters" to be sockets, a larger discussion >> is required here. > > I think we need the socket to represent the true topology of > the SoC, which means that considering clusters to be sockets is > wrong on ARM64 server platforms, a "socket" needs to be a memory > controller attached I think. If I understand correctly, your suggesting that the socket isn't really the physical socket, but a grouping at the memory controller level? My general take was that thread/core/socket is now insufficient as even x86 designs now have more levels of hierarchy than that. Mapping those layers, and the numa weighting into something meaningful for linux, well that was more than I wanted to start with. Particularly since the PPTT spec is silent about memory controller attachments at particular node levels, as well as a few other possible short comings. > > Take D05 for example, there are two physical SoC sockets on > the board but with two CPU DIE (with memory controller) on each > physical socket, and 4 clusters on each CPU DIE. > > When considering clusters as sockets (that's the code for now), > there are 16 "sockets" to represent to OS for schedule input, > but only 4 NUMA nodes, which are confusing the scheduler a lot... > > Xiongfeng was working on the CPU topology based on PPTT, and the code > is under internal review, if it's OK for you, we can send them out > for review comments to see if we can join our effort together, or > we can work on top of your patches, as you like :) > >> >> An example of lstopo with this patch: >> >> [root@mammon-juno-rh ~]# lstopo-no-graphics >> Machine (8072MB) >> Package L#0 + L2 L#0 (1024KB) >> L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) >> L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) >> L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) >> L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) >> Package L#1 + L2 L#1 (2048KB) >> L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4) >> L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5) >> HostBridge L#0 >> PCIBridge >> PCIBridge >> PCIBridge >> PCI 1095:3132 >> Block(Disk) L#0 "sda" >> PCIBridge >> PCI 11ab:4380 >> Net L#1 "enp8s0" >> >> Jeremy Linton (4): >> drivers: base: cacheinfo: Add support for ACPI based firmware tables >> arm64: cacheinfo: Add support for ACPI/PPTT generated topology >> ACPI/PPTT: Add Processor Properties Topology Table parsing >> ACPI: Enable PPTT support on ARM64 >> >> arch/arm64/kernel/cacheinfo.c | 23 ++- >> drivers/acpi/arm64/Kconfig | 3 + >> drivers/acpi/arm64/Makefile | 1 + >> drivers/acpi/arm64/pptt.c | 389 >> ++++++++++++++++++++++++++++++++++++++++++ > > I think PPTT is not ARM64 only, can be used for x86 too, > shall we locate them on drivers/acpi? Sure.. But, I was using the assumption that the table would only really be useful on arm64. On x86 the table is unnecessary and generally would have to be dynamically generated by the firmware (as it should be on arm) anyway. Put another way, does anyone want to use it on another platform? > > Rafael was working a lot on the PPTT proposal for the > spec, I think he can comment on this :) > > Rafael, what do you think? > > Thanks > Hanjun -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
+Cc John Gary (who is working on the PPTT with me) Hi Jeffrey, On 2017/8/8 1:10, Jeffrey Hugo wrote: > On 8/7/2017 4:20 AM, Hanjun Guo wrote: >> +Cc Xiongfeng (who is also working on the PPTT but focusing on >> CPU topology) >> >> Hi Jeremy, >> >> On 2017/8/5 8:11, Jeremy Linton wrote: >>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is >>> used to describe the processor and cache topologies. Ideally it is >>> used to extend/override information provided by the hardware, but >>> right now ARM64 is entirely dependent on firmware provided tables. >>> >>> This patch parses the table for the cache topology only. Its quite >>> trivial to add processor/cluster/???/socket level parsing as well, >>> but that information isn't as useful as the already provided NUMA >>> SRAT/SLIT information which provides relative distances. The one >>> useful thing, is the number of physical sockets but due to the >>> way arm64 considers "clusters" to be sockets, a larger discussion >>> is required here. >> >> I think we need the socket to represent the true topology of >> the SoC, which means that considering clusters to be sockets is >> wrong on ARM64 server platforms, a "socket" needs to be a memory >> controller attached I think. >> >> Take D05 for example, there are two physical SoC sockets on >> the board but with two CPU DIE (with memory controller) on each >> physical socket, and 4 clusters on each CPU DIE. >> >> When considering clusters as sockets (that's the code for now), >> there are 16 "sockets" to represent to OS for schedule input, >> but only 4 NUMA nodes, which are confusing the scheduler a lot... >> >> Xiongfeng was working on the CPU topology based on PPTT, and the code >> is under internal review, if it's OK for you, we can send them out >> for review comments to see if we can join our effort together, or >> we can work on top of your patches, as you like :) >> >>> >>> An example of lstopo with this patch: >>> >>> [root@mammon-juno-rh ~]# lstopo-no-graphics >>> Machine (8072MB) >>> Package L#0 + L2 L#0 (1024KB) >>> L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) >>> L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) >>> L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) >>> L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) >>> Package L#1 + L2 L#1 (2048KB) >>> L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4) >>> L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5) >>> HostBridge L#0 >>> PCIBridge >>> PCIBridge >>> PCIBridge >>> PCI 1095:3132 >>> Block(Disk) L#0 "sda" >>> PCIBridge >>> PCI 11ab:4380 >>> Net L#1 "enp8s0" >>> >>> Jeremy Linton (4): >>> drivers: base: cacheinfo: Add support for ACPI based firmware tables >>> arm64: cacheinfo: Add support for ACPI/PPTT generated topology >>> ACPI/PPTT: Add Processor Properties Topology Table parsing >>> ACPI: Enable PPTT support on ARM64 >>> >>> arch/arm64/kernel/cacheinfo.c | 23 ++- >>> drivers/acpi/arm64/Kconfig | 3 + >>> drivers/acpi/arm64/Makefile | 1 + >>> drivers/acpi/arm64/pptt.c | 389 ++++++++++++++++++++++++++++++++++++++++++ >> >> I think PPTT is not ARM64 only, can be used for x86 too, >> shall we locate them on drivers/acpi? > > Austin and I have been working on the CPU topology. Sounds like we have some overlap with your work. We have a working prototype at https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but are still doing some cleanup and fixes for our needs. > > Drivers/acpi makes sense to us, and was our working assumption. > John and I have also been working on the CPU topology. Our code can be got at https://github.com/fenghusthu/acpi_pptt . I have tested it on the D05 board and it is OK. Maybe we can work together, combine our codes, and make a better code structure. >> >> Rafael was working a lot on the PPTT proposal for the >> spec, I think he can comment on this :) >> >> Rafael, what do you think? >> >> Thanks >> Hanjun >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Hanjun On 2017/8/9 22:08, Hanjun Guo wrote: > Hi Jeremy, > > On 8 August 2017 at 01:33, Jeremy Linton <jeremy.linton@arm.com <mailto:jeremy.linton@arm.com>> wrote: >> >> Hi, >> >> On 08/07/2017 05:20 AM, Hanjun Guo wrote: >>> >>> +Cc Xiongfeng (who is also working on the PPTT but focusing on >>> CPU topology) >>> >>> Hi Jeremy, >>> >>> On 2017/8/5 8:11, Jeremy Linton wrote: >>>> >>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is >>>> used to describe the processor and cache topologies. Ideally it is >>>> used to extend/override information provided by the hardware, but >>>> right now ARM64 is entirely dependent on firmware provided tables. >>>> >>>> This patch parses the table for the cache topology only. Its quite >>>> trivial to add processor/cluster/???/socket level parsing as well, >>>> but that information isn't as useful as the already provided NUMA >>>> SRAT/SLIT information which provides relative distances. The one >>>> useful thing, is the number of physical sockets but due to the >>>> way arm64 considers "clusters" to be sockets, a larger discussion >>>> is required here. >>> >>> >>> I think we need the socket to represent the true topology of >>> the SoC, which means that considering clusters to be sockets is >>> wrong on ARM64 server platforms, a "socket" needs to be a memory >>> controller attached I think. >> I agree with you that clusters should not be considered as sockets. Cores in a cluster may share a L2 cache and should stay in a same local sched_domain for better performance. This is done in function cpu_coregroup_mask(), which use cpu_topology[cpu].core_sibling to build a sched_domain. The core_sibling information is also used in sysfs and considered as cores in a socket by "lscpu". I think we may need to add members 'physical_package_id' and 'cluster_sibling' in struct cpu_topology. So that cores in a cluster represented by 'core_sibling' are composed in a same local sched_domain, and cores in a package represented by 'cluster_sibling' are considered as a socket by 'lscpu'. Thanks, Xiongfeng Wang -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html