Message ID | 170568485801.1008395.12244787918793980621.stgit@djiang5-mobl3 |
---|---|
Headers | show |
Series | cxl: Add support to report region access coordinates to numa nodes | expand |
Dave Jiang wrote: > Refactor the common code of combining coordinates in order to reduce code. > Create a new function cxl_cooordinates_combine() it combine two 'struct > access_coordinate'. > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/cdat.c | 32 +++++++++++++++++++++++--------- > drivers/cxl/core/port.c | 18 ++---------------- > drivers/cxl/cxl.h | 4 ++++ > 3 files changed, 29 insertions(+), 25 deletions(-) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index cd84d87f597a..4d542627d02d 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -183,15 +183,7 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, > xa_for_each(dsmas_xa, index, dent) { > int qos_class; > > - dent->coord.read_latency = dent->coord.read_latency + > - c.read_latency; > - dent->coord.write_latency = dent->coord.write_latency + > - c.write_latency; > - dent->coord.read_bandwidth = min_t(int, c.read_bandwidth, > - dent->coord.read_bandwidth); > - dent->coord.write_bandwidth = min_t(int, c.write_bandwidth, > - dent->coord.write_bandwidth); > - > + cxl_coordinates_combine(&dent->coord, &dent->coord, &c); > dent->entries = 1; > rc = cxl_root->ops->qos_class(root_port, &dent->coord, 1, &qos_class); > if (rc != 1) > @@ -514,4 +506,26 @@ void cxl_switch_parse_cdat(struct cxl_port *port) > } > EXPORT_SYMBOL_NS_GPL(cxl_switch_parse_cdat, CXL); > > +/** > + * cxl_coordinates_combine - Combine the two input coordinates into the first > + * > + * @c1: first coordinate, to be written to > + * @c2: second coordinate > + */ > +void cxl_coordinates_combine(struct access_coordinate *out, > + struct access_coordinate *c1, > + struct access_coordinate *c2) > +{ > + if (c2->write_bandwidth) > + out->write_bandwidth = min(c1->write_bandwidth, > + c2->write_bandwidth); > + out->write_latency = c1->write_latency + c2->write_latency; > + > + if (c2->read_bandwidth) > + out->read_bandwidth = min(c1->read_bandwidth, > + c2->read_bandwidth); > + out->read_latency = c1->read_latency + c2->read_latency; > +} > +EXPORT_SYMBOL_NS_GPL(cxl_coordinates_combine, CXL); There is no need for EXPORT_SYMBOL() when the definition and the only caller exist within the same compilation unit, cxl_core.o. However, given there is nothing "CXL" about this function it likely wants to move out of cxl_core.o if another caller ever arrives.
On Fri, Jan 19, 2024 at 10:24:11AM -0700, Dave Jiang wrote: > For the numa nodes that are not created by SRAT, no memory_target is > allocated and is not managed by the HMAT_REPORTING code. Therefore > hmat_callback() memory hotplug notifier will exit early on those NUMA > nodes. The CXL memory hotplug notifier will need to call > node_set_perf_attrs() directly in order to setup the access sysfs > attributes. > > In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is > stored. Add a helper function acpi_node_backed_by_real_pxm() in order to > check if a NUMA node id is defined by SRAT or created by CFMWS or some > other methods. I'm thinking the 'or some other methods' can be dropped. In chat, we mentioned emulated nodes, but they don't make PXM assignments. Maybe I misunderstand, but I thought NUMA emulation can only be enabled when there is no physical NUMA architecture. Aside from clearing up the emulated or other nodes story...LGTM. Reviewed-by: Alison Schofield <alison.schofield@intel.com> > > node_set_perf_attrs() symbol is exported to allow update of perf attribs > for a node. The sysfs path of > /sys/devices/system/node/nodeX/access0/initiators/* is created by > ndoe_set_perf_attrs() for the various attributes where nodeX is matched > to the NUMA node of the CXL region. > > Cc: Rafael J. Wysocki <rafael@kernel.org> > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/acpi/numa/srat.c | 11 +++++++++++ > drivers/base/node.c | 1 + > drivers/cxl/core/cdat.c | 5 +++++ > drivers/cxl/core/core.h | 1 + > drivers/cxl/core/region.c | 7 ++++++- > include/linux/acpi.h | 1 + > 6 files changed, 25 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 12f330b0eac0..2f6f15b3891d 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -29,6 +29,8 @@ static int node_to_pxm_map[MAX_NUMNODES] > unsigned char acpi_srat_revision __initdata; > static int acpi_numa __initdata; > > +static int last_real_pxm; > + > void __init disable_srat(void) > { > acpi_numa = -1; > @@ -536,6 +538,7 @@ int __init acpi_numa_init(void) > if (node_to_pxm_map[i] > fake_pxm) > fake_pxm = node_to_pxm_map[i]; > } > + last_real_pxm = fake_pxm; > fake_pxm++; > acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws, > &fake_pxm); > @@ -547,6 +550,14 @@ int __init acpi_numa_init(void) > return 0; > } > > +bool acpi_node_backed_by_real_pxm(int nid) > +{ > + int pxm = node_to_pxm(nid); > + > + return pxm <= last_real_pxm; > +} > +EXPORT_SYMBOL_GPL(acpi_node_backed_by_real_pxm); > + > static int acpi_get_pxm(acpi_handle h) > { > unsigned long long pxm; > diff --git a/drivers/base/node.c b/drivers/base/node.c > index b4a449f07f2a..8d0b09769b77 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -215,6 +215,7 @@ void node_set_perf_attrs(unsigned int nid, struct access_coordinate *coord, > } > } > } > +EXPORT_SYMBOL_GPL(node_set_perf_attrs); > > /** > * struct node_cache_info - Internal tracking for memory node caches > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index 3556c897ece4..7d7163f999e8 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -626,3 +626,8 @@ int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, > { > return hmat_update_target_coordinates(nid, &cxlr->coord[access], access); > } > + > +bool cxl_need_node_perf_attrs_update(int nid) > +{ > + return !acpi_node_backed_by_real_pxm(nid); > +} > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > index e19800a7ce06..bc5a95665aa0 100644 > --- a/drivers/cxl/core/core.h > +++ b/drivers/cxl/core/core.h > @@ -92,5 +92,6 @@ long cxl_pci_get_latency(struct pci_dev *pdev); > > int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, > enum access_coordinate_class access); > +bool cxl_need_node_perf_attrs_update(int nid); > > #endif /* __CXL_CORE_H__ */ > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index ae1f34e1cd05..66f126067bda 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -3084,7 +3084,12 @@ static bool cxl_region_update_coordinates(struct cxl_region *cxlr, int nid) > > for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { > if (cxlr->coord[i].read_bandwidth) { > - rc = cxl_update_hmat_access_coordinates(nid, cxlr, i); > + rc = 0; > + if (cxl_need_node_perf_attrs_update(nid)) > + node_set_perf_attrs(nid, &cxlr->coord[i], i); > + else > + rc = cxl_update_hmat_access_coordinates(nid, cxlr, i); > + > if (rc == 0) > cset++; > } > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > index 1c664948b2ae..3067c6aad431 100644 > --- a/include/linux/acpi.h > +++ b/include/linux/acpi.h > @@ -447,6 +447,7 @@ static inline int hmat_update_target_coordinates(int nid, > #ifdef CONFIG_ACPI_NUMA > int acpi_map_pxm_to_node(int pxm); > int acpi_get_node(acpi_handle handle); > +bool acpi_node_backed_by_real_pxm(int nid); > > /** > * pxm_to_online_node - Map proximity ID to online node > >
On 1/19/24 17:35, Dan Williams wrote: > Dave Jiang wrote: >> Refactor the common code of combining coordinates in order to reduce code. >> Create a new function cxl_cooordinates_combine() it combine two 'struct >> access_coordinate'. >> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> drivers/cxl/core/cdat.c | 32 +++++++++++++++++++++++--------- >> drivers/cxl/core/port.c | 18 ++---------------- >> drivers/cxl/cxl.h | 4 ++++ >> 3 files changed, 29 insertions(+), 25 deletions(-) >> >> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c >> index cd84d87f597a..4d542627d02d 100644 >> --- a/drivers/cxl/core/cdat.c >> +++ b/drivers/cxl/core/cdat.c >> @@ -183,15 +183,7 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port, >> xa_for_each(dsmas_xa, index, dent) { >> int qos_class; >> >> - dent->coord.read_latency = dent->coord.read_latency + >> - c.read_latency; >> - dent->coord.write_latency = dent->coord.write_latency + >> - c.write_latency; >> - dent->coord.read_bandwidth = min_t(int, c.read_bandwidth, >> - dent->coord.read_bandwidth); >> - dent->coord.write_bandwidth = min_t(int, c.write_bandwidth, >> - dent->coord.write_bandwidth); >> - >> + cxl_coordinates_combine(&dent->coord, &dent->coord, &c); >> dent->entries = 1; >> rc = cxl_root->ops->qos_class(root_port, &dent->coord, 1, &qos_class); >> if (rc != 1) >> @@ -514,4 +506,26 @@ void cxl_switch_parse_cdat(struct cxl_port *port) >> } >> EXPORT_SYMBOL_NS_GPL(cxl_switch_parse_cdat, CXL); >> >> +/** >> + * cxl_coordinates_combine - Combine the two input coordinates into the first >> + * >> + * @c1: first coordinate, to be written to >> + * @c2: second coordinate >> + */ >> +void cxl_coordinates_combine(struct access_coordinate *out, >> + struct access_coordinate *c1, >> + struct access_coordinate *c2) >> +{ >> + if (c2->write_bandwidth) >> + out->write_bandwidth = min(c1->write_bandwidth, >> + c2->write_bandwidth); >> + out->write_latency = c1->write_latency + c2->write_latency; >> + >> + if (c2->read_bandwidth) >> + out->read_bandwidth = min(c1->read_bandwidth, >> + c2->read_bandwidth); >> + out->read_latency = c1->read_latency + c2->read_latency; >> +} >> +EXPORT_SYMBOL_NS_GPL(cxl_coordinates_combine, CXL); > > There is no need for EXPORT_SYMBOL() when the definition and the only > caller exist within the same compilation unit, cxl_core.o. > > However, given there is nothing "CXL" about this function it likely wants > to move out of cxl_core.o if another caller ever arrives. It's mostly used by core/cdat.c but eventually also used by core/port.c. So it's all within the core. >
On 1/30/24 19:22, Wonjae Lee wrote: > On Fri, Jan 19, 2024 at 10:23:52AM -0700, Dave Jiang wrote: >> Calculate and store the performance data for a CXL region. Find the worst >> read and write latency for all the included ranges from each of the devices >> that attributes to the region and designate that as the latency data. Sum >> all the read and write bandwidth data for each of the device region and >> that is the total bandwidth for the region. >> >> The perf list is expected to be constructed before the endpoint decoders >> are registered and thus there should be no early reading of the entries >> from the region assemble action. The calling of the region qos calculate >> function is under the protection of cxl_dpa_rwsem and will ensure that >> all DPA associated work has completed. >> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> v4: >> - Calculate access classes 0 and 1 by retrieving host bridge coords >> - Add lockdep assert for cxl_dpa_rwsem (Dan) >> - Clarify that HMAT code is HMEM_REPORTING code. (Dan) >> --- >> drivers/cxl/core/cdat.c 74 +++++++++++++++++++++++++++++++++++++++++++++ >> drivers/cxl/core/region.c 2 + >> drivers/cxl/cxl.h 4 ++ >> 3 files changed, 80 insertions(+) >> >> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c >> index 6e3998723aaa..7acb5837afad 100644 >> --- a/drivers/cxl/core/cdat.c >> +++ b/drivers/cxl/core/cdat.c >> @@ -8,6 +8,7 @@ >> #include "cxlpci.h" >> #include "cxlmem.h" >> #include "cxl.h" >> +#include "core.h" >> >> struct dsmas_entry { >> struct range dpa_range; >> @@ -546,3 +547,76 @@ void cxl_coordinates_combine(struct access_coordinate *out, >> EXPORT_SYMBOL_NS_GPL(cxl_coordinates_combine, CXL); >> >> MODULE_IMPORT_NS(CXL); >> + >> +void cxl_region_perf_data_calculate(struct cxl_region *cxlr, >> + struct cxl_endpoint_decoder *cxled) >> +{ >> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); >> + struct cxl_port *port = cxlmd->endpoint; >> + struct cxl_dev_state *cxlds = cxlmd->cxlds; >> + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); >> + struct access_coordinate hb_coord[ACCESS_COORDINATE_MAX]; >> + struct access_coordinate coord; >> + struct range dpa = { >> + .start = cxled->dpa_res->start, >> + .end = cxled->dpa_res->end, >> + }; >> + struct list_head *perf_list; >> + struct cxl_dpa_perf *perf; >> + bool found = false; >> + int rc; >> + >> + switch (cxlr->mode) { >> + case CXL_DECODER_RAM: >> + perf_list = &mds->ram_perf_list; >> + break; >> + case CXL_DECODER_PMEM: >> + perf_list = &mds->pmem_perf_list; >> + break; >> + default: >> + return; >> + } >> + >> + lockdep_assert_held(&cxl_dpa_rwsem); >> + >> + list_for_each_entry(perf, perf_list, list) { >> + if (range_contains(&perf->dpa_range, &dpa)) { >> + found = true; >> + break; >> + } >> + } >> + >> + if (!found) >> + return; >> + >> + rc = cxl_hb_get_perf_coordinates(port, hb_coord); >> + if (rc) { >> + dev_dbg(&port->dev, "Failed to retrieve hb perf coordinates.\n"); >> + return; >> + } >> + >> + for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { >> + /* Pickup the host bridge coords */ >> + cxl_coordinates_combine(&coord, &hb_coord[i], &perf->coord); >> + >> + /* Get total bandwidth and the worst latency for the cxl region */ >> + cxlr->coord[i].read_latency = max_t(unsigned int, >> + cxlr->coord[i].read_latency, >> + coord.read_latency); >> + cxlr->coord[i].write_latency = max_t(unsigned int, >> + cxlr->coord[i].write_latency, >> + coord.write_latency); >> + cxlr->coord[i].read_bandwidth += coord.read_bandwidth; >> + cxlr->coord[i].write_bandwidth += coord.write_bandwidth; >> + >> + /* >> + * Convert latency to nanosec from picosec to be consistent >> + * with the resulting latency coordinates computed by the >> + * HMAT_REPORTING code. >> + */ >> + cxlr->coord[i].read_latency = >> + DIV_ROUND_UP(cxlr->coord[i].read_latency, 1000); >> + cxlr->coord[i].write_latency = >> + DIV_ROUND_UP(cxlr->coord[i].write_latency, 1000); > > Hello, > > I ran into a bit of confusion and have a question while validating CDAT > behaviour with physical CXL devices. (I'm not sure if this is the right > thread to ask this question, sorry if it isn't.) > > IIUC, the raw data of latency is in picosec, but the comments on the > struct access_coordinate say that the latency units are in nanosec: > * @read_latency: Read latency in nanoseconds > * @write_latency: Write latency in nanoseconds > > This was a bit confusing at first, as the raw data of latency are in > ps, and the structure that stores the latency expects units of ns. Right. The numbers stored with the HMAT_REPORTING code and eventually NUMA nodes are normalized to nanoseconds, even though the raw data is in picoseconds. For CXL, I left the CDAT and computed numbers as raw numbers (picoseconds) until the final step when I calculate the latency for the entire region. And then it gets converted to nanoseconds in order to write back to the memory_target for HMAT_REPORTING. The numbers we retrieve from HMAT_REPORTING for the generic target is already in nanoseconds. > > I saw that you have already had a discussion with Brice about the > pico/nanosecond unit conversion. My question is, are there any plans to > store latency number of cxl port in nanoseconds or change the comments > of coords structure? The numbers for the coords structure will remain in nanoseconds as it always have been. > > Thanks, > Wonjae > >> + } >> +} >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c >> index 57a5901d5a60..7f19b533c5ae 100644 >> --- a/drivers/cxl/core/region.c >> +++ b/drivers/cxl/core/region.c >> @@ -1722,6 +1722,8 @@ static int cxl_region_attach(struct cxl_region *cxlr, >> return -EINVAL; >> } >> >> + cxl_region_perf_data_calculate(cxlr, cxled); >> + >> if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { >> int i; >> >> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h >> index 80e6bd294e18..f6637fa33113 100644 >> --- a/drivers/cxl/cxl.h >> +++ b/drivers/cxl/cxl.h >> @@ -519,6 +519,7 @@ struct cxl_region_params { >> * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge >> * @flags: Region state flags >> * @params: active + config params for the region >> + * @coord: QoS access coordinates for the region >> */ >> struct cxl_region { >> struct device dev; >> @@ -529,6 +530,7 @@ struct cxl_region { >> struct cxl_pmem_region *cxlr_pmem; >> unsigned long flags; >> struct cxl_region_params params; >> + struct access_coordinate coord[ACCESS_COORDINATE_MAX]; >> }; >> >> struct cxl_nvdimm_bridge { >> @@ -880,6 +882,8 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port, >> struct access_coordinate *coord); >> int cxl_hb_get_perf_coordinates(struct cxl_port *port, >> struct access_coordinate *coord); >> +void cxl_region_perf_data_calculate(struct cxl_region *cxlr, >> + struct cxl_endpoint_decoder *cxled); >> >> void cxl_coordinates_combine(struct access_coordinate *out, >> struct access_coordinate *c1, >> >> >>