Message ID | 20200623195229.26411-1-saeedm@mellanox.com |
---|---|
Headers | show |
Series | mlx5 updates 2020-06-23 | expand |
On 6/24/2020 11:30 PM, Jakub Kicinski wrote: > On Wed, 24 Jun 2020 20:15:14 +0000 Saeed Mahameed wrote: >> On Wed, 2020-06-24 at 10:22 -0700, Jakub Kicinski wrote: >>> On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: >>>>>> I think Michal will rightly complain that this does not belong >>>>>> in >>>>>> private flags any more. As (/if?) ARM deployments take a >>>>>> foothold >>>>>> in DC this will become a common setting for most NICs. >>>>> >>>>> Initially we used pcie_relaxed_ordering_enabled() to >>>>> programmatically enable this on/off on boot but this seems to >>>>> introduce some degradation on some Intel CPUs since the Intel >>>>> Faulty >>>>> CPUs list is not up to date. Aya is discussing this with Bjorn. >>>> Adding Bjorn Helgaas >>> >>> I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting >>> bad CPUs seems far nicer from operational perspective. Perhaps Bjorn >>> will chime in. Pushing the validation out to the user is not a great >>> solution IMHO. >> >> Can we move on with this patch for now ? since we are going to keep the >> user knob anyway, what is missing is setting the default value >> automatically but this can't be done until we >> fix pcie_relaxed_ordering_enabled() > > If this patch was just adding a chicken bit that'd be fine, but opt in > I'm not hugely comfortable with. Seems like Bjorn has provided some > assistance already on the defaults but there doesn't appear to be much > progress being made. Hi Jakub, Dave Assuming the discussions with Bjorn will conclude in a well-trusted API that ensures relaxed ordering in enabled, I'd still like a method to turn off relaxed ordering for performance debugging sake. Bjorn highlighted the fact that the PCIe sub system can only offer a query method. Even if theoretically a set API will be provided, this will not fit a netdev debugging - I wonder if CPU vendors even support relaxed ordering set/unset... On the driver's side relaxed ordering is an attribute of the mkey and should be available for configuration (similar to number of CPU vs. number of channels). Based on the above, and binding the driver's default relaxed ordering to the return value from pcie_relaxed_ordering_enabled(), may I continue with previous direction of a private-flag to control the client side (my driver) ? Aya. >
On Mon, 6 Jul 2020 16:00:59 +0300 Aya Levin wrote: > Assuming the discussions with Bjorn will conclude in a well-trusted API > that ensures relaxed ordering in enabled, I'd still like a method to > turn off relaxed ordering for performance debugging sake. > Bjorn highlighted the fact that the PCIe sub system can only offer a > query method. Even if theoretically a set API will be provided, this > will not fit a netdev debugging - I wonder if CPU vendors even support > relaxed ordering set/unset... > On the driver's side relaxed ordering is an attribute of the mkey and > should be available for configuration (similar to number of CPU vs. > number of channels). > Based on the above, and binding the driver's default relaxed ordering to > the return value from pcie_relaxed_ordering_enabled(), may I continue > with previous direction of a private-flag to control the client side (my > driver) ? That's fine with me, chicken bit seems reasonable as long as the default is dictated by the PCI subsystem. I have no particularly strong feeling on the API used for the chicken bit, but others may.
From: Aya Levin <ayal@mellanox.com> Date: Mon, 6 Jul 2020 16:00:59 +0300 > Assuming the discussions with Bjorn will conclude in a well-trusted > API that ensures relaxed ordering in enabled, I'd still like a method > to turn off relaxed ordering for performance debugging sake. > Bjorn highlighted the fact that the PCIe sub system can only offer a > query method. Even if theoretically a set API will be provided, this > will not fit a netdev debugging - I wonder if CPU vendors even support > relaxed ordering set/unset... > On the driver's side relaxed ordering is an attribute of the mkey and > should be available for configuration (similar to number of CPU > vs. number of channels). > Based on the above, and binding the driver's default relaxed ordering > to the return value from pcie_relaxed_ordering_enabled(), may I > continue with previous direction of a private-flag to control the > client side (my driver) ? I don't like this situation at all. If RO is so dodgy that it potentially needs to be disabled, that is going to be an issue not just with networking devices but also with storage and other device types as well. Will every device type have a custom way to disable RO, thus inconsistently, in order to accomodate this? That makes no sense and is a terrible user experience. That's why the knob belongs generically in PCI or similar.
On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote: > On 7/6/2020 10:49 PM, David Miller wrote: > > From: Aya Levin <ayal@mellanox.com> > > Date: Mon, 6 Jul 2020 16:00:59 +0300 > > > > > Assuming the discussions with Bjorn will conclude in a well-trusted > > > API that ensures relaxed ordering in enabled, I'd still like a method > > > to turn off relaxed ordering for performance debugging sake. > > > Bjorn highlighted the fact that the PCIe sub system can only offer a > > > query method. Even if theoretically a set API will be provided, this > > > will not fit a netdev debugging - I wonder if CPU vendors even support > > > relaxed ordering set/unset... > > > On the driver's side relaxed ordering is an attribute of the mkey and > > > should be available for configuration (similar to number of CPU > > > vs. number of channels). > > > Based on the above, and binding the driver's default relaxed ordering > > > to the return value from pcie_relaxed_ordering_enabled(), may I > > > continue with previous direction of a private-flag to control the > > > client side (my driver) ? > > > > I don't like this situation at all. > > > > If RO is so dodgy that it potentially needs to be disabled, that is > > going to be an issue not just with networking devices but also with > > storage and other device types as well. > > > > Will every device type have a custom way to disable RO, thus > > inconsistently, in order to accomodate this? > > > > That makes no sense and is a terrible user experience. > > > > That's why the knob belongs generically in PCI or similar. > > > Hi Bjorn, > > Mellanox NIC supports relaxed ordering operation over DMA buffers. > However for debug prepossess we must have a chicken bit to disable > relaxed ordering on a specific system without effecting others in > run-time. In order to meet this requirement, I added a netdev > private-flag to ethtool for set RO API. > > Dave raised a concern regarding embedding relaxed ordering set API > per system (networking, storage and others). We need the ability to > manage relaxed ordering in a unify manner. Could you please define a > PCI sub-system solution to meet this requirement? I agree, this is definitely a mess. Let me just outline what I think we have today and what we're missing. - On the hardware side, device use of Relaxed Ordering is controlled by the Enable Relaxed Ordering bit in the PCIe Device Control register (or the PCI-X Command register). If set, the device is allowed but not required to set the Relaxed Ordering bit in transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec 7.2.3). I suspect there may be device-specific controls, too, because [1] claims to enable/disable Relaxed Ordering but doesn't touch the PCIe Device Control register. Device-specific controls are certainly allowed, but of course it would be up to the driver, and the device cannot generate TLPs with Relaxed Ordering unless the architected PCIe Enable Relaxed Ordering bit is *also* set. - Platform firmware can enable Relaxed Ordering for a device either before handoff to the OS or via the _HPX ACPI method. - The PCI core never enables Relaxed Ordering itself except when applying _HPX. - At enumeration-time, the PCI core disables Relaxed Ordering in pci_configure_relaxed_ordering() if the device is below a Root Port that has a quirk indicating an erratum. This quirk currently includes many Intel Root Ports, but not all, and is an ongoing maintenance problem. - The PCI core provides pcie_relaxed_ordering_enabled() which tells you whether Relaxed Ordering is enabled. Only used by cxgb4 and csio, which use that information to fill in Ingress Queue Commands. - The PCI core does not provide a driver interface to enable or disable Relaxed Ordering. - Some drivers disable Relaxed Ordering themselves: mtip32xx, netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA), tsi721, kp2000_pcie. - Some drivers enable Relaxed Ordering themselves: niu, tegra. What are we missing and what should the PCI core do? - Currently the Enable Relaxed Ordering bit depends on what firmware did. Maybe the PCI core should always clear it during enumeration? - The PCI core should probably have a driver interface like pci_set_relaxed_ordering(dev, enable) that can set or clear the architected PCI-X or PCIe Enable Relaxed Ordering bit. - Maybe there should be a kernel command-line parameter like "pci=norelax" that disables Relaxed Ordering for every device and prevents pci_set_relaxed_ordering() from enabling it. I'm mixed on this because these tend to become folklore about how to "fix" problems and we end up with systems that don't work unless you happen to find the option on the web. For debugging issues, it might be enough to disable Relaxed Ordering using setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/
On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > I suspect there may be device-specific controls, too, because [1] > claims to enable/disable Relaxed Ordering but doesn't touch the > PCIe Device Control register. Device-specific controls are > certainly allowed, but of course it would be up to the driver, and > the device cannot generate TLPs with Relaxed Ordering unless the > architected PCIe Enable Relaxed Ordering bit is *also* set. Yes, at least on RDMA relaxed ordering can be set on a per transaction basis and is something userspace can choose to use or not at a fine granularity. This is because we have to support historical applications that make assumptions that data arrives in certain orders. I've been thinking of doing the same as this patch but for RDMA kernel ULPs and just globally turn it on if the PCI CAP is enabled as none of our in-kernel uses have the legacy data ordering problem. There are reports that using relaxed ordering is a *huge* speed up in certain platforms/configurations/etc. > issues, it might be enough to disable Relaxed Ordering using > setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" For the purposes of occasional performance testing I think this should be good enough? Aya? Jason
On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > I suspect there may be device-specific controls, too, because [1] > > claims to enable/disable Relaxed Ordering but doesn't touch the > > PCIe Device Control register. Device-specific controls are > > certainly allowed, but of course it would be up to the driver, and > > the device cannot generate TLPs with Relaxed Ordering unless the > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > basis and is something userspace can choose to use or not at a fine > granularity. This is because we have to support historical > applications that make assumptions that data arrives in certain > orders. > > I've been thinking of doing the same as this patch but for RDMA kernel > ULPs and just globally turn it on if the PCI CAP is enabled as none of > our in-kernel uses have the legacy data ordering problem. If I'm following this correctly - there are two different controls being discussed here: 1) having the driver request PCI relaxed ordering, which may or may not be granted, based on other system settings, and 2) having the driver set RO on the transactions it initiates, which are honored iff the PCI bit is set. It seems that in addition to the PCI core changes, there still is a need for driver controls? Unless the driver always enables RO if it's capable? -- Jonathan
On Thu, Jul 09, 2020 at 10:35:50AM -0700, Jonathan Lemon wrote: > On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > > I suspect there may be device-specific controls, too, because [1] > > > claims to enable/disable Relaxed Ordering but doesn't touch the > > > PCIe Device Control register. Device-specific controls are > > > certainly allowed, but of course it would be up to the driver, and > > > the device cannot generate TLPs with Relaxed Ordering unless the > > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > > basis and is something userspace can choose to use or not at a fine > > granularity. This is because we have to support historical > > applications that make assumptions that data arrives in certain > > orders. > > > > I've been thinking of doing the same as this patch but for RDMA kernel > > ULPs and just globally turn it on if the PCI CAP is enabled as none of > > our in-kernel uses have the legacy data ordering problem. > > If I'm following this correctly - there are two different controls being > discussed here: > > 1) having the driver request PCI relaxed ordering, which may or may > not be granted, based on other system settings, and This is what Bjorn was thinking about, yes, it is some PCI layer function to control the global config space bit. > 2) having the driver set RO on the transactions it initiates, which > are honored iff the PCI bit is set. > > It seems that in addition to the PCI core changes, there still is a need > for driver controls? Unless the driver always enables RO if it's capable? I think the PCI spec imagined that when the config space RO bit was enabled the PCI device would just start using RO packets, in an appropriate and device specific way. So the fine grained control in #2 is something done extra by some devices. IMHO if the driver knows it is functionally correct with RO then it should enable it fully on the device when the config space bit is set. I'm not sure there is a reason to allow users to finely tune RO, at least I haven't heard of cases where RO is a degredation depending on workload. If some platform doesn't work when RO is turned on then it should be globally black listed like is already done in some cases. If the devices has bugs and uses RO wrong, or the driver has bugs and is only stable with !RO and Intel, then the driver shouldn't turn it on at all. In all of these cases it is not a user tunable. Development and testing reasons, like 'is my crash from a RO bug?' to tune should be met by the device global setpci, I think. Jason
On Thu, 9 Jul 2020 15:20:11 -0300 Jason Gunthorpe wrote: > > 2) having the driver set RO on the transactions it initiates, which > > are honored iff the PCI bit is set. > > > > It seems that in addition to the PCI core changes, there still is a need > > for driver controls? Unless the driver always enables RO if it's capable? > > I think the PCI spec imagined that when the config space RO bit was > enabled the PCI device would just start using RO packets, in an > appropriate and device specific way. > > So the fine grained control in #2 is something done extra by some > devices. > > IMHO if the driver knows it is functionally correct with RO then it > should enable it fully on the device when the config space bit is set. > > I'm not sure there is a reason to allow users to finely tune RO, at > least I haven't heard of cases where RO is a degredation depending on > workload. > > If some platform doesn't work when RO is turned on then it should be > globally black listed like is already done in some cases. > > If the devices has bugs and uses RO wrong, or the driver has bugs and > is only stable with !RO and Intel, then the driver shouldn't turn it > on at all. > > In all of these cases it is not a user tunable. > > Development and testing reasons, like 'is my crash from a RO bug?' to > tune should be met by the device global setpci, I think. +1
On Thu, Jul 09, 2020 at 03:20:11PM -0300, Jason Gunthorpe wrote: > On Thu, Jul 09, 2020 at 10:35:50AM -0700, Jonathan Lemon wrote: > > On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > > > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > > > I suspect there may be device-specific controls, too, because [1] > > > > claims to enable/disable Relaxed Ordering but doesn't touch the > > > > PCIe Device Control register. Device-specific controls are > > > > certainly allowed, but of course it would be up to the driver, and > > > > the device cannot generate TLPs with Relaxed Ordering unless the > > > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > > > > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > > > basis and is something userspace can choose to use or not at a fine > > > granularity. This is because we have to support historical > > > applications that make assumptions that data arrives in certain > > > orders. > > > > > > I've been thinking of doing the same as this patch but for RDMA kernel > > > ULPs and just globally turn it on if the PCI CAP is enabled as none of > > > our in-kernel uses have the legacy data ordering problem. > > > > If I'm following this correctly - there are two different controls being > > discussed here: > > > > 1) having the driver request PCI relaxed ordering, which may or may > > not be granted, based on other system settings, and > > This is what Bjorn was thinking about, yes, it is some PCI layer > function to control the global config space bit. > > > 2) having the driver set RO on the transactions it initiates, which > > are honored iff the PCI bit is set. > > > > It seems that in addition to the PCI core changes, there still is a need > > for driver controls? Unless the driver always enables RO if it's capable? > > I think the PCI spec imagined that when the config space RO bit was > enabled the PCI device would just start using RO packets, in an > appropriate and device specific way. > > So the fine grained control in #2 is something done extra by some > devices. > > IMHO if the driver knows it is functionally correct with RO then it > should enable it fully on the device when the config space bit is set. Sounds reasonable to me. -- Jonathan
On Thu, 2020-07-09 at 12:47 -0700, Jakub Kicinski wrote: > On Thu, 9 Jul 2020 15:20:11 -0300 Jason Gunthorpe wrote: > > > 2) having the driver set RO on the transactions it initiates, > > > which > > > are honored iff the PCI bit is set. > > > > > > It seems that in addition to the PCI core changes, there still is > > > a need > > > for driver controls? Unless the driver always enables RO if it's > > > capable? > > > > I think the PCI spec imagined that when the config space RO bit was > > enabled the PCI device would just start using RO packets, in an > > appropriate and device specific way. > > > > So the fine grained control in #2 is something done extra by some > > devices. > > > > IMHO if the driver knows it is functionally correct with RO then it > > should enable it fully on the device when the config space bit is > > set. > > > > I'm not sure there is a reason to allow users to finely tune RO, at > > least I haven't heard of cases where RO is a degredation depending > > on > > workload. > > > > If some platform doesn't work when RO is turned on then it should > > be > > globally black listed like is already done in some cases. > > > > If the devices has bugs and uses RO wrong, or the driver has bugs > > and > > is only stable with !RO and Intel, then the driver shouldn't turn > > it > > on at all. > > > > In all of these cases it is not a user tunable. > > > > Development and testing reasons, like 'is my crash from a RO bug?' > > to > > tune should be met by the device global setpci, I think. > > +1 Be careful though to load driver with RO on and then setpci RO off.. not sure what the side effects are, unstable driver maybe ? And not sure what should be the procedure then ? reload driver ? FW will get a notification from PCI ?
On Fri, Jul 10, 2020 at 02:18:02AM +0000, Saeed Mahameed wrote: > Be careful though to load driver with RO on and then setpci RO off.. > not sure what the side effects are, unstable driver maybe ? According to the PCI spec HW should stop doing RO immediately once the config space bit is cleared. In any event continuing to issue RO won't harm anything. > And not sure what should be the procedure then ? reload driver ? FW > will get a notification from PCI ? At worst you'd have to reload the driver - continuing to use RO if the driver starts with RO off is seriously broken and probably won't work with the quirks to disable RO on buggy platforms. But as above, the RO config space bit should have immedaite effect on the device and it should stop using RO. The device HW itself has to enforce this to be spec compliant. Jason
On 7/9/2020 2:16 AM, Bjorn Helgaas wrote: > On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote: >> On 7/6/2020 10:49 PM, David Miller wrote: >>> From: Aya Levin <ayal@mellanox.com> >>> Date: Mon, 6 Jul 2020 16:00:59 +0300 >>> >>>> Assuming the discussions with Bjorn will conclude in a well-trusted >>>> API that ensures relaxed ordering in enabled, I'd still like a method >>>> to turn off relaxed ordering for performance debugging sake. >>>> Bjorn highlighted the fact that the PCIe sub system can only offer a >>>> query method. Even if theoretically a set API will be provided, this >>>> will not fit a netdev debugging - I wonder if CPU vendors even support >>>> relaxed ordering set/unset... >>>> On the driver's side relaxed ordering is an attribute of the mkey and >>>> should be available for configuration (similar to number of CPU >>>> vs. number of channels). >>>> Based on the above, and binding the driver's default relaxed ordering >>>> to the return value from pcie_relaxed_ordering_enabled(), may I >>>> continue with previous direction of a private-flag to control the >>>> client side (my driver) ? >>> >>> I don't like this situation at all. >>> >>> If RO is so dodgy that it potentially needs to be disabled, that is >>> going to be an issue not just with networking devices but also with >>> storage and other device types as well. >>> >>> Will every device type have a custom way to disable RO, thus >>> inconsistently, in order to accomodate this? >>> >>> That makes no sense and is a terrible user experience. >>> >>> That's why the knob belongs generically in PCI or similar. >>> >> Hi Bjorn, >> >> Mellanox NIC supports relaxed ordering operation over DMA buffers. >> However for debug prepossess we must have a chicken bit to disable >> relaxed ordering on a specific system without effecting others in >> run-time. In order to meet this requirement, I added a netdev >> private-flag to ethtool for set RO API. >> >> Dave raised a concern regarding embedding relaxed ordering set API >> per system (networking, storage and others). We need the ability to >> manage relaxed ordering in a unify manner. Could you please define a >> PCI sub-system solution to meet this requirement? > > I agree, this is definitely a mess. Let me just outline what I think > we have today and what we're missing. > > - On the hardware side, device use of Relaxed Ordering is controlled > by the Enable Relaxed Ordering bit in the PCIe Device Control > register (or the PCI-X Command register). If set, the device is > allowed but not required to set the Relaxed Ordering bit in > transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec > 7.2.3). > > I suspect there may be device-specific controls, too, because [1] > claims to enable/disable Relaxed Ordering but doesn't touch the > PCIe Device Control register. Device-specific controls are > certainly allowed, but of course it would be up to the driver, and > the device cannot generate TLPs with Relaxed Ordering unless the > architected PCIe Enable Relaxed Ordering bit is *also* set. > > - Platform firmware can enable Relaxed Ordering for a device either > before handoff to the OS or via the _HPX ACPI method. > > - The PCI core never enables Relaxed Ordering itself except when > applying _HPX. > > - At enumeration-time, the PCI core disables Relaxed Ordering in > pci_configure_relaxed_ordering() if the device is below a Root > Port that has a quirk indicating an erratum. This quirk currently > includes many Intel Root Ports, but not all, and is an ongoing > maintenance problem. > > - The PCI core provides pcie_relaxed_ordering_enabled() which tells > you whether Relaxed Ordering is enabled. Only used by cxgb4 and > csio, which use that information to fill in Ingress Queue > Commands. > > - The PCI core does not provide a driver interface to enable or > disable Relaxed Ordering. > > - Some drivers disable Relaxed Ordering themselves: mtip32xx, > netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA), > tsi721, kp2000_pcie. > > - Some drivers enable Relaxed Ordering themselves: niu, tegra. > > What are we missing and what should the PCI core do? > > - Currently the Enable Relaxed Ordering bit depends on what firmware > did. Maybe the PCI core should always clear it during > enumeration? > > - The PCI core should probably have a driver interface like > pci_set_relaxed_ordering(dev, enable) that can set or clear the > architected PCI-X or PCIe Enable Relaxed Ordering bit. > > - Maybe there should be a kernel command-line parameter like > "pci=norelax" that disables Relaxed Ordering for every device and > prevents pci_set_relaxed_ordering() from enabling it. > > I'm mixed on this because these tend to become folklore about how > to "fix" problems and we end up with systems that don't work > unless you happen to find the option on the web. For debugging > issues, it might be enough to disable Relaxed Ordering using > setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" > > [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/ > Hi Bjorn, Thanks for the detailed reply. From initial testing I can say that turning off the relaxed ordering on the PCI (setpci -s02:00.0 CAP_EXP+8.w=0) is the chicken bit I was looking for. This lower the risk of depending on pcie_relaxed_ordering_enabled(). I will update my patch and resubmit. Thanks, Aya
On 7/8/2040 1:22 AM, Aya Levin wrote: > > > On 7/6/2020 10:49 PM, David Miller wrote: >> From: Aya Levin <ayal@mellanox.com> >> Date: Mon, 6 Jul 2020 16:00:59 +0300 >> >>> Assuming the discussions with Bjorn will conclude in a well-trusted >>> API that ensures relaxed ordering in enabled, I'd still like a method >>> to turn off relaxed ordering for performance debugging sake. >>> Bjorn highlighted the fact that the PCIe sub system can only offer a >>> query method. Even if theoretically a set API will be provided, this >>> will not fit a netdev debugging - I wonder if CPU vendors even support >>> relaxed ordering set/unset... >>> On the driver's side relaxed ordering is an attribute of the mkey and >>> should be available for configuration (similar to number of CPU >>> vs. number of channels). >>> Based on the above, and binding the driver's default relaxed ordering >>> to the return value from pcie_relaxed_ordering_enabled(), may I >>> continue with previous direction of a private-flag to control the >>> client side (my driver) ? >> >> I don't like this situation at all. >> >> If RO is so dodgy that it potentially needs to be disabled, that is >> going to be an issue not just with networking devices but also with >> storage and other device types as well. >> >> Will every device type have a custom way to disable RO, thus >> inconsistently, in order to accomodate this? >> >> That makes no sense and is a terrible user experience. >> >> That's why the knob belongs generically in PCI or similar. >> > Hi Bjorn, > > Mellanox NIC supports relaxed ordering operation over DMA buffers. > However for debug prepossess we must have a chicken bit to disable > relaxed ordering on a specific system without effecting others in > run-time. In order to meet this requirement, I added a netdev > private-flag to ethtool for set RO API. > > Dave raised a concern regarding embedding relaxed ordering set API per > system (networking, storage and others). We need the ability to manage > relaxed ordering in a unify manner. Could you please define a PCI > sub-system solution to meet this requirement? > > Aya. Isn't there a relaxed ordering bit in the PCIe configuration space? Couldn't you use that as a global indication of if you can support relaxed ordering or not? Reading through the spec it seems like that is kind of the point of the config space bit in the Device Control register. If the bit is not set there then you shouldn't be able to use relaxed ordering in the device. Then it is just a matter of using setpci to enable/disable it. Thanks. - Alex
On 7/6/2020 10:49 PM, David Miller wrote: > From: Aya Levin <ayal@mellanox.com> > Date: Mon, 6 Jul 2020 16:00:59 +0300 > >> Assuming the discussions with Bjorn will conclude in a well-trusted >> API that ensures relaxed ordering in enabled, I'd still like a method >> to turn off relaxed ordering for performance debugging sake. >> Bjorn highlighted the fact that the PCIe sub system can only offer a >> query method. Even if theoretically a set API will be provided, this >> will not fit a netdev debugging - I wonder if CPU vendors even support >> relaxed ordering set/unset... >> On the driver's side relaxed ordering is an attribute of the mkey and >> should be available for configuration (similar to number of CPU >> vs. number of channels). >> Based on the above, and binding the driver's default relaxed ordering >> to the return value from pcie_relaxed_ordering_enabled(), may I >> continue with previous direction of a private-flag to control the >> client side (my driver) ? > > I don't like this situation at all. > > If RO is so dodgy that it potentially needs to be disabled, that is > going to be an issue not just with networking devices but also with > storage and other device types as well. > > Will every device type have a custom way to disable RO, thus > inconsistently, in order to accomodate this? > > That makes no sense and is a terrible user experience. > > That's why the knob belongs generically in PCI or similar. > Hi Bjorn, Mellanox NIC supports relaxed ordering operation over DMA buffers. However for debug prepossess we must have a chicken bit to disable relaxed ordering on a specific system without effecting others in run-time. In order to meet this requirement, I added a netdev private-flag to ethtool for set RO API. Dave raised a concern regarding embedding relaxed ordering set API per system (networking, storage and others). We need the ability to manage relaxed ordering in a unify manner. Could you please define a PCI sub-system solution to meet this requirement? Aya.