Message ID | 20200927075015.1417714-1-idosch@idosch.org |
---|---|
Headers | show |
Series | mlxsw: Expose transceiver overheat counter | expand |
From: Ido Schimmel <idosch@idosch.org> Date: Sun, 27 Sep 2020 10:50:05 +0300 > From: Ido Schimmel <idosch@nvidia.com> > > Amit says: > > An overheated transceiver can be the root cause of various network > problems such as link flapping. Counting the number of times a > transceiver's temperature was higher than its configured threshold can > therefore help in debugging such issues. > > This patch set exposes a transceiver overheat counter via ethtool. This > is achieved by configuring the Spectrum ASIC to generate events whenever > a transceiver is overheated. The temperature thresholds are queried from > the transceiver (if available) and set to the default otherwise. > > Example: > > # ethtool -S swp1 > ... > transceiver_overheat: 2 > > Patch set overview: > > Patches #1-#3 add required device registers > Patches #4-#5 add required infrastructure in mlxsw to configure and > count overheat events > Patches #6-#9 gradually add support for the transceiver overheat counter > Patch #10 exposes the transceiver overheat counter via ethtool Series applied, thanks.
From: Ido Schimmel <idosch@nvidia.com> Amit says: An overheated transceiver can be the root cause of various network problems such as link flapping. Counting the number of times a transceiver's temperature was higher than its configured threshold can therefore help in debugging such issues. This patch set exposes a transceiver overheat counter via ethtool. This is achieved by configuring the Spectrum ASIC to generate events whenever a transceiver is overheated. The temperature thresholds are queried from the transceiver (if available) and set to the default otherwise. Example: # ethtool -S swp1 ... transceiver_overheat: 2 Patch set overview: Patches #1-#3 add required device registers Patches #4-#5 add required infrastructure in mlxsw to configure and count overheat events Patches #6-#9 gradually add support for the transceiver overheat counter Patch #10 exposes the transceiver overheat counter via ethtool Amit Cohen (10): mlxsw: reg: Add Management Temperature Warning Event Register mlxsw: reg: Add Port Module Plug/Unplug Event Register mlxsw: reg: Add Ports Module Administrative and Operational Status Register mlxsw: core_hwmon: Query MTMP before writing to set only relevant fields mlxsw: core: Add an infrastructure to track transceiver overheat counter mlxsw: Update transceiver_overheat counter according to MTWE mlxsw: Enable temperature event for all supported port module sensors mlxsw: spectrum: Initialize netdev's module overheat counter mlxsw: Update module's settings when module is plugged in mlxsw: spectrum_ethtool: Expose transceiver_overheat counter drivers/net/ethernet/mellanox/mlxsw/core.c | 27 ++ drivers/net/ethernet/mellanox/mlxsw/core.h | 5 + .../net/ethernet/mellanox/mlxsw/core_env.c | 368 ++++++++++++++++++ .../net/ethernet/mellanox/mlxsw/core_env.h | 6 + .../net/ethernet/mellanox/mlxsw/core_hwmon.c | 21 +- drivers/net/ethernet/mellanox/mlxsw/reg.h | 132 +++++++ .../net/ethernet/mellanox/mlxsw/spectrum.c | 44 +++ .../net/ethernet/mellanox/mlxsw/spectrum.h | 1 + .../mellanox/mlxsw/spectrum_ethtool.c | 57 ++- drivers/net/ethernet/mellanox/mlxsw/trap.h | 4 + 10 files changed, 660 insertions(+), 5 deletions(-)