diff mbox series

[net-next] net: tighten the definition of interface statistics

Message ID 20200903020336.2302858-1-kuba@kernel.org
State New
Headers show
Series [net-next] net: tighten the definition of interface statistics | expand

Commit Message

Jakub Kicinski Sept. 3, 2020, 2:03 a.m. UTC
This patch is born out of an investigation into which IEEE statistics
correspond to which struct rtnl_link_stats64 members. Turns out that
there seems to be reasonable consensus on the matter, among many drivers.
To save others the time (and it took more time than I'm comfortable
admitting) I'm adding comments referring to IEEE attributes to
struct rtnl_link_stats64.

Up until now we had two forms of documentation for stats - in
Documentation/ABI/testing/sysfs-class-net-statistics and the comments
on struct rtnl_link_stats64 itself. While the former is very cautious
in defining the expected behavior, the latter feel quite dated and
may not be easy to understand for modern day driver author
(e.g. rx_over_errors). At the same time modern systems are far more
complex and once obvious definitions lost their clarity. For example
- does rx_packet count at the MAC layer (aFramesReceivedOK)?
packets processed correctly by hardware? received by the driver?
or maybe received by the stack?

I tried to clarify the expectations, further clarifications from
others are very welcome.

The part hardest to untangle is rx_over_errors vs rx_fifo_errors
vs rx_missed_errors. After much deliberation I concluded that for
modern HW only two of the counters will make sense. The distinction
between internal FIFO overflow and packets dropped due to back-pressure
from the host is likely too implementation (driver and device) specific
to expose in the standard stats.

Now - which two of those counters we select to use is anyone's pick:

sysfs documentation suggests rx_over_errors counts packets which
did not fit into buffers due to MTU being too small, which I reused.
There don't seem to be many modern drivers using it (well, CAN drivers
seem to love this statistic).

Of the remaining two I picked rx_missed_errors to report device drops.
bnxt reports it and it's folded into "drop"s in procfs (while
rx_fifo_errors is an error, and modern devices usually receive the frame
OK, they just can't admit it into the pipeline).

Of the drivers I looked at only AMD Lance-like and NS8390-like use all
three of these counters. rx_missed_errors counts missed frames,
rx_over_errors counts overflow events, and rx_fifo_errors counts frames
which were truncated because they didn't fit into buffers. This suggests
that rx_fifo_errors may be the correct stat for truncated packets, but
I'd think a FIFO stat counting truncated packets would be very confusing
to a modern reader.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 Documentation/networking/index.rst      |   1 +
 Documentation/networking/statistics.rst | 123 +++++++++++++++
 include/uapi/linux/if_link.h            | 198 ++++++++++++++++++++++--
 3 files changed, 305 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/networking/statistics.rst

Comments

Florian Fainelli Sept. 3, 2020, 8:48 p.m. UTC | #1
On 9/3/2020 1:45 PM, Jakub Kicinski wrote:
> On Thu, 3 Sep 2020 09:29:22 -0700 Edwin Peer wrote:
>> On Wed, Sep 2, 2020 at 7:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>> +Drivers should report all statistics which have a matching member in
>>> +:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` exclusively
>>> +via `.ndo_get_stats64`. Reporting such standard stats via ethtool
>>> +or debugfs will not be accepted.
>>
>> Should existing drivers that currently duplicate standard stats in the
>> ethtool list be revised also?
> 
> That's probably considered uAPI land. I've removed the stats from the
> nfp awhile back, and nobody complained, but I'm thinking to leave the
> decision to individual maintainers.
> 
> Funnily enough number of 10G and 40G drivers report tx_heartbeat_errors
> in their ethtool stats (always as 0). Explainer what the statistic
> counts for a contemporary reader:
> 
> http://www.ethermanage.com/ethernet/sqe/sqe.html
> 
>>> + * @rx_packets: Number of good packets received by the interface.
>>> + *   For hardware interfaces counts all good packets seen by the host,
>>> + *   including packets which host had to drop at various stages of processing
>>> + *   (even in the driver).
>>
>> This is perhaps a bit ambiguous. I think you mean to say packets received from
>> the device, but I could also interpret the above to mean received by the device
>> if 'host' is read to be the whole physical machine (ie. including NIC hardware)
>> instead of the part that is apart from the NIC from the NIC's perspective.
> 
> How about:
> 
>    For hardware interfaces counts all good packets received from the
>    device by the host, including packets which host had to drop...
> 
>>> + * @rx_bytes: Number of good incoming bytes, corresponding to @rx_packets.
>>> + * @tx_bytes: Number of good incoming bytes, corresponding to @tx_packets.
>>
>> Including or excluding FCS?
> 
> Good point, no FCS is probably a reasonable expectation.
> 
> I'm not sure what to say about pad. I'm tempted to also mention that
> for tx we shouldn't count pad, no? (IOW Ethernet Frame - FCS - pad)

It depends I would say, if the driver needed to add padding in order to 
get the frame to be transmitted because the Ethernet MAC cannot pad 
automatically then it would seem natural to count the added padding. If 
you implement BQL that is what you will be reporting because that how 
much travels on the wire. What do you think?

> 
>>> + *   For Ethernet devices this counter may be equivalent to:
>>> + *
>>> + *    - 30.3.1.1.21 aMulticastFramesReceivedOK
>>
>> You mention the IEEE standard in your commit message, but I don't think this
>> document properly cites what you are referring to here? It might be an idea to
>> say "IEEE 30.3.1.1.21 aMulticastFramesReceivedOK" here and provide an
>> appropriate citation reference at the end, or perhaps a link.
> 
> How about I replace Ethernet with IEEE 802.3:
> 
>    For IEEE 802.3 devices this counter may be equivalent to:
> 
>     - 30.3.1.1.21 aMulticastFramesReceivedOK
>
David Ahern Sept. 3, 2020, 10:40 p.m. UTC | #2
On 9/2/20 8:03 PM, Jakub Kicinski wrote:
> diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
> new file mode 100644
> index 000000000000..487b17c166e8
> --- /dev/null
> +++ b/Documentation/networking/statistics.rst

...

> +
> +sysfs
> +-----
> +
> +Each device directory in sysfs contains a `statistics` directory (e.g.
> +`/sys/class/net/lo/statistics/`) with files corresponding to
> +members of :c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
> +
> +This simple interface is convenient especially in constrained/embedded
> +environments without access to tools. However, it's sightly inefficient

sightly seems like the wrong word. Did you mean 'highly inefficient'?

> +when reading multiple stats as it internally performs a full dump of
> +:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`
> +and reports only the stat corresponding to the accessed file.
> +
> +Sysfs files are documented in
> +`Documentation/ABI/testing/sysfs-class-net-statistics`.
> +
> +
> +netlink
> +-------
> +
> +`rtnetlink` (`NETLINK_ROUTE`) is the preferred method of accessing
> +:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` stats.
> +
> +Statistics are reported both in the responses to link information
> +requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
> +when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
> +
> +ethtool
> +-------
> +
> +Ethtool IOCTL interface allows drivers to report implementation
> +specific statistics.

an example here would be helpful. e.g., I use `ethool -S` primarily for
per queue stats which show more details than the other APIs which show
aggregated stats.



> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 7fba4de511de..6ea0fb48739e 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -40,26 +40,191 @@ struct rtnl_link_stats {
>  	__u32	rx_nohandler;		/* dropped, no handler found	*/
>  };
>  
> -/* The main device statistics structure */
> +/**
> + * struct rtnl_link_stats64 - The main device statistics structure.
> + *
> + * @rx_packets: Number of good packets received by the interface.
> + *   For hardware interfaces counts all good packets seen by the host,> + *   including packets which host had to drop at various stages of
processing
> + *   (even in the driver).
> + *
> + * @tx_packets: Number of packets successfully transmitted.
> + *   For hardware interfaces counts packets which host was able to successfully
> + *   hand over to the device, which does not necessarily mean that packets
> + *   had been successfully transmitted out of the device, only that device
> + *   acknowledged it copied them out of host memory.
> + *
> + * @rx_bytes: Number of good incoming bytes, corresponding to @rx_packets.

s/incoming/received/?

> + *
> + * @tx_bytes: Number of good incoming bytes, corresponding to @tx_packets.

s/incoming/transmitted/

Thanks for taking the time to work on this; I'm sure you spent a LOT of
hours going through all of the drivers and APIs.
Jakub Kicinski Sept. 3, 2020, 11:03 p.m. UTC | #3
On Thu, 3 Sep 2020 16:02:28 -0700 Jakub Kicinski wrote:
> > > +This simple interface is convenient especially in constrained/embedded
> > > +environments without access to tools. However, it's sightly inefficient    
> > 
> > sightly seems like the wrong word. Did you mean 'highly inefficient'?  
> 
> Indeed, I'll drop the "slightly". Hopefully the info below is clear
> enough for users to understand what's happening. 

Ah, I realized now I was missing the 'l'. Anyway, I dropped the word,
it didn't add much.
diff mbox series

Patch

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index c29496fff81c..4167acc5c076 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -93,6 +93,7 @@  Linux Networking Documentation
    sctp
    secid
    seg6-sysctl
+   statistics
    strparser
    switchdev
    tc-actions-env-rules
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
new file mode 100644
index 000000000000..487b17c166e8
--- /dev/null
+++ b/Documentation/networking/statistics.rst
@@ -0,0 +1,123 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Interface statistics
+====================
+
+This document is a guide to Linux network interface statistics.
+
+There are two main sources of interface statistics in Linux:
+
+ - standard interface statistics based on
+   :c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`; and
+ - driver-defined statistics available via ethtool.
+
+There are multiple interfaces to reach the former. Most commonly used
+is the `ip` command from `iproute2`::
+
+  $ ip -s -s link show dev ens4u1u1
+  6: ens4u1u1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
+    link/ether 48:2a:e3:4c:b1:d1 brd ff:ff:ff:ff:ff:ff
+    RX: bytes  packets  errors  dropped overrun mcast
+    74327665117 69016965 0       0       0       0
+    RX errors: length   crc     frame   fifo    missed
+               0        0       0       0       0
+    TX: bytes  packets  errors  dropped carrier collsns
+    21405556176 44608960 0       0       0       0
+    TX errors: aborted  fifo   window heartbeat transns
+               0        0       0       0       128
+    altname enp58s0u1u1
+
+Note that `-s` has been specified twice to see all members of
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
+If `-s` is specified once the detailed errors won't be shown.
+
+`ip` supports JSON formatting via the `-j` option.
+
+Ethtool statistics can be dumped using `ethtool -S $ifc`, e.g.::
+
+  $ ethtool -S ens4u1u1
+  NIC statistics:
+     tx_single_collisions: 0
+     tx_multi_collisions: 0
+
+uAPIs
+=====
+
+procfs
+------
+
+The historical `/proc/net/dev` text interface gives access to the list
+of interfaces as well as their statistics.
+
+Note that even though this interface is using
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`
+internally it combines some of the fields.
+
+sysfs
+-----
+
+Each device directory in sysfs contains a `statistics` directory (e.g.
+`/sys/class/net/lo/statistics/`) with files corresponding to
+members of :c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
+
+This simple interface is convenient especially in constrained/embedded
+environments without access to tools. However, it's sightly inefficient
+when reading multiple stats as it internally performs a full dump of
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`
+and reports only the stat corresponding to the accessed file.
+
+Sysfs files are documented in
+`Documentation/ABI/testing/sysfs-class-net-statistics`.
+
+
+netlink
+-------
+
+`rtnetlink` (`NETLINK_ROUTE`) is the preferred method of accessing
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` stats.
+
+Statistics are reported both in the responses to link information
+requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
+when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
+
+ethtool
+-------
+
+Ethtool IOCTL interface allows drivers to report implementation
+specific statistics.
+
+Statistics and their string identifiers are retrieved separately.
+Identifiers via `ETHTOOL_GSTRINGS` with `string_set` set to `ETH_SS_STATS`,
+and values via `ETHTOOL_GSTATS`. User space should use `ETHTOOL_GDRVINFO`
+to retrieve the number of statistics (`.n_stats`).
+
+debugfs
+-------
+
+Some drivers expose extra statistics via `debugfs`.
+
+struct rtnl_link_stats64
+========================
+
+.. kernel-doc:: include/uapi/linux/if_link.h
+    :identifiers: rtnl_link_stats64
+
+Notes for driver authors
+========================
+
+Drivers should report all statistics which have a matching member in
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>` exclusively
+via `.ndo_get_stats64`. Reporting such standard stats via ethtool
+or debugfs will not be accepted.
+
+Drivers must ensure best possible compliance with
+:c:type:`struct rtnl_link_stats64 <rtnl_link_stats64>`.
+Please note for example that detailed error statistics must be
+added into the general `rx_error` / `tx_error` counters.
+
+The `.ndo_get_stats64` callback can not sleep because of accesses
+via `/proc/net/dev`. If driver may sleep when retrieving the statistics
+from the device it should do so periodically asynchronously and only return
+a recent copy from `.ndo_get_stats64`. Ethtool interrupt coalescing interface
+allows setting the frequency of refreshing statistics, if needed.
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7fba4de511de..6ea0fb48739e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -40,26 +40,191 @@  struct rtnl_link_stats {
 	__u32	rx_nohandler;		/* dropped, no handler found	*/
 };
 
-/* The main device statistics structure */
+/**
+ * struct rtnl_link_stats64 - The main device statistics structure.
+ *
+ * @rx_packets: Number of good packets received by the interface.
+ *   For hardware interfaces counts all good packets seen by the host,
+ *   including packets which host had to drop at various stages of processing
+ *   (even in the driver).
+ *
+ * @tx_packets: Number of packets successfully transmitted.
+ *   For hardware interfaces counts packets which host was able to successfully
+ *   hand over to the device, which does not necessarily mean that packets
+ *   had been successfully transmitted out of the device, only that device
+ *   acknowledged it copied them out of host memory.
+ *
+ * @rx_bytes: Number of good incoming bytes, corresponding to @rx_packets.
+ *
+ * @tx_bytes: Number of good incoming bytes, corresponding to @tx_packets.
+ *
+ * @rx_errors: Total number of bad packets received on this network device.
+ *   This counter must include events counted by @rx_length_errors,
+ *   @rx_crc_errors, @rx_frame_errors and other errors not otherwise
+ *   counted.
+ *
+ * @tx_errors: Total number of transmit problems.
+ *   This counter must include events counter by @tx_aborted_errors,
+ *   @tx_carrier_errors, @tx_fifo_errors, @tx_heartbeat_errors,
+ *   @tx_window_errors and other errors not otherwise counted.
+ *
+ * @rx_dropped: Number of packets received but not processed,
+ *   e.g. due to lack of resources or unsupported protocol.
+ *   For hardware interfaces this counter should not include packets
+ *   dropped by the device which are counted separately in
+ *   @rx_missed_errors (since procfs folds those two counters together).
+ *
+ * @tx_dropped: Number of packets dropped on their way to transmission,
+ *   e.g. due to lack of resources.
+ *
+ * @multicast: Multicast packets received.
+ *   For hardware interfaces this statistic is commonly calculated
+ *   at the device level (unlike @rx_packets) and therefore may include
+ *   packets which did not reach the host.
+ *
+ *   For Ethernet devices this counter may be equivalent to:
+ *
+ *    - 30.3.1.1.21 aMulticastFramesReceivedOK
+ *
+ * @collisions: Number of collisions during packet transmissions.
+ *
+ * @rx_length_errors: Number of packets dropped due to invalid length.
+ *   Part of aggregate "frame" errors in `/proc/net/dev`.
+ *
+ *   For Ethernet devices this counter should be equivalent to a sum
+ *   of the following attributes:
+ *
+ *    - 30.3.1.1.23 aInRangeLengthErrors
+ *    - 30.3.1.1.24 aOutOfRangeLengthField
+ *    - 30.3.1.1.25 aFrameTooLongErrors
+ *
+ * @rx_over_errors: Receiver FIFO overflow event counter.
+ *
+ *   Historically the count of overflow events. Such events may be
+ *   reported in the receive descriptors or via interrupts, and may
+ *   not correspond one-to-one with dropped packets.
+ *
+ *   The recommended interpretation for high speed interfaces is -
+ *   number of packets dropped because they did not fit into buffers
+ *   provided by the host, e.g. packets larger than MTU or next buffer
+ *   in the ring was not available for a scatter transfer.
+ *
+ *   Part of aggregate "frame" errors in `/proc/net/dev`.
+ *
+ *   This statistics was historically used interchangeably with
+ *   @rx_fifo_errors.
+ *
+ *   This statistic corresponds to hardware events and is not commonly used
+ *   on software devices.
+ *
+ * @rx_crc_errors: Number of packets received with a CRC error.
+ *   Part of aggregate "frame" errors in `/proc/net/dev`.
+ *
+ *   For Ethernet devices this counter must be equivalent to:
+ *
+ *    - 30.3.1.1.6 aFrameCheckSequenceErrors
+ *
+ * @rx_frame_errors: Receiver frame alignment errors.
+ *   Part of aggregate "frame" errors in `/proc/net/dev`.
+ *
+ *   For Ethernet devices this counter should be equivalent to:
+ *
+ *    - 30.3.1.1.7 aAlignmentErrors
+ *
+ * @rx_fifo_errors: Receiver FIFO error counter.
+ *
+ *   Historically the count of overflow events. Those events may be
+ *   reported in the receive descriptors or via interrupts, and may
+ *   not correspond one-to-one with dropped packets.
+ *
+ *   This statistics was used interchangeably with @rx_over_errors.
+ *   Not recommended for use in drivers for high speed interfaces.
+ *
+ *   This statistic is used on software devices, e.g. to count software
+ *   packet queue overflow (can) or sequencing errors (GRE).
+ *
+ * @rx_missed_errors: Count of packets missed by the host.
+ *   Folded into the "drop" counter in `/proc/net/dev`.
+ *
+ *   Counts number of packets dropped by the device due to lack
+ *   of buffer space. This usually indicates that the host interface
+ *   is slower than the network interface, or host is not keeping up
+ *   with the receive packet rate.
+ *
+ *   This statistic corresponds to hardware events and is not used
+ *   on software devices.
+ *
+ * @tx_aborted_errors:
+ *   Part of aggregate "carrier" errors in `/proc/net/dev`.
+ *   For Ethernet devices capable of half-duplex operation this counter
+ *   must be equivalent to:
+ *
+ *    - 30.3.1.1.11 aFramesAbortedDueToXSColls
+ *
+ *   High speed interfaces may use this counter as a general device
+ *   discard counter.
+ *
+ * @tx_carrier_errors: Number of frame transmission errors due to loss
+ *   of carrier during transmission.
+ *   Part of aggregate "carrier" errors in `/proc/net/dev`.
+ *
+ *   For Ethernet devices this counter must be equivalent to:
+ *
+ *    - 30.3.1.1.13 aCarrierSenseErrors
+ *
+ * @tx_fifo_errors: Number of frame transmission errors due to device
+ *   FIFO underrun / underflow. This condition occurs when the device
+ *   begins transmission of a frame but is unable to deliver the
+ *   entire frame to the transmitter in time for transmission.
+ *   Part of aggregate "carrier" errors in `/proc/net/dev`.
+ *
+ * @tx_heartbeat_errors: Number of Heartbeat / SQE Test errors for
+ *   old half-duplex Ethernet.
+ *   Part of aggregate "carrier" errors in `/proc/net/dev`.
+ *
+ *   Possibly equivalent to:
+ *
+ *    - 30.3.2.1.4 aSQETestErrors
+ *
+ * @tx_window_errors: Number of frame transmission errors due
+ *   to late collisions (for Ethernet - after the first 64B of the packet).
+ *   Part of aggregate "carrier" errors in `/proc/net/dev`.
+ *
+ *   For Ethernet devices this counter must be equivalent to:
+ *
+ *    - 30.3.1.1.10 aLateCollisions
+ *
+ * @rx_compressed: Number of correctly received compressed packets.
+ *   This counters is only meaningful for interfaces which support
+ *   packet compression (e.g. CSLIP, PPP).
+ *
+ * @tx_compressed: Number of transmitted compressed packets.
+ *   This counters is only meaningful for interfaces which support
+ *   packet compression (e.g. CSLIP, PPP).
+ *
+ * @rx_nohandler: Number of packets received on the interface
+ *   but dropped by the networking stack because the device is
+ *   not designated to receive packets (e.g. backup link in a bond).
+ */
 struct rtnl_link_stats64 {
-	__u64	rx_packets;		/* total packets received	*/
-	__u64	tx_packets;		/* total packets transmitted	*/
-	__u64	rx_bytes;		/* total bytes received 	*/
-	__u64	tx_bytes;		/* total bytes transmitted	*/
-	__u64	rx_errors;		/* bad packets received		*/
-	__u64	tx_errors;		/* packet transmit problems	*/
-	__u64	rx_dropped;		/* no space in linux buffers	*/
-	__u64	tx_dropped;		/* no space available in linux	*/
-	__u64	multicast;		/* multicast packets received	*/
+	__u64	rx_packets;
+	__u64	tx_packets;
+	__u64	rx_bytes;
+	__u64	tx_bytes;
+	__u64	rx_errors;
+	__u64	tx_errors;
+	__u64	rx_dropped;
+	__u64	tx_dropped;
+	__u64	multicast;
 	__u64	collisions;
 
 	/* detailed rx_errors: */
 	__u64	rx_length_errors;
-	__u64	rx_over_errors;		/* receiver ring buff overflow	*/
-	__u64	rx_crc_errors;		/* recved pkt with crc error	*/
-	__u64	rx_frame_errors;	/* recv'd frame alignment error */
-	__u64	rx_fifo_errors;		/* recv'r fifo overrun		*/
-	__u64	rx_missed_errors;	/* receiver missed packet	*/
+	__u64	rx_over_errors;
+	__u64	rx_crc_errors;
+	__u64	rx_frame_errors;
+	__u64	rx_fifo_errors;
+	__u64	rx_missed_errors;
 
 	/* detailed tx_errors */
 	__u64	tx_aborted_errors;
@@ -71,8 +236,7 @@  struct rtnl_link_stats64 {
 	/* for cslip etc */
 	__u64	rx_compressed;
 	__u64	tx_compressed;
-
-	__u64	rx_nohandler;		/* dropped, no handler found	*/
+	__u64	rx_nohandler;
 };
 
 /* The struct should be in sync with struct ifmap */