Message ID | 20250106135606.9704-1-yoong.siang.song@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | [bpf-next,v4,1/4] xsk: Add launch time hardware offload support to XDP Tx metadata | expand |
On Wednesday, January 8, 2025 12:50 AM, Stanislav Fomichev <stfomichev@gmail.com> wrote: >On 01/06, Song Yoong Siang wrote: >> Extend the XDP Tx metadata framework so that user can requests launch time >> hardware offload, where the Ethernet device will schedule the packet for >> transmission at a pre-determined time called launch time. The value of >> launch time is communicated from user space to Ethernet driver via >> launch_time field of struct xsk_tx_metadata. >> >> Suggested-by: Stanislav Fomichev <sdf@google.com> Hi Stanislav Fomichev, Thanks for your review comments. I notice that you have two emails: sdf@google.com & stfomichev@gmail.com Which one I should use in the suggested-by tag? >> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> >> --- >> Documentation/netlink/specs/netdev.yaml | 4 ++ >> Documentation/networking/xsk-tx-metadata.rst | 64 ++++++++++++++++++++ >> include/net/xdp_sock.h | 10 +++ >> include/net/xdp_sock_drv.h | 1 + >> include/uapi/linux/if_xdp.h | 10 +++ >> include/uapi/linux/netdev.h | 3 + >> net/core/netdev-genl.c | 2 + >> net/xdp/xsk.c | 3 + >> tools/include/uapi/linux/if_xdp.h | 10 +++ >> tools/include/uapi/linux/netdev.h | 3 + >> 10 files changed, 110 insertions(+) >> >> diff --git a/Documentation/netlink/specs/netdev.yaml >b/Documentation/netlink/specs/netdev.yaml >> index cbb544bd6c84..e59c8a14f7d1 100644 >> --- a/Documentation/netlink/specs/netdev.yaml >> +++ b/Documentation/netlink/specs/netdev.yaml >> @@ -70,6 +70,10 @@ definitions: >> name: tx-checksum >> doc: >> L3 checksum HW offload is supported by the driver. >> + - >> + name: tx-launch-time >> + doc: >> + Launch time HW offload is supported by the driver. >> - >> name: queue-type >> type: enum >> diff --git a/Documentation/networking/xsk-tx-metadata.rst >b/Documentation/networking/xsk-tx-metadata.rst >> index e76b0cfc32f7..3cec089747ce 100644 >> --- a/Documentation/networking/xsk-tx-metadata.rst >> +++ b/Documentation/networking/xsk-tx-metadata.rst >> @@ -50,6 +50,10 @@ The flags field enables the particular offload: >> checksum. ``csum_start`` specifies byte offset of where the checksumming >> should start and ``csum_offset`` specifies byte offset where the >> device should store the computed checksum. >> +- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the >> + packet for transmission at a pre-determined time called launch time. The >> + value of launch time is indicated by ``launch_time`` field of >> + ``union xsk_tx_metadata``. >> >> Besides the flags above, in order to trigger the offloads, the first >> packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` >> @@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX >checksum >> is calculated on the CPU. Do not enable this option in production because >> it will negatively affect performance. >> >> +Launch Time >> +=========== >> + >> +The value of the requested launch time should be based on the device's PTP >> +Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path >> +compared to the ETF queuing discipline, which organizes packets and delays >> +their transmission. Instead, AF_XDP immediately hands off the packets to >> +the device driver without rearranging their order or holding them prior to >> +transmission. In scenarios where the launch time offload feature is >> +disabled, the device driver is expected to disregard the launch time >> +request. For correct interpretation and meaningful operation, the launch >> +time should never be set to a value larger than the farthest programmable >> +time in the future (the horizon). Different devices have different hardware >> +limitations on the launch time offload feature. >> + >> +stmmac driver >> +------------- >> + >> +For stmmac, TSO and launch time (TBS) features are mutually exclusive for >> +each individual Tx Queue. By default, the driver configures Tx Queue 0 to >> +support TSO and the rest of the Tx Queues to support TBS. The launch time >> +hardware offload feature can be enabled or disabled by using the tc-etf >> +command to call the driver's ndo_setup_tc() callback. >> + >> +The value of the launch time that is programmed in the Enhanced Normal >> +Transmit Descriptors is a 32-bit value, where the most significant 8 bits >> +represent the time in seconds and the remaining 24 bits represent the time >> +in 256 ns increments. The programmed launch time is compared against the >> +PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the >> +horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the >> +future. >> + >> +The stmmac driver maintains FIFO behavior and does not perform packet >> +reordering. This means that a packet with a launch time request will block >> +other packets in the same Tx Queue until it is transmitted. >> + >> +igc driver >> +---------- >> + >> +For igc, all four Tx Queues support the launch time feature. The launch >> +time hardware offload feature can be enabled or disabled by using the >> +tc-etf command to call the driver's ndo_setup_tc() callback. When entering >> +TSN mode, the igc driver will reset the device and create a default Qbv >> +schedule with a 1-second cycle time, with all Tx Queues open at all times. >> + >> +The value of the launch time that is programmed in the Advanced Transmit >> +Context Descriptor is a relative offset to the starting time of the Qbv >> +transmission window of the queue. The Frst flag of the descriptor can be >> +set to schedule the packet for the next Qbv cycle. Therefore, the horizon >> +of the launch time for i225 and i226 is the ending time of the next cycle >> +of the Qbv transmission window of the queue. For example, when the Qbv >> +cycle time is set to 1 second, the horizon of the launch time ranges >> +from 1 second to 2 seconds, depending on where the Qbv cycle is currently >> +running. >> + >> +The igc driver maintains FIFO behavior and does not perform packet >> +reordering. This means that a packet with a launch time request will block >> +other packets in the same Tx Queue until it is transmitted. > >Since two devices we initially support are using FIFO mode, should we more >explicitly target this case? Maybe even call netdev features >tx-launch-time-fifo? In the future, if/when we get support timing-wheel-like >queues, we can export another tx-launch-time-wheel? > >It seems important for the userspace to know which mode it's running. >In a fifo mode, it might make sense to allocate separate queues >for scheduling things far into the future/etc. You are right, user should isolate one queue for scheduling things far into future and use other queue for normal traffic. > >Thoughts? No code changes required, just more explicitly state the >expectations. Agree with you, let me change the name from tx-launch-time to tx-launch-time-fifo to explicitly state the fifo behavior. Thanks & Regards Siang
On 01/09, Song, Yoong Siang wrote: > On Wednesday, January 8, 2025 12:50 AM, Stanislav Fomichev <stfomichev@gmail.com> wrote: > >On 01/06, Song Yoong Siang wrote: > >> Extend the XDP Tx metadata framework so that user can requests launch time > >> hardware offload, where the Ethernet device will schedule the packet for > >> transmission at a pre-determined time called launch time. The value of > >> launch time is communicated from user space to Ethernet driver via > >> launch_time field of struct xsk_tx_metadata. > >> > >> Suggested-by: Stanislav Fomichev <sdf@google.com> > > Hi Stanislav Fomichev, > > Thanks for your review comments. > I notice that you have two emails: > sdf@google.com & stfomichev@gmail.com > > Which one I should use in the suggested-by tag? google.com should be bouncing now. sdf@fomichev.me is preferred.
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index cbb544bd6c84..e59c8a14f7d1 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -70,6 +70,10 @@ definitions: name: tx-checksum doc: L3 checksum HW offload is supported by the driver. + - + name: tx-launch-time + doc: + Launch time HW offload is supported by the driver. - name: queue-type type: enum diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst index e76b0cfc32f7..3cec089747ce 100644 --- a/Documentation/networking/xsk-tx-metadata.rst +++ b/Documentation/networking/xsk-tx-metadata.rst @@ -50,6 +50,10 @@ The flags field enables the particular offload: checksum. ``csum_start`` specifies byte offset of where the checksumming should start and ``csum_offset`` specifies byte offset where the device should store the computed checksum. +- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the + packet for transmission at a pre-determined time called launch time. The + value of launch time is indicated by ``launch_time`` field of + ``union xsk_tx_metadata``. Besides the flags above, in order to trigger the offloads, the first packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` @@ -65,6 +69,65 @@ In this case, when running in ``XDK_COPY`` mode, the TX checksum is calculated on the CPU. Do not enable this option in production because it will negatively affect performance. +Launch Time +=========== + +The value of the requested launch time should be based on the device's PTP +Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path +compared to the ETF queuing discipline, which organizes packets and delays +their transmission. Instead, AF_XDP immediately hands off the packets to +the device driver without rearranging their order or holding them prior to +transmission. In scenarios where the launch time offload feature is +disabled, the device driver is expected to disregard the launch time +request. For correct interpretation and meaningful operation, the launch +time should never be set to a value larger than the farthest programmable +time in the future (the horizon). Different devices have different hardware +limitations on the launch time offload feature. + +stmmac driver +------------- + +For stmmac, TSO and launch time (TBS) features are mutually exclusive for +each individual Tx Queue. By default, the driver configures Tx Queue 0 to +support TSO and the rest of the Tx Queues to support TBS. The launch time +hardware offload feature can be enabled or disabled by using the tc-etf +command to call the driver's ndo_setup_tc() callback. + +The value of the launch time that is programmed in the Enhanced Normal +Transmit Descriptors is a 32-bit value, where the most significant 8 bits +represent the time in seconds and the remaining 24 bits represent the time +in 256 ns increments. The programmed launch time is compared against the +PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the +horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the +future. + +The stmmac driver maintains FIFO behavior and does not perform packet +reordering. This means that a packet with a launch time request will block +other packets in the same Tx Queue until it is transmitted. + +igc driver +---------- + +For igc, all four Tx Queues support the launch time feature. The launch +time hardware offload feature can be enabled or disabled by using the +tc-etf command to call the driver's ndo_setup_tc() callback. When entering +TSN mode, the igc driver will reset the device and create a default Qbv +schedule with a 1-second cycle time, with all Tx Queues open at all times. + +The value of the launch time that is programmed in the Advanced Transmit +Context Descriptor is a relative offset to the starting time of the Qbv +transmission window of the queue. The Frst flag of the descriptor can be +set to schedule the packet for the next Qbv cycle. Therefore, the horizon +of the launch time for i225 and i226 is the ending time of the next cycle +of the Qbv transmission window of the queue. For example, when the Qbv +cycle time is set to 1 second, the horizon of the launch time ranges +from 1 second to 2 seconds, depending on where the Qbv cycle is currently +running. + +The igc driver maintains FIFO behavior and does not perform packet +reordering. This means that a packet with a launch time request will block +other packets in the same Tx Queue until it is transmitted. + Querying Device Capabilities ============================ @@ -74,6 +137,7 @@ Refer to ``xsk-flags`` features bitmask in - ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP`` - ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM`` +- ``tx-launch-time``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME`` See ``tools/net/ynl/samples/netdev.c`` on how to query this information. diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index bfe625b55d55..a58ae7589d12 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -110,11 +110,16 @@ struct xdp_sock { * indicates position where checksumming should start. * csum_offset indicates position where checksum should be stored. * + * void (*tmo_request_launch_time)(u64 launch_time, void *priv) + * Called when AF_XDP frame requested launch time HW offload support. + * launch_time indicates the PTP time at which the device can schedule the + * packet for transmission. */ struct xsk_tx_metadata_ops { void (*tmo_request_timestamp)(void *priv); u64 (*tmo_fill_timestamp)(void *priv); void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv); + void (*tmo_request_launch_time)(u64 launch_time, void *priv); }; #ifdef CONFIG_XDP_SOCKETS @@ -162,6 +167,11 @@ static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta, if (!meta) return; + if (ops->tmo_request_launch_time) + if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME) + ops->tmo_request_launch_time(meta->request.launch_time, + priv); + if (ops->tmo_request_timestamp) if (meta->flags & XDP_TXMD_FLAGS_TIMESTAMP) ops->tmo_request_timestamp(priv); diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 40085afd9160..78af371bc002 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -198,6 +198,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) #define XDP_TXMD_FLAGS_VALID ( \ XDP_TXMD_FLAGS_TIMESTAMP | \ XDP_TXMD_FLAGS_CHECKSUM | \ + XDP_TXMD_FLAGS_LAUNCH_TIME | \ 0) static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 42ec5ddaab8d..42869770776e 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -127,6 +127,12 @@ struct xdp_options { */ #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1) +/* Request launch time hardware offload. The device will schedule the packet for + * transmission at a pre-determined time called launch time. The value of + * launch time is communicated via launch_time field of struct xsk_tx_metadata. + */ +#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2) + /* AF_XDP offloads request. 'request' union member is consumed by the driver * when the packet is being transmitted. 'completion' union member is * filled by the driver when the transmit completion arrives. @@ -142,6 +148,10 @@ struct xsk_tx_metadata { __u16 csum_start; /* Offset from csum_start where checksum should be stored. */ __u16 csum_offset; + + /* XDP_TXMD_FLAGS_LAUNCH_TIME */ + /* Launch time in nanosecond against the PTP HW Clock */ + __u64 launch_time; } request; struct { diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index e4be227d3ad6..5ab85f4af009 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata { * by the driver. * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the * driver. + * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the + * driver. */ enum netdev_xsk_flags { NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + NETDEV_XSK_FLAGS_LAUNCH_TIME = 4, }; enum netdev_queue_type { diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 9527dd46e4dc..e2515cf9190f 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -52,6 +52,8 @@ XDP_METADATA_KFUNC_xxx xsk_features |= NETDEV_XSK_FLAGS_TX_TIMESTAMP; if (netdev->xsk_tx_metadata_ops->tmo_request_checksum) xsk_features |= NETDEV_XSK_FLAGS_TX_CHECKSUM; + if (netdev->xsk_tx_metadata_ops->tmo_request_launch_time) + xsk_features |= NETDEV_XSK_FLAGS_LAUNCH_TIME; } if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3fa70286c846..8feaa0e86f07 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -743,6 +743,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, goto free_err; } } + + if (meta->flags & XDP_TXMD_FLAGS_LAUNCH_TIME) + skb->skb_mstamp_ns = meta->request.launch_time; } } diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 2f082b01ff22..67719f8966c2 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -127,6 +127,12 @@ struct xdp_options { */ #define XDP_TXMD_FLAGS_CHECKSUM (1 << 1) +/* Request launch time hardware offload. The device will schedule the packet for + * transmission at a pre-determined time called launch time. The value of + * launch time is communicated via launch_time field of struct xsk_tx_metadata. + */ +#define XDP_TXMD_FLAGS_LAUNCH_TIME (1 << 2) + /* AF_XDP offloads request. 'request' union member is consumed by the driver * when the packet is being transmitted. 'completion' union member is * filled by the driver when the transmit completion arrives. @@ -142,6 +148,10 @@ struct xsk_tx_metadata { __u16 csum_start; /* Offset from csum_start where checksum should be stored. */ __u16 csum_offset; + + /* XDP_TXMD_FLAGS_LAUNCH_TIME */ + /* Launch time in nanosecond against the PTP HW Clock */ + __u64 launch_time; } request; struct { diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index e4be227d3ad6..5ab85f4af009 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -59,10 +59,13 @@ enum netdev_xdp_rx_metadata { * by the driver. * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the * driver. + * @NETDEV_XSK_FLAGS_LAUNCH_TIME: Launch Time HW offload is supported by the + * driver. */ enum netdev_xsk_flags { NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + NETDEV_XSK_FLAGS_LAUNCH_TIME = 4, }; enum netdev_queue_type {
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> --- Documentation/netlink/specs/netdev.yaml | 4 ++ Documentation/networking/xsk-tx-metadata.rst | 64 ++++++++++++++++++++ include/net/xdp_sock.h | 10 +++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 +++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 +++ tools/include/uapi/linux/netdev.h | 3 + 10 files changed, 110 insertions(+)