Message ID | 20210811213749.3276687-1-kuba@kernel.org |
---|---|
Headers | show |
Series | bnxt: Tx NAPI disabling resiliency improvements | expand |
On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote: > On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > --- > > v2: - netdev_warn() -> netif_warn() [Edwin] > > - use correct prod value [Michael] > > --- > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 +++++++++++++++-------- > > drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + > > 2 files changed, 25 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > > index 52f5c8405e76..79bbd6ec7ef7 100644 > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > > @@ -72,7 +72,8 @@ > > #include "bnxt_debugfs.h" > > > > #define BNXT_TX_TIMEOUT (5 * HZ) > > -#define BNXT_DEF_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_HW) > > +#define BNXT_DEF_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_HW | \ > > + NETIF_MSG_TX_ERR) > > > > MODULE_LICENSE("GPL"); > > MODULE_DESCRIPTION("Broadcom BCM573xx network driver"); > > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb) > > return md_dst->u.port_info.port_id; > > } > > > > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr, > > + u16 prod) > > +{ > > + bnxt_db_write(bp, &txr->tx_db, prod); > > + txr->kick_pending = 0; > > +} > > + > > static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > { > > struct bnxt *bp = netdev_priv(dev); > > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > free_size = bnxt_tx_avail(bp, txr); > > if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) { > > netif_tx_stop_queue(txq); > > + if (net_ratelimit() && txr->kick_pending) > > + netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n"); > > You forgot to remove this. I changed my mind. I added the && txr->kick_pending to the condition, if there is a race and napi starts the queue unnecessarily the kick can't be pending. > > return NETDEV_TX_BUSY; > > } > > > > @@ -516,21 +526,16 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > normal_tx: > > if (length < BNXT_MIN_PKT_SIZE) { > > pad = BNXT_MIN_PKT_SIZE - length; > > - if (skb_pad(skb, pad)) { > > + if (skb_pad(skb, pad)) > > /* SKB already freed. */ > > - tx_buf->skb = NULL; > > - return NETDEV_TX_OK; > > - } > > + goto tx_kick_pending; > > length = BNXT_MIN_PKT_SIZE; > > } > > > > mapping = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE); > > > > - if (unlikely(dma_mapping_error(&pdev->dev, mapping))) { > > - dev_kfree_skb_any(skb); > > - tx_buf->skb = NULL; > > - return NETDEV_TX_OK; > > - } > > + if (unlikely(dma_mapping_error(&pdev->dev, mapping))) > > + goto tx_free; > > > > dma_unmap_addr_set(tx_buf, mapping, mapping); > > flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD | > > @@ -617,13 +622,15 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > txr->tx_prod = prod; > > > > if (!netdev_xmit_more() || netif_xmit_stopped(txq)) > > - bnxt_db_write(bp, &txr->tx_db, prod); > > + bnxt_txr_db_kick(bp, txr, prod); > > + else > > + txr->kick_pending = 1; > > > > tx_done: > > > > if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) { > > if (netdev_xmit_more() && !tx_buf->is_push) > > - bnxt_db_write(bp, &txr->tx_db, prod); > > + bnxt_txr_db_kick(bp, txr, prod); > > > > netif_tx_stop_queue(txq); > > > > @@ -661,7 +668,12 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > PCI_DMA_TODEVICE); > > } > > > > +tx_free: > > dev_kfree_skb_any(skb); > > +tx_kick_pending: > > + tx_buf->skb = NULL; > > I think we should remove the setting of tx_buf->skb to NULL in the > tx_dma_error path since we are setting it here now. But tx_buf gets moved IIRC - if we hit tx_dma_error tx_buf will be one of the fragment bufs at this point. It should be legal to clear the skb pointer on those AFAICT. Are you suggesting to do something along the lines of: txr->tx_buf_ring[txr->tx_prod].skb = NULL; ? > > + if (txr->kick_pending) > > + bnxt_txr_db_kick(bp, txr, txr->tx_prod); > > return NETDEV_TX_OK; > > } > >
On Wed, Aug 11, 2021 at 3:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote: > > On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb) > > > return md_dst->u.port_info.port_id; > > > } > > > > > > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr, > > > + u16 prod) > > > +{ > > > + bnxt_db_write(bp, &txr->tx_db, prod); > > > + txr->kick_pending = 0; > > > +} > > > + > > > static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > { > > > struct bnxt *bp = netdev_priv(dev); > > > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > free_size = bnxt_tx_avail(bp, txr); > > > if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) { > > > netif_tx_stop_queue(txq); > > > + if (net_ratelimit() && txr->kick_pending) > > > + netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n"); > > > > You forgot to remove this. > > I changed my mind. I added the && txr->kick_pending to the condition, > if there is a race and napi starts the queue unnecessarily the kick > can't be pending. I don't understand. The queue should be stopped if we have <= MAX_SKB_FRAGS + 1 descriptors left. If there is a race and the queue is awake, the first TX packet may slip through if skb_shinfo(skb)->nr_frags is small and we have enough descriptors for it. Let's say xmit_more is set for this packet and so kick is pending. The next packet may not fit anymore and it will hit this check here. > > > > return NETDEV_TX_BUSY; > > > } > > > > > > @@ -516,21 +526,16 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > normal_tx: > > > if (length < BNXT_MIN_PKT_SIZE) { > > > pad = BNXT_MIN_PKT_SIZE - length; > > > - if (skb_pad(skb, pad)) { > > > + if (skb_pad(skb, pad)) > > > /* SKB already freed. */ > > > - tx_buf->skb = NULL; > > > - return NETDEV_TX_OK; > > > - } > > > + goto tx_kick_pending; > > > length = BNXT_MIN_PKT_SIZE; > > > } > > > > > > mapping = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE); > > > > > > - if (unlikely(dma_mapping_error(&pdev->dev, mapping))) { > > > - dev_kfree_skb_any(skb); > > > - tx_buf->skb = NULL; > > > - return NETDEV_TX_OK; > > > - } > > > + if (unlikely(dma_mapping_error(&pdev->dev, mapping))) > > > + goto tx_free; > > > > > > dma_unmap_addr_set(tx_buf, mapping, mapping); > > > flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD | > > > @@ -617,13 +622,15 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > txr->tx_prod = prod; > > > > > > if (!netdev_xmit_more() || netif_xmit_stopped(txq)) > > > - bnxt_db_write(bp, &txr->tx_db, prod); > > > + bnxt_txr_db_kick(bp, txr, prod); > > > + else > > > + txr->kick_pending = 1; > > > > > > tx_done: > > > > > > if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) { > > > if (netdev_xmit_more() && !tx_buf->is_push) > > > - bnxt_db_write(bp, &txr->tx_db, prod); > > > + bnxt_txr_db_kick(bp, txr, prod); > > > > > > netif_tx_stop_queue(txq); > > > > > > @@ -661,7 +668,12 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > PCI_DMA_TODEVICE); > > > } > > > > > > +tx_free: > > > dev_kfree_skb_any(skb); > > > +tx_kick_pending: > > > + tx_buf->skb = NULL; > > > > I think we should remove the setting of tx_buf->skb to NULL in the > > tx_dma_error path since we are setting it here now. > > But tx_buf gets moved IIRC - if we hit tx_dma_error tx_buf will be one > of the fragment bufs at this point. It should be legal to clear the skb > pointer on those AFAICT. Ah, you're right. > > Are you suggesting to do something along the lines of: > > txr->tx_buf_ring[txr->tx_prod].skb = NULL; Yeah, I like this the best. > > ? > > > > + if (txr->kick_pending) > > > + bnxt_txr_db_kick(bp, txr, txr->tx_prod); > > > return NETDEV_TX_OK; > > > } > > > >
On Wed, Aug 11, 2021 at 4:17 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 11 Aug 2021 16:00:52 -0700 Michael Chan wrote: > > On Wed, Aug 11, 2021 at 3:44 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote: > > > > On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb) > > > > > return md_dst->u.port_info.port_id; > > > > > } > > > > > > > > > > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr, > > > > > + u16 prod) > > > > > +{ > > > > > + bnxt_db_write(bp, &txr->tx_db, prod); > > > > > + txr->kick_pending = 0; > > > > > +} > > > > > + > > > > > static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > { > > > > > struct bnxt *bp = netdev_priv(dev); > > > > > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > > > > > free_size = bnxt_tx_avail(bp, txr); > > > > > if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) { > > > > > netif_tx_stop_queue(txq); > > > > > + if (net_ratelimit() && txr->kick_pending) > > > > > + netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n"); > > > > > > > > You forgot to remove this. > > > > > > I changed my mind. I added the && txr->kick_pending to the condition, > > > if there is a race and napi starts the queue unnecessarily the kick > > > can't be pending. > > > > I don't understand. The queue should be stopped if we have <= > > MAX_SKB_FRAGS + 1 descriptors left. If there is a race and the queue > > is awake, the first TX packet may slip through if > > skb_shinfo(skb)->nr_frags is small and we have enough descriptors for > > it. Let's say xmit_more is set for this packet and so kick is > > pending. The next packet may not fit anymore and it will hit this > > check here. > > But even if we slip past this check we can only do it once, the check > at the end of start_xmit() will see we have fewer slots than MAX_FRAGS > + 2, ring the doorbell and stop. Yeah, I think you're right.
On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) > free_size = bnxt_tx_avail(bp, txr); > if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) { > netif_tx_stop_queue(txq); > + if (net_ratelimit() && txr->kick_pending) > + netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n"); I think there is one more problem here. Now that it is possible to get here, we can race with bnxt_tx_int() again here. We can call netif_tx_stop_queue() here after bnxt_tx_int() has already cleaned the entire TX ring. So I think we need to call bnxt_tx_wake_queue() here again if descriptors have become available. > return NETDEV_TX_BUSY; > } >
From: Jakub Kicinski > Sent: 11 August 2021 22:38 > > skbs are freed on error and not put on the ring. We may, however, > be in a situation where we're freeing the last skb of a batch, > and there is a doorbell ring pending because of xmit_more() being > true earlier. Make sure we ring the door bell in such situations. > > Since errors are rare don't pay attention to xmit_more() and just > always flush the pending frames. > ... > +tx_free: > dev_kfree_skb_any(skb); > +tx_kick_pending: > + tx_buf->skb = NULL; > + if (txr->kick_pending) > + bnxt_txr_db_kick(bp, txr, txr->tx_prod); > return NETDEV_TX_OK; Is this case actually so unlikely that the 'kick' can be done unconditionally? Then all the conditionals can be removed from the hot path. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)