[net,v2,0/4] bnxt: Tx NAPI disabling resiliency improvements

Message ID	20210811213749.3276687-1-kuba@kernel.org
Headers	show Return-Path: <netdev-owner@kernel.org> From: Jakub Kicinski <kuba@kernel.org> To: davem@davemloft.net Cc: michael.chan@broadcom.com, huangjw@broadcom.com, eddie.wai@broadcom.com, prashant@broadcom.com, gospo@broadcom.com, netdev@vger.kernel.org, edwin.peer@broadcom.com, Jakub Kicinski <kuba@kernel.org> Subject: [PATCH net v2 0/4] bnxt: Tx NAPI disabling resiliency improvements Date: Wed, 11 Aug 2021 14:37:45 -0700 Message-Id: <20210811213749.3276687-1-kuba@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bnxt: Tx NAPI disabling resiliency improvements \| expand [net,v2,0/4] bnxt: Tx NAPI disabling resiliency improvements [net,v2,1/4] bnxt: don't lock the tx queue from napi poll [net,v2,4/4] bnxt: count Tx drops

Jakub Kicinski Aug. 11, 2021, 9:37 p.m. UTC

A lockdep warning was triggered by netpoll because napi poll
was taking the xmit lock. Fix that and a couple more issues
noticed while reading the code.

Jakub Kicinski (4):
  bnxt: don't lock the tx queue from napi poll
  bnxt: disable napi before canceling DIM
  bnxt: make sure xmit_more + errors does not miss doorbells
  bnxt: count Tx drops

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 60 ++++++++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 37 insertions(+), 24 deletions(-)

Jakub Kicinski Aug. 11, 2021, 10:44 p.m. UTC | #1

On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote:
> On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > ---
> > v2: - netdev_warn() -> netif_warn() [Edwin]
> >     - use correct prod value [Michael]
> > ---
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 +++++++++++++++--------
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
> >  2 files changed, 25 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 52f5c8405e76..79bbd6ec7ef7 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -72,7 +72,8 @@
> >  #include "bnxt_debugfs.h"
> >
> >  #define BNXT_TX_TIMEOUT                (5 * HZ)
> > -#define BNXT_DEF_MSG_ENABLE    (NETIF_MSG_DRV | NETIF_MSG_HW)
> > +#define BNXT_DEF_MSG_ENABLE    (NETIF_MSG_DRV | NETIF_MSG_HW | \
> > +                                NETIF_MSG_TX_ERR)
> >
> >  MODULE_LICENSE("GPL");
> >  MODULE_DESCRIPTION("Broadcom BCM573xx network driver");
> > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
> >         return md_dst->u.port_info.port_id;
> >  }
> >
> > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> > +                            u16 prod)
> > +{
> > +       bnxt_db_write(bp, &txr->tx_db, prod);
> > +       txr->kick_pending = 0;
> > +}
> > +
> >  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  {
> >         struct bnxt *bp = netdev_priv(dev);
> > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >         free_size = bnxt_tx_avail(bp, txr);
> >         if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {
> >                 netif_tx_stop_queue(txq);
> > +               if (net_ratelimit() && txr->kick_pending)
> > +                       netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n");  
> 
> You forgot to remove this.

I changed my mind. I added the && txr->kick_pending to the condition,
if there is a race and napi starts the queue unnecessarily the kick
can't be pending.

> >                 return NETDEV_TX_BUSY;
> >         }
> >
> > @@ -516,21 +526,16 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  normal_tx:
> >         if (length < BNXT_MIN_PKT_SIZE) {
> >                 pad = BNXT_MIN_PKT_SIZE - length;
> > -               if (skb_pad(skb, pad)) {
> > +               if (skb_pad(skb, pad))
> >                         /* SKB already freed. */
> > -                       tx_buf->skb = NULL;
> > -                       return NETDEV_TX_OK;
> > -               }
> > +                       goto tx_kick_pending;
> >                 length = BNXT_MIN_PKT_SIZE;
> >         }
> >
> >         mapping = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE);
> >
> > -       if (unlikely(dma_mapping_error(&pdev->dev, mapping))) {
> > -               dev_kfree_skb_any(skb);
> > -               tx_buf->skb = NULL;
> > -               return NETDEV_TX_OK;
> > -       }
> > +       if (unlikely(dma_mapping_error(&pdev->dev, mapping)))
> > +               goto tx_free;
> >
> >         dma_unmap_addr_set(tx_buf, mapping, mapping);
> >         flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD |
> > @@ -617,13 +622,15 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >         txr->tx_prod = prod;
> >
> >         if (!netdev_xmit_more() || netif_xmit_stopped(txq))
> > -               bnxt_db_write(bp, &txr->tx_db, prod);
> > +               bnxt_txr_db_kick(bp, txr, prod);
> > +       else
> > +               txr->kick_pending = 1;
> >
> >  tx_done:
> >
> >         if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) {
> >                 if (netdev_xmit_more() && !tx_buf->is_push)
> > -                       bnxt_db_write(bp, &txr->tx_db, prod);
> > +                       bnxt_txr_db_kick(bp, txr, prod);
> >
> >                 netif_tx_stop_queue(txq);
> >
> > @@ -661,7 +668,12 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >                                PCI_DMA_TODEVICE);
> >         }
> >
> > +tx_free:
> >         dev_kfree_skb_any(skb);
> > +tx_kick_pending:
> > +       tx_buf->skb = NULL;  
> 
> I think we should remove the setting of tx_buf->skb to NULL in the
> tx_dma_error path since we are setting it here now.

But tx_buf gets moved IIRC - if we hit tx_dma_error tx_buf will be one
of the fragment bufs at this point. It should be legal to clear the skb
pointer on those AFAICT.

Are you suggesting to do something along the lines of:

	txr->tx_buf_ring[txr->tx_prod].skb = NULL;

?

> > +       if (txr->kick_pending)
> > +               bnxt_txr_db_kick(bp, txr, txr->tx_prod);
> >         return NETDEV_TX_OK;
> >  }
> >

Michael Chan Aug. 11, 2021, 11 p.m. UTC | #2

On Wed, Aug 11, 2021 at 3:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote:
> > On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
> > >         return md_dst->u.port_info.port_id;
> > >  }
> > >
> > > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> > > +                            u16 prod)
> > > +{
> > > +       bnxt_db_write(bp, &txr->tx_db, prod);
> > > +       txr->kick_pending = 0;
> > > +}
> > > +
> > >  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >  {
> > >         struct bnxt *bp = netdev_priv(dev);
> > > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >         free_size = bnxt_tx_avail(bp, txr);
> > >         if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {
> > >                 netif_tx_stop_queue(txq);
> > > +               if (net_ratelimit() && txr->kick_pending)
> > > +                       netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n");
> >
> > You forgot to remove this.
>
> I changed my mind. I added the && txr->kick_pending to the condition,
> if there is a race and napi starts the queue unnecessarily the kick
> can't be pending.

I don't understand.  The queue should be stopped if we have <=
MAX_SKB_FRAGS + 1 descriptors left.  If there is a race and the queue
is awake, the first TX packet may slip through if
skb_shinfo(skb)->nr_frags is small and we have enough descriptors for
it.  Let's say xmit_more is set for this packet and so kick is
pending.  The next packet may not fit anymore and it will hit this
check here.

>
> > >                 return NETDEV_TX_BUSY;
> > >         }
> > >
> > > @@ -516,21 +526,16 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >  normal_tx:
> > >         if (length < BNXT_MIN_PKT_SIZE) {
> > >                 pad = BNXT_MIN_PKT_SIZE - length;
> > > -               if (skb_pad(skb, pad)) {
> > > +               if (skb_pad(skb, pad))
> > >                         /* SKB already freed. */
> > > -                       tx_buf->skb = NULL;
> > > -                       return NETDEV_TX_OK;
> > > -               }
> > > +                       goto tx_kick_pending;
> > >                 length = BNXT_MIN_PKT_SIZE;
> > >         }
> > >
> > >         mapping = dma_map_single(&pdev->dev, skb->data, len, DMA_TO_DEVICE);
> > >
> > > -       if (unlikely(dma_mapping_error(&pdev->dev, mapping))) {
> > > -               dev_kfree_skb_any(skb);
> > > -               tx_buf->skb = NULL;
> > > -               return NETDEV_TX_OK;
> > > -       }
> > > +       if (unlikely(dma_mapping_error(&pdev->dev, mapping)))
> > > +               goto tx_free;
> > >
> > >         dma_unmap_addr_set(tx_buf, mapping, mapping);
> > >         flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD |
> > > @@ -617,13 +622,15 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >         txr->tx_prod = prod;
> > >
> > >         if (!netdev_xmit_more() || netif_xmit_stopped(txq))
> > > -               bnxt_db_write(bp, &txr->tx_db, prod);
> > > +               bnxt_txr_db_kick(bp, txr, prod);
> > > +       else
> > > +               txr->kick_pending = 1;
> > >
> > >  tx_done:
> > >
> > >         if (unlikely(bnxt_tx_avail(bp, txr) <= MAX_SKB_FRAGS + 1)) {
> > >                 if (netdev_xmit_more() && !tx_buf->is_push)
> > > -                       bnxt_db_write(bp, &txr->tx_db, prod);
> > > +                       bnxt_txr_db_kick(bp, txr, prod);
> > >
> > >                 netif_tx_stop_queue(txq);
> > >
> > > @@ -661,7 +668,12 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >                                PCI_DMA_TODEVICE);
> > >         }
> > >
> > > +tx_free:
> > >         dev_kfree_skb_any(skb);
> > > +tx_kick_pending:
> > > +       tx_buf->skb = NULL;
> >
> > I think we should remove the setting of tx_buf->skb to NULL in the
> > tx_dma_error path since we are setting it here now.
>
> But tx_buf gets moved IIRC - if we hit tx_dma_error tx_buf will be one
> of the fragment bufs at this point. It should be legal to clear the skb
> pointer on those AFAICT.

Ah, you're right.

>
> Are you suggesting to do something along the lines of:
>
>         txr->tx_buf_ring[txr->tx_prod].skb = NULL;

Yeah, I like this the best.

>
> ?
>
> > > +       if (txr->kick_pending)
> > > +               bnxt_txr_db_kick(bp, txr, txr->tx_prod);
> > >         return NETDEV_TX_OK;
> > >  }
> > >
>

Michael Chan Aug. 11, 2021, 11:38 p.m. UTC | #3

On Wed, Aug 11, 2021 at 4:17 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 11 Aug 2021 16:00:52 -0700 Michael Chan wrote:
> > On Wed, Aug 11, 2021 at 3:44 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 11 Aug 2021 15:36:34 -0700 Michael Chan wrote:
> > > > On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > @@ -367,6 +368,13 @@ static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
> > > > >         return md_dst->u.port_info.port_id;
> > > > >  }
> > > > >
> > > > > +static void bnxt_txr_db_kick(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> > > > > +                            u16 prod)
> > > > > +{
> > > > > +       bnxt_db_write(bp, &txr->tx_db, prod);
> > > > > +       txr->kick_pending = 0;
> > > > > +}
> > > > > +
> > > > >  static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >  {
> > > > >         struct bnxt *bp = netdev_priv(dev);
> > > > > @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >         free_size = bnxt_tx_avail(bp, txr);
> > > > >         if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {
> > > > >                 netif_tx_stop_queue(txq);
> > > > > +               if (net_ratelimit() && txr->kick_pending)
> > > > > +                       netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n");
> > > >
> > > > You forgot to remove this.
> > >
> > > I changed my mind. I added the && txr->kick_pending to the condition,
> > > if there is a race and napi starts the queue unnecessarily the kick
> > > can't be pending.
> >
> > I don't understand.  The queue should be stopped if we have <=
> > MAX_SKB_FRAGS + 1 descriptors left.  If there is a race and the queue
> > is awake, the first TX packet may slip through if
> > skb_shinfo(skb)->nr_frags is small and we have enough descriptors for
> > it.  Let's say xmit_more is set for this packet and so kick is
> > pending.  The next packet may not fit anymore and it will hit this
> > check here.
>
> But even if we slip past this check we can only do it once, the check
> at the end of start_xmit() will see we have fewer slots than MAX_FRAGS
> + 2, ring the doorbell and stop.

Yeah, I think you're right.

Michael Chan Aug. 12, 2021, 6:51 a.m. UTC | #4

On Wed, Aug 11, 2021 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote:

> @@ -396,6 +404,8 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)

>         free_size = bnxt_tx_avail(bp, txr);

>         if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {

>                 netif_tx_stop_queue(txq);

> +               if (net_ratelimit() && txr->kick_pending)

> +                       netif_warn(bp, tx_err, dev, "bnxt: ring busy!\n");

I think there is one more problem here.  Now that it is possible to
get here, we can race with bnxt_tx_int() again here.  We can call
netif_tx_stop_queue() here after bnxt_tx_int() has already cleaned the
entire TX ring.  So I think we need to call bnxt_tx_wake_queue() here
again if descriptors have become available.

>                 return NETDEV_TX_BUSY;

>         }

>

David Laight Aug. 13, 2021, 8:35 a.m. UTC | #5

From: Jakub Kicinski

> Sent: 11 August 2021 22:38

> 

> skbs are freed on error and not put on the ring. We may, however,

> be in a situation where we're freeing the last skb of a batch,

> and there is a doorbell ring pending because of xmit_more() being

> true earlier. Make sure we ring the door bell in such situations.

> 

> Since errors are rare don't pay attention to xmit_more() and just

> always flush the pending frames.

> 

...
> +tx_free:

>  	dev_kfree_skb_any(skb);

> +tx_kick_pending:

> +	tx_buf->skb = NULL;

> +	if (txr->kick_pending)

> +		bnxt_txr_db_kick(bp, txr, txr->tx_prod);

>  	return NETDEV_TX_OK;


Is this case actually so unlikely that the 'kick' can be
done unconditionally?
Then all the conditionals can be removed from the hot path.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

[net,v2,0/4] bnxt: Tx NAPI disabling resiliency improvements

Message

Comments