Message ID | 1482127758-4904-2-git-send-email-jianbo.liu@linaro.org |
---|---|
State | Superseded |
Headers | show |
On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote: Hi Jianbo, > vPMD will check 4 descriptors in one time, but the statuses are not consistent > because the memory allocated for RX descriptors is cacheable huagepage. Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages? I am just looking at what it takes to fix similar issues for all drivers wrt armv8. Are you able to reproduce this issue any armv8 platform. If so, could you please the platform detail and commands to reproduce this issue? > This patch is to calculate the number of received packets by scanning DD bit > sequentially, and stops when meeting the first packet with DD bit unset. > > Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> > --- > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > index f96cc85..0b1338d 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > @@ -196,7 +196,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, > struct ixgbe_rx_entry *sw_ring; > uint16_t nb_pkts_recd; > int pos; > - uint64_t var; > uint8x16_t shuf_msk = { > 0xFF, 0xFF, > 0xFF, 0xFF, /* skip 32 bits pkt_type */ > @@ -255,6 +254,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, > uint64x2_t mbp1, mbp2; > uint8x16_t staterr; > uint16x8_t tmp; > + uint32_t var = 0; > uint32_t stat; > > /* B.1 load 1 mbuf point */ > @@ -349,11 +349,19 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, > vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1, > pkt_mb1); > > + stat &= IXGBE_VPMD_DESC_DD_MASK; > + > /* C.4 calc avaialbe number of desc */ > - var = __builtin_popcount(stat & IXGBE_VPMD_DESC_DD_MASK); > - nb_pkts_recd += var; > - if (likely(var != RTE_IXGBE_DESCS_PER_LOOP)) > + if (likely(var != IXGBE_VPMD_DESC_DD_MASK)) { > + while (stat & 0x01) { > + ++var; > + stat = stat >> 8; > + } > + nb_pkts_recd += var; > break; > + } else { > + nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP; > + } > } > > /* Update our internal tail pointer */ > -- > 2.4.11 >
On Wed, Dec 21, 2016 at 03:38:51PM +0530, Jerin Jacob wrote: > On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote: > > Hi Jianbo, > > > vPMD will check 4 descriptors in one time, but the statuses are not consistent > > because the memory allocated for RX descriptors is cacheable huagepage. > Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages? This is not a problem on IA, because the instruction ordering rules on IA guarantee that the reads will be done in the correct program order, and we never get stale cache data. /Bruce
Hi Jerin, On 21 December 2016 at 18:08, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote: > > Hi Jianbo, > >> vPMD will check 4 descriptors in one time, but the statuses are not consistent >> because the memory allocated for RX descriptors is cacheable huagepage. > Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages? > I am just looking at what it takes to fix similar issues for all drivers wrt armv8. > > Are you able to reproduce this issue any armv8 platform. If so, could > you please the platform detail and commands to reproduce this issue? > I have tested on Huawei D03 and Softiron with Intel X540, same issue for both of them. The setup is very simple: loopback 2 ports, then run testpmd.
On 21 December 2016 at 19:03, Bruce Richardson <bruce.richardson@intel.com> wrote: > On Wed, Dec 21, 2016 at 03:38:51PM +0530, Jerin Jacob wrote: >> On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote: >> >> Hi Jianbo, >> >> > vPMD will check 4 descriptors in one time, but the statuses are not consistent >> > because the memory allocated for RX descriptors is cacheable huagepage. >> Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages? > > This is not a problem on IA, because the instruction ordering rules on > IA guarantee that the reads will be done in the correct program order, > and we never get stale cache data. > Yes, I think it's an issue for ARM arch. It's because more than one cacheline-sized data (4/8 descriptors can be in two cachelines) will be read at one time in bulk alloc RX or vPMD. There is the same issue for i40e, I'll send the same patch later.
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index f96cc85..0b1338d 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -196,7 +196,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, struct ixgbe_rx_entry *sw_ring; uint16_t nb_pkts_recd; int pos; - uint64_t var; uint8x16_t shuf_msk = { 0xFF, 0xFF, 0xFF, 0xFF, /* skip 32 bits pkt_type */ @@ -255,6 +254,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, uint64x2_t mbp1, mbp2; uint8x16_t staterr; uint16x8_t tmp; + uint32_t var = 0; uint32_t stat; /* B.1 load 1 mbuf point */ @@ -349,11 +349,19 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1, pkt_mb1); + stat &= IXGBE_VPMD_DESC_DD_MASK; + /* C.4 calc avaialbe number of desc */ - var = __builtin_popcount(stat & IXGBE_VPMD_DESC_DD_MASK); - nb_pkts_recd += var; - if (likely(var != RTE_IXGBE_DESCS_PER_LOOP)) + if (likely(var != IXGBE_VPMD_DESC_DD_MASK)) { + while (stat & 0x01) { + ++var; + stat = stat >> 8; + } + nb_pkts_recd += var; break; + } else { + nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP; + } } /* Update our internal tail pointer */
vPMD will check 4 descriptors in one time, but the statuses are not consistent because the memory allocated for RX descriptors is cacheable huagepage. This patch is to calculate the number of received packets by scanning DD bit sequentially, and stops when meeting the first packet with DD bit unset. Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> --- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) -- 2.4.11