Message ID | 8002c31a-60b9-58f1-f0dd-8fd07239917f@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net] r8169: work around short packet hw bug on RTL8125 | expand |
On Tue, 3 Nov 2020 18:52:18 +0100 Heiner Kallweit wrote: > Network problems with RTL8125B have been reported [0] and with help > from Realtek it turned out that this chip version has a hw problem > with short packets (similar to RTL8168evl). Having said that activate > the same workaround as for RTL8168evl. > Realtek suggested to activate the workaround for RTL8125A too, even > though they're not 100% sure yet which RTL8125 versions are affected. > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=209839 > > Fixes: 0439297be951 ("r8169: add support for RTL8125B") > Reported-by: Maxim Plotnikov <wgh@torlan.ru> > Tested-by: Maxim Plotnikov <wgh@torlan.ru> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Applied, thanks! > @@ -4125,7 +4133,7 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, > > opts[1] |= transport_offset << TCPHO_SHIFT; > } else { > - if (unlikely(rtl_test_hw_pad_bug(tp, skb))) > + if (unlikely(skb->len < ETH_ZLEN && rtl_test_hw_pad_bug(tp))) > return !eth_skb_pad(skb); > } But looks like we may have another bug here - looks like this function treas skb_cow_head() and eth_skb_pad() failures the same, but former doesn't free the skb on error, while the latter does.
On 05.11.2020 02:45, Jakub Kicinski wrote: > On Tue, 3 Nov 2020 18:52:18 +0100 Heiner Kallweit wrote: >> Network problems with RTL8125B have been reported [0] and with help >> from Realtek it turned out that this chip version has a hw problem >> with short packets (similar to RTL8168evl). Having said that activate >> the same workaround as for RTL8168evl. >> Realtek suggested to activate the workaround for RTL8125A too, even >> though they're not 100% sure yet which RTL8125 versions are affected. >> >> [0] https://bugzilla.kernel.org/show_bug.cgi?id=209839 >> >> Fixes: 0439297be951 ("r8169: add support for RTL8125B") >> Reported-by: Maxim Plotnikov <wgh@torlan.ru> >> Tested-by: Maxim Plotnikov <wgh@torlan.ru> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > > Applied, thanks! > >> @@ -4125,7 +4133,7 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, >> >> opts[1] |= transport_offset << TCPHO_SHIFT; >> } else { >> - if (unlikely(rtl_test_hw_pad_bug(tp, skb))) >> + if (unlikely(skb->len < ETH_ZLEN && rtl_test_hw_pad_bug(tp))) >> return !eth_skb_pad(skb); >> } > > But looks like we may have another bug here - looks like this function > treas skb_cow_head() and eth_skb_pad() failures the same, but former > doesn't free the skb on error, while the latter does. > Thanks for the hint, indeed we have an issue. The caller of rtl8169_tso_csum_v2() also frees the skb if false is returned, therefore we have a double free if eth_skb_pad() fails. When checking eth_skb_pad() I saw that it uses kfree_skb() to free the skb on error. Kernel documentation say about ndo_start_xmit context: Process with BHs disabled or BH (timer), will be called with interrupts disabled by netconsole. Is it safe to use kfree_skb() if interrupts are disabled? I'm asking because dev_kfree_skb_any() uses the irq path if irqs_disabled() is true.
On Thu, 5 Nov 2020 11:31:48 +0100 Heiner Kallweit wrote: > >> @@ -4125,7 +4133,7 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, > >> > >> opts[1] |= transport_offset << TCPHO_SHIFT; > >> } else { > >> - if (unlikely(rtl_test_hw_pad_bug(tp, skb))) > >> + if (unlikely(skb->len < ETH_ZLEN && rtl_test_hw_pad_bug(tp))) > >> return !eth_skb_pad(skb); > >> } > > > > But looks like we may have another bug here - looks like this function > > treas skb_cow_head() and eth_skb_pad() failures the same, but former > > doesn't free the skb on error, while the latter does. > > > Thanks for the hint, indeed we have an issue. The caller of > rtl8169_tso_csum_v2() also frees the skb if false is returned, therefore > we have a double free if eth_skb_pad() fails. > > When checking eth_skb_pad() I saw that it uses kfree_skb() to free > the skb on error. Kernel documentation say about ndo_start_xmit context: > > Process with BHs disabled or BH (timer), > will be called with interrupts disabled by netconsole. > > Is it safe to use kfree_skb() if interrupts are disabled? > I'm asking because dev_kfree_skb_any() uses the irq path if > irqs_disabled() is true. It is not, although in practice it only matters if skb has a socket attached, and AFAIR netconsole doesn't, so no-one has ever seen the WARN() trigger. But yes, I think it's best if we fix it.
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7e0947e29..07d197141 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4051,9 +4051,17 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, return -EIO; } -static bool rtl_test_hw_pad_bug(struct rtl8169_private *tp, struct sk_buff *skb) +static bool rtl_test_hw_pad_bug(struct rtl8169_private *tp) { - return skb->len < ETH_ZLEN && tp->mac_version == RTL_GIGA_MAC_VER_34; + switch (tp->mac_version) { + case RTL_GIGA_MAC_VER_34: + case RTL_GIGA_MAC_VER_60: + case RTL_GIGA_MAC_VER_61: + case RTL_GIGA_MAC_VER_63: + return true; + default: + return false; + } } static void rtl8169_tso_csum_v1(struct sk_buff *skb, u32 *opts) @@ -4125,7 +4133,7 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, opts[1] |= transport_offset << TCPHO_SHIFT; } else { - if (unlikely(rtl_test_hw_pad_bug(tp, skb))) + if (unlikely(skb->len < ETH_ZLEN && rtl_test_hw_pad_bug(tp))) return !eth_skb_pad(skb); }