Message ID | 0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch |
---|---|
State | New |
Headers | show |
Series | [v2,net] net: udp: fix Fast/frag0 UDP GRO | expand |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 9 Nov 2020 18:37:36 +0100 > On 11/9/20 5:56 PM, Alexander Lobakin wrote: >> While testing UDP GSO fraglists forwarding through driver that uses >> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order >> iperf packets: >> >> [ ID] Interval Transfer Bitrate Jitter >> [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order >> >> Simple switch to napi_gro_receive() any other method without frag0 >> shortcut completely resolved them. >> >> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() >> callback. While it's probably OK for non-frag0 paths (when all >> headers or even the entire frame are already in skb->data), this >> inline points to junk when using Fast GRO (napi_gro_frags() or >> napi_gro_receive() with only Ethernet header in skb->data and all >> the rest in shinfo->frags) and breaks GRO packet compilation and >> the packet flow itself. >> To support both modes, skb_gro_header_fast() + skb_gro_header_slow() >> are typically used. UDP even has an inline helper that makes use of >> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() >> to get rid of the out-of-order delivers. >> >> Present since the introduction of plain UDP GRO in 5.0-rc1. >> >> Since v1 [1]: >> - added a NULL pointer check for "uh" as suggested by Willem. >> >> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch >> >> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") >> Signed-off-by: Alexander Lobakin <alobakin@pm.me> >> --- >> net/ipv4/udp_offload.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >> index e67a66fbf27b..7f6bd221880a 100644 >> --- a/net/ipv4/udp_offload.c >> +++ b/net/ipv4/udp_offload.c >> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, >> static struct sk_buff *udp_gro_receive_segment(struct list_head *head, >> struct sk_buff *skb) >> { >> - struct udphdr *uh = udp_hdr(skb); >> + struct udphdr *uh = udp_gro_udphdr(skb); >> struct sk_buff *pp = NULL; >> struct udphdr *uh2; >> struct sk_buff *p; >> unsigned int ulen; >> int ret = 0; >> >> + if (unlikely(!uh)) { > > How uh could be NULL here ? > > My understanding is that udp_gro_receive() is called > only after udp4_gro_receive() or udp6_gro_receive() > validated that udp_gro_udphdr(skb) was not NULL. Right, but only after udp{4,6}_lib_lookup_skb() in certain cases. I don't know for sure if their logic can actually edit skb->data, so it's better to check from my point of view. >> + NAPI_GRO_CB(skb)->flush = 1; >> + return NULL; >> + } >> + >> /* requires non zero csum, for symmetry with GSO */ >> if (!uh->check) { >> NAPI_GRO_CB(skb)->flush = 1; >> > >Why uh2 is left unchanged ? > > uh2 = udp_hdr(p); > >... Packets from list_head *head have their headers already pulled to skb->data in 100% cases, no need to change anything here. I double-checked that udp_hdr(p) always returns the same pointer as "p->data + network offset" and left it as it is. Thanks, Al
On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 11/9/20 5:56 PM, Alexander Lobakin wrote: > > While testing UDP GSO fraglists forwarding through driver that uses > > Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order > > iperf packets: > > > > [ ID] Interval Transfer Bitrate Jitter > > [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order > > > > Simple switch to napi_gro_receive() any other method without frag0 > > shortcut completely resolved them. > > > > I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() > > callback. While it's probably OK for non-frag0 paths (when all > > headers or even the entire frame are already in skb->data), this > > inline points to junk when using Fast GRO (napi_gro_frags() or > > napi_gro_receive() with only Ethernet header in skb->data and all > > the rest in shinfo->frags) and breaks GRO packet compilation and > > the packet flow itself. > > To support both modes, skb_gro_header_fast() + skb_gro_header_slow() > > are typically used. UDP even has an inline helper that makes use of > > them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() > > to get rid of the out-of-order delivers. > > > > Present since the introduction of plain UDP GRO in 5.0-rc1. > > > > Since v1 [1]: > > - added a NULL pointer check for "uh" as suggested by Willem. > > > > [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch > > > > Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") > > Signed-off-by: Alexander Lobakin <alobakin@pm.me> > > --- > > net/ipv4/udp_offload.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > > index e67a66fbf27b..7f6bd221880a 100644 > > --- a/net/ipv4/udp_offload.c > > +++ b/net/ipv4/udp_offload.c > > @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, > > static struct sk_buff *udp_gro_receive_segment(struct list_head *head, > > struct sk_buff *skb) > > { > > - struct udphdr *uh = udp_hdr(skb); > > + struct udphdr *uh = udp_gro_udphdr(skb); > > struct sk_buff *pp = NULL; > > struct udphdr *uh2; > > struct sk_buff *p; > > unsigned int ulen; > > int ret = 0; > > > > + if (unlikely(!uh)) { > > How uh could be NULL here ? > > My understanding is that udp_gro_receive() is called > only after udp4_gro_receive() or udp6_gro_receive() > validated that udp_gro_udphdr(skb) was not NULL. Oh indeed. This has already been checked before. > > + NAPI_GRO_CB(skb)->flush = 1; > > + return NULL; > > + } > > + > > /* requires non zero csum, for symmetry with GSO */ > > if (!uh->check) { > > NAPI_GRO_CB(skb)->flush = 1; > > > > Why uh2 is left unchanged ? > > uh2 = udp_hdr(p); Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0 optimization to worry about for packets on the list. > ... >
On 11/9/20 7:28 PM, Willem de Bruijn wrote: > On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 11/9/20 5:56 PM, Alexander Lobakin wrote: >>> While testing UDP GSO fraglists forwarding through driver that uses >>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order >>> iperf packets: >>> >>> [ ID] Interval Transfer Bitrate Jitter >>> [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order >>> >>> Simple switch to napi_gro_receive() any other method without frag0 >>> shortcut completely resolved them. >>> >>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() >>> callback. While it's probably OK for non-frag0 paths (when all >>> headers or even the entire frame are already in skb->data), this >>> inline points to junk when using Fast GRO (napi_gro_frags() or >>> napi_gro_receive() with only Ethernet header in skb->data and all >>> the rest in shinfo->frags) and breaks GRO packet compilation and >>> the packet flow itself. >>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow() >>> are typically used. UDP even has an inline helper that makes use of >>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() >>> to get rid of the out-of-order delivers. >>> >>> Present since the introduction of plain UDP GRO in 5.0-rc1. >>> >>> Since v1 [1]: >>> - added a NULL pointer check for "uh" as suggested by Willem. >>> >>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch >>> >>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") >>> Signed-off-by: Alexander Lobakin <alobakin@pm.me> >>> --- >>> net/ipv4/udp_offload.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >>> index e67a66fbf27b..7f6bd221880a 100644 >>> --- a/net/ipv4/udp_offload.c >>> +++ b/net/ipv4/udp_offload.c >>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, >>> static struct sk_buff *udp_gro_receive_segment(struct list_head *head, >>> struct sk_buff *skb) >>> { >>> - struct udphdr *uh = udp_hdr(skb); >>> + struct udphdr *uh = udp_gro_udphdr(skb); >>> struct sk_buff *pp = NULL; >>> struct udphdr *uh2; >>> struct sk_buff *p; >>> unsigned int ulen; >>> int ret = 0; >>> >>> + if (unlikely(!uh)) { >> >> How uh could be NULL here ? >> >> My understanding is that udp_gro_receive() is called >> only after udp4_gro_receive() or udp6_gro_receive() >> validated that udp_gro_udphdr(skb) was not NULL. > > Oh indeed. This has already been checked before. > >>> + NAPI_GRO_CB(skb)->flush = 1; >>> + return NULL; >>> + } >>> + >>> /* requires non zero csum, for symmetry with GSO */ >>> if (!uh->check) { >>> NAPI_GRO_CB(skb)->flush = 1; >>> >> >> Why uh2 is left unchanged ? >> >> uh2 = udp_hdr(p); > > Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0 > optimization to worry about for packets on the list. My feeling was that tcp_gro_receive() is terminal in the GRO stack. While UDP could be encapsulated in UDP :) I guess we do not support this yet. Years ago we made sure to propagate the current header offset into GRO stack (when we added SIT/IPIP/GRE support to GRO) 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") udp_hdr() is using transport header, which is unique for skb "on the list"
On Mon, Nov 9, 2020 at 1:54 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 11/9/20 7:28 PM, Willem de Bruijn wrote: > > On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> > >> > >> > >> On 11/9/20 5:56 PM, Alexander Lobakin wrote: > >>> While testing UDP GSO fraglists forwarding through driver that uses > >>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order > >>> iperf packets: > >>> > >>> [ ID] Interval Transfer Bitrate Jitter > >>> [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order > >>> > >>> Simple switch to napi_gro_receive() any other method without frag0 > >>> shortcut completely resolved them. > >>> > >>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() > >>> callback. While it's probably OK for non-frag0 paths (when all > >>> headers or even the entire frame are already in skb->data), this > >>> inline points to junk when using Fast GRO (napi_gro_frags() or > >>> napi_gro_receive() with only Ethernet header in skb->data and all > >>> the rest in shinfo->frags) and breaks GRO packet compilation and > >>> the packet flow itself. > >>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow() > >>> are typically used. UDP even has an inline helper that makes use of > >>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() > >>> to get rid of the out-of-order delivers. > >>> > >>> Present since the introduction of plain UDP GRO in 5.0-rc1. > >>> > >>> Since v1 [1]: > >>> - added a NULL pointer check for "uh" as suggested by Willem. > >>> > >>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch > >>> > >>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") > >>> Signed-off-by: Alexander Lobakin <alobakin@pm.me> > >>> --- > >>> net/ipv4/udp_offload.c | 7 ++++++- > >>> 1 file changed, 6 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > >>> index e67a66fbf27b..7f6bd221880a 100644 > >>> --- a/net/ipv4/udp_offload.c > >>> +++ b/net/ipv4/udp_offload.c > >>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, > >>> static struct sk_buff *udp_gro_receive_segment(struct list_head *head, > >>> struct sk_buff *skb) > >>> { > >>> - struct udphdr *uh = udp_hdr(skb); > >>> + struct udphdr *uh = udp_gro_udphdr(skb); > >>> struct sk_buff *pp = NULL; > >>> struct udphdr *uh2; > >>> struct sk_buff *p; > >>> unsigned int ulen; > >>> int ret = 0; > >>> > >>> + if (unlikely(!uh)) { > >> > >> How uh could be NULL here ? > >> > >> My understanding is that udp_gro_receive() is called > >> only after udp4_gro_receive() or udp6_gro_receive() > >> validated that udp_gro_udphdr(skb) was not NULL. > > > > Oh indeed. This has already been checked before. > > > >>> + NAPI_GRO_CB(skb)->flush = 1; > >>> + return NULL; > >>> + } > >>> + > >>> /* requires non zero csum, for symmetry with GSO */ > >>> if (!uh->check) { > >>> NAPI_GRO_CB(skb)->flush = 1; > >>> > >> > >> Why uh2 is left unchanged ? > >> > >> uh2 = udp_hdr(p); > > > > Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0 > > optimization to worry about for packets on the list. > > My feeling was that tcp_gro_receive() is terminal in the GRO stack. > > While UDP could be encapsulated in UDP :) > > I guess we do not support this yet. > > Years ago we made sure to propagate the current header offset into GRO stack > (when we added SIT/IPIP/GRE support to GRO) > 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") On which note, and Alexander's mention of udp4_lib_lookup_skb(), that performs a standard ip_hdr on the possibly frag0 optimized incoming packet: struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, __be16 sport, __be16 dport) { const struct iphdr *iph = ip_hdr(skb); return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, iph->daddr, dport, inet_iif(skb), inet_sdif(skb), &udp_table, NULL); } This should use skb_gro_header_.. too, then?
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Date: Mon, 9 Nov 2020 14:06:07 -0500 > On Mon, Nov 9, 2020 at 1:54 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> >> >> On 11/9/20 7:28 PM, Willem de Bruijn wrote: >>> On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: >>>> >>>> >>>> >>>> On 11/9/20 5:56 PM, Alexander Lobakin wrote: >>>>> While testing UDP GSO fraglists forwarding through driver that uses >>>>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order >>>>> iperf packets: >>>>> >>>>> [ ID] Interval Transfer Bitrate Jitter >>>>> [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order >>>>> >>>>> Simple switch to napi_gro_receive() any other method without frag0 >>>>> shortcut completely resolved them. >>>>> >>>>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() >>>>> callback. While it's probably OK for non-frag0 paths (when all >>>>> headers or even the entire frame are already in skb->data), this >>>>> inline points to junk when using Fast GRO (napi_gro_frags() or >>>>> napi_gro_receive() with only Ethernet header in skb->data and all >>>>> the rest in shinfo->frags) and breaks GRO packet compilation and >>>>> the packet flow itself. >>>>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow() >>>>> are typically used. UDP even has an inline helper that makes use of >>>>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() >>>>> to get rid of the out-of-order delivers. >>>>> >>>>> Present since the introduction of plain UDP GRO in 5.0-rc1. >>>>> >>>>> Since v1 [1]: >>>>> - added a NULL pointer check for "uh" as suggested by Willem. >>>>> >>>>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch >>>>> >>>>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") >>>>> Signed-off-by: Alexander Lobakin <alobakin@pm.me> >>>>> --- >>>>> net/ipv4/udp_offload.c | 7 ++++++- >>>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >>>>> index e67a66fbf27b..7f6bd221880a 100644 >>>>> --- a/net/ipv4/udp_offload.c >>>>> +++ b/net/ipv4/udp_offload.c >>>>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, >>>>> static struct sk_buff *udp_gro_receive_segment(struct list_head *head, >>>>> struct sk_buff *skb) >>>>> { >>>>> - struct udphdr *uh = udp_hdr(skb); >>>>> + struct udphdr *uh = udp_gro_udphdr(skb); >>>>> struct sk_buff *pp = NULL; >>>>> struct udphdr *uh2; >>>>> struct sk_buff *p; >>>>> unsigned int ulen; >>>>> int ret = 0; >>>>> >>>>> + if (unlikely(!uh)) { >>>> >>>> How uh could be NULL here ? >>>> >>>> My understanding is that udp_gro_receive() is called >>>> only after udp4_gro_receive() or udp6_gro_receive() >>>> validated that udp_gro_udphdr(skb) was not NULL. >>> >>> Oh indeed. This has already been checked before. >>> >>>>> + NAPI_GRO_CB(skb)->flush = 1; >>>>> + return NULL; >>>>> + } >>>>> + >>>>> /* requires non zero csum, for symmetry with GSO */ >>>>> if (!uh->check) { >>>>> NAPI_GRO_CB(skb)->flush = 1; >>>>> >>>> >>>> Why uh2 is left unchanged ? >>>> >>>> uh2 = udp_hdr(p); >>> >>> Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0 >>> optimization to worry about for packets on the list. >> >> My feeling was that tcp_gro_receive() is terminal in the GRO stack. >> >> While UDP could be encapsulated in UDP :) >> >> I guess we do not support this yet. >> >> Years ago we made sure to propagate the current header offset into GRO stack >> (when we added SIT/IPIP/GRE support to GRO) >> 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") > > On which note, and Alexander's mention of udp4_lib_lookup_skb(), that > performs a standard ip_hdr on the possibly frag0 optimized incoming > packet: > > struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, > __be16 sport, __be16 dport) > { > const struct iphdr *iph = ip_hdr(skb); > > return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport, > iph->daddr, dport, inet_iif(skb), > inet_sdif(skb), &udp_table, NULL); > } > > This should use skb_gro_header_.. too, then? Well, both of udp{4,6}_lib_lookup_skb() and a couple of neighbour functions use ip_hdr(skb). I didn't check this path before as my test case was NAT/forwarding with no sk. I suspect they should use skb_gro_network_header() to obtain an IP header. I'll fix them in v3, thanks for pointing this out! Al
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index e67a66fbf27b..7f6bd221880a 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, static struct sk_buff *udp_gro_receive_segment(struct list_head *head, struct sk_buff *skb) { - struct udphdr *uh = udp_hdr(skb); + struct udphdr *uh = udp_gro_udphdr(skb); struct sk_buff *pp = NULL; struct udphdr *uh2; struct sk_buff *p; unsigned int ulen; int ret = 0; + if (unlikely(!uh)) { + NAPI_GRO_CB(skb)->flush = 1; + return NULL; + } + /* requires non zero csum, for symmetry with GSO */ if (!uh->check) { NAPI_GRO_CB(skb)->flush = 1;
While testing UDP GSO fraglists forwarding through driver that uses Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order iperf packets: [ ID] Interval Transfer Bitrate Jitter [SUM] 0.0-40.0 sec 12106 datagrams received out-of-order Simple switch to napi_gro_receive() any other method without frag0 shortcut completely resolved them. I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive() callback. While it's probably OK for non-frag0 paths (when all headers or even the entire frame are already in skb->data), this inline points to junk when using Fast GRO (napi_gro_frags() or napi_gro_receive() with only Ethernet header in skb->data and all the rest in shinfo->frags) and breaks GRO packet compilation and the packet flow itself. To support both modes, skb_gro_header_fast() + skb_gro_header_slow() are typically used. UDP even has an inline helper that makes use of them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr() to get rid of the out-of-order delivers. Present since the introduction of plain UDP GRO in 5.0-rc1. Since v1 [1]: - added a NULL pointer check for "uh" as suggested by Willem. [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Alexander Lobakin <alobakin@pm.me> --- net/ipv4/udp_offload.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)