[net,0/4] udp: more FRAGLIST fixes

Message ID	cover.1620223174.git.pabeni@redhat.com
Headers	show Return-Path: <netdev-owner@kernel.org> From: Paolo Abeni <pabeni@redhat.com> To: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Steffen Klassert <steffen.klassert@secunet.com>, Willem de Bruijn <willemb@google.com>, Miaohe Lin <linmiaohe@huawei.com> Subject: [PATCH net 0/4] udp: more FRAGLIST fixes Date: Wed, 5 May 2021 17:35:00 +0200 Message-Id: <cover.1620223174.git.pabeni@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	udp: more FRAGLIST fixes \| expand [net,0/4] udp: more FRAGLIST fixes [net,2/4] udp: fix out-of-bound at segmentation time [net,4/4] selftests: more UDP GRO tests

Paolo Abeni May 5, 2021, 3:35 p.m. UTC

This series includes a bunch of fixes for SKB_GSO_FRAGLIST
packets, popped out while enabling the mentioned GRO type
in non trivial forwarding scenarios.

The last patch extends the existing UDP GRO self-tests to
cover the currently bugged cases.

Paolo Abeni (4):
  net: fix double-free on fraglist GSO skbs
  udp: fix out-of-bound at segmentation time
  udp: fix outer header csum for SKB_GSO_FRAGLIST over UDP tunnel
  selftests: more UDP GRO tests

 net/core/skbuff.c                             |  6 +++
 net/ipv4/udp.c                                |  2 +-
 net/ipv4/udp_offload.c                        | 45 +++++++++++-----
 tools/testing/selftests/net/set_sysfs_attr.sh | 15 ++++++
 tools/testing/selftests/net/udpgro_fwd.sh     | 54 +++++++++++++++++++
 5 files changed, 107 insertions(+), 15 deletions(-)
 create mode 100755 tools/testing/selftests/net/set_sysfs_attr.sh

Willem de Bruijn May 5, 2021, 4:13 p.m. UTC | #1

On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> callback is available, the skb destructor is invoked on each
> aggregated packet via skb_release_head_state().
>
> Such field (and the pairer skb->sk) is left untouched, so the same
> destructor is invoked again when the segmented skbs are freed, leading
> to double-free/UaF of the relevant socket.

Similar to skb_segment, should the destructor be swapped with the last
segment and callback delayed, instead of called immediately as part of
segmentation?

        /* Following permits correct backpressure, for protocols
         * using skb_set_owner_w().
         * Idea is to tranfert ownership from head_skb to last segment.
         */
        if (head_skb->destructor == sock_wfree) {
                swap(tail->truesize, head_skb->truesize);
                swap(tail->destructor, head_skb->destructor);
                swap(tail->sk, head_skb->sk);
        }

Paolo Abeni May 5, 2021, 5:28 p.m. UTC | #2

On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor
> > callback is available, the skb destructor is invoked on each
> > aggregated packet via skb_release_head_state().
> > 
> > Such field (and the pairer skb->sk) is left untouched, so the same
> > destructor is invoked again when the segmented skbs are freed, leading
> > to double-free/UaF of the relevant socket.
> 
> Similar to skb_segment, should the destructor be swapped with the last
> segment and callback delayed, instead of called immediately as part of
> segmentation?
> 
>         /* Following permits correct backpressure, for protocols
>          * using skb_set_owner_w().
>          * Idea is to tranfert ownership from head_skb to last segment.
>          */
>         if (head_skb->destructor == sock_wfree) {
>                 swap(tail->truesize, head_skb->truesize);
>                 swap(tail->destructor, head_skb->destructor);
>                 swap(tail->sk, head_skb->sk);
>         }

My understanding is that one assumption in the original
SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not
owned by any socket. 

AFAICS the above assumption was true until:

commit c75fb320d482a5ce6e522378d137fd2c3bf79225
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Apr 9 13:04:37 2021 +0200

    veth: use skb_orphan_partial instead of skb_orphan

after that, if the skb is owned, skb->destructor is sock_efree(), so
the above code should not trigger.

More importantly SKB_GSO_FRAGLIST can only be applied if the inner-
most protocol is UDP, so
commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475
and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant. 

Thanks!

Paolo

Paolo Abeni May 6, 2021, 11:06 a.m. UTC | #3

On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:
> On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:

> > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:

> > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:

> > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor

> > > > callback is available, the skb destructor is invoked on each

> > > > aggregated packet via skb_release_head_state().

> > > > 

> > > > Such field (and the pairer skb->sk) is left untouched, so the same

> > > > destructor is invoked again when the segmented skbs are freed, leading

> > > > to double-free/UaF of the relevant socket.

> > > 

> > > Similar to skb_segment, should the destructor be swapped with the last

> > > segment and callback delayed, instead of called immediately as part of

> > > segmentation?

> > > 

> > >         /* Following permits correct backpressure, for protocols

> > >          * using skb_set_owner_w().

> > >          * Idea is to tranfert ownership from head_skb to last segment.

> > >          */

> > >         if (head_skb->destructor == sock_wfree) {

> > >                 swap(tail->truesize, head_skb->truesize);

> > >                 swap(tail->destructor, head_skb->destructor);

> > >                 swap(tail->sk, head_skb->sk);

> > >         }

> > 

> > My understanding is that one assumption in the original

> > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not

> > owned by any socket.

> > 

> > AFAICS the above assumption was true until:

> > 

> > commit c75fb320d482a5ce6e522378d137fd2c3bf79225

> > Author: Paolo Abeni <pabeni@redhat.com>

> > Date:   Fri Apr 9 13:04:37 2021 +0200

> > 

> >     veth: use skb_orphan_partial instead of skb_orphan

> > 

> > after that, if the skb is owned, skb->destructor is sock_efree(), so

> > the above code should not trigger.

> 

> Okay, great.

> 

> > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-

> > most protocol is UDP, so

> > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475

> > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.

> 

> I think the first does apply, as it applies to any protocol that uses

> sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.


If we want to be safe about future possible sock_wfree users, I think
the approach here should be different: in skb_segment(), tail-
>destructor is expected to be NULL, while skb_segment_list(), all the

list skbs can be owned by the same socket. Possibly we could open-
code skb_release_head_state(), omitting the skb orphaning part
for sock_wfree() destructor.

Note that the this is not currently needed - sock_wfree destructor
can't reach there.

Given all the above, I'm unsure if you are fine with (or at least do
not oppose to) the code proposed in this patch?

Thanks,

Paolo

Willem de Bruijn May 6, 2021, 2:32 p.m. UTC | #4

On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:
>

> On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:

> > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:

> > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:

> > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:

> > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor

> > > > > callback is available, the skb destructor is invoked on each

> > > > > aggregated packet via skb_release_head_state().

> > > > >

> > > > > Such field (and the pairer skb->sk) is left untouched, so the same

> > > > > destructor is invoked again when the segmented skbs are freed, leading

> > > > > to double-free/UaF of the relevant socket.

> > > >

> > > > Similar to skb_segment, should the destructor be swapped with the last

> > > > segment and callback delayed, instead of called immediately as part of

> > > > segmentation?

> > > >

> > > >         /* Following permits correct backpressure, for protocols

> > > >          * using skb_set_owner_w().

> > > >          * Idea is to tranfert ownership from head_skb to last segment.

> > > >          */

> > > >         if (head_skb->destructor == sock_wfree) {

> > > >                 swap(tail->truesize, head_skb->truesize);

> > > >                 swap(tail->destructor, head_skb->destructor);

> > > >                 swap(tail->sk, head_skb->sk);

> > > >         }

> > >

> > > My understanding is that one assumption in the original

> > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not

> > > owned by any socket.

> > >

> > > AFAICS the above assumption was true until:

> > >

> > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225

> > > Author: Paolo Abeni <pabeni@redhat.com>

> > > Date:   Fri Apr 9 13:04:37 2021 +0200

> > >

> > >     veth: use skb_orphan_partial instead of skb_orphan

> > >

> > > after that, if the skb is owned, skb->destructor is sock_efree(), so

> > > the above code should not trigger.

> >

> > Okay, great.

> >

> > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-

> > > most protocol is UDP, so

> > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475

> > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.

> >

> > I think the first does apply, as it applies to any protocol that uses

> > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.

>

> If we want to be safe about future possible sock_wfree users, I think

> the approach here should be different: in skb_segment(), tail-

> >destructor is expected to be NULL, while skb_segment_list(), all the

> list skbs can be owned by the same socket. Possibly we could open-

> code skb_release_head_state(), omitting the skb orphaning part

> for sock_wfree() destructor.

>

> Note that the this is not currently needed - sock_wfree destructor

> can't reach there.

>

> Given all the above, I'm unsure if you are fine with (or at least do

> not oppose to) the code proposed in this patch?


Yes. Thanks for clarifying, Paolo.

Paolo Abeni May 6, 2021, 3:55 p.m. UTC | #5

On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:
> On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:

> > On Wed, 2021-05-05 at 13:30 -0400, Willem de Bruijn wrote:

> > > On Wed, May 5, 2021 at 1:28 PM Paolo Abeni <pabeni@redhat.com> wrote:

> > > > On Wed, 2021-05-05 at 12:13 -0400, Willem de Bruijn wrote:

> > > > > On Wed, May 5, 2021 at 11:37 AM Paolo Abeni <pabeni@redhat.com> wrote:

> > > > > > While segmenting a SKB_GSO_FRAGLIST GSO packet, if the destructor

> > > > > > callback is available, the skb destructor is invoked on each

> > > > > > aggregated packet via skb_release_head_state().

> > > > > > 

> > > > > > Such field (and the pairer skb->sk) is left untouched, so the same

> > > > > > destructor is invoked again when the segmented skbs are freed, leading

> > > > > > to double-free/UaF of the relevant socket.

> > > > > 

> > > > > Similar to skb_segment, should the destructor be swapped with the last

> > > > > segment and callback delayed, instead of called immediately as part of

> > > > > segmentation?

> > > > > 

> > > > >         /* Following permits correct backpressure, for protocols

> > > > >          * using skb_set_owner_w().

> > > > >          * Idea is to tranfert ownership from head_skb to last segment.

> > > > >          */

> > > > >         if (head_skb->destructor == sock_wfree) {

> > > > >                 swap(tail->truesize, head_skb->truesize);

> > > > >                 swap(tail->destructor, head_skb->destructor);

> > > > >                 swap(tail->sk, head_skb->sk);

> > > > >         }

> > > > 

> > > > My understanding is that one assumption in the original

> > > > SKB_GSO_FRAGLIST implementation was that SKB_GSO_FRAGLIST skbs are not

> > > > owned by any socket.

> > > > 

> > > > AFAICS the above assumption was true until:

> > > > 

> > > > commit c75fb320d482a5ce6e522378d137fd2c3bf79225

> > > > Author: Paolo Abeni <pabeni@redhat.com>

> > > > Date:   Fri Apr 9 13:04:37 2021 +0200

> > > > 

> > > >     veth: use skb_orphan_partial instead of skb_orphan

> > > > 

> > > > after that, if the skb is owned, skb->destructor is sock_efree(), so

> > > > the above code should not trigger.

> > > 

> > > Okay, great.

> > > 

> > > > More importantly SKB_GSO_FRAGLIST can only be applied if the inner-

> > > > most protocol is UDP, so

> > > > commit 432c856fcf45c468fffe2e5029cb3f95c7dc9475

> > > > and d6a4a10411764cf1c3a5dad4f06c5ebe5194488b should not be relevant.

> > > 

> > > I think the first does apply, as it applies to any protocol that uses

> > > sock_wfree, not just tcp_wfree? Anyway, the point is moot indeed.

> > 

> > If we want to be safe about future possible sock_wfree users, I think

> > the approach here should be different: in skb_segment(), tail-

> > > destructor is expected to be NULL, while skb_segment_list(), all the

> > list skbs can be owned by the same socket. Possibly we could open-

> > code skb_release_head_state(), omitting the skb orphaning part

> > for sock_wfree() destructor.

> > 

> > Note that the this is not currently needed - sock_wfree destructor

> > can't reach there.

> > 

> > Given all the above, I'm unsure if you are fine with (or at least do

> > not oppose to) the code proposed in this patch?

> 

> Yes. Thanks for clarifying, Paolo.


Thank you for reviewing!

@David, @Jakub: I see this series is already archived as "change
requested", should I repost?

Thanks!

Paolo

Jakub Kicinski May 6, 2021, 9:17 p.m. UTC | #6

On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:
> On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:

> > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  

> > > If we want to be safe about future possible sock_wfree users, I think

> > > the approach here should be different: in skb_segment(), tail-  

> > > > destructor is expected to be NULL, while skb_segment_list(), all the  

> > > list skbs can be owned by the same socket. Possibly we could open-

> > > code skb_release_head_state(), omitting the skb orphaning part

> > > for sock_wfree() destructor.

> > > 

> > > Note that the this is not currently needed - sock_wfree destructor

> > > can't reach there.

> > > 

> > > Given all the above, I'm unsure if you are fine with (or at least do

> > > not oppose to) the code proposed in this patch?  

> > 

> > Yes. Thanks for clarifying, Paolo.  

> 

> Thank you for reviewing!

> 

> @David, @Jakub: I see this series is already archived as "change

> requested", should I repost?


Yes, please. Patch 2 adds two new sparse warnings. 

I think you need csum_unfold() to go from __sum16 to __wsum.

Paolo Abeni May 7, 2021, 8:46 a.m. UTC | #7

On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:
> On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:

> > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:

> > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  

> > > > If we want to be safe about future possible sock_wfree users, I think

> > > > the approach here should be different: in skb_segment(), tail-  

> > > > > destructor is expected to be NULL, while skb_segment_list(), all the  

> > > > list skbs can be owned by the same socket. Possibly we could open-

> > > > code skb_release_head_state(), omitting the skb orphaning part

> > > > for sock_wfree() destructor.

> > > > 

> > > > Note that the this is not currently needed - sock_wfree destructor

> > > > can't reach there.

> > > > 

> > > > Given all the above, I'm unsure if you are fine with (or at least do

> > > > not oppose to) the code proposed in this patch?  

> > > 

> > > Yes. Thanks for clarifying, Paolo.  

> > 

> > Thank you for reviewing!

> > 

> > @David, @Jakub: I see this series is already archived as "change

> > requested", should I repost?

> 

> Yes, please. Patch 2 adds two new sparse warnings. 

> 

> I think you need csum_unfold() to go from __sum16 to __wsum.


Yes, indeed. I'll send a v2 with such change, thanks!

Paolo
>

Paolo Abeni May 10, 2021, 3:37 p.m. UTC | #8

On Fri, 2021-05-07 at 10:46 +0200, Paolo Abeni wrote:
> On Thu, 2021-05-06 at 14:17 -0700, Jakub Kicinski wrote:

> > On Thu, 06 May 2021 17:55:36 +0200 Paolo Abeni wrote:

> > > On Thu, 2021-05-06 at 10:32 -0400, Willem de Bruijn wrote:

> > > > On Thu, May 6, 2021 at 7:07 AM Paolo Abeni <pabeni@redhat.com> wrote:  

> > > > > If we want to be safe about future possible sock_wfree users, I think

> > > > > the approach here should be different: in skb_segment(), tail-  

> > > > > > destructor is expected to be NULL, while skb_segment_list(), all the  

> > > > > list skbs can be owned by the same socket. Possibly we could open-

> > > > > code skb_release_head_state(), omitting the skb orphaning part

> > > > > for sock_wfree() destructor.

> > > > > 

> > > > > Note that the this is not currently needed - sock_wfree destructor

> > > > > can't reach there.

> > > > > 

> > > > > Given all the above, I'm unsure if you are fine with (or at least do

> > > > > not oppose to) the code proposed in this patch?  

> > > > 

> > > > Yes. Thanks for clarifying, Paolo.  

> > > 

> > > Thank you for reviewing!

> > > 

> > > @David, @Jakub: I see this series is already archived as "change

> > > requested", should I repost?

> > 

> > Yes, please. Patch 2 adds two new sparse warnings. 

> > 

> > I think you need csum_unfold() to go from __sum16 to __wsum.

> 

> Yes, indeed. I'll send a v2 with such change, thanks!

It's taking [much] more than expected, as it turned out that thare are
still a number of case where the tx csum is uncorrect.

If the traffic comes from a veth we don't have a valid th->csum value
at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current
code does - looks wrong.
@Steffen: I see in the original discussion about GRO_FRAGLIST
introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY
to avoid csum modification in fwd path. I guess that choice was mostily
due performance reasons, to avoid touching the aggregated pkts header
at gso_segment_list time, but it looks like it's quite bug prone. If
so, I'm unsure the performance gain is worty. I propose to switch to
CHECKSUM_PARTIAL. Would you be ok with that?

Thanks,

Paolo

Steffen Klassert May 11, 2021, 9:39 a.m. UTC | #9

On Mon, May 10, 2021 at 05:37:58PM +0200, Paolo Abeni wrote:
> 

> It's taking [much] more than expected, as it turned out that thare are

> still a number of case where the tx csum is uncorrect.

> 

> If the traffic comes from a veth we don't have a valid th->csum value

> at GRO time, setting ip_summed to CHECKSUM_UNNECESSARY - as the current

> code does - looks wrong.

> @Steffen: I see in the original discussion about GRO_FRAGLIST

> introduction that you wanted the GRO packets to be CHECKSUM_UNNECESSARY

> to avoid csum modification in fwd path. I guess that choice was mostily

> due performance reasons, to avoid touching the aggregated pkts header

> at gso_segment_list time, but it looks like it's quite bug prone. If

> so, I'm unsure the performance gain is worty.


Yes, that was for performance reasons. We don't mangle the packets
with fraglist GRO, so the checksum should be still correct when
doing GSO.

> I propose to switch to

> CHECKSUM_PARTIAL. Would you be ok with that?


If there are cases where CHECKSUM_UNNECESSARY is problematic,
then yes, let's switch to CHECKSUM_PARTIAL.

Thanks for doing this Paolo!

[net,0/4] udp: more FRAGLIST fixes

Message

Comments