[bpf-next,v5,0/3] bpf: misc performance improvements for cgroup hooks

Message ID	20210108180333.180906-1-sdf@google.com
Headers	show Return-Path: <netdev-owner@kernel.org> Sender: "sdf via sendgmr" <sdf@sdf2.svl.corp.google.com> Date: Fri, 8 Jan 2021 10:03:30 -0800 Message-Id: <20210108180333.180906-1-sdf@google.com> Mime-Version: 1.0 Subject: [PATCH bpf-next v5 0/3] bpf: misc performance improvements for cgroup hooks From: Stanislav Fomichev <sdf@google.com> To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, Stanislav Fomichev <sdf@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	bpf: misc performance improvements for cgroup hooks \| expand [bpf-next,v5,0/3] bpf: misc performance improvements for cgroup hooks [bpf-next,v5,3/3] bpf: split cgroup_bpf_enabled per attach type

Message ID

20210108180333.180906-1-sdf@google.com

Headers

Sender: "sdf via sendgmr" <sdf@sdf2.svl.corp.google.com>
Date: Fri,  8 Jan 2021 10:03:30 -0800
Message-Id: <20210108180333.180906-1-sdf@google.com>
Mime-Version: 1.0
Subject: [PATCH bpf-next v5 0/3] bpf: misc performance improvements for
	cgroup hooks
From: Stanislav Fomichev <sdf@google.com>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, Stanislav Fomichev <sdf@google.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

bpf: misc performance improvements for cgroup hooks | expand

Message

Stanislav Fomichev Jan. 8, 2021, 6:03 p.m. UTC

First patch adds custom getsockopt for TCP_ZEROCOPY_RECEIVE
to remove kmalloc and lock_sock overhead from the dat path.

Second patch removes kzalloc/kfree from getsockopt for the common cases.

Third patch switches cgroup_bpf_enabled to be per-attach to
to add only overhead for the cgroup attach types used on the system.

No visible user-side changes.

v5:
- reorder patches to reduce the churn (Martin KaFai Lau)

v4:
- update performance numbers
- bypass_bpf_getsockopt (Martin KaFai Lau)

v3:
- remove extra newline, add comment about sizeof tcp_zerocopy_receive
  (Martin KaFai Lau)
- add another patch to remove lock_sock overhead from
  TCP_ZEROCOPY_RECEIVE; technically, this makes patch #1 obsolete,
  but I'd still prefer to keep it to help with other socket
  options

v2:
- perf numbers for getsockopt kmalloc reduction (Song Liu)
- (sk) in BPF_CGROUP_PRE_CONNECT_ENABLED (Song Liu)
- 128 -> 64 buffer size, BUILD_BUG_ON (Martin KaFai Lau)

Stanislav Fomichev (3):
  bpf: remove extra lock_sock for TCP_ZEROCOPY_RECEIVE
  bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt
  bpf: split cgroup_bpf_enabled per attach type

 include/linux/bpf-cgroup.h                    |  61 ++++++----
 include/linux/filter.h                        |   5 +
 include/net/sock.h                            |   2 +
 include/net/tcp.h                             |   1 +
 kernel/bpf/cgroup.c                           | 104 +++++++++++++++---
 net/ipv4/af_inet.c                            |   9 +-
 net/ipv4/tcp.c                                |  14 +++
 net/ipv4/tcp_ipv4.c                           |   1 +
 net/ipv4/udp.c                                |   7 +-
 net/ipv6/af_inet6.c                           |   9 +-
 net/ipv6/tcp_ipv6.c                           |   1 +
 net/ipv6/udp.c                                |   7 +-
 .../selftests/bpf/prog_tests/sockopt_sk.c     |  22 ++++
 .../testing/selftests/bpf/progs/sockopt_sk.c  |  15 +++
 14 files changed, 206 insertions(+), 52 deletions(-)

Comments

Stanislav Fomichev Jan. 8, 2021, 6:26 p.m. UTC | #1

On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > call in do_tcp_getsockopt using the on-stack data. This removes
> > 3% overhead for locking/unlocking the socket.
> >
> > Without this patch:
> >      3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
> >             |
> >              --3.30%--__cgroup_bpf_run_filter_getsockopt
> >                        |
> >                         --0.81%--__kmalloc
> >
> > With the patch applied:
> >      0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern
> >
>
>
> OK but we are adding yet another indirect call.
>
> Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ?
Sure, but do you think it will bring any benefit?
We don't have any indirect avoidance in __sys_getsockopt for the
sock->ops->getsockopt() call.
If we add it for this new bpf_bypass_getsockopt, we might as well add
it for sock->ops->getsockopt?
And we need some new INDIRECT_CALL_INET2 such that f2 doesn't get
disabled when ipv6 is disabled :-/

Eric Dumazet Jan. 8, 2021, 6:40 p.m. UTC | #2

On Fri, Jan 8, 2021 at 7:26 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > > call in do_tcp_getsockopt using the on-stack data. This removes
> > > 3% overhead for locking/unlocking the socket.
> > >
> > > Without this patch:
> > >      3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
> > >             |
> > >              --3.30%--__cgroup_bpf_run_filter_getsockopt
> > >                        |
> > >                         --0.81%--__kmalloc
> > >
> > > With the patch applied:
> > >      0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern
> > >
> >
> >
> > OK but we are adding yet another indirect call.
> >
> > Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ?
> Sure, but do you think it will bring any benefit?

Sure, avoiding an indirect call might be the same gain than the
lock_sock() avoidance :)

> We don't have any indirect avoidance in __sys_getsockopt for the
> sock->ops->getsockopt() call.
> If we add it for this new bpf_bypass_getsockopt, we might as well add
> it for sock->ops->getsockopt?

Well, that is orthogonal to this patch.
As you may know, Google kernels do have a mitigation there already and
Brian may upstream it.

> And we need some new INDIRECT_CALL_INET2 such that f2 doesn't get
> disabled when ipv6 is disabled :-/

The same handler is called for IPv4 and IPv6, so you need the variant
with only one known handler (tcp_bpf_bypass_getsockopt)

Only it needs to make sure CONFIG_INET is enabled.