Message ID | 20210609135537.1460244-1-joamaki@gmail.com |
---|---|
Headers | show |
Series | XDP bonding support | expand |
On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote: > > > > This patchset introduces XDP support to the bonding driver. > > > > Patch 1 contains the implementation, including support for > > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a > > performance fix to the roundrobin mode which switches rr_tx_counter > > to be per-cpu. Patch 3 contains the test suite for the implementation > > using a pair of veth devices. > > > > The vmtest.sh is modified to enable the bonding module and install > > modules. The config change should probably be done in the libbpf > > repository. Andrii: How would you like this done properly? > > I think vmtest.sh and CI setup doesn't support modules (not easily at > least). Can we just compile that driver in? Then you can submit a PR > against libbpf Github repo to adjust the config. We have also kernel > CI repo where we'll need to make this change. Unfortunately the mode and xmit_policy options of the bonding driver are module params, so it'll need to be a module so the different modes can be tested. I already modified vmtest.sh [1] to "make module_install" into the rootfs and enable the bonding module via scripts/config, but a cleaner approach would probably be to, as you suggested, update latest.config in libbpf repo and probably get the "modules_install" change into vmtest.sh separately (if you're happy with this approach). What do you think? [1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81
Jussi Maki <joamaki@gmail.com> wrote: >On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko ><andrii.nakryiko@gmail.com> wrote: >> >> On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote: >> > >> > This patchset introduces XDP support to the bonding driver. >> > >> > Patch 1 contains the implementation, including support for >> > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a >> > performance fix to the roundrobin mode which switches rr_tx_counter >> > to be per-cpu. Patch 3 contains the test suite for the implementation >> > using a pair of veth devices. >> > >> > The vmtest.sh is modified to enable the bonding module and install >> > modules. The config change should probably be done in the libbpf >> > repository. Andrii: How would you like this done properly? >> >> I think vmtest.sh and CI setup doesn't support modules (not easily at >> least). Can we just compile that driver in? Then you can submit a PR >> against libbpf Github repo to adjust the config. We have also kernel >> CI repo where we'll need to make this change. > >Unfortunately the mode and xmit_policy options of the bonding driver >are module params, so it'll need to be a module so the different modes >can be tested. I already modified vmtest.sh [1] to "make >module_install" into the rootfs and enable the bonding module via >scripts/config, but a cleaner approach would probably be to, as you >suggested, update latest.config in libbpf repo and probably get the >"modules_install" change into vmtest.sh separately (if you're happy >with this approach). What do you think? The bonding mode and xmit_hash_policy (and any other option) can be changed via "ip link"; no module parameter needed, e.g., ip link set dev bond0 type bond xmit_hash_policy layer2 -J >[1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81 --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Mon, Jun 14, 2021 at 5:25 AM Jussi Maki <joamaki@gmail.com> wrote: > > On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote: > > > > > > This patchset introduces XDP support to the bonding driver. > > > > > > Patch 1 contains the implementation, including support for > > > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a > > > performance fix to the roundrobin mode which switches rr_tx_counter > > > to be per-cpu. Patch 3 contains the test suite for the implementation > > > using a pair of veth devices. > > > > > > The vmtest.sh is modified to enable the bonding module and install > > > modules. The config change should probably be done in the libbpf > > > repository. Andrii: How would you like this done properly? > > > > I think vmtest.sh and CI setup doesn't support modules (not easily at > > least). Can we just compile that driver in? Then you can submit a PR > > against libbpf Github repo to adjust the config. We have also kernel > > CI repo where we'll need to make this change. > > Unfortunately the mode and xmit_policy options of the bonding driver > are module params, so it'll need to be a module so the different modes > can be tested. I already modified vmtest.sh [1] to "make > module_install" into the rootfs and enable the bonding module via > scripts/config, but a cleaner approach would probably be to, as you > suggested, update latest.config in libbpf repo and probably get the > "modules_install" change into vmtest.sh separately (if you're happy > with this approach). What do you think? If we can make modules work in vmtest.sh then it's great, regardless if you need it still or not. It's not supported right now because no one did work to support modules, not because we explicitly didn't want modules in CI. > > [1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81
From: Jussi Maki <joamaki@gmail.com>
This patchset introduces XDP support to the bonding driver.
The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters. An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.
Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected
using "sar -n dev -u 1 10".
-----------------------| CPU |--| rxpck/s |--| txpck/s |----
without patch (1 dev):
XDP_DROP: 3.15% 48.6Mpps
XDP_TX: 3.12% 18.3Mpps 18.3Mpps
XDP_DROP (RSS): 9.47% 116.5Mpps
XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps
-----------------------
with patch, bond (1 dev):
XDP_DROP: 3.14% 46.7Mpps
XDP_TX: 3.15% 13.9Mpps 13.9Mpps
XDP_DROP (RSS): 10.33% 117.2Mpps
XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps
-----------------------
with patch, bond (2 devs):
XDP_DROP: 6.27% 92.7Mpps
XDP_TX: 6.26% 17.6Mpps 17.5Mpps
XDP_DROP (RSS): 11.38% 117.2Mpps
XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps
--------------------------------------------------------------
RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.
[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
Patch 1 prepares bond_xmit_hash for hashing xdp_buff's
Patch 2 adds hooks to implement redirection after bpf prog run
Patch 3 implements the hooks in the bonding driver.
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
v1->v2:
- Split up into smaller easier to review patches and address cosmetic
review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
patch set from Magnus Karlsson is merged and provides test utilities that can
be reused for XDP bonding tests. v2 contains no major functional changes and
was tested with the test suite included in v1.
(https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)
---
Jussi Maki (4):
net: bonding: Refactor bond_xmit_hash for use with xdp_buff
net: core: Add support for XDP redirection to slave device
net: bonding: Add XDP support to the bonding driver
devmap: Exclude XDP broadcast to master device
drivers/net/bonding/bond_main.c | 431 +++++++++++++++++++++++++++-----
include/linux/filter.h | 13 +-
include/linux/netdevice.h | 5 +
include/net/bonding.h | 1 +
kernel/bpf/devmap.c | 34 ++-
net/core/filter.c | 25 ++
6 files changed, 445 insertions(+), 64 deletions(-)
On Mon, Aug 2, 2021 at 6:24 AM <joamaki@gmail.com> wrote: > > From: Jussi Maki <joamaki@gmail.com> > > Add a test suite to test XDP bonding implementation > over a pair of veth devices. > > Signed-off-by: Jussi Maki <joamaki@gmail.com> > --- Was there any reason not to use BPF skeleton in your new tests? And also bpf_link-based XDP attachment instead of netlink-based? > .../selftests/bpf/prog_tests/xdp_bonding.c | 467 ++++++++++++++++++ > 1 file changed, 467 insertions(+) > [...]
On Tue, Aug 3, 2021 at 2:19 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Aug 2, 2021 at 6:24 AM <joamaki@gmail.com> wrote: > > > > From: Jussi Maki <joamaki@gmail.com> > > > > Add a test suite to test XDP bonding implementation > > over a pair of veth devices. > > > > Signed-off-by: Jussi Maki <joamaki@gmail.com> > > --- > > Was there any reason not to use BPF skeleton in your new tests? And > also bpf_link-based XDP attachment instead of netlink-based? Not really. I used the existing xdp_redirect_multi test as basis and that used this approach. I'll give a go at changing this to use the BPF skeletons.
On Wed, Aug 4, 2021 at 5:45 AM Jussi Maki <joamaki@gmail.com> wrote: > > Add a test suite to test XDP bonding implementation > over a pair of veth devices. > > Signed-off-by: Jussi Maki <joamaki@gmail.com> > --- > .../selftests/bpf/prog_tests/xdp_bonding.c | 533 ++++++++++++++++++ > 1 file changed, 533 insertions(+) > [...] > + > +static int xdp_attach(struct skeletons *skeletons, struct bpf_program *prog, char *iface) > +{ > + struct bpf_link *link; > + int ifindex; > + > + ifindex = if_nametoindex(iface); > + if (!ASSERT_GT(ifindex, 0, "get ifindex")) > + return -1; > + > + if (!ASSERT_LE(skeletons->nlinks, MAX_BPF_LINKS, "too many XDP programs attached")) If it's already less or equal to MAX_BPF_LINKS, then you'll bump nlinks below one more time and write beyond the array boundaries? > + return -1; > + > + link = bpf_program__attach_xdp(prog, ifindex); > + if (!ASSERT_OK_PTR(link, "attach xdp program")) > + return -1; > + > + skeletons->links[skeletons->nlinks++] = link; > + return 0; > +} > + [...] > + > +static void bonding_cleanup(struct skeletons *skeletons) > +{ > + restore_root_netns(); > + while (skeletons->nlinks) { > + skeletons->nlinks--; > + bpf_link__detach(skeletons->links[skeletons->nlinks]); You want bpf_link__destroy, not bpf_link__detach (detach will leave underlying BPF link FD open and ensure that bpf_link__destory() won't do anything with it, just frees memory). > + } > + ASSERT_OK(system("ip link delete bond1"), "delete bond1"); > + ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1"); > + ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2"); > + ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst"); > +} > + > +out: > + bonding_cleanup(skeletons); > +} > + > + nit: extra line > +/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding > + * all the interfaces to it and checking that broadcasting won't send the packet > + * to neither the ingress bond device (bond2) or its slave (veth2_1). > + */ > +void test_xdp_bonding_redirect_multi(struct skeletons *skeletons) > +{ > + static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"}; > + int veth1_1_rx, veth1_2_rx; > + int err; > + > + if (!test__start_subtest("xdp_bonding_redirect_multi")) > + return; > + > + if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, > + BOND_ONE_NO_ATTACH)) > + goto out; > + > + nit: another extra empty line, please check if there are more > + if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst")) > + goto out; > + [...] > + /* enslaving with a XDP program loaded fails */ > + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth); > + if (!ASSERT_OK_PTR(link, "attach program to veth")) > + goto out; > + > + err = system("ip link set veth master bond"); > + if (!ASSERT_NEQ(err, 0, "attaching slave with xdp program expected to fail")) > + goto out; > + > + bpf_link__detach(link); same here and in few more places, you need destroy > + link = NULL; > + > + err = system("ip link set veth master bond"); > + if (!ASSERT_OK(err, "set veth master")) > + goto out; > + > + /* attaching to slave when master has no program is allowed */ > + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth); > + if (!ASSERT_OK_PTR(link, "attach program to slave when enslaved")) > + goto out; > + > + /* attaching to master not allowed when slave has program loaded */ > + link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); > + if (!ASSERT_ERR_PTR(link2, "attach program to master when slave has program")) > + goto out; > + > + bpf_link__detach(link); > + link = NULL; > + > + /* attaching XDP program to master allowed when slave has no program */ > + link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); > + if (!ASSERT_OK_PTR(link, "attach program to master")) > + goto out; > + > + /* attaching to slave not allowed when master has program loaded */ > + link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond); > + ASSERT_ERR_PTR(link2, "attach program to slave when master has program"); > + > +out: > + if (link) > + bpf_link__detach(link); > + if (link2) > + bpf_link__detach(link2); bpf_link__destroy() handles NULLs just fine, you don't have to do extra checks > + > + system("ip link del veth"); > + system("ip link del bond"); > +} > + > +static int libbpf_debug_print(enum libbpf_print_level level, > + const char *format, va_list args) > +{ > + if (level != LIBBPF_WARN) > + vprintf(format, args); > + return 0; > +} > + > +struct bond_test_case { > + char *name; > + int mode; > + int xmit_policy; > +}; > + > +static struct bond_test_case bond_test_cases[] = { > + { "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, }, > + { "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 }, > + > + { "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, }, > + { "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, }, > + { "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, }, > +}; > + > +void test_xdp_bonding(void) this should be the only non-static function in this file, please fix all the functions above > +{ > + libbpf_print_fn_t old_print_fn; > + struct skeletons skeletons = {}; > + int i; > + > + old_print_fn = libbpf_set_print(libbpf_debug_print); > + > + root_netns_fd = open("/proc/self/ns/net", O_RDONLY); > + if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net")) > + goto out; > + > + skeletons.xdp_dummy = xdp_dummy__open_and_load(); > + if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load")) > + goto out; > + > + skeletons.xdp_tx = xdp_tx__open_and_load(); > + if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load")) > + goto out; > + > + skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load(); > + if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern, > + "xdp_redirect_multi_kern__open_and_load")) > + goto out; > + > + test_xdp_bonding_attach(&skeletons); check for errors > + > + for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) { > + struct bond_test_case *test_case = &bond_test_cases[i]; > + > + test_xdp_bonding_with_mode( > + &skeletons, > + test_case->name, > + test_case->mode, > + test_case->xmit_policy); > + } > + > + test_xdp_bonding_redirect_multi(&skeletons); > + > +out: > + if (skeletons.xdp_dummy) > + xdp_dummy__destroy(skeletons.xdp_dummy); > + if (skeletons.xdp_tx) > + xdp_tx__destroy(skeletons.xdp_tx); > + if (skeletons.xdp_redirect_multi_kern) > + xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern); similarly, all libbpf destructors handle NULL and error pointers cleanly, no need for extra ifs > + > + libbpf_set_print(old_print_fn); > + if (root_netns_fd) > + close(root_netns_fd); > +} > -- > 2.17.1 >
On Wed, Aug 4, 2021 at 5:45 AM Jussi Maki <joamaki@gmail.com> wrote: > > The program type cannot be deduced from 'tx' which causes an invalid > argument error when trying to load xdp_tx.o using the skeleton. > Rename the section name to "xdp/tx" so that libbpf can deduce the type. > > Signed-off-by: Jussi Maki <joamaki@gmail.com> > --- > tools/testing/selftests/bpf/progs/xdp_tx.c | 2 +- > tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c > index 94e6c2b281cb..ece1fbbc0984 100644 > --- a/tools/testing/selftests/bpf/progs/xdp_tx.c > +++ b/tools/testing/selftests/bpf/progs/xdp_tx.c > @@ -3,7 +3,7 @@ > #include <linux/bpf.h> > #include <bpf/bpf_helpers.h> > > -SEC("tx") > +SEC("xdp/tx") please use just SEC("xdp") > int xdp_tx(struct xdp_md *xdp) > { > return XDP_TX; > diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh > index ba8ffcdaac30..c8e0b7d36f56 100755 > --- a/tools/testing/selftests/bpf/test_xdp_veth.sh > +++ b/tools/testing/selftests/bpf/test_xdp_veth.sh > @@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 > ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 > > ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy > -ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx > +ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp/tx > ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy > > trap cleanup EXIT > -- > 2.17.1 >
On Thu, Aug 5, 2021 at 9:10 AM Jussi Maki <joamaki@gmail.com> wrote: > > The program type cannot be deduced from 'tx' which causes an invalid > argument error when trying to load xdp_tx.o using the skeleton. > Rename the section name to "xdp" so that libbpf can deduce the type. > > Signed-off-by: Jussi Maki <joamaki@gmail.com> > --- LGTM. Acked-by: Andrii Nakryiko <andrii@kernel.org> > tools/testing/selftests/bpf/progs/xdp_tx.c | 2 +- > tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c > index 94e6c2b281cb..5f725c720e00 100644 > --- a/tools/testing/selftests/bpf/progs/xdp_tx.c > +++ b/tools/testing/selftests/bpf/progs/xdp_tx.c > @@ -3,7 +3,7 @@ > #include <linux/bpf.h> > #include <bpf/bpf_helpers.h> > > -SEC("tx") > +SEC("xdp") > int xdp_tx(struct xdp_md *xdp) > { > return XDP_TX; > diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh > index ba8ffcdaac30..995278e684b6 100755 > --- a/tools/testing/selftests/bpf/test_xdp_veth.sh > +++ b/tools/testing/selftests/bpf/test_xdp_veth.sh > @@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1 > ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2 > > ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy > -ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx > +ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp > ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy > > trap cleanup EXIT > -- > 2.17.1 >