Message ID | 160593039663.2604.1374502006916871573.stgit@localhost.localdomain |
---|---|
State | New |
Headers | show |
Series | [net] tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN | expand |
On Fri, 20 Nov 2020 19:47:44 -0800 Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > When a BPF program is used to select between a type of TCP congestion > control algorithm that uses either ECN or not there is a case where the > synack for the frame was coming up without the ECT0 bit set. A bit of > research found that this was due to the final socket being configured to > dctcp while the listener socket was staying in cubic. > > To reproduce it all that is needed is to monitor TCP traffic while running > the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed, > assuming tcp_dctcp module is loaded or compiled in and the traffic matches > the rules in the sample file, is that for all frames with the exception of > the synack the ECT0 bit is set. > > To address that it is necessary to make one additional call to > tcp_bpf_ca_needs_ecn using the request socket and then use the output of > that to set the ECT0 bit for the tos/tclass of the packet. > > Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Applied, thank you!
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c5f8b686aa82..d20d779b18f3 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -980,13 +980,17 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst, skb = tcp_make_synack(sk, dst, req, foc, synack_type, syn_skb); - tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? - tcp_rsk(req)->syn_tos & ~INET_ECN_MASK : - inet_sk(sk)->tos; - if (skb) { __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr); + tos = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? + tcp_rsk(req)->syn_tos & ~INET_ECN_MASK : + inet_sk(sk)->tos; + + if (!INET_ECN_is_capable(tos) && + tcp_bpf_ca_needs_ecn((struct sock *)req)) + tos |= INET_ECN_ECT_0; + rcu_read_lock(); err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, ireq->ir_rmt_addr, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 3d49e8d0afee..e2b360e9bec4 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -527,11 +527,16 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, if (np->repflow && ireq->pktopts) fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts)); - rcu_read_lock(); - opt = ireq->ipv6_opt; tclass = sock_net(sk)->ipv4.sysctl_tcp_reflect_tos ? tcp_rsk(req)->syn_tos & ~INET_ECN_MASK : np->tclass; + + if (!INET_ECN_is_capable(tclass) && + tcp_bpf_ca_needs_ecn((struct sock *)req)) + tclass |= INET_ECN_ECT_0; + + rcu_read_lock(); + opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); err = ip6_xmit(sk, skb, fl6, sk->sk_mark, opt,