[net-next,v11,00/23] Introducing OpenVPN Data Channel Offload

Message ID	20241029-b4-ovpn-v11-0-de4698c73a25@openvpn.net
Headers	show Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B794205145 for <linux-kselftest@vger.kernel.org>; Tue, 29 Oct 2024 10:48:00 +0000 (UTC) From: Antonio Quartulli <antonio@openvpn.net> Subject: [PATCH net-next v11 00/23] Introducing OpenVPN Data Channel Offload Date: Tue, 29 Oct 2024 11:47:13 +0100 Message-Id: <20241029-b4-ovpn-v11-0-de4698c73a25@openvpn.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Donald Hunter <donald.hunter@gmail.com>, Antonio Quartulli <antonio@openvpn.net>, Shuah Khan <shuah@kernel.org>, donald.hunter@gmail.com, sd@queasysnail.net, ryazanov.s.a@gmail.com, Andrew Lunn <andrew@lunn.ch> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, steffen.klassert@secunet.com, antony.antony@secunet.com, Shuah Khan <skhan@linuxfoundation.org>
Series	Introducing OpenVPN Data Channel Offload \| expand [net-next,v11,00/23] Introducing OpenVPN Data Channel Offload [net-next,v11,02/23] net: introduce OpenVPN Data Channel Offload (ovpn) [net-next,v11,03/23] ovpn: add basic netlink support [net-next,v11,07/23] ovpn: introduce the ovpn_socket object [net-next,v11,08/23] ovpn: implement basic TX path (UDP) [net-next,v11,11/23] ovpn: store tunnel and transport statistics [net-next,v11,12/23] ovpn: implement TCP transport [net-next,v11,13/23] ovpn: implement multi-peer support [net-next,v11,15/23] ovpn: implement keepalive mechanism [net-next,v11,19/23] ovpn: implement key add/get/del/swap via netlink [net-next,v11,20/23] ovpn: kill key and notify userspace in case of IV exhaustion [net-next,v11,22/23] ovpn: add basic ethtool support

Antonio Quartulli Oct. 29, 2024, 10:47 a.m. UTC

Notable changes from v10:
* extended commit message of 23/23 with brief description of the output
* Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net

Please note that some patches were already reviewed by Andre Lunn,
Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
since no major code modification has happened since the review.

The latest code can also be found at:

https://github.com/OpenVPN/linux-kernel-ovpn

Thanks a lot!
Best Regards,

Antonio Quartulli
OpenVPN Inc.

---
Antonio Quartulli (23):
      netlink: add NLA_POLICY_MAX_LEN macro
      net: introduce OpenVPN Data Channel Offload (ovpn)
      ovpn: add basic netlink support
      ovpn: add basic interface creation/destruction/management routines
      ovpn: keep carrier always on
      ovpn: introduce the ovpn_peer object
      ovpn: introduce the ovpn_socket object
      ovpn: implement basic TX path (UDP)
      ovpn: implement basic RX path (UDP)
      ovpn: implement packet processing
      ovpn: store tunnel and transport statistics
      ovpn: implement TCP transport
      ovpn: implement multi-peer support
      ovpn: implement peer lookup logic
      ovpn: implement keepalive mechanism
      ovpn: add support for updating local UDP endpoint
      ovpn: add support for peer floating
      ovpn: implement peer add/get/dump/delete via netlink
      ovpn: implement key add/get/del/swap via netlink
      ovpn: kill key and notify userspace in case of IV exhaustion
      ovpn: notify userspace when a peer is deleted
      ovpn: add basic ethtool support
      testing/selftests: add test tool and scripts for ovpn module

 Documentation/netlink/specs/ovpn.yaml              |  362 +++
 MAINTAINERS                                        |   11 +
 drivers/net/Kconfig                                |   14 +
 drivers/net/Makefile                               |    1 +
 drivers/net/ovpn/Makefile                          |   22 +
 drivers/net/ovpn/bind.c                            |   54 +
 drivers/net/ovpn/bind.h                            |  117 +
 drivers/net/ovpn/crypto.c                          |  214 ++
 drivers/net/ovpn/crypto.h                          |  145 ++
 drivers/net/ovpn/crypto_aead.c                     |  386 ++++
 drivers/net/ovpn/crypto_aead.h                     |   33 +
 drivers/net/ovpn/io.c                              |  462 ++++
 drivers/net/ovpn/io.h                              |   25 +
 drivers/net/ovpn/main.c                            |  337 +++
 drivers/net/ovpn/main.h                            |   24 +
 drivers/net/ovpn/netlink-gen.c                     |  212 ++
 drivers/net/ovpn/netlink-gen.h                     |   41 +
 drivers/net/ovpn/netlink.c                         | 1135 ++++++++++
 drivers/net/ovpn/netlink.h                         |   18 +
 drivers/net/ovpn/ovpnstruct.h                      |   61 +
 drivers/net/ovpn/packet.h                          |   40 +
 drivers/net/ovpn/peer.c                            | 1201 ++++++++++
 drivers/net/ovpn/peer.h                            |  165 ++
 drivers/net/ovpn/pktid.c                           |  130 ++
 drivers/net/ovpn/pktid.h                           |   87 +
 drivers/net/ovpn/proto.h                           |  104 +
 drivers/net/ovpn/skb.h                             |   56 +
 drivers/net/ovpn/socket.c                          |  178 ++
 drivers/net/ovpn/socket.h                          |   55 +
 drivers/net/ovpn/stats.c                           |   21 +
 drivers/net/ovpn/stats.h                           |   47 +
 drivers/net/ovpn/tcp.c                             |  506 +++++
 drivers/net/ovpn/tcp.h                             |   44 +
 drivers/net/ovpn/udp.c                             |  406 ++++
 drivers/net/ovpn/udp.h                             |   26 +
 include/net/netlink.h                              |    1 +
 include/uapi/linux/if_link.h                       |   15 +
 include/uapi/linux/ovpn.h                          |  109 +
 include/uapi/linux/udp.h                           |    1 +
 tools/net/ynl/ynl-gen-c.py                         |    4 +-
 tools/testing/selftests/Makefile                   |    1 +
 tools/testing/selftests/net/ovpn/.gitignore        |    2 +
 tools/testing/selftests/net/ovpn/Makefile          |   17 +
 tools/testing/selftests/net/ovpn/config            |   10 +
 tools/testing/selftests/net/ovpn/data64.key        |    5 +
 tools/testing/selftests/net/ovpn/ovpn-cli.c        | 2370 ++++++++++++++++++++
 tools/testing/selftests/net/ovpn/tcp_peers.txt     |    5 +
 .../testing/selftests/net/ovpn/test-chachapoly.sh  |    9 +
 tools/testing/selftests/net/ovpn/test-float.sh     |    9 +
 tools/testing/selftests/net/ovpn/test-tcp.sh       |    9 +
 tools/testing/selftests/net/ovpn/test.sh           |  183 ++
 tools/testing/selftests/net/ovpn/udp_peers.txt     |    5 +
 52 files changed, 9494 insertions(+), 1 deletion(-)
---
base-commit: ab101c553bc1f76a839163d1dc0d1e715ad6bb4e
change-id: 20241002-b4-ovpn-eeee35c694a2

Best regards,

Sabrina Dubroca Oct. 30, 2024, 4:37 p.m. UTC | #1

2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
> +static void ovpn_peer_release(struct ovpn_peer *peer)
> +{
> +	ovpn_bind_reset(peer, NULL);
> +
> +	dst_cache_destroy(&peer->dst_cache);

Is it safe to destroy the cache at this time? In the same function, we
use rcu to free the peer, but AFAICT the dst_cache will be freed
immediately:

void dst_cache_destroy(struct dst_cache *dst_cache)
{
[...]
	free_percpu(dst_cache->cache);
}

(probably no real issue because ovpn_udp_send_skb gets called while we
hold a reference to the peer?)

> +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
> +	kfree_rcu(peer, rcu);
> +}


[...]
> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> +			     enum ovpn_del_peer_reason reason)
> +	__must_hold(&peer->ovpn->lock)
> +{
> +	struct ovpn_peer *tmp;
> +
> +	tmp = rcu_dereference_protected(peer->ovpn->peer,
> +					lockdep_is_held(&peer->ovpn->lock));
> +	if (tmp != peer) {
> +		DEBUG_NET_WARN_ON_ONCE(1);
> +		if (tmp)
> +			ovpn_peer_put(tmp);

Does peer->ovpn->peer need to be set to NULL here as well? Or is it
going to survive this _put?

> +
> +		return -ENOENT;
> +	}
> +
> +	tmp->delete_reason = reason;
> +	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
> +	ovpn_peer_put(tmp);
> +
> +	return 0;
> +}

Antonio Quartulli Oct. 30, 2024, 8:47 p.m. UTC | #2

On 30/10/2024 17:37, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
>> +static void ovpn_peer_release(struct ovpn_peer *peer)
>> +{
>> +	ovpn_bind_reset(peer, NULL);
>> +
>> +	dst_cache_destroy(&peer->dst_cache);
> 
> Is it safe to destroy the cache at this time? In the same function, we
> use rcu to free the peer, but AFAICT the dst_cache will be freed
> immediately:
> 
> void dst_cache_destroy(struct dst_cache *dst_cache)
> {
> [...]
> 	free_percpu(dst_cache->cache);
> }
> 
> (probably no real issue because ovpn_udp_send_skb gets called while we
> hold a reference to the peer?)

Right.
That was my assumption: release happens on refcnt = 0 only, therefore no 
field should be in use anymore.
Anything that may still be in use will have its own refcounter.

> 
>> +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
>> +	kfree_rcu(peer, rcu);
>> +}
> 
> 
> [...]
>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
>> +			     enum ovpn_del_peer_reason reason)
>> +	__must_hold(&peer->ovpn->lock)
>> +{
>> +	struct ovpn_peer *tmp;
>> +
>> +	tmp = rcu_dereference_protected(peer->ovpn->peer,
>> +					lockdep_is_held(&peer->ovpn->lock));
>> +	if (tmp != peer) {
>> +		DEBUG_NET_WARN_ON_ONCE(1);
>> +		if (tmp)
>> +			ovpn_peer_put(tmp);
> 
> Does peer->ovpn->peer need to be set to NULL here as well? Or is it
> going to survive this _put?

First of all consider that this is truly something that we don't expect 
to happen (hence the WARN_ON).
If this is happening it's because we are trying to delete a peer that is 
not the one we are connected to (unexplainable scenario in p2p mode).

Still, should we hit this case (I truly can't see how), I'd say "leave 
everything as is - maybe this call was just a mistake".

Cheers,

> 
>> +
>> +		return -ENOENT;
>> +	}
>> +
>> +	tmp->delete_reason = reason;
>> +	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
>> +	ovpn_peer_put(tmp);
>> +
>> +	return 0;
>> +}
>

Antonio Quartulli Oct. 31, 2024, 10 a.m. UTC | #3

It seems some little changes to ovpn.yaml were not reflected in the 
generated files I committed.

Specifically I changed some U32 to BE32 (IPv4 addresses) and files were 
not regenerated before committing.

(I saw the failure in patchwork about this)

It seems I'll have to send v12 after all.

Cheers,

On 29/10/2024 11:47, Antonio Quartulli wrote:
> Notable changes from v10:
> * extended commit message of 23/23 with brief description of the output
> * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
> 
> Please note that some patches were already reviewed by Andre Lunn,
> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
> since no major code modification has happened since the review.
> 
> The latest code can also be found at:
> 
> https://github.com/OpenVPN/linux-kernel-ovpn
> 
> Thanks a lot!
> Best Regards,
> 
> Antonio Quartulli
> OpenVPN Inc.
> 
> ---
> Antonio Quartulli (23):
>        netlink: add NLA_POLICY_MAX_LEN macro
>        net: introduce OpenVPN Data Channel Offload (ovpn)
>        ovpn: add basic netlink support
>        ovpn: add basic interface creation/destruction/management routines
>        ovpn: keep carrier always on
>        ovpn: introduce the ovpn_peer object
>        ovpn: introduce the ovpn_socket object
>        ovpn: implement basic TX path (UDP)
>        ovpn: implement basic RX path (UDP)
>        ovpn: implement packet processing
>        ovpn: store tunnel and transport statistics
>        ovpn: implement TCP transport
>        ovpn: implement multi-peer support
>        ovpn: implement peer lookup logic
>        ovpn: implement keepalive mechanism
>        ovpn: add support for updating local UDP endpoint
>        ovpn: add support for peer floating
>        ovpn: implement peer add/get/dump/delete via netlink
>        ovpn: implement key add/get/del/swap via netlink
>        ovpn: kill key and notify userspace in case of IV exhaustion
>        ovpn: notify userspace when a peer is deleted
>        ovpn: add basic ethtool support
>        testing/selftests: add test tool and scripts for ovpn module
> 
>   Documentation/netlink/specs/ovpn.yaml              |  362 +++
>   MAINTAINERS                                        |   11 +
>   drivers/net/Kconfig                                |   14 +
>   drivers/net/Makefile                               |    1 +
>   drivers/net/ovpn/Makefile                          |   22 +
>   drivers/net/ovpn/bind.c                            |   54 +
>   drivers/net/ovpn/bind.h                            |  117 +
>   drivers/net/ovpn/crypto.c                          |  214 ++
>   drivers/net/ovpn/crypto.h                          |  145 ++
>   drivers/net/ovpn/crypto_aead.c                     |  386 ++++
>   drivers/net/ovpn/crypto_aead.h                     |   33 +
>   drivers/net/ovpn/io.c                              |  462 ++++
>   drivers/net/ovpn/io.h                              |   25 +
>   drivers/net/ovpn/main.c                            |  337 +++
>   drivers/net/ovpn/main.h                            |   24 +
>   drivers/net/ovpn/netlink-gen.c                     |  212 ++
>   drivers/net/ovpn/netlink-gen.h                     |   41 +
>   drivers/net/ovpn/netlink.c                         | 1135 ++++++++++
>   drivers/net/ovpn/netlink.h                         |   18 +
>   drivers/net/ovpn/ovpnstruct.h                      |   61 +
>   drivers/net/ovpn/packet.h                          |   40 +
>   drivers/net/ovpn/peer.c                            | 1201 ++++++++++
>   drivers/net/ovpn/peer.h                            |  165 ++
>   drivers/net/ovpn/pktid.c                           |  130 ++
>   drivers/net/ovpn/pktid.h                           |   87 +
>   drivers/net/ovpn/proto.h                           |  104 +
>   drivers/net/ovpn/skb.h                             |   56 +
>   drivers/net/ovpn/socket.c                          |  178 ++
>   drivers/net/ovpn/socket.h                          |   55 +
>   drivers/net/ovpn/stats.c                           |   21 +
>   drivers/net/ovpn/stats.h                           |   47 +
>   drivers/net/ovpn/tcp.c                             |  506 +++++
>   drivers/net/ovpn/tcp.h                             |   44 +
>   drivers/net/ovpn/udp.c                             |  406 ++++
>   drivers/net/ovpn/udp.h                             |   26 +
>   include/net/netlink.h                              |    1 +
>   include/uapi/linux/if_link.h                       |   15 +
>   include/uapi/linux/ovpn.h                          |  109 +
>   include/uapi/linux/udp.h                           |    1 +
>   tools/net/ynl/ynl-gen-c.py                         |    4 +-
>   tools/testing/selftests/Makefile                   |    1 +
>   tools/testing/selftests/net/ovpn/.gitignore        |    2 +
>   tools/testing/selftests/net/ovpn/Makefile          |   17 +
>   tools/testing/selftests/net/ovpn/config            |   10 +
>   tools/testing/selftests/net/ovpn/data64.key        |    5 +
>   tools/testing/selftests/net/ovpn/ovpn-cli.c        | 2370 ++++++++++++++++++++
>   tools/testing/selftests/net/ovpn/tcp_peers.txt     |    5 +
>   .../testing/selftests/net/ovpn/test-chachapoly.sh  |    9 +
>   tools/testing/selftests/net/ovpn/test-float.sh     |    9 +
>   tools/testing/selftests/net/ovpn/test-tcp.sh       |    9 +
>   tools/testing/selftests/net/ovpn/test.sh           |  183 ++
>   tools/testing/selftests/net/ovpn/udp_peers.txt     |    5 +
>   52 files changed, 9494 insertions(+), 1 deletion(-)
> ---
> base-commit: ab101c553bc1f76a839163d1dc0d1e715ad6bb4e
> change-id: 20241002-b4-ovpn-eeee35c694a2
> 
> Best regards,

Sabrina Dubroca Oct. 31, 2024, 11:29 a.m. UTC | #4

2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:
> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
[...]
> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> +		/* DATA_V1 is not supported */
> +		if (opcode == OVPN_DATA_V1)

The TCP encap code passes everything that's not V2 to userspace. Why
not do that with UDP as well?

> +			goto drop;
> +
> +		/* unknown or control packet: let it bubble up to userspace */
> +		return 1;
> +	}
> +
> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
> +	/* some OpenVPN server implementations send data packets with the
> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
> +	 * and we try with the transport address
> +	 */
> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
> +		if (!peer) {
> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
> +					    __func__, peer_id);
> +			goto drop;
> +		}
> +	}
> +
> +	if (!peer) {

nit: that could be an "else" combined with the previous case?

> +		/* data packet with undef peer-id */
> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
> +		if (unlikely(!peer)) {
> +			net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
> +					    __func__);
> +			goto drop;
> +		}
> +	}

Antonio Quartulli Oct. 31, 2024, 1:04 p.m. UTC | #5

On 31/10/2024 12:29, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:22 +0100, Antonio Quartulli wrote:
>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>> +{
> [...]
>> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
>> +	if (unlikely(opcode != OVPN_DATA_V2)) {
>> +		/* DATA_V1 is not supported */
>> +		if (opcode == OVPN_DATA_V1)
> 
> The TCP encap code passes everything that's not V2 to userspace. Why
> not do that with UDP as well?

If that's the case, then this is a bug in the TCP code.

DATA_Vx packets are part of the data channel and userspace can't do 
anything with them (userspace handles the control channel only when the 
ovpn module is in use).

I'll go check the TCP code then, because sending DATA_V1 to userspace is 
not expected. Thanks for noticing this discrepancy.

> 
>> +			goto drop;
>> +
>> +		/* unknown or control packet: let it bubble up to userspace */
>> +		return 1;
>> +	}
>> +
>> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
>> +	/* some OpenVPN server implementations send data packets with the
>> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
>> +	 * and we try with the transport address
>> +	 */
>> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
>> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
>> +		if (!peer) {
>> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
>> +					    __func__, peer_id);
>> +			goto drop;
>> +		}
>> +	}
>> +
>> +	if (!peer) {
> 
> nit: that could be an "else" combined with the previous case?

mhh that's true. Then I can combine the two "if (!peer)" in one block only.


Thanks!
Regards,

> 
>> +		/* data packet with undef peer-id */
>> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
>> +		if (unlikely(!peer)) {
>> +			net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
>> +					    __func__);
>> +			goto drop;
>> +		}
>> +	}
>

patchwork-bot+netdevbpf@kernel.org Nov. 1, 2024, 2:20 a.m. UTC | #6

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 29 Oct 2024 11:47:13 +0100 you wrote:
> Notable changes from v10:
> * extended commit message of 23/23 with brief description of the output
> * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
> 
> Please note that some patches were already reviewed by Andre Lunn,
> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
> since no major code modification has happened since the review.
> 
> [...]

Here is the summary with links:
  - [net-next,v11,01/23] netlink: add NLA_POLICY_MAX_LEN macro
    https://git.kernel.org/netdev/net-next/c/4138e9ec0093
  - [net-next,v11,02/23] net: introduce OpenVPN Data Channel Offload (ovpn)
    (no matching commit)
  - [net-next,v11,03/23] ovpn: add basic netlink support
    (no matching commit)
  - [net-next,v11,04/23] ovpn: add basic interface creation/destruction/management routines
    (no matching commit)
  - [net-next,v11,05/23] ovpn: keep carrier always on
    (no matching commit)
  - [net-next,v11,06/23] ovpn: introduce the ovpn_peer object
    (no matching commit)
  - [net-next,v11,07/23] ovpn: introduce the ovpn_socket object
    (no matching commit)
  - [net-next,v11,08/23] ovpn: implement basic TX path (UDP)
    (no matching commit)
  - [net-next,v11,09/23] ovpn: implement basic RX path (UDP)
    (no matching commit)
  - [net-next,v11,10/23] ovpn: implement packet processing
    (no matching commit)
  - [net-next,v11,11/23] ovpn: store tunnel and transport statistics
    (no matching commit)
  - [net-next,v11,12/23] ovpn: implement TCP transport
    (no matching commit)
  - [net-next,v11,13/23] ovpn: implement multi-peer support
    (no matching commit)
  - [net-next,v11,14/23] ovpn: implement peer lookup logic
    (no matching commit)
  - [net-next,v11,15/23] ovpn: implement keepalive mechanism
    (no matching commit)
  - [net-next,v11,16/23] ovpn: add support for updating local UDP endpoint
    (no matching commit)
  - [net-next,v11,17/23] ovpn: add support for peer floating
    (no matching commit)
  - [net-next,v11,18/23] ovpn: implement peer add/get/dump/delete via netlink
    (no matching commit)
  - [net-next,v11,19/23] ovpn: implement key add/get/del/swap via netlink
    (no matching commit)
  - [net-next,v11,20/23] ovpn: kill key and notify userspace in case of IV exhaustion
    (no matching commit)
  - [net-next,v11,21/23] ovpn: notify userspace when a peer is deleted
    (no matching commit)
  - [net-next,v11,22/23] ovpn: add basic ethtool support
    (no matching commit)
  - [net-next,v11,23/23] testing/selftests: add test tool and scripts for ovpn module
    (no matching commit)

You are awesome, thank you!

Sabrina Dubroca Nov. 4, 2024, 11:24 a.m. UTC | #7

2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
> +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
> +				    const struct sockaddr_storage *ss,
> +				    const u8 *local_ip)
> +	__must_hold(&peer->lock)
> +{
> +	struct ovpn_bind *bind;
> +	size_t ip_len;
> +
> +	/* create new ovpn_bind object */
> +	bind = ovpn_bind_from_sockaddr(ss);
> +	if (IS_ERR(bind))
> +		return PTR_ERR(bind);
> +
> +	if (local_ip) {
> +		if (ss->ss_family == AF_INET) {
> +			ip_len = sizeof(struct in_addr);
> +		} else if (ss->ss_family == AF_INET6) {
> +			ip_len = sizeof(struct in6_addr);
> +		} else {
> +			netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
> +				   __func__);

ratelimited since that can be triggered from packet processing?


[...]
> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
[...]
> +
> +	switch (family) {
> +	case AF_INET:
> +		sa = (struct sockaddr_in *)&ss;
> +		sa->sin_family = AF_INET;
> +		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
> +		sa->sin_port = udp_hdr(skb)->source;
> +		salen = sizeof(*sa);
> +		break;
> +	case AF_INET6:
> +		sa6 = (struct sockaddr_in6 *)&ss;
> +		sa6->sin6_family = AF_INET6;
> +		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
> +		sa6->sin6_port = udp_hdr(skb)->source;
> +		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
> +							 skb->skb_iif);
> +		salen = sizeof(*sa6);
> +		break;
> +	default:
> +		goto unlock;
> +	}
> +
> +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,

                                              %u for peer->id?

and ratelimited too, probably.

(also in ovpn_peer_update_local_endpoint in the previous patch)

> +		   peer->id, &ss);
> +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
> +				 local_ip);

skip the rehash if this fails? peer->bind will still be the old one so
moving it to the new hash chain won't help (the lookup will fail).

Sabrina Dubroca Nov. 4, 2024, 11:26 a.m. UTC | #8

2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
>  struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
>  					       struct sk_buff *skb)
>  {
> -	struct ovpn_peer *peer = NULL;
> +	struct ovpn_peer *tmp, *peer = NULL;
>  	struct sockaddr_storage ss = { 0 };
> +	struct hlist_nulls_head *nhead;
> +	struct hlist_nulls_node *ntmp;
> +	size_t sa_len;
>  
>  	if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
>  		return NULL;
>  
>  	if (ovpn->mode == OVPN_MODE_P2P)
> -		peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
> +		return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
> +
> +	switch (ss.ss_family) {
> +	case AF_INET:
> +		sa_len = sizeof(struct sockaddr_in);
> +		break;
> +	case AF_INET6:
> +		sa_len = sizeof(struct sockaddr_in6);
> +		break;
> +	default:
> +		return NULL;
> +	}

You could get rid of that switch by having ovpn_peer_skb_to_sockaddr
also set sa_len (or return 0/the size).

> +
> +	nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len);
> +
> +	rcu_read_lock();
> +	hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead,
> +				       hash_entry_transp_addr) {

I think that's missing the retry in case we ended up in the wrong
bucket due to a peer rehash?

> +		if (!ovpn_peer_transp_match(tmp, &ss))
> +			continue;
> +
> +		if (!ovpn_peer_hold(tmp))
> +			continue;
> +
> +		peer = tmp;
> +		break;
> +	}
> +	rcu_read_unlock();
>  
>  	return peer;
>  }

Sabrina Dubroca Nov. 4, 2024, 3:14 p.m. UTC | #9

2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
> +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
> +				 struct genl_info *info,
> +				 struct nlattr **attrs)
> +{
> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
> +			      OVPN_A_PEER_ID))
> +		return -EINVAL;
> +
> +	if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "cannot specify both remote IPv4 or IPv6 address");
> +		return -EINVAL;
> +	}
> +
> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
> +	    !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "cannot specify remote port without IP address");
> +		return -EINVAL;
> +	}
> +
> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
> +	    attrs[OVPN_A_PEER_LOCAL_IPV4]) {
> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "cannot specify local IPv4 address without remote");
> +		return -EINVAL;
> +	}
> +
> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
> +	    attrs[OVPN_A_PEER_LOCAL_IPV6]) {

I think these consistency checks should account for v4mapped
addresses. With remote=v4mapped and local=v6 we'll end up with an
incorrect ipv4 "local" address (taken out of the ipv6 address's first
4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped,
we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to
ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address)
out of that.

> +		NL_SET_ERR_MSG_MOD(info->extack,
> +				   "cannot specify local IPV6 address without remote");
> +		return -EINVAL;
> +	}


[...]
>  int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info)
>  {
[...]
> +	ret = ovpn_nl_peer_modify(peer, info, attrs);
> +	if (ret < 0) {
> +		ovpn_peer_put(peer);
> +		return ret;
> +	}
> +
> +	/* ret == 1 means that VPN IPv4/6 has been modified and rehashing
> +	 * is required
> +	 */
> +	if (ret > 0) {

&& mode == MP ?

I don't see ovpn_nl_peer_modify checking that before returning 1, and
in P2P mode ovpn->peers will be NULL.

> +		spin_lock_bh(&ovpn->peers->lock);
> +		ovpn_peer_hash_vpn_ip(peer);
> +		spin_unlock_bh(&ovpn->peers->lock);
> +	}
> +
> +	ovpn_peer_put(peer);
> +
> +	return 0;
> +}

>  int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
>  {
[...]
> +	} else {
> +		rcu_read_lock();
> +		hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
> +				  hash_entry_id) {
> +			/* skip already dumped peers that were dumped by
> +			 * previous invocations
> +			 */
> +			if (last_idx > 0) {
> +				last_idx--;
> +				continue;
> +			}

If a peer that was dumped during a previous invocation is removed in
between, we'll miss one that's still present in the overall dump. I
don't know how much it matters (I guses it depends on how the results
of this dump are used by userspace), so I'll let you decide if this
needs to be fixed immediately or if it can be ignored for now.

> +
> +			if (ovpn_nl_send_peer(skb, info, peer,
> +					      NETLINK_CB(cb->skb).portid,
> +					      cb->nlh->nlmsg_seq,
> +					      NLM_F_MULTI) < 0)
> +				break;
> +
> +			/* count peers being dumped during this invocation */
> +			dumped++;
> +		}
> +		rcu_read_unlock();
> +	}
> +
> +out:
> +	netdev_put(ovpn->dev, &ovpn->dev_tracker);
> +
> +	/* sum up peers dumped in this message, so that at the next invocation
> +	 * we can continue from where we left
> +	 */
> +	cb->args[1] += dumped;
> +	return skb->len;
>  }
>  
>  int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> -	return -EOPNOTSUPP;
> +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
> +	struct ovpn_struct *ovpn = info->user_ptr[0];
> +	struct ovpn_peer *peer;
> +	u32 peer_id;
> +	int ret;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
> +		return -EINVAL;
> +
> +	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
> +			       ovpn_peer_nl_policy, info->extack);
> +	if (ret)
> +		return ret;
> +
> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
> +			      OVPN_A_PEER_ID))
> +		return -EINVAL;
> +
> +	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
> +
> +	peer = ovpn_peer_get_by_id(ovpn, peer_id);
> +	if (!peer)

maybe c/p the extack from ovpn_nl_peer_get_doit?

> +		return -ENOENT;
> +
> +	netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id);
> +	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
> +	ovpn_peer_put(peer);
> +
> +	return ret;
>  }

Sabrina Dubroca Nov. 5, 2024, 1:12 p.m. UTC | #10

2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote:
> On 30/10/2024 17:37, Sabrina Dubroca wrote:
> > 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
> > > +static void ovpn_peer_release(struct ovpn_peer *peer)
> > > +{
> > > +	ovpn_bind_reset(peer, NULL);
> > > +
> > > +	dst_cache_destroy(&peer->dst_cache);
> > 
> > Is it safe to destroy the cache at this time? In the same function, we
> > use rcu to free the peer, but AFAICT the dst_cache will be freed
> > immediately:
> > 
> > void dst_cache_destroy(struct dst_cache *dst_cache)
> > {
> > [...]
> > 	free_percpu(dst_cache->cache);
> > }
> > 
> > (probably no real issue because ovpn_udp_send_skb gets called while we
> > hold a reference to the peer?)
> 
> Right.
> That was my assumption: release happens on refcnt = 0 only, therefore no
> field should be in use anymore.
> Anything that may still be in use will have its own refcounter.

My worry is that code changes over time, assumptions are forgotten,
and we end up with code that was a bit odd but safe not being safe
anymore.

> > 
> > > +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
> > > +	kfree_rcu(peer, rcu);
> > > +}
> > 
> > 
> > [...]
> > > +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> > > +			     enum ovpn_del_peer_reason reason)
> > > +	__must_hold(&peer->ovpn->lock)
> > > +{
> > > +	struct ovpn_peer *tmp;
> > > +
> > > +	tmp = rcu_dereference_protected(peer->ovpn->peer,
> > > +					lockdep_is_held(&peer->ovpn->lock));
> > > +	if (tmp != peer) {
> > > +		DEBUG_NET_WARN_ON_ONCE(1);
> > > +		if (tmp)
> > > +			ovpn_peer_put(tmp);
> > 
> > Does peer->ovpn->peer need to be set to NULL here as well? Or is it
> > going to survive this _put?
> 
> First of all consider that this is truly something that we don't expect to
> happen (hence the WARN_ON).
> If this is happening it's because we are trying to delete a peer that is not
> the one we are connected to (unexplainable scenario in p2p mode).
>
> Still, should we hit this case (I truly can't see how), I'd say "leave
> everything as is - maybe this call was just a mistake".

Yeah, true, let's leave it. Thanks.

Sergey Ryazanov Nov. 6, 2024, 1:18 a.m. UTC | #11

Hi Antonio,

On 29.10.2024 12:47, Antonio Quartulli wrote:
> Notable changes from v10:
> * extended commit message of 23/23 with brief description of the output
> * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0-b87530777be7@openvpn.net
> 
> Please note that some patches were already reviewed by Andre Lunn,
> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
> since no major code modification has happened since the review.
> 
> The latest code can also be found at:
> 
> https://github.com/OpenVPN/linux-kernel-ovpn

As I promised many months ago I am starting publishing some nit picks 
regarding the series. The review was started when series was V3 
"rebasing" the review to every next version to publish it at once. But I 
lost this race to the new version releasing velocity :) So, I am going 
to publish it patch-by-patch.

Anyway you and all participants have done a great progress toward making 
accelerator part of the kernel. Most of considerable things already 
resolved so do not wait me please to finish picking every nit.

Regarding "big" topics I have only two concerns: link creation using 
RTNL and a switch statement usage. In the corresponding thread, I asked 
Jiri to clarify that "should" regarding .newlink implementation. Hope he 
will have a chance to find a time to reply.

For the 'switch' statement, I see a repeating pattern of handling 
mode-or family-specific cases like this:

int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
{
   switch (ovpn->mode) {
   case OVPN_MODE_MP:
     return ovpn_peer_add_mp(ovpn, peer);
   case OVPN_MODE_P2P:
     return ovpn_peer_add_p2p(ovpn, peer);
   default:
     return -EOPNOTSUPP;
   }
}

or

void ovpn_encrypt_post(void *data, int ret)
{
   ...
   switch (peer->sock->sock->sk->sk_protocol) {
   case IPPROTO_UDP:
     ovpn_udp_send_skb(peer->ovpn, peer, skb);
     break;
   case IPPROTO_TCP:
     ovpn_tcp_send_skb(peer, skb);
     break;
   default:
     /* no transport configured yet */
     goto err;
   }
   ...
}

or

void ovpn_peer_keepalive_work(...)
{
   ...
   switch (ovpn->mode) {
   case OVPN_MODE_MP:
     next_run = ovpn_peer_keepalive_work_mp(ovpn, now);
     break;
   case OVPN_MODE_P2P:
     next_run = ovpn_peer_keepalive_work_p2p(ovpn, now);
     break;
   }
   ...
}

Did you consider to implement mode specific operations as a set of 
operations like this:

ovpn_ops {
   int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer);
   int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason 
reason);
   void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb);
   time64_t (*keepalive_work)(...);
};

Initialize them during the interface creation and invoke these 
operations indirectly. E.g.

int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
{
   return ovpn->ops->peer_add(ovpn, peer);
}

void ovpn_encrypt_post(void *data, int ret)
{
   ...
   ovpn->ops->send_skb(peer, skb);
   ...
}

void ovpn_peer_keepalive_work(...)
{
   ...
   next_run = ovpn->ops->keepalive_work(ovpn, now);
   ...
}

Anyway the module has all these option values in advance during the 
network interface creation phase and I believe replacing 'switch' 
statements with indirect calls can make code easy to read.

--
Sergey

Sergey Ryazanov Nov. 9, 2024, 1:01 a.m. UTC | #12

On 29.10.2024 12:47, Antonio Quartulli wrote:
> Add basic infrastructure for handling ovpn interfaces.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>   drivers/net/ovpn/main.c       | 115 ++++++++++++++++++++++++++++++++++++++++--
>   drivers/net/ovpn/main.h       |   7 +++
>   drivers/net/ovpn/ovpnstruct.h |   8 +++
>   drivers/net/ovpn/packet.h     |  40 +++++++++++++++
>   include/uapi/linux/if_link.h  |  15 ++++++
>   5 files changed, 180 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -10,18 +10,52 @@
>   #include <linux/genetlink.h>
>   #include <linux/module.h>
>   #include <linux/netdevice.h>
> +#include <linux/inetdevice.h>
> +#include <net/ip.h>
>   #include <net/rtnetlink.h>
> -#include <uapi/linux/ovpn.h>
> +#include <uapi/linux/if_arp.h>
>   
>   #include "ovpnstruct.h"
>   #include "main.h"
>   #include "netlink.h"
>   #include "io.h"
> +#include "packet.h"
>   
>   /* Driver info */
>   #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
>   #define DRV_COPYRIGHT	"(C) 2020-2024 OpenVPN, Inc."
>   
> +static void ovpn_struct_free(struct net_device *net)
> +{
> +}

nit: since this handler is not mandatory, its introduction can be moved 
to the later patch, which actually fills it with meaningful operations.

> +static int ovpn_net_open(struct net_device *dev)
> +{
> +	netif_tx_start_all_queues(dev);
> +	return 0;
> +}
> +
> +static int ovpn_net_stop(struct net_device *dev)
> +{
> +	netif_tx_stop_all_queues(dev);
> +	return 0;
> +}
> +
> +static const struct net_device_ops ovpn_netdev_ops = {
> +	.ndo_open		= ovpn_net_open,
> +	.ndo_stop		= ovpn_net_stop,
> +	.ndo_start_xmit		= ovpn_net_xmit,
> +};
> +
> +static const struct device_type ovpn_type = {
> +	.name = OVPN_FAMILY_NAME,

nit: same question here regarding name deriviation. Are you sure that 
the device type name is the same as the GENL family name?

> +};
> +
> +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = {
> +	[IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P,
> +					    OVPN_MODE_MP),
> +};
> +
>   /**
>    * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
>    * @dev: the interface to check
> @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
>   	return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
>   }
>   
> +static void ovpn_setup(struct net_device *dev)
> +{
> +	/* compute the overhead considering AEAD encryption */
> +	const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +

Where are these magic sizeof(u32) and '16' came from?

> +			     sizeof(struct udphdr) +
> +			     max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
> +
> +	netdev_features_t feat = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
> +				 NETIF_F_GSO | NETIF_F_GSO_SOFTWARE |
> +				 NETIF_F_HIGHDMA;
> +
> +	dev->needs_free_netdev = true;
> +
> +	dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
> +
> +	dev->netdev_ops = &ovpn_netdev_ops;
> +
> +	dev->priv_destructor = ovpn_struct_free;
> +
> +	dev->hard_header_len = 0;
> +	dev->addr_len = 0;
> +	dev->mtu = ETH_DATA_LEN - overhead;
> +	dev->min_mtu = IPV4_MIN_MTU;
> +	dev->max_mtu = IP_MAX_MTU - overhead;
> +
> +	dev->type = ARPHRD_NONE;
> +	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
> +	dev->priv_flags |= IFF_NO_QUEUE;
> +
> +	dev->lltx = true;
> +	dev->features |= feat;
> +	dev->hw_features |= feat;
> +	dev->hw_enc_features |= feat;
> +
> +	dev->needed_headroom = OVPN_HEAD_ROOM;
> +	dev->needed_tailroom = OVPN_MAX_PADDING;
> +
> +	SET_NETDEV_DEVTYPE(dev, &ovpn_type);
> +}
> +
>   static int ovpn_newlink(struct net *src_net, struct net_device *dev,
>   			struct nlattr *tb[], struct nlattr *data[],
>   			struct netlink_ext_ack *extack)
>   {
> -	return -EOPNOTSUPP;
> +	struct ovpn_struct *ovpn = netdev_priv(dev);
> +	enum ovpn_mode mode = OVPN_MODE_P2P;
> +
> +	if (data && data[IFLA_OVPN_MODE]) {
> +		mode = nla_get_u8(data[IFLA_OVPN_MODE]);
> +		netdev_dbg(dev, "setting device mode: %u\n", mode);
> +	}
> +
> +	ovpn->dev = dev;
> +	ovpn->mode = mode;
> +
> +	/* turn carrier explicitly off after registration, this way state is
> +	 * clearly defined
> +	 */
> +	netif_carrier_off(dev);
> +
> +	return register_netdevice(dev);
>   }
>   
>   static struct rtnl_link_ops ovpn_link_ops = {
>   	.kind = OVPN_FAMILY_NAME,
>   	.netns_refund = false,
> +	.priv_size = sizeof(struct ovpn_struct),
> +	.setup = ovpn_setup,
> +	.policy = ovpn_policy,
> +	.maxtype = IFLA_OVPN_MAX,
>   	.newlink = ovpn_newlink,
>   	.dellink = unregister_netdevice_queue,
>   };
> @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>   				     unsigned long state, void *ptr)
>   {
>   	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct ovpn_struct *ovpn;
>   
>   	if (!ovpn_dev_is_valid(dev))
>   		return NOTIFY_DONE;
>   
> +	ovpn = netdev_priv(dev);

nit: netdev_priv() returns only a pointer, it is safe to fetch the 
pointer in advance, but do not dereference it until we are sure that an 
event references the desired interface type. Takin this into 
consideration, the assignment of private data pointer can be moved above 
to the variable declaration. Just to make code couple of lines shorter.

> +
>   	switch (state) {
>   	case NETDEV_REGISTER:
> -		/* add device to internal list for later destruction upon
> -		 * unregistration
> -		 */
> +		ovpn->registered = true;
>   		break;
>   	case NETDEV_UNREGISTER:
> +		/* twiddle thumbs on netns device moves */
> +		if (dev->reg_state != NETREG_UNREGISTERING)
> +			break;
> +
>   		/* can be delivered multiple times, so check registered flag,
>   		 * then destroy the interface
>   		 */
> +		if (!ovpn->registered)
> +			return NOTIFY_DONE;
> +
> +		netif_carrier_off(dev);
> +		ovpn->registered = false;
>   		break;
>   	case NETDEV_POST_INIT:
>   	case NETDEV_GOING_DOWN:
>   	case NETDEV_DOWN:
>   	case NETDEV_UP:
>   	case NETDEV_PRE_UP:
> +		break;
>   	default:
>   		return NOTIFY_DONE;
>   	}
> diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
> index a3215316c49bfcdf2496590bac878f145b8b27fd..0740a05070a817e0daea7b63a1f4fcebd274eb37 100644
> --- a/drivers/net/ovpn/main.h
> +++ b/drivers/net/ovpn/main.h
> @@ -12,4 +12,11 @@
>   
>   bool ovpn_dev_is_valid(const struct net_device *dev);
>   
> +#define SKB_HEADER_LEN                                       \
> +	(max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \
> +	 sizeof(struct udphdr) + NET_SKB_PAD)
> +
> +#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)

Where is this magic '16' came from?

> +#define OVPN_MAX_PADDING 16
> +
>   #endif /* _NET_OVPN_MAIN_H_ */
> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -11,15 +11,23 @@
>   #define _NET_OVPN_OVPNSTRUCT_H_
>   
>   #include <net/net_trackers.h>
> +#include <uapi/linux/if_link.h>
> +#include <uapi/linux/ovpn.h>
>   
>   /**
>    * struct ovpn_struct - per ovpn interface state
>    * @dev: the actual netdev representing the tunnel
>    * @dev_tracker: reference tracker for associated dev
> + * @registered: whether dev is still registered with netdev or not
> + * @mode: device operation mode (i.e. p2p, mp, ..)
> + * @dev_list: entry for the module wide device list
>    */
>   struct ovpn_struct {
>   	struct net_device *dev;
>   	netdevice_tracker dev_tracker;
> +	bool registered;
> +	enum ovpn_mode mode;
> +	struct list_head dev_list;

dev_list is no more used and should be deleted.

>   };
>   
>   #endif /* _NET_OVPN_OVPNSTRUCT_H_ */
> diff --git a/drivers/net/ovpn/packet.h b/drivers/net/ovpn/packet.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..7ed146f5932a25f448af6da58738a7eae81007fe
> --- /dev/null
> +++ b/drivers/net/ovpn/packet.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + *		James Yonan <james@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_PACKET_H_
> +#define _NET_OVPN_PACKET_H_
> +
> +/* When the OpenVPN protocol is ran in AEAD mode, use
> + * the OpenVPN packet ID as the AEAD nonce:
> + *
> + *    00000005 521c3b01 4308c041
> + *    [seq # ] [  nonce_tail   ]
> + *    [     12-byte full IV    ] -> NONCE_SIZE
> + *    [4-bytes                   -> NONCE_WIRE_SIZE
> + *    on wire]
> + */

Nice diagram! Can we go futher and define the OpenVPN packet header as a 
stucture? Referencing the structure instead of using magic sizes and 
offsets can greatly improve the code readability. Especially when it 
comes to header construction/parsing in the encryption/decryption code.

E.g. define a structures like this:

struct ovpn_pkt_hdr {
   __be32 op;
   __be32 pktid;
   u8 auth[];
} __attribute__((packed));

struct ovpn_aead_iv {
   __be32 pktid;
   u8 nonce[OVPN_NONCE_TAIL_SIZE];
} __attribute__((packed));

> +
> +/* OpenVPN nonce size */
> +#define NONCE_SIZE 12

nit: is using the common 'OVPN_' prefix here and for other constants any 
good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes 
from for a code reader.

And another one question. Could you clarify in the comment to this 
constant where it came from? AFAIU, these 12 bytes is the expectation of 
the nonce size of AEAD crypto protocol, rigth?

> +
> +/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the
> + * size of the AEAD Associated Data (AD) sent over the wire
> + * and is normally the head of the IV
> + */
> +#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail))

If the headers and IV are defined as structures, we no more need this 
constant since the header construction will be done by a compiler 
according to the structure layout.

> +/* Last 8 bytes of AEAD nonce
> + * Provided by userspace and usually derived from
> + * key material generated during TLS handshake
> + */
> +struct ovpn_nonce_tail {
> +	u8 u8[OVPN_NONCE_TAIL_SIZE];
> +};

Why do you need a dadicated structure for this array? Can we declare the 
corresponding fields like this:

u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE];
u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE];

> +#endif /* _NET_OVPN_PACKET_H_ */
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -1975,4 +1975,19 @@ enum {
>   
>   #define IFLA_DSA_MAX	(__IFLA_DSA_MAX - 1)
>   
> +/* OVPN section */
> +
> +enum ovpn_mode {
> +	OVPN_MODE_P2P,
> +	OVPN_MODE_MP,
> +};

Mode min/max values can be defined here and the netlink policy can 
reference these values:

enum ovpn_mode {
   OVPN_MODE_P2P,
   OVPN_MODE_MP,
   __OVPN_MODE_MAX
};

#define OVPN_MODE_MIN OVPN_MODE_P2P
#define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)

... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)

> +
> +enum {
> +	IFLA_OVPN_UNSPEC,
> +	IFLA_OVPN_MODE,
> +	__IFLA_OVPN_MAX,
> +};
> +
> +#define IFLA_OVPN_MAX	(__IFLA_OVPN_MAX - 1)
> +
>   #endif /* _UAPI_LINUX_IF_LINK_H */
> 

--
Sergey

Sergey Ryazanov Nov. 9, 2024, 1:11 a.m. UTC | #13

On 29.10.2024 12:47, Antonio Quartulli wrote:
> An ovpn interface will keep carrier always on and let the user
> decide when an interface should be considered disconnected.
> 
> This way, even if an ovpn interface is not connected to any peer,
> it can still retain all IPs and routes and thus prevent any data
> leak.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> ---
>   drivers/net/ovpn/main.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net)
>   
>   static int ovpn_net_open(struct net_device *dev)
>   {
> +	/* ovpn keeps the carrier always on to avoid losing IP or route
> +	 * configuration upon disconnection. This way it can prevent leaks
> +	 * of traffic outside of the VPN tunnel.
> +	 * The user may override this behaviour by tearing down the interface
> +	 * manually.
> +	 */
> +	netif_carrier_on(dev);

If a user cares about the traffic leaking, then he can create a 
blackhole route with huge metric:

# ip route add blackhole default metric 10000

Why the network interface should implicitly provide this functionality? 
And on another hand, how a routing daemon can learn a topology change 
without indication from the interface?

>   	netif_tx_start_all_queues(dev);
>   	return 0;
>   }
>

Sergey Ryazanov Nov. 10, 2024, 1:38 p.m. UTC | #14

On 29.10.2024 12:47, Antonio Quartulli wrote:
> An ovpn_peer object holds the whole status of a remote peer
> (regardless whether it is a server or a client).
> 
> This includes status for crypto, tx/rx buffers, napi, etc.
> 
> Only support for one peer is introduced (P2P mode).
> Multi peer support is introduced with a later patch.

Reviewing the peer creation/destroying code I came to a generic 
question. Did you consider keeping a single P2P peer in the peers table 
as well?

Looks like such approach can greatly simply the code by dropping all 
these 'switch (ovpn->mode)' checks and implementing a unified peer 
management. The 'peer' field in the main private data structure can be 
kept to accelerate lookups, still using peers table for management tasks 
like removing all the peers on the interface teardown.

> Along with the ovpn_peer, also the ovpn_bind object is introcued
> as the two are strictly related.
> An ovpn_bind object wraps a sockaddr representing the local
> coordinates being used to talk to a specific peer.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>   drivers/net/ovpn/Makefile     |   2 +
>   drivers/net/ovpn/bind.c       |  58 +++++++
>   drivers/net/ovpn/bind.h       | 117 ++++++++++++++

Why do we need these bind.c/bind.h files? They contains a minimum of 
code and still anyway references the peer object. Can we merge these 
definitions and code into peer.c/peer.h?

>   drivers/net/ovpn/main.c       |  11 ++
>   drivers/net/ovpn/main.h       |   2 +
>   drivers/net/ovpn/ovpnstruct.h |   4 +
>   drivers/net/ovpn/peer.c       | 354 ++++++++++++++++++++++++++++++++++++++++++
>   drivers/net/ovpn/peer.h       |  79 ++++++++++
>   8 files changed, 627 insertions(+)

[...]

> diff --git a/drivers/net/ovpn/bind.c b/drivers/net/ovpn/bind.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b4d2ccec2ceddf43bc445b489cc62a578ef0ad0a
> --- /dev/null
> +++ b/drivers/net/ovpn/bind.c
> @@ -0,0 +1,58 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2012-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#include <linux/netdevice.h>
> +#include <linux/socket.h>
> +
> +#include "ovpnstruct.h"
> +#include "bind.h"
> +#include "peer.h"
> +
> +/**
> + * ovpn_bind_from_sockaddr - retrieve binding matching sockaddr
> + * @ss: the sockaddr to match
> + *
> + * Return: the bind matching the passed sockaddr if found, NULL otherwise

The function returns ERR_PTR() in case of error, the comment should be 
updated.

> + */
> +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *ss)
> +{
> +	struct ovpn_bind *bind;
> +	size_t sa_len;
> +
> +	if (ss->ss_family == AF_INET)
> +		sa_len = sizeof(struct sockaddr_in);
> +	else if (ss->ss_family == AF_INET6)
> +		sa_len = sizeof(struct sockaddr_in6);
> +	else
> +		return ERR_PTR(-EAFNOSUPPORT);
> +
> +	bind = kzalloc(sizeof(*bind), GFP_ATOMIC);
> +	if (unlikely(!bind))
> +		return ERR_PTR(-ENOMEM);
> +
> +	memcpy(&bind->remote, ss, sa_len);
> +
> +	return bind;
> +}
> +
> +/**
> + * ovpn_bind_reset - assign new binding to peer
> + * @peer: the peer whose binding has to be replaced
> + * @new: the new bind to assign
> + */
> +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *new)
> +{
> +	struct ovpn_bind *old;
> +
> +	spin_lock_bh(&peer->lock);
> +	old = rcu_replace_pointer(peer->bind, new, true);
> +	spin_unlock_bh(&peer->lock);

Locking will be removed from this function in the subsequent patch. 
Should we move the peer->lock usage to ovpn_peer_release() now?

> +
> +	kfree_rcu(old, rcu);
> +}
> diff --git a/drivers/net/ovpn/bind.h b/drivers/net/ovpn/bind.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..859213d5040deb36c416eafcf5c6ab31c4d52c7a
> --- /dev/null
> +++ b/drivers/net/ovpn/bind.h
> @@ -0,0 +1,117 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2012-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_OVPNBIND_H_
> +#define _NET_OVPN_OVPNBIND_H_
> +
> +#include <net/ip.h>
> +#include <linux/in.h>
> +#include <linux/in6.h>
> +#include <linux/rcupdate.h>
> +#include <linux/skbuff.h>
> +#include <linux/spinlock.h>
> +
> +struct ovpn_peer;
> +
> +/**
> + * union ovpn_sockaddr - basic transport layer address

Why do we need this dedicated named union? Can we merge this union into 
the ovpn_bind struct as already done for the local address?

> + * @in4: IPv4 address
> + * @in6: IPv6 address
> + */
> +union ovpn_sockaddr {

Family type can be putted here as a dedicated element to make address 
type check simple:

        unsigned short int sa_family;

> +	struct sockaddr_in in4;
> +	struct sockaddr_in6 in6;
> +};> +
> +/**
> + * struct ovpn_bind - remote peer binding
> + * @remote: the remote peer sockaddress
> + * @local: local endpoint used to talk to the peer
> + * @local.ipv4: local IPv4 used to talk to the peer
> + * @local.ipv6: local IPv6 used to talk to the peer
> + * @rcu: used to schedule RCU cleanup job
> + */
> +struct ovpn_bind {
> +	union ovpn_sockaddr remote;  /* remote sockaddr */
> +
> +	union {
> +		struct in_addr ipv4;
> +		struct in6_addr ipv6;
> +	} local;
> +
> +	struct rcu_head rcu;
> +};
> +
> +/**
> + * skb_protocol_to_family - translate skb->protocol to AF_INET or AF_INET6
> + * @skb: the packet sk_buff to inspect
> + *
> + * Return: AF_INET, AF_INET6 or 0 in case of unknown protocol
> + */
> +static inline unsigned short skb_protocol_to_family(const struct sk_buff *skb)

The function called outside the peer.c file only in ovpn_decrypt_post() 
and that call is debatable. Considering this, I belive, the 
skb_protocol_to_family() should be moved inside peer.c as static 
non-inlined function.

> +{
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		return AF_INET;
> +	case htons(ETH_P_IPV6):
> +		return AF_INET6;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +/**
> + * ovpn_bind_skb_src_match - match packet source with binding
> + * @bind: the binding to match
> + * @skb: the packet to match
> + *
> + * Return: true if the packet source matches the remote peer sockaddr
> + * in the binding
> + */
> +static inline bool ovpn_bind_skb_src_match(const struct ovpn_bind *bind,
> +					   const struct sk_buff *skb)

The function is called only from ovpn_peer_float() and probably should 
be moved into peer.c and un-inlined.

> +{
> +	const unsigned short family = skb_protocol_to_family(skb);
> +	const union ovpn_sockaddr *remote;
> +
> +	if (unlikely(!bind))
> +		return false;

The caller ovpn_peer_float() function already verified bind object 
pointer, why we should redo the same check here?

> +
> +	remote = &bind->remote;
> +
> +	if (unlikely(remote->in4.sin_family != family))
> +		return false;
> +
> +	switch (family) {
> +	case AF_INET:
> +		if (unlikely(remote->in4.sin_addr.s_addr != ip_hdr(skb)->saddr))
> +			return false;
> +
> +		if (unlikely(remote->in4.sin_port != udp_hdr(skb)->source))
> +			return false;
> +		break;
> +	case AF_INET6:
> +		if (unlikely(!ipv6_addr_equal(&remote->in6.sin6_addr,
> +					      &ipv6_hdr(skb)->saddr)))
> +			return false;
> +
> +		if (unlikely(remote->in6.sin6_port != udp_hdr(skb)->source))
> +			return false;
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +struct ovpn_bind *ovpn_bind_from_sockaddr(const struct sockaddr_storage *sa);
> +void ovpn_bind_reset(struct ovpn_peer *peer, struct ovpn_bind *bind);
> +
> +#endif /* _NET_OVPN_OVPNBIND_H_ */
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index eaa83a8662e4ac2c758201008268f9633643c0b6..5492ce07751d135c1484fe1ed8227c646df94969 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -20,6 +20,7 @@
>   #include "netlink.h"
>   #include "io.h"
>   #include "packet.h"
> +#include "peer.h"
>   
>   /* Driver info */
>   #define DRV_DESCRIPTION	"OpenVPN data channel offload (ovpn)"
> @@ -29,6 +30,11 @@ static void ovpn_struct_free(struct net_device *net)
>   {
>   }
>   
> +static int ovpn_net_init(struct net_device *dev)
> +{
> +	return 0;
> +}

The function is not required. Can we move its introduction to 'implement 
basic RX path (UDP)' patch, where its content will be added and where 
counterpart ovpn_net_uninit() function will be introduced?

> +
>   static int ovpn_net_open(struct net_device *dev)
>   {
>   	/* ovpn keeps the carrier always on to avoid losing IP or route
> @@ -49,6 +55,7 @@ static int ovpn_net_stop(struct net_device *dev)
>   }
>   
>   static const struct net_device_ops ovpn_netdev_ops = {
> +	.ndo_init		= ovpn_net_init,
>   	.ndo_open		= ovpn_net_open,
>   	.ndo_stop		= ovpn_net_stop,
>   	.ndo_start_xmit		= ovpn_net_xmit,
> @@ -128,6 +135,7 @@ static int ovpn_newlink(struct net *src_net, struct net_device *dev,
>   
>   	ovpn->dev = dev;
>   	ovpn->mode = mode;
> +	spin_lock_init(&ovpn->lock);
>   
>   	/* turn carrier explicitly off after registration, this way state is
>   	 * clearly defined
> @@ -176,6 +184,9 @@ static int ovpn_netdev_notifier_call(struct notifier_block *nb,
>   
>   		netif_carrier_off(dev);
>   		ovpn->registered = false;
> +
> +		if (ovpn->mode == OVPN_MODE_P2P)
> +			ovpn_peer_release_p2p(ovpn);
>   		break;
>   	case NETDEV_POST_INIT:
>   	case NETDEV_GOING_DOWN:
> diff --git a/drivers/net/ovpn/main.h b/drivers/net/ovpn/main.h
> index 0740a05070a817e0daea7b63a1f4fcebd274eb37..28e5c44816e110974333a7a6a9cf18bd15ae84e6 100644
> --- a/drivers/net/ovpn/main.h
> +++ b/drivers/net/ovpn/main.h
> @@ -19,4 +19,6 @@ bool ovpn_dev_is_valid(const struct net_device *dev);
>   #define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
>   #define OVPN_MAX_PADDING 16
>   
> +#define OVPN_QUEUE_LEN 1024

This macro is unused, should we drop it?

> +
>   #endif /* _NET_OVPN_MAIN_H_ */
> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index 211df871538d34fdff90d182f21a0b0fb11b28ad..a22c5083381c131db01a28c0f51e661d690d4998 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -20,6 +20,8 @@
>    * @dev_tracker: reference tracker for associated dev
>    * @registered: whether dev is still registered with netdev or not
>    * @mode: device operation mode (i.e. p2p, mp, ..)
> + * @lock: protect this object
> + * @peer: in P2P mode, this is the only remote peer
>    * @dev_list: entry for the module wide device list
>    */
>   struct ovpn_struct {
> @@ -27,6 +29,8 @@ struct ovpn_struct {
>   	netdevice_tracker dev_tracker;
>   	bool registered;
>   	enum ovpn_mode mode;
> +	spinlock_t lock; /* protect writing to the ovpn_struct object */
> +	struct ovpn_peer __rcu *peer;
>   	struct list_head dev_list;
>   };
>   
> diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d9788a0cc99b5839c466c35d1b2266cc6b95fb72
> --- /dev/null
> +++ b/drivers/net/ovpn/peer.c
> @@ -0,0 +1,354 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#include <linux/skbuff.h>
> +#include <linux/list.h>
> +
> +#include "ovpnstruct.h"
> +#include "bind.h"
> +#include "io.h"
> +#include "main.h"
> +#include "netlink.h"
> +#include "peer.h"
> +
> +/**
> + * ovpn_peer_new - allocate and initialize a new peer object
> + * @ovpn: the openvpn instance inside which the peer should be created
> + * @id: the ID assigned to this peer
> + *
> + * Return: a pointer to the new peer on success or an error code otherwise
> + */
> +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
> +{
> +	struct ovpn_peer *peer;
> +	int ret;
> +
> +	/* alloc and init peer object */
> +	peer = kzalloc(sizeof(*peer), GFP_KERNEL);
> +	if (!peer)
> +		return ERR_PTR(-ENOMEM);
> +
> +	peer->id = id;
> +	peer->halt = false;
> +	peer->ovpn = ovpn;
> +
> +	peer->vpn_addrs.ipv4.s_addr = htonl(INADDR_ANY);
> +	peer->vpn_addrs.ipv6 = in6addr_any;
> +
> +	RCU_INIT_POINTER(peer->bind, NULL);
> +	spin_lock_init(&peer->lock);
> +	kref_init(&peer->refcount);
> +
> +	ret = dst_cache_init(&peer->dst_cache, GFP_KERNEL);
> +	if (ret < 0) {
> +		netdev_err(ovpn->dev, "%s: cannot initialize dst cache\n",
> +			   __func__);
> +		kfree(peer);
> +		return ERR_PTR(ret);
> +	}
> +
> +	netdev_hold(ovpn->dev, &ovpn->dev_tracker, GFP_KERNEL);
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_release - release peer private members
> + * @peer: the peer to release
> + */
> +static void ovpn_peer_release(struct ovpn_peer *peer)
> +{
> +	ovpn_bind_reset(peer, NULL);
> +
> +	dst_cache_destroy(&peer->dst_cache);
> +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
> +	kfree_rcu(peer, rcu);
> +}
> +
> +/**
> + * ovpn_peer_release_kref - callback for kref_put
> + * @kref: the kref object belonging to the peer
> + */
> +void ovpn_peer_release_kref(struct kref *kref)
> +{
> +	struct ovpn_peer *peer = container_of(kref, struct ovpn_peer, refcount);
> +
> +	ovpn_peer_release(peer);
> +}
> +
> +/**
> + * ovpn_peer_skb_to_sockaddr - fill sockaddr with skb source address
> + * @skb: the packet to extract data from
> + * @ss: the sockaddr to fill
> + *
> + * Return: true on success or false otherwise
> + */
> +static bool ovpn_peer_skb_to_sockaddr(struct sk_buff *skb,
> +				      struct sockaddr_storage *ss)
> +{
> +	struct sockaddr_in6 *sa6;
> +	struct sockaddr_in *sa4;
> +
> +	ss->ss_family = skb_protocol_to_family(skb);

Why do we need skb_protocol_to_family() call? Can we directly use 
skb->protocol and ETH_P_IP/ETH_P_IPV6 in the switch?

> +	switch (ss->ss_family) {
> +	case AF_INET:
> +		sa4 = (struct sockaddr_in *)ss;
> +		sa4->sin_family = AF_INET;
> +		sa4->sin_addr.s_addr = ip_hdr(skb)->saddr;
> +		sa4->sin_port = udp_hdr(skb)->source;
> +		break;
> +	case AF_INET6:
> +		sa6 = (struct sockaddr_in6 *)ss;
> +		sa6->sin6_family = AF_INET6;
> +		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
> +		sa6->sin6_port = udp_hdr(skb)->source;
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * ovpn_peer_transp_match - check if sockaddr and peer binding match
> + * @peer: the peer to get the binding from
> + * @ss: the sockaddr to match
> + *
> + * Return: true if sockaddr and binding match or false otherwise
> + */
> +static bool ovpn_peer_transp_match(const struct ovpn_peer *peer,
> +				   const struct sockaddr_storage *ss)
> +{
> +	struct ovpn_bind *bind = rcu_dereference(peer->bind);
> +	struct sockaddr_in6 *sa6;
> +	struct sockaddr_in *sa4;
> +
> +	if (unlikely(!bind))
> +		return false;
> +
> +	if (ss->ss_family != bind->remote.in4.sin_family)

nit: if the dedicated 'sa_family' element is added into the union, the 
check can be more straighforward (without 'in4' access):

         if (ss->ss_family != bind->remote.sa_family)

> +		return false;
> +
> +	switch (ss->ss_family) {
> +	case AF_INET:
> +		sa4 = (struct sockaddr_in *)ss;
> +		if (sa4->sin_addr.s_addr != bind->remote.in4.sin_addr.s_addr)
> +			return false;
> +		if (sa4->sin_port != bind->remote.in4.sin_port)
> +			return false;
> +		break;
> +	case AF_INET6:
> +		sa6 = (struct sockaddr_in6 *)ss;
> +		if (!ipv6_addr_equal(&sa6->sin6_addr,
> +				     &bind->remote.in6.sin6_addr))
> +			return false;
> +		if (sa6->sin6_port != bind->remote.in6.sin6_port)
> +			return false;
> +		break;
> +	default:
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * ovpn_peer_get_by_transp_addr_p2p - get peer by transport address in a P2P
> + *                                    instance
> + * @ovpn: the openvpn instance to search
> + * @ss: the transport socket address
> + *
> + * Return: the peer if found or NULL otherwise
> + */
> +static struct ovpn_peer *
> +ovpn_peer_get_by_transp_addr_p2p(struct ovpn_struct *ovpn,
> +				 struct sockaddr_storage *ss)
> +{
> +	struct ovpn_peer *tmp, *peer = NULL;
> +
> +	rcu_read_lock();
> +	tmp = rcu_dereference(ovpn->peer);
> +	if (likely(tmp && ovpn_peer_transp_match(tmp, ss) &&
> +		   ovpn_peer_hold(tmp)))
> +		peer = tmp;
> +	rcu_read_unlock();
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_get_by_transp_addr - retrieve peer by transport address
> + * @ovpn: the openvpn instance to search
> + * @skb: the skb to retrieve the source transport address from
> + *
> + * Return: a pointer to the peer if found or NULL otherwise
> + */
> +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
> +					       struct sk_buff *skb)
> +{
> +	struct ovpn_peer *peer = NULL;
> +	struct sockaddr_storage ss = { 0 };

nit: reverse x-mas tree order, please.

> +
> +	if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
> +		return NULL;
> +
> +	if (ovpn->mode == OVPN_MODE_P2P)
> +		peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_get_by_id_p2p - get peer by ID in a P2P instance
> + * @ovpn: the openvpn instance to search
> + * @peer_id: the ID of the peer to find
> + *
> + * Return: the peer if found or NULL otherwise
> + */
> +static struct ovpn_peer *ovpn_peer_get_by_id_p2p(struct ovpn_struct *ovpn,
> +						 u32 peer_id)
> +{
> +	struct ovpn_peer *tmp, *peer = NULL;
> +
> +	rcu_read_lock();
> +	tmp = rcu_dereference(ovpn->peer);
> +	if (likely(tmp && tmp->id == peer_id && ovpn_peer_hold(tmp)))
> +		peer = tmp;
> +	rcu_read_unlock();
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_get_by_id - retrieve peer by ID
> + * @ovpn: the openvpn instance to search
> + * @peer_id: the unique peer identifier to match
> + *
> + * Return: a pointer to the peer if found or NULL otherwise
> + */
> +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id)
> +{
> +	struct ovpn_peer *peer = NULL;
> +
> +	if (ovpn->mode == OVPN_MODE_P2P)
> +		peer = ovpn_peer_get_by_id_p2p(ovpn, peer_id);
> +
> +	return peer;
> +}
> +
> +/**
> + * ovpn_peer_add_p2p - add peer to related tables in a P2P instance
> + * @ovpn: the instance to add the peer to
> + * @peer: the peer to add
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +static int ovpn_peer_add_p2p(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
> +{
> +	struct ovpn_peer *tmp;
> +
> +	spin_lock_bh(&ovpn->lock);
> +	/* in p2p mode it is possible to have a single peer only, therefore the
> +	 * old one is released and substituted by the new one
> +	 */
> +	tmp = rcu_dereference_protected(ovpn->peer,
> +					lockdep_is_held(&ovpn->lock));
> +	if (tmp) {
> +		tmp->delete_reason = OVPN_DEL_PEER_REASON_TEARDOWN;
> +		ovpn_peer_put(tmp);
> +	}
> +
> +	rcu_assign_pointer(ovpn->peer, peer);

nit: the rcu_dereference_protected() + rcu_assign_pointer() calls can be 
replaced with a single rcu_replace_pointer() call.

> +	spin_unlock_bh(&ovpn->lock);
> +
> +	return 0;
> +}
> +
> +/**
> + * ovpn_peer_add - add peer to the related tables
> + * @ovpn: the openvpn instance the peer belongs to
> + * @peer: the peer object to add
> + *
> + * Assume refcounter was increased by caller
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
> +{
> +	switch (ovpn->mode) {
> +	case OVPN_MODE_P2P:
> +		return ovpn_peer_add_p2p(ovpn, peer);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +/**
> + * ovpn_peer_del_p2p - delete peer from related tables in a P2P instance
> + * @peer: the peer to delete
> + * @reason: reason why the peer was deleted (sent to userspace)
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
> +			     enum ovpn_del_peer_reason reason)
> +	__must_hold(&peer->ovpn->lock)
> +{
> +	struct ovpn_peer *tmp;
> +
> +	tmp = rcu_dereference_protected(peer->ovpn->peer,
> +					lockdep_is_held(&peer->ovpn->lock));
> +	if (tmp != peer) {
> +		DEBUG_NET_WARN_ON_ONCE(1);

nit: the above two lines can be simplified to:

         if (DEBUG_NET_WARN_ON_ONCE(tmp != peer)) {

> +		if (tmp)
> +			ovpn_peer_put(tmp);
> +
> +		return -ENOENT;
> +	}
> +
> +	tmp->delete_reason = reason;
> +	RCU_INIT_POINTER(peer->ovpn->peer, NULL);
> +	ovpn_peer_put(tmp);
> +
> +	return 0;
> +}
> +
> +/**
> + * ovpn_peer_release_p2p - release peer upon P2P device teardown
> + * @ovpn: the instance being torn down
> + */
> +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn)
> +{
> +	struct ovpn_peer *tmp;
> +
> +	spin_lock_bh(&ovpn->lock);
> +	tmp = rcu_dereference_protected(ovpn->peer,
> +					lockdep_is_held(&ovpn->lock));
> +	if (tmp)
> +		ovpn_peer_del_p2p(tmp, OVPN_DEL_PEER_REASON_TEARDOWN);
> +	spin_unlock_bh(&ovpn->lock);
> +}
> +
> +/**
> + * ovpn_peer_del - delete peer from related tables
> + * @peer: the peer object to delete
> + * @reason: reason for deleting peer (will be sent to userspace)
> + *
> + * Return: 0 on success or a negative error code otherwise
> + */
> +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason)
> +{
> +	switch (peer->ovpn->mode) {
> +	case OVPN_MODE_P2P:
> +		return ovpn_peer_del_p2p(peer, reason);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..6e0c6b14559de886d0677117f5a7ae029214e1f8
> --- /dev/null
> +++ b/drivers/net/ovpn/peer.h
> @@ -0,0 +1,79 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	James Yonan <james@openvpn.net>
> + *		Antonio Quartulli <antonio@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_OVPNPEER_H_
> +#define _NET_OVPN_OVPNPEER_H_
> +
> +#include <net/dst_cache.h>
> +
> +/**
> + * struct ovpn_peer - the main remote peer object
> + * @ovpn: main openvpn instance this peer belongs to
> + * @id: unique identifier
> + * @vpn_addrs: IP addresses assigned over the tunnel
> + * @vpn_addrs.ipv4: IPv4 assigned to peer on the tunnel
> + * @vpn_addrs.ipv6: IPv6 assigned to peer on the tunnel
> + * @dst_cache: cache for dst_entry used to send to peer
> + * @bind: remote peer binding
> + * @halt: true if ovpn_peer_mark_delete was called
> + * @delete_reason: why peer was deleted (i.e. timeout, transport error, ..)
> + * @lock: protects binding to peer (bind)
> + * @refcount: reference counter
> + * @rcu: used to free peer in an RCU safe way
> + * @delete_work: deferred cleanup work, used to notify userspace
> + */
> +struct ovpn_peer {
> +	struct ovpn_struct *ovpn;
> +	u32 id;
> +	struct {
> +		struct in_addr ipv4;
> +		struct in6_addr ipv6;
> +	} vpn_addrs;
> +	struct dst_cache dst_cache;
> +	struct ovpn_bind __rcu *bind;
> +	bool halt;
> +	enum ovpn_del_peer_reason delete_reason;
> +	spinlock_t lock; /* protects bind */
> +	struct kref refcount;
> +	struct rcu_head rcu;
> +	struct work_struct delete_work;
> +};
> +
> +/**
> + * ovpn_peer_hold - increase reference counter
> + * @peer: the peer whose counter should be increased
> + *
> + * Return: true if the counter was increased or false if it was zero already
> + */
> +static inline bool ovpn_peer_hold(struct ovpn_peer *peer)
> +{
> +	return kref_get_unless_zero(&peer->refcount);
> +}
> +
> +void ovpn_peer_release_kref(struct kref *kref);
> +
> +/**
> + * ovpn_peer_put - decrease reference counter
> + * @peer: the peer whose counter should be decreased
> + */
> +static inline void ovpn_peer_put(struct ovpn_peer *peer)
> +{
> +	kref_put(&peer->refcount, ovpn_peer_release_kref);
> +}
> +
> +struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id);
> +int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer);
> +int ovpn_peer_del(struct ovpn_peer *peer, enum ovpn_del_peer_reason reason);
> +void ovpn_peer_release_p2p(struct ovpn_struct *ovpn);
> +
> +struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
> +					       struct sk_buff *skb);
> +struct ovpn_peer *ovpn_peer_get_by_id(struct ovpn_struct *ovpn, u32 peer_id);
> +
> +#endif /* _NET_OVPN_OVPNPEER_H_ */
>

Sergey Ryazanov Nov. 10, 2024, 7:52 p.m. UTC | #15

On 29.10.2024 12:47, Antonio Quartulli wrote:

[...]

> +static void ovpn_peer_release(struct ovpn_peer *peer)
> +{
> +	ovpn_bind_reset(peer, NULL);
> +

nit: this empty line after ovpn_bind_reset() is removed in the 
'implement basic TX path (UDP)' patch. What tricks git and it produces a 
sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then 
introduced again. If you do not like this empty line then remove it 
here, please :)

> +	dst_cache_destroy(&peer->dst_cache);
> +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
> +	kfree_rcu(peer, rcu);
> +}

--
Sergey

Sergey Ryazanov Nov. 10, 2024, 8:42 p.m. UTC | #16

Missed the most essential note regarding this patch :)

On 29.10.2024 12:47, Antonio Quartulli wrote:
> +static int ovpn_net_open(struct net_device *dev)
> +{
> +	netif_tx_start_all_queues(dev);
> +	return 0;
> +}
> +
> +static int ovpn_net_stop(struct net_device *dev)
> +{
> +	netif_tx_stop_all_queues(dev);

Here we stop a user generated traffic in downlink. Shall we take care 
about other kinds of traffic: keepalive, uplink?

I believe we should remove all the peers here or at least stop the 
keepalive generation. But peers removing is better since 
administratively down is administratively down, meaning user expected 
full traffic stop in any direction. And even if we only stop the 
keepalive generation then peer(s) anyway will destroy the tunnel on 
their side.

This way we even should not care about peers removing on the device 
unregistering. What do you think?

> +	return 0;
> +}

Sergey Ryazanov Nov. 11, 2024, 1:54 a.m. UTC | #17

On 29.10.2024 12:47, Antonio Quartulli wrote:
> Packets received over the socket are forwarded to the user device.
> 
> Implementation is UDP only. TCP will be added by a later patch.
> 
> Note: no decryption/decapsulation exists yet, packets are forwarded as
> they arrive without much processing.
> 
> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
> ---
>   drivers/net/ovpn/io.c         |  66 ++++++++++++++++++++++++++-
>   drivers/net/ovpn/io.h         |   2 +
>   drivers/net/ovpn/main.c       |  13 +++++-
>   drivers/net/ovpn/ovpnstruct.h |   3 ++
>   drivers/net/ovpn/proto.h      |  75 ++++++++++++++++++++++++++++++
>   drivers/net/ovpn/socket.c     |  24 ++++++++++
>   drivers/net/ovpn/udp.c        | 104 +++++++++++++++++++++++++++++++++++++++++-
>   drivers/net/ovpn/udp.h        |   3 +-
>   8 files changed, 286 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 77ba4d33ae0bd2f52e8bd1c06a182d24285297b4..791a1b117125118b179cb13cdfd5fbab6523a360 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -9,15 +9,79 @@
>   
>   #include <linux/netdevice.h>
>   #include <linux/skbuff.h>
> +#include <net/gro_cells.h>
>   #include <net/gso.h>
>   
> -#include "io.h"
>   #include "ovpnstruct.h"
>   #include "peer.h"
> +#include "io.h"
> +#include "netlink.h"
> +#include "proto.h"
>   #include "udp.h"
>   #include "skb.h"
>   #include "socket.h"
>   
> +/* Called after decrypt to write the IP packet to the device.
> + * This method is expected to manage/free the skb.
> + */
> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	unsigned int pkt_len;
> +
> +	/* we can't guarantee the packet wasn't corrupted before entering the
> +	 * VPN, therefore we give other layers a chance to check that
> +	 */
> +	skb->ip_summed = CHECKSUM_NONE;
> +
> +	/* skb hash for transport packet no longer valid after decapsulation */
> +	skb_clear_hash(skb);
> +
> +	/* post-decrypt scrub -- prepare to inject encapsulated packet onto the
> +	 * interface, based on __skb_tunnel_rx() in dst.h
> +	 */
> +	skb->dev = peer->ovpn->dev;
> +	skb_set_queue_mapping(skb, 0);
> +	skb_scrub_packet(skb, true);
> +

The skb->protocol field is going to be updated in the upcoming patch in 
the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, 
why do not touch the protocol field here?

> +	skb_reset_network_header(skb);

ovpn_decrypt_post() already reseted the network header. Why do we need 
it here again?

> +	skb_reset_transport_header(skb);
> +	skb_probe_transport_header(skb);
> +	skb_reset_inner_headers(skb);
> +
> +	memset(skb->cb, 0, sizeof(skb->cb));

Why do we need to zero the control buffer here?

> +	/* cause packet to be "received" by the interface */
> +	pkt_len = skb->len;
> +	if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
> +				     skb) == NET_RX_SUCCESS))
> +		/* update RX stats with the size of decrypted packet */
> +		dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
> +}
> +
> +static void ovpn_decrypt_post(struct sk_buff *skb, int ret)
> +{
> +	struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
> +
> +	if (unlikely(ret < 0))
> +		goto drop;
> +
> +	ovpn_netdev_write(peer, skb);
> +	/* skb is passed to upper layer - don't free it */
> +	skb = NULL;
> +drop:
> +	if (unlikely(skb))
> +		dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
> +	ovpn_peer_put(peer);
> +	kfree_skb(skb);
> +}
> +
> +/* pick next packet from RX queue, decrypt and forward it to the device */

The function now receives packets from externel callers. Should we 
update the above comment?

> +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	ovpn_skb_cb(skb)->peer = peer;
> +	ovpn_decrypt_post(skb, 0);
> +}
> +
>   static void ovpn_encrypt_post(struct sk_buff *skb, int ret)
>   {
>   	struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
> diff --git a/drivers/net/ovpn/io.h b/drivers/net/ovpn/io.h
> index aa259be66441f7b0262f39da12d6c3dce0a9b24c..9667a0a470e0b4b427524fffb5b9b395007e5a2f 100644
> --- a/drivers/net/ovpn/io.h
> +++ b/drivers/net/ovpn/io.h
> @@ -12,4 +12,6 @@
>   
>   netdev_tx_t ovpn_net_xmit(struct sk_buff *skb, struct net_device *dev);
>   
> +void ovpn_recv(struct ovpn_peer *peer, struct sk_buff *skb);
> +
>   #endif /* _NET_OVPN_OVPN_H_ */
> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
> index 5492ce07751d135c1484fe1ed8227c646df94969..73348765a8cf24321aa6be78e75f607d6dbffb1d 100644
> --- a/drivers/net/ovpn/main.c
> +++ b/drivers/net/ovpn/main.c
> @@ -11,6 +11,7 @@
>   #include <linux/module.h>
>   #include <linux/netdevice.h>
>   #include <linux/inetdevice.h>
> +#include <net/gro_cells.h>
>   #include <net/ip.h>
>   #include <net/rtnetlink.h>
>   #include <uapi/linux/if_arp.h>
> @@ -32,7 +33,16 @@ static void ovpn_struct_free(struct net_device *net)
>   
>   static int ovpn_net_init(struct net_device *dev)
>   {
> -	return 0;
> +	struct ovpn_struct *ovpn = netdev_priv(dev);
> +
> +	return gro_cells_init(&ovpn->gro_cells, dev);
> +}
> +
> +static void ovpn_net_uninit(struct net_device *dev)
> +{
> +	struct ovpn_struct *ovpn = netdev_priv(dev);
> +
> +	gro_cells_destroy(&ovpn->gro_cells);
>   }
>   
>   static int ovpn_net_open(struct net_device *dev)
> @@ -56,6 +66,7 @@ static int ovpn_net_stop(struct net_device *dev)
>   
>   static const struct net_device_ops ovpn_netdev_ops = {
>   	.ndo_init		= ovpn_net_init,
> +	.ndo_uninit		= ovpn_net_uninit,
>   	.ndo_open		= ovpn_net_open,
>   	.ndo_stop		= ovpn_net_stop,
>   	.ndo_start_xmit		= ovpn_net_xmit,
> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ovpnstruct.h
> index a22c5083381c131db01a28c0f51e661d690d4998..4a48fc048890ab1cda78bc104fe3034b4a49d226 100644
> --- a/drivers/net/ovpn/ovpnstruct.h
> +++ b/drivers/net/ovpn/ovpnstruct.h
> @@ -10,6 +10,7 @@
>   #ifndef _NET_OVPN_OVPNSTRUCT_H_
>   #define _NET_OVPN_OVPNSTRUCT_H_
>   
> +#include <net/gro_cells.h>
>   #include <net/net_trackers.h>
>   #include <uapi/linux/if_link.h>
>   #include <uapi/linux/ovpn.h>
> @@ -23,6 +24,7 @@
>    * @lock: protect this object
>    * @peer: in P2P mode, this is the only remote peer
>    * @dev_list: entry for the module wide device list
> + * @gro_cells: pointer to the Generic Receive Offload cell
>    */
>   struct ovpn_struct {
>   	struct net_device *dev;
> @@ -32,6 +34,7 @@ struct ovpn_struct {
>   	spinlock_t lock; /* protect writing to the ovpn_struct object */
>   	struct ovpn_peer __rcu *peer;
>   	struct list_head dev_list;
> +	struct gro_cells gro_cells;
>   };
>   
>   #endif /* _NET_OVPN_OVPNSTRUCT_H_ */
> diff --git a/drivers/net/ovpn/proto.h b/drivers/net/ovpn/proto.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..69604cf26bbf82539ee5cd5a7ac9c23920f555de
> --- /dev/null
> +++ b/drivers/net/ovpn/proto.h
> @@ -0,0 +1,75 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*  OpenVPN data channel offload
> + *
> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
> + *
> + *  Author:	Antonio Quartulli <antonio@openvpn.net>
> + *		James Yonan <james@openvpn.net>
> + */
> +
> +#ifndef _NET_OVPN_OVPNPROTO_H_
> +#define _NET_OVPN_OVPNPROTO_H_
> +
> +#include "main.h"
> +
> +#include <linux/skbuff.h>
> +
> +/* Methods for operating on the initial command
> + * byte of the OpenVPN protocol.
> + */
> +
> +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
> + * one byte
> + */
> +#define OVPN_KEY_ID_MASK 0x07
> +#define OVPN_OPCODE_SHIFT 3
> +#define OVPN_OPCODE_MASK 0x1F

Instead of defining mask(s) and shift(s), we can define only masks and 
use bitfield API (see below).

> +/* upper bounds on opcode and key ID */
> +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1)
> +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1)
> +/* packet opcodes of interest to us */
> +#define OVPN_DATA_V1 6 /* data channel V1 packet */
> +#define OVPN_DATA_V2 9 /* data channel V2 packet */
> +/* size of initial packet opcode */
> +#define OVPN_OP_SIZE_V1 1
> +#define OVPN_OP_SIZE_V2	4
> +#define OVPN_PEER_ID_MASK 0x00FFFFFF
> +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF
> +/* first byte of keepalive message */
> +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a
> +/* first byte of exit message */
> +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28

 From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, 
OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and 
OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be 
removed.

> +/**
> + * ovpn_opcode_from_skb - extract OP code from skb at specified offset
> + * @skb: the packet to extract the OP code from
> + * @offset: the offset in the data buffer where the OP code is located
> + *
> + * Note: this function assumes that the skb head was pulled enough
> + * to access the first byte.
> + *
> + * Return: the OP code
> + */
> +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 offset)
> +{
> +	u8 byte = *(skb->data + offset);
> +
> +	return byte >> OVPN_OPCODE_SHIFT;

For example here, the shift can be replaced with bitfield macro:

#define OVPN_OPCODE_PKTTYPE_MSK  0xf8000000
#define OVPN_OPCODE_KEYID_MSK    0x07000000
#define OVPN_OPCODE_PEERID_MSK   0x00ffffff

static inline u8 ovpn_opcode_from_skb(...)
{
     u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset));

     return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode);
}

And the upcoming ovpn_opcode_compose() can be implemented like this:

static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
{
     return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) |
            FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) |
            FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id);
}

And with this size can be even embedded into ovpn_aead_encrypt() to make 
the header composing more clear.

> +}
> +
> +/**
> + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
> + * @skb: the packet to extract the OP code from
> + * @offset: the offset in the data buffer where the OP code is located
> + *
> + * Note: this function assumes that the skb head was pulled enough
> + * to access the first 4 bytes.
> + *
> + * Return: the peer ID.
> + */
> +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, u16 offset)
> +{
> +	return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK;
> +}
> +
> +#endif /* _NET_OVPN_OVPNPROTO_H_ */
> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
> index 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644
> --- a/drivers/net/ovpn/socket.c
> +++ b/drivers/net/ovpn/socket.c
> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock)
>   	if (!sock)
>   		return;
>   
> +	if (sock->sk->sk_protocol == IPPROTO_UDP)
> +		ovpn_udp_socket_detach(sock);
> +
>   	sockfd_put(sock);
>   }
>   
> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, struct ovpn_peer *peer)
>   	return ret;
>   }
>   
> +/* Retrieve the corresponding ovpn object from a UDP socket
> + * rcu_read_lock must be held on entry
> + */
> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
> +{
> +	struct ovpn_socket *ovpn_sock;
> +
> +	if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != UDP_ENCAP_OVPNINUDP))
> +		return NULL;
> +
> +	ovpn_sock = rcu_dereference_sk_user_data(sk);
> +	if (unlikely(!ovpn_sock))
> +		return NULL;
> +
> +	/* make sure that sk matches our stored transport socket */
> +	if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
> +		return NULL;
> +
> +	return ovpn_sock->ovpn;

Now, returning of this pointer is safe. But the following TCP transport 
support calls the socket release via a scheduled work. What extends 
socket lifetime and makes it possible to receive a UDP packet way after 
the interface private data release. Is it correct assumption?

If the above is right then shall we set ->ovpn = NULL before scheduling 
the socket releasing work or somehow else mark the socket as half-destroyed?

> +}
> +
>   /**
>    * ovpn_socket_new - create a new socket and initialize it
>    * @sock: the kernel socket to embed
> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
> index d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644
> --- a/drivers/net/ovpn/udp.c
> +++ b/drivers/net/ovpn/udp.c
> @@ -21,9 +21,95 @@
>   #include "bind.h"
>   #include "io.h"
>   #include "peer.h"
> +#include "proto.h"
>   #include "socket.h"
>   #include "udp.h"
>   
> +/**
> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
> + * @sk: socket over which the packet was received
> + * @skb: the received packet
> + *
> + * If the first byte of the payload is DATA_V2, the packet is further processed,
> + * otherwise it is forwarded to the UDP stack for delivery to user space.
> + *
> + * Return:
> + *  0 if skb was consumed or dropped
> + * >0 if skb should be passed up to userspace as UDP (packet not consumed)
> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
> + */
> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
> +{
> +	struct ovpn_peer *peer = NULL;
> +	struct ovpn_struct *ovpn;
> +	u32 peer_id;
> +	u8 opcode;
> +
> +	ovpn = ovpn_from_udp_sock(sk);
> +	if (unlikely(!ovpn)) {
> +		net_err_ratelimited("%s: cannot obtain ovpn object from UDP socket\n",
> +				    __func__);

Probably we should zero ovpn pointer in the ovpn_sock to survive 
scheduled socket release (see comment in ovpn_from_udp_sock). So, this 
print should be removed to avoid printing misguiding errors.

> +		goto drop_noovpn;
> +	}
> +
> +	/* Make sure the first 4 bytes of the skb data buffer after the UDP
> +	 * header are accessible.
> +	 * They are required to fetch the OP code, the key ID and the peer ID.
> +	 */
> +	if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
> +				    OVPN_OP_SIZE_V2))) {
> +		net_dbg_ratelimited("%s: packet too small\n", __func__);
> +		goto drop;
> +	}
> +
> +	opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
> +	if (unlikely(opcode != OVPN_DATA_V2)) {
> +		/* DATA_V1 is not supported */
> +		if (opcode == OVPN_DATA_V1)
> +			goto drop;

This packet dropping makes protocol accelerator, intendent to speed up 
the data packets processing, a protocol enforcement entity, isn't it? 
Shall we follow the principle of beeing liberal in what we accept and 
just forward everything besides data packets upstream to a userspace 
application?

> +
> +		/* unknown or control packet: let it bubble up to userspace */
> +		return 1;
> +	}
> +
> +	peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
> +	/* some OpenVPN server implementations send data packets with the
> +	 * peer-id set to undef. In this case we skip the peer lookup by peer-id
> +	 * and we try with the transport address
> +	 */
> +	if (peer_id != OVPN_PEER_ID_UNDEF) {
> +		peer = ovpn_peer_get_by_id(ovpn, peer_id);
> +		if (!peer) {
> +			net_err_ratelimited("%s: received data from unknown peer (id: %d)\n",
> +					    __func__, peer_id);

Why do we consider a peer sending us garbage our problem? Meaning, this 
peer miss can be not our fault but a malformed packet from a 3rd party 
side. E.g. nowdays I can see a lot of traces of these "active probers" 
in my OpenVPN logs. Shall remove this message or at least make it debug 
to avoid bothering users with garbage traveling Internet? Anyway we can 
not do anything regarding incoming traffic.

> +			goto drop;
> +		}
> +	}
> +
> +	if (!peer) {

AFAIU, this condition can true only in case of peer_id beeing equal to 
OVPN_PEER_ID_UNDEF, right? In this case the condition check can be 
replaced by simple 'else' statement.

And to make code more corresponding to the above comment regarding 
implementations that send undefined peer-id, can we swap sides of the 
lookup method selection? E.g.

/* Comment about fancy implementations sending undefined peer-id */
if (peer_id == OVPN_PEER_ID_UNDEF) {
   /* Do transport address based loockup */
} else {
   /* Do peer-id based loockup */
}

> +		/* data packet with undef peer-id */
> +		peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
> +		if (unlikely(!peer)) {
> +			net_dbg_ratelimited("%s: received data with undef peer-id from unknown source\n",
> +					    __func__);
> +			goto drop;
> +		}
> +	}
> +
> +	/* pop off outer UDP header */
> +	__skb_pull(skb, sizeof(struct udphdr));
> +	ovpn_recv(peer, skb);
> +	return 0;
> +
> +drop:
> +	if (peer)
> +		ovpn_peer_put(peer);

AFAIU, the peer is alway NULL here. Shall we remove the above check?

> +	dev_core_stats_rx_dropped_inc(ovpn->dev);
> +drop_noovpn:
> +	kfree_skb(skb);
> +	return 0;
> +}
> +
>   /**
>    * ovpn_udp4_output - send IPv4 packet over udp socket
>    * @ovpn: the openvpn instance
> @@ -259,8 +345,12 @@ void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
>    */
>   int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>   {
> +	struct udp_tunnel_sock_cfg cfg = {
> +		.encap_type = UDP_ENCAP_OVPNINUDP,
> +		.encap_rcv = ovpn_udp_encap_recv,
> +	};
>   	struct ovpn_socket *old_data;
> -	int ret = 0;
> +	int ret;
>   
>   	/* sanity check */
>   	if (sock->sk->sk_protocol != IPPROTO_UDP) {
> @@ -274,6 +364,7 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>   	if (!old_data) {
>   		/* socket is currently unused - we can take it */
>   		rcu_read_unlock();
> +		setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
>   		return 0;
>   	}
>   
> @@ -302,3 +393,14 @@ int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn)
>   
>   	return ret;
>   }
> +
> +/**
> + * ovpn_udp_socket_detach - clean udp-tunnel status for this socket
> + * @sock: the socket to clean
> + */
> +void ovpn_udp_socket_detach(struct socket *sock)
> +{
> +	struct udp_tunnel_sock_cfg cfg = { };
> +
> +	setup_udp_tunnel_sock(sock_net(sock->sk), sock, &cfg);
> +}
> diff --git a/drivers/net/ovpn/udp.h b/drivers/net/ovpn/udp.h
> index e60f8cd2b4ac8f910aabcf8ed546af59d6ca4be4..fecb68464896bc1228315faf268453f9005e693d 100644
> --- a/drivers/net/ovpn/udp.h
> +++ b/drivers/net/ovpn/udp.h
> @@ -18,8 +18,9 @@ struct sk_buff;
>   struct socket;
>   
>   int ovpn_udp_socket_attach(struct socket *sock, struct ovpn_struct *ovpn);
> -
> +void ovpn_udp_socket_detach(struct socket *sock);
>   void ovpn_udp_send_skb(struct ovpn_struct *ovpn, struct ovpn_peer *peer,
>   		       struct sk_buff *skb);
> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk);
>   
>   #endif /* _NET_OVPN_UDP_H_ */
>

Sabrina Dubroca Nov. 11, 2024, 3:41 p.m. UTC | #18

2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
> +static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info,
> +			       struct nlattr **attrs)
> +{
> +	struct sockaddr_storage ss = {};
> +	u32 sockfd, interv, timeout;
> +	struct socket *sock = NULL;
> +	u8 *local_ip = NULL;
> +	bool rehash = false;
> +	int ret;
> +
> +	if (attrs[OVPN_A_PEER_SOCKET]) {
> +		/* lookup the fd in the kernel table and extract the socket
> +		 * object
> +		 */
> +		sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]);
> +		/* sockfd_lookup() increases sock's refcounter */
> +		sock = sockfd_lookup(sockfd, &ret);
> +		if (!sock) {
> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +					       "cannot lookup peer socket (fd=%u): %d",
> +					       sockfd, ret);
> +			return -ENOTSOCK;
> +		}
> +
> +		/* Only when using UDP as transport protocol the remote endpoint
> +		 * can be configured so that ovpn knows where to send packets
> +		 * to.
> +		 *
> +		 * In case of TCP, the socket is connected to the peer and ovpn
> +		 * will just send bytes over it, without the need to specify a
> +		 * destination.
> +		 */
> +		if (sock->sk->sk_protocol != IPPROTO_UDP &&
> +		    (attrs[OVPN_A_PEER_REMOTE_IPV4] ||
> +		     attrs[OVPN_A_PEER_REMOTE_IPV6])) {
> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +					       "unexpected remote IP address for non UDP socket");
> +			sockfd_put(sock);
> +			return -EINVAL;
> +		}
> +
> +		if (peer->sock)
> +			ovpn_socket_put(peer->sock);
> +
> +		peer->sock = ovpn_socket_new(sock, peer);

I don't see anything preventing concurrent updates of peer->sock. I
think peer->lock should be taken from the start of
ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and
peer->keepalive_* are also not prevented with the current code.


> +		if (IS_ERR(peer->sock)) {
> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +					       "cannot encapsulate socket: %ld",
> +					       PTR_ERR(peer->sock));
> +			sockfd_put(sock);
> +			peer->sock = NULL;
> +			return -ENOTSOCK;
> +		}
> +	}
> +
> +	if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) {
> +		/* we carry the local IP in a generic container.
> +		 * ovpn_peer_reset_sockaddr() will properly interpret it
> +		 * based on ss.ss_family
> +		 */
> +		local_ip = ovpn_nl_attr_local_ip(attrs);
> +
> +		spin_lock_bh(&peer->lock);
> +		/* set peer sockaddr */
> +		ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip);
> +		if (ret < 0) {
> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +					       "cannot set peer sockaddr: %d",
> +					       ret);
> +			spin_unlock_bh(&peer->lock);
> +			return ret;
> +		}
> +		spin_unlock_bh(&peer->lock);
> +	}
> +
> +	if (attrs[OVPN_A_PEER_VPN_IPV4]) {
> +		rehash = true;
> +		peer->vpn_addrs.ipv4.s_addr =
> +			nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]);
> +	}
> +
> +	if (attrs[OVPN_A_PEER_VPN_IPV6]) {
> +		rehash = true;
> +		peer->vpn_addrs.ipv6 =
> +			nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]);
> +	}
> +
> +	/* when setting the keepalive, both parameters have to be configured */
> +	if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] &&
> +	    attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) {
> +		interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]);
> +		timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]);
> +		ovpn_peer_keepalive_set(peer, interv, timeout);
> +	}
> +
> +	netdev_dbg(peer->ovpn->dev,
> +		   "%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n",
> +		   __func__, peer->id, &ss,
> +		   peer->sock->sock->sk->sk_prot_creator->name,
> +		   &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6);
> +
> +	return rehash ? 1 : 0;
> +}
> +

[...]
> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
> +	__must_hold(&peer->ovpn->peers->lock)

Changes to peer->vpn_addrs are not protected by peers->lock, so those
could be getting updated while we're rehashing (and taking peer->lock
in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent
that).

> +{
> +	struct hlist_nulls_head *nhead;
> +
> +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
> +		/* remove potential old hashing */
> +		hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
> +
> +		nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
> +					   &peer->vpn_addrs.ipv4,
> +					   sizeof(peer->vpn_addrs.ipv4));
> +		hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
> +	}
> +
> +	if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
> +		/* remove potential old hashing */
> +		hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
> +
> +		nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
> +					   &peer->vpn_addrs.ipv6,
> +					   sizeof(peer->vpn_addrs.ipv6));
> +		hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
> +	}
> +}

Sergey Ryazanov Nov. 12, 2024, 12:16 a.m. UTC | #19

On 29.10.2024 12:47, Antonio Quartulli wrote:
> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	unsigned int pkt_len;
> +
> +	/* we can't guarantee the packet wasn't corrupted before entering the
> +	 * VPN, therefore we give other layers a chance to check that
> +	 */
> +	skb->ip_summed = CHECKSUM_NONE;
> +
> +	/* skb hash for transport packet no longer valid after decapsulation */
> +	skb_clear_hash(skb);
> +
> +	/* post-decrypt scrub -- prepare to inject encapsulated packet onto the
> +	 * interface, based on __skb_tunnel_rx() in dst.h
> +	 */
> +	skb->dev = peer->ovpn->dev;
> +	skb_set_queue_mapping(skb, 0);
> +	skb_scrub_packet(skb, true);
> +
> +	skb_reset_network_header(skb);
> +	skb_reset_transport_header(skb);
> +	skb_probe_transport_header(skb);
> +	skb_reset_inner_headers(skb);
> +
> +	memset(skb->cb, 0, sizeof(skb->cb));
> +
> +	/* cause packet to be "received" by the interface */
> +	pkt_len = skb->len;
> +	if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
> +				     skb) == NET_RX_SUCCESS))

nit: to improve readability, the packet delivery call can be composed 
like this:

       pkt_len = skb->len;
       res = gro_cells_receive(&peer->ovpn->gro_cells, skb);
       if (likely(res == NET_RX_SUCCESS))

> +		/* update RX stats with the size of decrypted packet */
> +		dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
> +}

Sergey Ryazanov Nov. 12, 2024, 1:18 a.m. UTC | #20

On 04.11.2024 13:26, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
>>   struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct *ovpn,
>>   					       struct sk_buff *skb)
>>   {
>> -	struct ovpn_peer *peer = NULL;
>> +	struct ovpn_peer *tmp, *peer = NULL;
>>   	struct sockaddr_storage ss = { 0 };
>> +	struct hlist_nulls_head *nhead;
>> +	struct hlist_nulls_node *ntmp;
>> +	size_t sa_len;
>>   
>>   	if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
>>   		return NULL;
>>   
>>   	if (ovpn->mode == OVPN_MODE_P2P)
>> -		peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
>> +		return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
>> +
>> +	switch (ss.ss_family) {
>> +	case AF_INET:
>> +		sa_len = sizeof(struct sockaddr_in);
>> +		break;
>> +	case AF_INET6:
>> +		sa_len = sizeof(struct sockaddr_in6);
>> +		break;
>> +	default:
>> +		return NULL;
>> +	}
> 
> You could get rid of that switch by having ovpn_peer_skb_to_sockaddr
> also set sa_len (or return 0/the size).
> 
>> +
>> +	nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, sa_len);
>> +
>> +	rcu_read_lock();
>> +	hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead,
>> +				       hash_entry_transp_addr) {
> 
> I think that's missing the retry in case we ended up in the wrong
> bucket due to a peer rehash?

Nice catch! I am also wondering why the 'nulls' variant was selected, 
but there are no nulls value verification with the search respin.

Since we started discussing the list API, why the 'nulls' variant is 
used for address hash tables and the normal variant is used for the 
peer-id lookup?

> 
>> +		if (!ovpn_peer_transp_match(tmp, &ss))
>> +			continue;
>> +
>> +		if (!ovpn_peer_hold(tmp))
>> +			continue;
>> +
>> +		peer = tmp;
>> +		break;
>> +	}
>> +	rcu_read_unlock();
>>   
>>   	return peer;
>>   }

--
Sergey

Antonio Quartulli Nov. 12, 2024, 10:12 a.m. UTC | #21

On 05/11/2024 14:12, Sabrina Dubroca wrote:
> 2024-10-30, 21:47:58 +0100, Antonio Quartulli wrote:
>> On 30/10/2024 17:37, Sabrina Dubroca wrote:
>>> 2024-10-29, 11:47:19 +0100, Antonio Quartulli wrote:
>>>> +static void ovpn_peer_release(struct ovpn_peer *peer)
>>>> +{
>>>> +	ovpn_bind_reset(peer, NULL);
>>>> +
>>>> +	dst_cache_destroy(&peer->dst_cache);
>>>
>>> Is it safe to destroy the cache at this time? In the same function, we
>>> use rcu to free the peer, but AFAICT the dst_cache will be freed
>>> immediately:
>>>
>>> void dst_cache_destroy(struct dst_cache *dst_cache)
>>> {
>>> [...]
>>> 	free_percpu(dst_cache->cache);
>>> }
>>>
>>> (probably no real issue because ovpn_udp_send_skb gets called while we
>>> hold a reference to the peer?)
>>
>> Right.
>> That was my assumption: release happens on refcnt = 0 only, therefore no
>> field should be in use anymore.
>> Anything that may still be in use will have its own refcounter.
> 
> My worry is that code changes over time, assumptions are forgotten,
> and we end up with code that was a bit odd but safe not being safe
> anymore.

Yeah, makes sense.
I'll move the call to dst_cache_destroy() and to kfree(peer) in a RCU 
callback.

Thanks!

> 
>>>
>>>> +	netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
>>>> +	kfree_rcu(peer, rcu);
>>>> +}
>>>
>>>
>>> [...]
>>>> +static int ovpn_peer_del_p2p(struct ovpn_peer *peer,
>>>> +			     enum ovpn_del_peer_reason reason)
>>>> +	__must_hold(&peer->ovpn->lock)
>>>> +{
>>>> +	struct ovpn_peer *tmp;
>>>> +
>>>> +	tmp = rcu_dereference_protected(peer->ovpn->peer,
>>>> +					lockdep_is_held(&peer->ovpn->lock));
>>>> +	if (tmp != peer) {
>>>> +		DEBUG_NET_WARN_ON_ONCE(1);
>>>> +		if (tmp)
>>>> +			ovpn_peer_put(tmp);
>>>
>>> Does peer->ovpn->peer need to be set to NULL here as well? Or is it
>>> going to survive this _put?
>>
>> First of all consider that this is truly something that we don't expect to
>> happen (hence the WARN_ON).
>> If this is happening it's because we are trying to delete a peer that is not
>> the one we are connected to (unexplainable scenario in p2p mode).
>>
>> Still, should we hit this case (I truly can't see how), I'd say "leave
>> everything as is - maybe this call was just a mistake".
> 
> Yeah, true, let's leave it. Thanks.
>

Sabrina Dubroca Nov. 12, 2024, 10:56 a.m. UTC | #22

2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644
> --- a/drivers/net/ovpn/io.c
> +++ b/drivers/net/ovpn/io.c
> @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
>  	/* keep track of last received authenticated packet for keepalive */
>  	peer->last_recv = ktime_get_real_seconds();
>  
> +	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {

What prevents peer->sock from being replaced and released
concurrently?

Or possibly reading the error value that ovpn_socket_new can return
before peer->sock is reset to NULL, just noticed this in
ovpn_nl_peer_modify:

	if (attrs[OVPN_A_PEER_SOCKET]) {
		// ...
		peer->sock = ovpn_socket_new(sock, peer);
		if (IS_ERR(peer->sock)) {
			// ...
			peer->sock = NULL;


(ovpn_encrypt_post has a similar check on
peer->sock->sock->sk->sk_protocol that I don't think is safe either)


> +		/* check if this peer changed it's IP address and update
> +		 * state
> +		 */
> +		ovpn_peer_float(peer, skb);
> +		/* update source endpoint for this peer */
> +		ovpn_peer_update_local_endpoint(peer, skb);

Why not do both in the same function? They're not called anywhere else
(at least in this version of the series). They both modify peer->bind
depending on skb_protocol_to_family(skb), and operate under
peer->lock.


> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	struct hlist_nulls_head *nhead;
> +	struct sockaddr_storage ss;
> +	const u8 *local_ip = NULL;
> +	struct sockaddr_in6 *sa6;
> +	struct sockaddr_in *sa;
> +	struct ovpn_bind *bind;
> +	sa_family_t family;
> +	size_t salen;
> +
> +	rcu_read_lock();
> +	bind = rcu_dereference(peer->bind);
> +	if (unlikely(!bind)) {
> +		rcu_read_unlock();
> +		return;
> +	}
> +
> +	spin_lock_bh(&peer->lock);

You could take the lock from the start, instead of using rcu_read_lock
to get peer->bind. It would guarantee that the bind we got isn't
already being replaced just as we wait to update it. And same in
ovpn_peer_update_local_endpoint, it would make sure we're updating the
local IP for the active bind.

(sorry I didn't think about that last time we discussed this)

> +	if (likely(ovpn_bind_skb_src_match(bind, skb)))
> +		goto unlock;
> +
> +	family = skb_protocol_to_family(skb);
> +

Antonio Quartulli Nov. 12, 2024, 12:32 p.m. UTC | #23

On 12/11/2024 02:18, Sergey Ryazanov wrote:
> On 04.11.2024 13:26, Sabrina Dubroca wrote:
>> 2024-10-29, 11:47:27 +0100, Antonio Quartulli wrote:
>>>   struct ovpn_peer *ovpn_peer_get_by_transp_addr(struct ovpn_struct 
>>> *ovpn,
>>>                              struct sk_buff *skb)
>>>   {
>>> -    struct ovpn_peer *peer = NULL;
>>> +    struct ovpn_peer *tmp, *peer = NULL;
>>>       struct sockaddr_storage ss = { 0 };
>>> +    struct hlist_nulls_head *nhead;
>>> +    struct hlist_nulls_node *ntmp;
>>> +    size_t sa_len;
>>>       if (unlikely(!ovpn_peer_skb_to_sockaddr(skb, &ss)))
>>>           return NULL;
>>>       if (ovpn->mode == OVPN_MODE_P2P)
>>> -        peer = ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
>>> +        return ovpn_peer_get_by_transp_addr_p2p(ovpn, &ss);
>>> +
>>> +    switch (ss.ss_family) {
>>> +    case AF_INET:
>>> +        sa_len = sizeof(struct sockaddr_in);
>>> +        break;
>>> +    case AF_INET6:
>>> +        sa_len = sizeof(struct sockaddr_in6);
>>> +        break;
>>> +    default:
>>> +        return NULL;
>>> +    }
>>
>> You could get rid of that switch by having ovpn_peer_skb_to_sockaddr
>> also set sa_len (or return 0/the size).

Yeah, makes sense. Thanks!

>>
>>> +
>>> +    nhead = ovpn_get_hash_head(ovpn->peers->by_transp_addr, &ss, 
>>> sa_len);
>>> +
>>> +    rcu_read_lock();
>>> +    hlist_nulls_for_each_entry_rcu(tmp, ntmp, nhead,
>>> +                       hash_entry_transp_addr) {
>>
>> I think that's missing the retry in case we ended up in the wrong
>> bucket due to a peer rehash?

Oh, for some reason I convinced myself that this is handled behind the 
scene, but indeed the lookup must be explicitly restarted.

will fix it, thanks for pointing this out!


> 
> Nice catch! I am also wondering why the 'nulls' variant was selected, 
> but there are no nulls value verification with the search respin.
> 
> Since we started discussing the list API, why the 'nulls' variant is 
> used for address hash tables and the normal variant is used for the 
> peer-id lookup?

Because the nulls variant is used only for tables where a re-hash can 
happen.

The peer-id table does not expect its objected to be re-used or 
re-hashed since the ID of a peer cannot change throughout its lifetime.


Regards,


> 
>>
>>> +        if (!ovpn_peer_transp_match(tmp, &ss))
>>> +            continue;
>>> +
>>> +        if (!ovpn_peer_hold(tmp))
>>> +            continue;
>>> +
>>> +        peer = tmp;
>>> +        break;
>>> +    }
>>> +    rcu_read_unlock();
>>>       return peer;
>>>   }
> 
> -- 
> Sergey
>

Antonio Quartulli Nov. 12, 2024, 1:52 p.m. UTC | #24

On 04/11/2024 12:24, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
>> +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
>> +				    const struct sockaddr_storage *ss,
>> +				    const u8 *local_ip)
>> +	__must_hold(&peer->lock)
>> +{
>> +	struct ovpn_bind *bind;
>> +	size_t ip_len;
>> +
>> +	/* create new ovpn_bind object */
>> +	bind = ovpn_bind_from_sockaddr(ss);
>> +	if (IS_ERR(bind))
>> +		return PTR_ERR(bind);
>> +
>> +	if (local_ip) {
>> +		if (ss->ss_family == AF_INET) {
>> +			ip_len = sizeof(struct in_addr);
>> +		} else if (ss->ss_family == AF_INET6) {
>> +			ip_len = sizeof(struct in6_addr);
>> +		} else {
>> +			netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
>> +				   __func__);
> 
> ratelimited since that can be triggered from packet processing?

ACK

> 
> 
> [...]
>> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
>> +{
> [...]
>> +
>> +	switch (family) {
>> +	case AF_INET:
>> +		sa = (struct sockaddr_in *)&ss;
>> +		sa->sin_family = AF_INET;
>> +		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
>> +		sa->sin_port = udp_hdr(skb)->source;
>> +		salen = sizeof(*sa);
>> +		break;
>> +	case AF_INET6:
>> +		sa6 = (struct sockaddr_in6 *)&ss;
>> +		sa6->sin6_family = AF_INET6;
>> +		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
>> +		sa6->sin6_port = udp_hdr(skb)->source;
>> +		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
>> +							 skb->skb_iif);
>> +		salen = sizeof(*sa6);
>> +		break;
>> +	default:
>> +		goto unlock;
>> +	}
>> +
>> +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
> 
>                                                %u for peer->id?
> 
> and ratelimited too, probably.
> 
> (also in ovpn_peer_update_local_endpoint in the previous patch)

Technically we don't expect that frequent float/endpoint updates, but 
should they happen..better to be protected.

ACK

> 
>> +		   peer->id, &ss);
>> +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
>> +				 local_ip);
> 
> skip the rehash if this fails? peer->bind will still be the old one so
> moving it to the new hash chain won't help (the lookup will fail).

Yeah, it makes sense.

Thanks a lot.
Regards,

Antonio Quartulli Nov. 12, 2024, 2:03 p.m. UTC | #25

On 12/11/2024 11:56, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>> index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644
>> --- a/drivers/net/ovpn/io.c
>> +++ b/drivers/net/ovpn/io.c
>> @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
>>   	/* keep track of last received authenticated packet for keepalive */
>>   	peer->last_recv = ktime_get_real_seconds();
>>   
>> +	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
> 
> What prevents peer->sock from being replaced and released
> concurrently?

Technically nothing.
Userspace currently does not even support updating a peer socket at 
runtime, but I wanted ovpn to be flexible enough from the beginning.

One approach might be to go back to peer->sock being unmutable and 
forget about this.

OTOH, if we want to keep this flexibility (which I think is nice), I 
think I should make peer->sock an RCU pointer and access it accordingly.
Does it make sense?

> 
> Or possibly reading the error value that ovpn_socket_new can return
> before peer->sock is reset to NULL, just noticed this in
> ovpn_nl_peer_modify:
> 
> 	if (attrs[OVPN_A_PEER_SOCKET]) {
> 		// ...
> 		peer->sock = ovpn_socket_new(sock, peer);
> 		if (IS_ERR(peer->sock)) {
> 			// ...
> 			peer->sock = NULL;
> 
> 
> (ovpn_encrypt_post has a similar check on
> peer->sock->sock->sk->sk_protocol that I don't think is safe either)

Yap, agreed.

> 
> 
>> +		/* check if this peer changed it's IP address and update
>> +		 * state
>> +		 */
>> +		ovpn_peer_float(peer, skb);
>> +		/* update source endpoint for this peer */
>> +		ovpn_peer_update_local_endpoint(peer, skb);
> 
> Why not do both in the same function? They're not called anywhere else
> (at least in this version of the series). They both modify peer->bind
> depending on skb_protocol_to_family(skb), and operate under
> peer->lock.

I never considered to do so as I just always assumed the two to be two 
separate features/routines.

I think it's a good idea and I would get rid of a few common 
instructions (along with acquiring the lock twice). Thanks!

> 
> 
>> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
>> +{
>> +	struct hlist_nulls_head *nhead;
>> +	struct sockaddr_storage ss;
>> +	const u8 *local_ip = NULL;
>> +	struct sockaddr_in6 *sa6;
>> +	struct sockaddr_in *sa;
>> +	struct ovpn_bind *bind;
>> +	sa_family_t family;
>> +	size_t salen;
>> +
>> +	rcu_read_lock();
>> +	bind = rcu_dereference(peer->bind);
>> +	if (unlikely(!bind)) {
>> +		rcu_read_unlock();
>> +		return;
>> +	}
>> +
>> +	spin_lock_bh(&peer->lock);
> 
> You could take the lock from the start, instead of using rcu_read_lock
> to get peer->bind. It would guarantee that the bind we got isn't
> already being replaced just as we wait to update it. And same in
> ovpn_peer_update_local_endpoint, it would make sure we're updating the
> local IP for the active bind.
> 
> (sorry I didn't think about that last time we discussed this)

no worries :) and I like the idea. will do that, thanks.

> 
>> +	if (likely(ovpn_bind_skb_src_match(bind, skb)))
>> +		goto unlock;
>> +
>> +	family = skb_protocol_to_family(skb);
>> +
>

Antonio Quartulli Nov. 12, 2024, 2:19 p.m. UTC | #26

On 04/11/2024 16:14, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
>> +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
>> +				 struct genl_info *info,
>> +				 struct nlattr **attrs)
>> +{
>> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
>> +			      OVPN_A_PEER_ID))
>> +		return -EINVAL;
>> +
>> +	if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "cannot specify both remote IPv4 or IPv6 address");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
>> +	    !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "cannot specify remote port without IP address");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
>> +	    attrs[OVPN_A_PEER_LOCAL_IPV4]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "cannot specify local IPv4 address without remote");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
>> +	    attrs[OVPN_A_PEER_LOCAL_IPV6]) {
> 
> I think these consistency checks should account for v4mapped
> addresses. With remote=v4mapped and local=v6 we'll end up with an
> incorrect ipv4 "local" address (taken out of the ipv6 address's first
> 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped,
> we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to
> ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address)
> out of that.

Right, a v4mapped address would fool this check.
How about checking if both or none addresses are v4mapped? This way we 
should prevent such cases.


> 
>> +		NL_SET_ERR_MSG_MOD(info->extack,
>> +				   "cannot specify local IPV6 address without remote");
>> +		return -EINVAL;
>> +	}
> 
> 
> [...]
>>   int ovpn_nl_peer_set_doit(struct sk_buff *skb, struct genl_info *info)
>>   {
> [...]
>> +	ret = ovpn_nl_peer_modify(peer, info, attrs);
>> +	if (ret < 0) {
>> +		ovpn_peer_put(peer);
>> +		return ret;
>> +	}
>> +
>> +	/* ret == 1 means that VPN IPv4/6 has been modified and rehashing
>> +	 * is required
>> +	 */
>> +	if (ret > 0) {
> 
> && mode == MP ?
> 
> I don't see ovpn_nl_peer_modify checking that before returning 1, and
> in P2P mode ovpn->peers will be NULL.

Right.
I was wondering if it's better to add the check on the return statement 
of ovpn_nl_peer_modify...but I think it's more functional to add it 
here, as per your suggestion.

> 
>> +		spin_lock_bh(&ovpn->peers->lock);
>> +		ovpn_peer_hash_vpn_ip(peer);
>> +		spin_unlock_bh(&ovpn->peers->lock);
>> +	}
>> +
>> +	ovpn_peer_put(peer);
>> +
>> +	return 0;
>> +}
> 
>>   int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
>>   {
> [...]
>> +	} else {
>> +		rcu_read_lock();
>> +		hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
>> +				  hash_entry_id) {
>> +			/* skip already dumped peers that were dumped by
>> +			 * previous invocations
>> +			 */
>> +			if (last_idx > 0) {
>> +				last_idx--;
>> +				continue;
>> +			}
> 
> If a peer that was dumped during a previous invocation is removed in
> between, we'll miss one that's still present in the overall dump. I
> don't know how much it matters (I guses it depends on how the results
> of this dump are used by userspace), so I'll let you decide if this
> needs to be fixed immediately or if it can be ignored for now.

True, this is a risk I assumed.
Not extremely important if you ask me, but do you have any suggestion 
how to avoid this in an elegant and lockless way?

IIRC I got inspired by the station dump in the mac80211 code, which 
probably assumes the same risk.

> 
>> +
>> +			if (ovpn_nl_send_peer(skb, info, peer,
>> +					      NETLINK_CB(cb->skb).portid,
>> +					      cb->nlh->nlmsg_seq,
>> +					      NLM_F_MULTI) < 0)
>> +				break;
>> +
>> +			/* count peers being dumped during this invocation */
>> +			dumped++;
>> +		}
>> +		rcu_read_unlock();
>> +	}
>> +
>> +out:
>> +	netdev_put(ovpn->dev, &ovpn->dev_tracker);
>> +
>> +	/* sum up peers dumped in this message, so that at the next invocation
>> +	 * we can continue from where we left
>> +	 */
>> +	cb->args[1] += dumped;
>> +	return skb->len;
>>   }
>>   
>>   int ovpn_nl_peer_del_doit(struct sk_buff *skb, struct genl_info *info)
>>   {
>> -	return -EOPNOTSUPP;
>> +	struct nlattr *attrs[OVPN_A_PEER_MAX + 1];
>> +	struct ovpn_struct *ovpn = info->user_ptr[0];
>> +	struct ovpn_peer *peer;
>> +	u32 peer_id;
>> +	int ret;
>> +
>> +	if (GENL_REQ_ATTR_CHECK(info, OVPN_A_PEER))
>> +		return -EINVAL;
>> +
>> +	ret = nla_parse_nested(attrs, OVPN_A_PEER_MAX, info->attrs[OVPN_A_PEER],
>> +			       ovpn_peer_nl_policy, info->extack);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
>> +			      OVPN_A_PEER_ID))
>> +		return -EINVAL;
>> +
>> +	peer_id = nla_get_u32(attrs[OVPN_A_PEER_ID]);
>> +
>> +	peer = ovpn_peer_get_by_id(ovpn, peer_id);
>> +	if (!peer)
> 
> maybe c/p the extack from ovpn_nl_peer_get_doit?

Yes, will do.

Thanks a lot.
Regards,

> 
>> +		return -ENOENT;
>> +
>> +	netdev_dbg(ovpn->dev, "%s: peer id=%u\n", __func__, peer->id);
>> +	ret = ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_USERSPACE);
>> +	ovpn_peer_put(peer);
>> +
>> +	return ret;
>>   }
>

Antonio Quartulli Nov. 12, 2024, 2:26 p.m. UTC | #27

On 11/11/2024 16:41, Sabrina Dubroca wrote:
> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
>> +static int ovpn_nl_peer_modify(struct ovpn_peer *peer, struct genl_info *info,
>> +			       struct nlattr **attrs)
>> +{
>> +	struct sockaddr_storage ss = {};
>> +	u32 sockfd, interv, timeout;
>> +	struct socket *sock = NULL;
>> +	u8 *local_ip = NULL;
>> +	bool rehash = false;
>> +	int ret;
>> +
>> +	if (attrs[OVPN_A_PEER_SOCKET]) {
>> +		/* lookup the fd in the kernel table and extract the socket
>> +		 * object
>> +		 */
>> +		sockfd = nla_get_u32(attrs[OVPN_A_PEER_SOCKET]);
>> +		/* sockfd_lookup() increases sock's refcounter */
>> +		sock = sockfd_lookup(sockfd, &ret);
>> +		if (!sock) {
>> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
>> +					       "cannot lookup peer socket (fd=%u): %d",
>> +					       sockfd, ret);
>> +			return -ENOTSOCK;
>> +		}
>> +
>> +		/* Only when using UDP as transport protocol the remote endpoint
>> +		 * can be configured so that ovpn knows where to send packets
>> +		 * to.
>> +		 *
>> +		 * In case of TCP, the socket is connected to the peer and ovpn
>> +		 * will just send bytes over it, without the need to specify a
>> +		 * destination.
>> +		 */
>> +		if (sock->sk->sk_protocol != IPPROTO_UDP &&
>> +		    (attrs[OVPN_A_PEER_REMOTE_IPV4] ||
>> +		     attrs[OVPN_A_PEER_REMOTE_IPV6])) {
>> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
>> +					       "unexpected remote IP address for non UDP socket");
>> +			sockfd_put(sock);
>> +			return -EINVAL;
>> +		}
>> +
>> +		if (peer->sock)
>> +			ovpn_socket_put(peer->sock);
>> +
>> +		peer->sock = ovpn_socket_new(sock, peer);
> 
> I don't see anything preventing concurrent updates of peer->sock. I
> think peer->lock should be taken from the start of
> ovpn_nl_peer_modify. Concurrent changes to peer->vpn_addrs and
> peer->keepalive_* are also not prevented with the current code.

Yeah, this came up to my mind as well when checking the keepalive worker 
code.

I'll make sure all updates happen under lock.

> 
> 
>> +		if (IS_ERR(peer->sock)) {
>> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
>> +					       "cannot encapsulate socket: %ld",
>> +					       PTR_ERR(peer->sock));
>> +			sockfd_put(sock);
>> +			peer->sock = NULL;
>> +			return -ENOTSOCK;
>> +		}
>> +	}
>> +
>> +	if (ovpn_nl_attr_sockaddr_remote(attrs, &ss) != AF_UNSPEC) {
>> +		/* we carry the local IP in a generic container.
>> +		 * ovpn_peer_reset_sockaddr() will properly interpret it
>> +		 * based on ss.ss_family
>> +		 */
>> +		local_ip = ovpn_nl_attr_local_ip(attrs);
>> +
>> +		spin_lock_bh(&peer->lock);
>> +		/* set peer sockaddr */
>> +		ret = ovpn_peer_reset_sockaddr(peer, &ss, local_ip);
>> +		if (ret < 0) {
>> +			NL_SET_ERR_MSG_FMT_MOD(info->extack,
>> +					       "cannot set peer sockaddr: %d",
>> +					       ret);
>> +			spin_unlock_bh(&peer->lock);
>> +			return ret;
>> +		}
>> +		spin_unlock_bh(&peer->lock);
>> +	}
>> +
>> +	if (attrs[OVPN_A_PEER_VPN_IPV4]) {
>> +		rehash = true;
>> +		peer->vpn_addrs.ipv4.s_addr =
>> +			nla_get_in_addr(attrs[OVPN_A_PEER_VPN_IPV4]);
>> +	}
>> +
>> +	if (attrs[OVPN_A_PEER_VPN_IPV6]) {
>> +		rehash = true;
>> +		peer->vpn_addrs.ipv6 =
>> +			nla_get_in6_addr(attrs[OVPN_A_PEER_VPN_IPV6]);
>> +	}
>> +
>> +	/* when setting the keepalive, both parameters have to be configured */
>> +	if (attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL] &&
>> +	    attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]) {
>> +		interv = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_INTERVAL]);
>> +		timeout = nla_get_u32(attrs[OVPN_A_PEER_KEEPALIVE_TIMEOUT]);
>> +		ovpn_peer_keepalive_set(peer, interv, timeout);
>> +	}
>> +
>> +	netdev_dbg(peer->ovpn->dev,
>> +		   "%s: peer id=%u endpoint=%pIScp/%s VPN-IPv4=%pI4 VPN-IPv6=%pI6c\n",
>> +		   __func__, peer->id, &ss,
>> +		   peer->sock->sock->sk->sk_prot_creator->name,
>> +		   &peer->vpn_addrs.ipv4.s_addr, &peer->vpn_addrs.ipv6);
>> +
>> +	return rehash ? 1 : 0;
>> +}
>> +
> 
> [...]
>> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
>> +	__must_hold(&peer->ovpn->peers->lock)
> 
> Changes to peer->vpn_addrs are not protected by peers->lock, so those
> could be getting updated while we're rehashing (and taking peer->lock
> in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent
> that).
> 

/me screams :-D

Indeed peers->lock is only about protecting the lists, not the content 
of the listed objects.

How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
This way we prevent concurrent updates to interfere with each other, 
while at the same time we avoid concurrent adds/dels of the peer (the 
second part should already be protected as of today).

None of them is time critical and the lock should avoid the issue you 
mentioned.


Thanks a lot.

Regards,

>> +{
>> +	struct hlist_nulls_head *nhead;
>> +
>> +	if (peer->vpn_addrs.ipv4.s_addr != htonl(INADDR_ANY)) {
>> +		/* remove potential old hashing */
>> +		hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
>> +
>> +		nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
>> +					   &peer->vpn_addrs.ipv4,
>> +					   sizeof(peer->vpn_addrs.ipv4));
>> +		hlist_nulls_add_head_rcu(&peer->hash_entry_addr4, nhead);
>> +	}
>> +
>> +	if (!ipv6_addr_any(&peer->vpn_addrs.ipv6)) {
>> +		/* remove potential old hashing */
>> +		hlist_nulls_del_init_rcu(&peer->hash_entry_transp_addr);
>> +
>> +		nhead = ovpn_get_hash_head(peer->ovpn->peers->by_vpn_addr,
>> +					   &peer->vpn_addrs.ipv6,
>> +					   sizeof(peer->vpn_addrs.ipv6));
>> +		hlist_nulls_add_head_rcu(&peer->hash_entry_addr6, nhead);
>> +	}
>> +}
>

Sabrina Dubroca Nov. 12, 2024, 4:47 p.m. UTC | #28

2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
> > +/* When the OpenVPN protocol is ran in AEAD mode, use
> > + * the OpenVPN packet ID as the AEAD nonce:
> > + *
> > + *    00000005 521c3b01 4308c041
> > + *    [seq # ] [  nonce_tail   ]
> > + *    [     12-byte full IV    ] -> NONCE_SIZE
> > + *    [4-bytes                   -> NONCE_WIRE_SIZE
> > + *    on wire]
> > + */
> 
> Nice diagram! Can we go futher and define the OpenVPN packet header as a
> stucture? Referencing the structure instead of using magic sizes and offsets
> can greatly improve the code readability. Especially when it comes to header
> construction/parsing in the encryption/decryption code.
> 
> E.g. define a structures like this:
> 
> struct ovpn_pkt_hdr {
>   __be32 op;
>   __be32 pktid;
>   u8 auth[];
> } __attribute__((packed));
> 
> struct ovpn_aead_iv {
>   __be32 pktid;
>   u8 nonce[OVPN_NONCE_TAIL_SIZE];
> } __attribute__((packed));

__attribute__((packed)) should not be needed here as the fields in
both structs look properly aligned, and IIRC using packed can cause
the compiler to generate worse code.


> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -1975,4 +1975,19 @@ enum {
> >   #define IFLA_DSA_MAX	(__IFLA_DSA_MAX - 1)
> > +/* OVPN section */
> > +
> > +enum ovpn_mode {
> > +	OVPN_MODE_P2P,
> > +	OVPN_MODE_MP,
> > +};
> 
> Mode min/max values can be defined here and the netlink policy can reference
> these values:
> 
> enum ovpn_mode {
>   OVPN_MODE_P2P,
>   OVPN_MODE_MP,
>   __OVPN_MODE_MAX
> };
> 
> #define OVPN_MODE_MIN OVPN_MODE_P2P
> #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
> 
> ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)

I don't think there's much benefit to that, other than making the diff
smaller on a (very unlikely) patch that would add a new mode in the
future. It even looks more inconvenient to me when reading the code
("ok what are _MIN and _MAX?  the code is using _P2P and _MP, do they
match?").

Sabrina Dubroca Nov. 12, 2024, 5:31 p.m. UTC | #29

2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
> > An ovpn_peer object holds the whole status of a remote peer
> > (regardless whether it is a server or a client).
> > 
> > This includes status for crypto, tx/rx buffers, napi, etc.
> > 
> > Only support for one peer is introduced (P2P mode).
> > Multi peer support is introduced with a later patch.
> 
> Reviewing the peer creation/destroying code I came to a generic question.
> Did you consider keeping a single P2P peer in the peers table as well?
> 
> Looks like such approach can greatly simply the code by dropping all these
> 'switch (ovpn->mode)' checks and implementing a unified peer management. The
> 'peer' field in the main private data structure can be kept to accelerate
> lookups, still using peers table for management tasks like removing all the
> peers on the interface teardown.

It would save a few 'switch(mode)', but force every client to allocate
the hashtable for no reason at all. That tradeoff doesn't look very
beneficial to me, the P2P-specific code is really simple. And if you
keep ovpn->peer to make lookups faster, you're not removing that many
'switch(mode)'.

Sergey Ryazanov Nov. 12, 2024, 11:56 p.m. UTC | #30

On 12.11.2024 18:47, Sabrina Dubroca wrote:
> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>> +/* When the OpenVPN protocol is ran in AEAD mode, use
>>> + * the OpenVPN packet ID as the AEAD nonce:
>>> + *
>>> + *    00000005 521c3b01 4308c041
>>> + *    [seq # ] [  nonce_tail   ]
>>> + *    [     12-byte full IV    ] -> NONCE_SIZE
>>> + *    [4-bytes                   -> NONCE_WIRE_SIZE
>>> + *    on wire]
>>> + */
>>
>> Nice diagram! Can we go futher and define the OpenVPN packet header as a
>> stucture? Referencing the structure instead of using magic sizes and offsets
>> can greatly improve the code readability. Especially when it comes to header
>> construction/parsing in the encryption/decryption code.
>>
>> E.g. define a structures like this:
>>
>> struct ovpn_pkt_hdr {
>>    __be32 op;
>>    __be32 pktid;
>>    u8 auth[];
>> } __attribute__((packed));
>>
>> struct ovpn_aead_iv {
>>    __be32 pktid;
>>    u8 nonce[OVPN_NONCE_TAIL_SIZE];
>> } __attribute__((packed));
> 
> __attribute__((packed)) should not be needed here as the fields in
> both structs look properly aligned, and IIRC using packed can cause
> the compiler to generate worse code.

True, the fields are pretty good aligned and from code generation 
perspective packed indication is unneeded. I suggested to mark structs 
as packed mostly as a documentation to clearly state that these 
structures represent specific memory layout.

>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -1975,4 +1975,19 @@ enum {
>>>    #define IFLA_DSA_MAX	(__IFLA_DSA_MAX - 1)
>>> +/* OVPN section */
>>> +
>>> +enum ovpn_mode {
>>> +	OVPN_MODE_P2P,
>>> +	OVPN_MODE_MP,
>>> +};
>>
>> Mode min/max values can be defined here and the netlink policy can reference
>> these values:
>>
>> enum ovpn_mode {
>>    OVPN_MODE_P2P,
>>    OVPN_MODE_MP,
>>    __OVPN_MODE_MAX
>> };
>>
>> #define OVPN_MODE_MIN OVPN_MODE_P2P
>> #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
>>
>> ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
> 
> I don't think there's much benefit to that, other than making the diff
> smaller on a (very unlikely) patch that would add a new mode in the
> future. It even looks more inconvenient to me when reading the code
> ("ok what are _MIN and _MAX?  the code is using _P2P and _MP, do they
> match?").

I would answer yes. Just prefer to trust these kind of statements until 
it crashes badly. Honestly, I never thought that referring to a max 
value might raise such a question. Can you give an example why it should 
be meaningful to know exact min/max values of an unordered set?

I suggested to define boundaries indeed for documentation purpose. Diff 
reduction is also desirable, but as you already mentioned, here it is 
not the case. Using specific values in a range declaration assigns them 
with extra semantic. Like, MODE_P2P is also a minimal possible value 
while MODE_MP has this extra meaning of minimal possible value. And we 
can learn this only from the policy which is specified far way from the 
modes declarations. I also see policies declaration as referring to 
already defined information rather than creating new meanings. On 
another hand the NL policy is the only user, so maybe we should left it 
as-is for the sake of simplicity.

--
Sergey

Sergey Ryazanov Nov. 13, 2024, 1:37 a.m. UTC | #31

On 12.11.2024 19:31, Sabrina Dubroca wrote:
> 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>> An ovpn_peer object holds the whole status of a remote peer
>>> (regardless whether it is a server or a client).
>>>
>>> This includes status for crypto, tx/rx buffers, napi, etc.
>>>
>>> Only support for one peer is introduced (P2P mode).
>>> Multi peer support is introduced with a later patch.
>>
>> Reviewing the peer creation/destroying code I came to a generic question.
>> Did you consider keeping a single P2P peer in the peers table as well?
>>
>> Looks like such approach can greatly simply the code by dropping all these
>> 'switch (ovpn->mode)' checks and implementing a unified peer management. The
>> 'peer' field in the main private data structure can be kept to accelerate
>> lookups, still using peers table for management tasks like removing all the
>> peers on the interface teardown.
> 
> It would save a few 'switch(mode)', but force every client to allocate
> the hashtable for no reason at all. That tradeoff doesn't look very
> beneficial to me, the P2P-specific code is really simple. And if you
> keep ovpn->peer to make lookups faster, you're not removing that many
> 'switch(mode)'.

Looking at the done review, I can retrospectively conclude that I 
personally do not like short 'switch' statements and special handlers :)

Seriously, this module has a highest density of switches per KLOC from 
what I have seen before and a major part of it dedicated to handle the 
special case of P2P connection. What together look too unusual, so it 
feels like a flaw in the design. I racked my brains to come up with a 
better solution and failed. So I took a different approach, inviting 
people to discuss item pieces of the code to find a solution 
collectively or to realize that there is no better solution for now.

The problem is that all these hash tables become inefficient with the 
single entry (P2P case). I was thinking about allocating a table with a 
single bin, but it still requires hash function run to access the 
indexed entry.

And back to the hashtable(s) size for the MP mode. 8k-bins table looks a 
good choice for a normal server with 1-2Gb uplink serving up to 1k 
connections. But it sill unclear, how this choice can affect 
installations with a bigger number of connections? Or is this module 
applicable for embedded solutions? E.g. running a couple of VPN servers 
on a home router with a few actual connections looks like a waste of 
RAM. I was about to suggest to use rhashtable due to its dynamic sizing 
feature, but the module needs three tables. Any better idea?

--
Sergey

Sabrina Dubroca Nov. 13, 2024, 10:03 a.m. UTC | #32

2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
> On 12.11.2024 19:31, Sabrina Dubroca wrote:
> > 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
> > > On 29.10.2024 12:47, Antonio Quartulli wrote:
> > > > An ovpn_peer object holds the whole status of a remote peer
> > > > (regardless whether it is a server or a client).
> > > > 
> > > > This includes status for crypto, tx/rx buffers, napi, etc.
> > > > 
> > > > Only support for one peer is introduced (P2P mode).
> > > > Multi peer support is introduced with a later patch.
> > > 
> > > Reviewing the peer creation/destroying code I came to a generic question.
> > > Did you consider keeping a single P2P peer in the peers table as well?
> > > 
> > > Looks like such approach can greatly simply the code by dropping all these
> > > 'switch (ovpn->mode)' checks and implementing a unified peer management. The
> > > 'peer' field in the main private data structure can be kept to accelerate
> > > lookups, still using peers table for management tasks like removing all the
> > > peers on the interface teardown.
> > 
> > It would save a few 'switch(mode)', but force every client to allocate
> > the hashtable for no reason at all. That tradeoff doesn't look very
> > beneficial to me, the P2P-specific code is really simple. And if you
> > keep ovpn->peer to make lookups faster, you're not removing that many
> > 'switch(mode)'.
> 
> Looking at the done review, I can retrospectively conclude that I personally
> do not like short 'switch' statements and special handlers :)
> 
> Seriously, this module has a highest density of switches per KLOC from what
> I have seen before and a major part of it dedicated to handle the special
> case of P2P connection.

I think it's fine. Either way there will be two implementations of
whatever mode-dependent operation needs to be done. switch doesn't
make it more complex than an ops structure.

If you're reading the current version and find ovpn_peer_add, you see
directly that it'll do either ovpn_peer_add_mp or
ovpn_peer_add_p2p. With an ops structure, you'd have a call to
ovpn->ops->peer_add, and you'd have to look up all possible ops
structures to know that it can be either ovpn_peer_add_mp or
ovpn_peer_add_p2p. If there's an undefined number of implementations
living in different modules (like net_device_ops, or L4 protocols),
you don't have a choice.

xfrm went the opposite way to what you're proposing a few years ago
(see commit 0c620e97b349 ("xfrm: remove output indirection from
xfrm_mode") and others), and it made the code simpler.

> What together look too unusual, so it feels like a
> flaw in the design.

I don't think it's a flaw in the design, maybe just different needs
from other code you've seen (but similar in some ways to xfrm).

> I racked my brains to come up with a better solution and
> failed. So I took a different approach, inviting people to discuss item
> pieces of the code to find a solution collectively or to realize that there
> is no better solution for now.

Sure. And I think there is no better solution, so I'm answering this
thread to say that.

> The problem is that all these hash tables become inefficient with the single
> entry (P2P case). I was thinking about allocating a table with a single bin,
> but it still requires hash function run to access the indexed entry.

And the current implementation relies on fixed-size hashtables
(hash_for_each_safe -> HASH_SIZE -> ARRAY_SIZE -> sizeof).

> And back to the hashtable(s) size for the MP mode. 8k-bins table looks a
> good choice for a normal server with 1-2Gb uplink serving up to 1k
> connections. But it sill unclear, how this choice can affect installations
> with a bigger number of connections? Or is this module applicable for
> embedded solutions? E.g. running a couple of VPN servers on a home router
> with a few actual connections looks like a waste of RAM. I was about to
> suggest to use rhashtable due to its dynamic sizing feature, but the module
> needs three tables. Any better idea?

For this initial implementation I think it's fine. Sure, converting to
rhashtable (or some other type of dynamically-sized hashtable, if
rhashtable doesn't fit) in the future would make sense. But I don't
think it's necessary to get the patches into net-next.

Sabrina Dubroca Nov. 13, 2024, 11:05 a.m. UTC | #33

2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
> On 11/11/2024 16:41, Sabrina Dubroca wrote:
> > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
> > > +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
> > > +	__must_hold(&peer->ovpn->peers->lock)
> > 
> > Changes to peer->vpn_addrs are not protected by peers->lock, so those
> > could be getting updated while we're rehashing (and taking peer->lock
> > in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent
> > that).
> > 
> 
> /me screams :-D

Sorry :)

> Indeed peers->lock is only about protecting the lists, not the content of
> the listed objects.
> 
> How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?

It seems like it would work. Maybe a bit weird to have conditional
locking (MP mode only), but ok. You already have this lock ordering
(hold peers->lock before taking peer->lock) in
ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing
the same thing in the netlink code.

Then I would also do that in ovpn_peer_float to protect that rehash.

It feels like peers->lock is turning into a duplicate of
ovpn->lock. ovpn->lock used for P2P mode, peers->lock used
equivalently for MP mode. You might consider merging them (but I
wouldn't see it as necessary for merging the series unless there's a
locking issue with the current proposal).

Sabrina Dubroca Nov. 13, 2024, 11:25 a.m. UTC | #34

2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote:
> On 12/11/2024 11:56, Sabrina Dubroca wrote:
> > 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
> > > diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
> > > index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644
> > > --- a/drivers/net/ovpn/io.c
> > > +++ b/drivers/net/ovpn/io.c
> > > @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
> > >   	/* keep track of last received authenticated packet for keepalive */
> > >   	peer->last_recv = ktime_get_real_seconds();
> > > +	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
> > 
> > What prevents peer->sock from being replaced and released
> > concurrently?
> 
> Technically nothing.
> Userspace currently does not even support updating a peer socket at runtime,
> but I wanted ovpn to be flexible enough from the beginning.

Is there a reason to do that? With TCP the peer would have to
reconnect, and I guess fully restart the whole process (become a new
peer with a new ID etc). With UDP, do you need to replace the socket?

> One approach might be to go back to peer->sock being unmutable and forget
> about this.
> 
> OTOH, if we want to keep this flexibility (which I think is nice), I think I
> should make peer->sock an RCU pointer and access it accordingly.

You already use kfree_rcu for ovpn_socket, so the only difference
would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in
a few places)

Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks
painful... (another place that I missed where things could go bad if
the socket was updated in the current implementation, btw)

Maybe save that for later since you don't have a use case for it yet?

Sabrina Dubroca Nov. 13, 2024, 4:56 p.m. UTC | #35

2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
> On 04/11/2024 16:14, Sabrina Dubroca wrote:
> > 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
> > > +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
> > > +				 struct genl_info *info,
> > > +				 struct nlattr **attrs)
> > > +{
> > > +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
> > > +			      OVPN_A_PEER_ID))
> > > +		return -EINVAL;
> > > +
> > > +	if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
> > > +		NL_SET_ERR_MSG_MOD(info->extack,
> > > +				   "cannot specify both remote IPv4 or IPv6 address");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
> > > +	    !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
> > > +		NL_SET_ERR_MSG_MOD(info->extack,
> > > +				   "cannot specify remote port without IP address");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
> > > +	    attrs[OVPN_A_PEER_LOCAL_IPV4]) {
> > > +		NL_SET_ERR_MSG_MOD(info->extack,
> > > +				   "cannot specify local IPv4 address without remote");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
> > > +	    attrs[OVPN_A_PEER_LOCAL_IPV6]) {
> > 
> > I think these consistency checks should account for v4mapped
> > addresses. With remote=v4mapped and local=v6 we'll end up with an
> > incorrect ipv4 "local" address (taken out of the ipv6 address's first
> > 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped,
> > we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to
> > ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address)
> > out of that.
> 
> Right, a v4mapped address would fool this check.
> How about checking if both or none addresses are v4mapped? This way we
> should prevent such cases.

I don't know when userspace would use v4mapped addresses, but treating
a v4mapped address as a "proper" ipv4 address should work with the
rest of the code, since you already have the conversion in
ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you
could do something like (rough idea and completely untested):

    static int get_family(attr_v4, attr_v6)
    {
       if (attr_v4)
           return AF_INET;
       if (attr_v6) {
           if (ipv6_addr_v4mapped(attr_v6)
               return AF_INET;
           return AF_INET6;
       }
       return AF_UNSPEC;
    }


    // in _precheck:
    // keep the   attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]  check
    // maybe add a similar one for   LOCAL_IPV4 && LOCAL_IPV6

    remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]);
    local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]);
    if (remote_family != local_family) {
        extack "incompatible address families";
        return -EINVAL;
    }

That would mirror the conversion that
ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.

> > >   int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> > >   {
> > [...]
> > > +	} else {
> > > +		rcu_read_lock();
> > > +		hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
> > > +				  hash_entry_id) {
> > > +			/* skip already dumped peers that were dumped by
> > > +			 * previous invocations
> > > +			 */
> > > +			if (last_idx > 0) {
> > > +				last_idx--;
> > > +				continue;
> > > +			}
> > 
> > If a peer that was dumped during a previous invocation is removed in
> > between, we'll miss one that's still present in the overall dump. I
> > don't know how much it matters (I guses it depends on how the results
> > of this dump are used by userspace), so I'll let you decide if this
> > needs to be fixed immediately or if it can be ignored for now.
> 
> True, this is a risk I assumed.
> Not extremely important if you ask me, but do you have any suggestion how to
> avoid this in an elegant and lockless way?

No, inconsistent dumps are an old problem with netlink, so I'm just
mentioning it as something to be aware of. You can add
genl_dump_check_consistent to let userspace know that it may have
gotten incorrect information (you'll need to keep a counter and
increment it when a peer is added/removed). On a very busy server you
may never manage to get a consistent dump, if peers are going up and
down very fast.

There's been some progress for dumping netdevices in commit
759ab1edb56c ("net: store netdevs in an xarray"), but that can still
return incorrect data.

Antonio Quartulli Nov. 14, 2024, 8:07 a.m. UTC | #36

On 12/11/2024 17:47, Sabrina Dubroca wrote:
> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>> +/* When the OpenVPN protocol is ran in AEAD mode, use
>>> + * the OpenVPN packet ID as the AEAD nonce:
>>> + *
>>> + *    00000005 521c3b01 4308c041
>>> + *    [seq # ] [  nonce_tail   ]
>>> + *    [     12-byte full IV    ] -> NONCE_SIZE
>>> + *    [4-bytes                   -> NONCE_WIRE_SIZE
>>> + *    on wire]
>>> + */
>>
>> Nice diagram! Can we go futher and define the OpenVPN packet header as a
>> stucture? Referencing the structure instead of using magic sizes and offsets
>> can greatly improve the code readability. Especially when it comes to header
>> construction/parsing in the encryption/decryption code.
>>
>> E.g. define a structures like this:
>>
>> struct ovpn_pkt_hdr {
>>    __be32 op;
>>    __be32 pktid;
>>    u8 auth[];
>> } __attribute__((packed));
>>
>> struct ovpn_aead_iv {
>>    __be32 pktid;
>>    u8 nonce[OVPN_NONCE_TAIL_SIZE];
>> } __attribute__((packed));
> 
> __attribute__((packed)) should not be needed here as the fields in
> both structs look properly aligned, and IIRC using packed can cause
> the compiler to generate worse code.

Agreed. Using packed will make certain architecture read every field 
byte by byte (I remember David M. biting us on this in batman-adv :))

This said, I like the idea of using a struct, but I don't feel confident 
enough to change the code now that we are hitting v12.
This kind of change will be better implemented later and tested 
carefully. (and patches are always welcome! :))

> 
> 
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 8516c1ccd57a7c7634a538fe3ac16c858f647420..84d294aab20b79b8e9cb9b736a074105c99338f3 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -1975,4 +1975,19 @@ enum {
>>>    #define IFLA_DSA_MAX	(__IFLA_DSA_MAX - 1)
>>> +/* OVPN section */
>>> +
>>> +enum ovpn_mode {
>>> +	OVPN_MODE_P2P,
>>> +	OVPN_MODE_MP,
>>> +};
>>
>> Mode min/max values can be defined here and the netlink policy can reference
>> these values:
>>
>> enum ovpn_mode {
>>    OVPN_MODE_P2P,
>>    OVPN_MODE_MP,
>>    __OVPN_MODE_MAX
>> };
>>
>> #define OVPN_MODE_MIN OVPN_MODE_P2P
>> #define OVPN_MODE_MAX (__OVPN_MODE_MAX - 1)
>>
>> ... = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_MIN, OVPN_MODE_MAX)
> 
> I don't think there's much benefit to that, other than making the diff
> smaller on a (very unlikely) patch that would add a new mode in the
> future. It even looks more inconvenient to me when reading the code
> ("ok what are _MIN and _MAX?  the code is using _P2P and _MP, do they
> match?").

I agree with Sabrina here.
I also initially thought about having MIN/MAX, but it wouldn't make 
things simpler for the ovpn_mode.

Regards,

Antonio Quartulli Nov. 14, 2024, 8:26 a.m. UTC | #37

On 13/11/2024 12:25, Sabrina Dubroca wrote:
> 2024-11-12, 15:03:00 +0100, Antonio Quartulli wrote:
>> On 12/11/2024 11:56, Sabrina Dubroca wrote:
>>> 2024-10-29, 11:47:30 +0100, Antonio Quartulli wrote:
>>>> diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
>>>> index 63c140138bf98e5d1df79a2565b666d86513323d..0e8a6f2c76bc7b2ccc287ad1187cf50f033bf261 100644
>>>> --- a/drivers/net/ovpn/io.c
>>>> +++ b/drivers/net/ovpn/io.c
>>>> @@ -135,6 +135,15 @@ void ovpn_decrypt_post(void *data, int ret)
>>>>    	/* keep track of last received authenticated packet for keepalive */
>>>>    	peer->last_recv = ktime_get_real_seconds();
>>>> +	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
>>>
>>> What prevents peer->sock from being replaced and released
>>> concurrently?
>>
>> Technically nothing.
>> Userspace currently does not even support updating a peer socket at runtime,
>> but I wanted ovpn to be flexible enough from the beginning.
> 
> Is there a reason to do that? With TCP the peer would have to
> reconnect, and I guess fully restart the whole process (become a new
> peer with a new ID etc). With UDP, do you need to replace the socket?

At the moment userspace won't try to do that, but I can foresee some 
future use cases: i.e. a peer that switches to a different interface and 
needs to open a new socket to keep sending data.

Moreover, in userspace we're currently working on multisocket support 
(theoretically server side only), therefore I can imagine a peer 
floating from one socket to the other while keeping the session alive.

This is all work in progress, but not that far in the future.

For TCP, you're right, although at some point we may even implement 
transport reconnections without losing the VPN state (this is not even 
planned, just a brain dump).

> 
>> One approach might be to go back to peer->sock being unmutable and forget
>> about this.
>>
>> OTOH, if we want to keep this flexibility (which I think is nice), I think I
>> should make peer->sock an RCU pointer and access it accordingly.
> 
> You already use kfree_rcu for ovpn_socket, so the only difference
> would be the __rcu annotation and helpers? (+ rcu_read_lock/unlock in
> a few places)
> 
> Adding rcu_read_lock for peer->sock in ovpn_tcp_tx_work looks
> painful... (another place that I missed where things could go bad if
> the socket was updated in the current implementation, btw)
> 
> Maybe save that for later since you don't have a use case for it yet?

I agree with you. I'll make the socket unmutable again and I'll work on 
this later on.

Thanks a lot for digging with me into this.

Regards,

Antonio Quartulli Nov. 14, 2024, 9:21 a.m. UTC | #38

On 13/11/2024 17:56, Sabrina Dubroca wrote:
> 2024-11-12, 15:19:50 +0100, Antonio Quartulli wrote:
>> On 04/11/2024 16:14, Sabrina Dubroca wrote:
>>> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
>>>> +static int ovpn_nl_peer_precheck(struct ovpn_struct *ovpn,
>>>> +				 struct genl_info *info,
>>>> +				 struct nlattr **attrs)
>>>> +{
>>>> +	if (NL_REQ_ATTR_CHECK(info->extack, info->attrs[OVPN_A_PEER], attrs,
>>>> +			      OVPN_A_PEER_ID))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]) {
>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>> +				   "cannot specify both remote IPv4 or IPv6 address");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
>>>> +	    !attrs[OVPN_A_PEER_REMOTE_IPV6] && attrs[OVPN_A_PEER_REMOTE_PORT]) {
>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>> +				   "cannot specify remote port without IP address");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
>>>> +	    attrs[OVPN_A_PEER_LOCAL_IPV4]) {
>>>> +		NL_SET_ERR_MSG_MOD(info->extack,
>>>> +				   "cannot specify local IPv4 address without remote");
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
>>>> +	    attrs[OVPN_A_PEER_LOCAL_IPV6]) {
>>>
>>> I think these consistency checks should account for v4mapped
>>> addresses. With remote=v4mapped and local=v6 we'll end up with an
>>> incorrect ipv4 "local" address (taken out of the ipv6 address's first
>>> 4B by ovpn_peer_reset_sockaddr). With remote=ipv6 and local=v4mapped,
>>> we'll pass the last 4B of OVPN_A_PEER_LOCAL_IPV6 to
>>> ovpn_peer_reset_sockaddr and try to read 16B (the full ipv6 address)
>>> out of that.
>>
>> Right, a v4mapped address would fool this check.
>> How about checking if both or none addresses are v4mapped? This way we
>> should prevent such cases.
> 
> I don't know when userspace would use v4mapped addresses, 

It happens when listening on [::] with a v6 socket that has no 
"IPV6_V6ONLY" set to true (you can check ipv6(7) for more details).
This socket can receive IPv4 connections, which are implemented using 
v4mapped addresses. In this case both remote and local are going to be 
v4mapped.
However, the sanity check should make sure nobody can inject bogus 
combinations.

> but treating
> a v4mapped address as a "proper" ipv4 address should work with the
> rest of the code, since you already have the conversion in
> ovpn_nl_attr_local_ip and ovpn_nl_attr_sockaddr_remote. So maybe you
> could do something like (rough idea and completely untested):
> 
>      static int get_family(attr_v4, attr_v6)
>      {
>         if (attr_v4)
>             return AF_INET;
>         if (attr_v6) {
>             if (ipv6_addr_v4mapped(attr_v6)
>                 return AF_INET;
>             return AF_INET6;
>         }
>         return AF_UNSPEC;
>      }
> 
> 
>      // in _precheck:
>      // keep the   attrs[OVPN_A_PEER_REMOTE_IPV4] && attrs[OVPN_A_PEER_REMOTE_IPV6]  check
>      // maybe add a similar one for   LOCAL_IPV4 && LOCAL_IPV6

the latter is already covered by:

  192         if (!attrs[OVPN_A_PEER_REMOTE_IPV4] &&
  193             attrs[OVPN_A_PEER_LOCAL_IPV4]) {
  194                 NL_SET_ERR_MSG_MOD(info->extack,
  195                                    "cannot specify local IPv4 
address without remote");
  196                 return -EINVAL;
  197         }
  198
  199         if (!attrs[OVPN_A_PEER_REMOTE_IPV6] &&
  200             attrs[OVPN_A_PEER_LOCAL_IPV6]) {
  201                 NL_SET_ERR_MSG_MOD(info->extack,
  202                                    "cannot specify local IPV6 
address without remote");
  203                 return -EINVAL;
  204         }


> 
>      remote_family = get_family(attrs[OVPN_A_PEER_REMOTE_IPV4], attrs[OVPN_A_PEER_REMOTE_IPV6]);
>      local_family = get_family(attrs[OVPN_A_PEER_LOCAL_IPV4], attrs[OVPN_A_PEER_LOCAL_IPV6]);
>      if (remote_family != local_family) {
>          extack "incompatible address families";
>          return -EINVAL;
>      }
> 
> That would mirror the conversion that
> ovpn_nl_attr_local_ip/ovpn_nl_attr_sockaddr_remote do.

Yeah, pretty much what I was suggested, but in a more explicit manner.
I like it.

> 
>>>>    int ovpn_nl_peer_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
>>>>    {
>>> [...]
>>>> +	} else {
>>>> +		rcu_read_lock();
>>>> +		hash_for_each_rcu(ovpn->peers->by_id, bkt, peer,
>>>> +				  hash_entry_id) {
>>>> +			/* skip already dumped peers that were dumped by
>>>> +			 * previous invocations
>>>> +			 */
>>>> +			if (last_idx > 0) {
>>>> +				last_idx--;
>>>> +				continue;
>>>> +			}
>>>
>>> If a peer that was dumped during a previous invocation is removed in
>>> between, we'll miss one that's still present in the overall dump. I
>>> don't know how much it matters (I guses it depends on how the results
>>> of this dump are used by userspace), so I'll let you decide if this
>>> needs to be fixed immediately or if it can be ignored for now.
>>
>> True, this is a risk I assumed.
>> Not extremely important if you ask me, but do you have any suggestion how to
>> avoid this in an elegant and lockless way?
> 
> No, inconsistent dumps are an old problem with netlink, so I'm just
> mentioning it as something to be aware of. You can add
> genl_dump_check_consistent to let userspace know that it may have
> gotten incorrect information (you'll need to keep a counter and
> increment it when a peer is added/removed). On a very busy server you
> may never manage to get a consistent dump, if peers are going up and
> down very fast.
> 
> There's been some progress for dumping netdevices in commit
> 759ab1edb56c ("net: store netdevs in an xarray"), but that can still
> return incorrect data.

Got it. I'll keep it as it is for now, since this is not critical.

Thanks a lot.

Regards,

>

Antonio Quartulli Nov. 14, 2024, 10:32 a.m. UTC | #39

On 13/11/2024 12:05, Sabrina Dubroca wrote:
> 2024-11-12, 15:26:59 +0100, Antonio Quartulli wrote:
>> On 11/11/2024 16:41, Sabrina Dubroca wrote:
>>> 2024-10-29, 11:47:31 +0100, Antonio Quartulli wrote:
>>>> +void ovpn_peer_hash_vpn_ip(struct ovpn_peer *peer)
>>>> +	__must_hold(&peer->ovpn->peers->lock)
>>>
>>> Changes to peer->vpn_addrs are not protected by peers->lock, so those
>>> could be getting updated while we're rehashing (and taking peer->lock
>>> in ovpn_nl_peer_modify as I'm suggesting above also wouldn't prevent
>>> that).
>>>
>>
>> /me screams :-D
> 
> Sorry :)
> 
>> Indeed peers->lock is only about protecting the lists, not the content of
>> the listed objects.
>>
>> How about acquiring the peers->lock before calling ovpn_nl_peer_modify()?
> 
> It seems like it would work. Maybe a bit weird to have conditional
> locking (MP mode only), but ok. You already have this lock ordering
> (hold peers->lock before taking peer->lock) in
> ovpn_peer_keepalive_work_mp, so there should be no deadlock from doing
> the same thing in the netlink code.

Yeah.

> 
> Then I would also do that in ovpn_peer_float to protect that rehash.

I am not extremely comfortable with this, because it means acquiring 
peers->lock on every packet (right now we do so only on peer->lock) and 
it may defeat the advantage of the RCU locking on the hashtables.
Wouldn't you agree?

An alternative would be to hold peer->lock for the entire function, but 
this will lead to dead locks...no go either.

> 
> It feels like peers->lock is turning into a duplicate of
> ovpn->lock. ovpn->lock used for P2P mode, peers->lock used
> equivalently for MP mode. You might consider merging them (but I
> wouldn't see it as necessary for merging the series unless there's a
> locking issue with the current proposal).

I agree: ovpn->lock was introduced to protect ovpn's fields, but 
actually the only one e protect is peer.

They are truly the same and I could therefore get rid of 
ovpn->peers->lock and always use ovpn->lock.

Will see how invasive this is and decide whether to commit it to v12 or not.

Thanks!

Regards,

Antonio Quartulli Nov. 14, 2024, 2:55 p.m. UTC | #40

On 10/11/2024 20:52, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
> 
> [...]
> 
>> +static void ovpn_peer_release(struct ovpn_peer *peer)
>> +{
>> +    ovpn_bind_reset(peer, NULL);
>> +
> 
> nit: this empty line after ovpn_bind_reset() is removed in the 
> 'implement basic TX path (UDP)' patch. What tricks git and it produces a 
> sensless diff with 'ovpn_bind_reset(...)' line beeing removed and then 
> introduced again. If you do not like this empty line then remove it 
> here, please :)

Thanks! will make sure it won't be introduced at all.

Regards,

> 
>> +    dst_cache_destroy(&peer->dst_cache);
>> +    netdev_put(peer->ovpn->dev, &peer->ovpn->dev_tracker);
>> +    kfree_rcu(peer, rcu);
>> +}
> 
> -- 
> Sergey

Antonio Quartulli Nov. 14, 2024, 3:33 p.m. UTC | #41

On 06/11/2024 02:18, Sergey Ryazanov wrote:
> Hi Antonio,
> 
> On 29.10.2024 12:47, Antonio Quartulli wrote:
>> Notable changes from v10:
>> * extended commit message of 23/23 with brief description of the output
>> * Link to v10: https://lore.kernel.org/r/20241025-b4-ovpn-v10-0- 
>> b87530777be7@openvpn.net
>>
>> Please note that some patches were already reviewed by Andre Lunn,
>> Donald Hunter and Shuah Khan. They have retained the Reviewed-by tag
>> since no major code modification has happened since the review.
>>
>> The latest code can also be found at:
>>
>> https://github.com/OpenVPN/linux-kernel-ovpn
> 
> As I promised many months ago I am starting publishing some nit picks 
> regarding the series. 

Thanks and welcome back!

> The review was started when series was V3 
> "rebasing" the review to every next version to publish it at once. But I 
> lost this race to the new version releasing velocity :) So, I am going 
> to publish it patch-by-patch.
> 
> Anyway you and all participants have done a great progress toward making 
> accelerator part of the kernel. Most of considerable things already 
> resolved so do not wait me please to finish picking every nit.

I'll go through them all and judge what's meaningful to add to v12 and 
what can be postponed for later improvements.

> 
> Regarding "big" topics I have only two concerns: link creation using 
> RTNL and a switch statement usage. In the corresponding thread, I asked 
> Jiri to clarify that "should" regarding .newlink implementation. Hope he 
> will have a chance to find a time to reply.

True, but to be honest at this point I am fine with sticking to RTNL, 
also because we will soon introduce the ability to create 'persistent' 
ifaces, which a user should be able to create before starting openvpn.

Going through RTNL for this is the best choice IMHO, therefore we have 
an extra use case in favour of this approach (next to what Jiri already 
mentioned).

> 
> For the 'switch' statement, I see a repeating pattern of handling mode- 
> or family-specific cases like this:
> 
> int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
> {
>    switch (ovpn->mode) {
>    case OVPN_MODE_MP:
>      return ovpn_peer_add_mp(ovpn, peer);
>    case OVPN_MODE_P2P:
>      return ovpn_peer_add_p2p(ovpn, peer);
>    default:
>      return -EOPNOTSUPP;
>    }
> }
> 
> or
> 
> void ovpn_encrypt_post(void *data, int ret)
> {
>    ...
>    switch (peer->sock->sock->sk->sk_protocol) {
>    case IPPROTO_UDP:
>      ovpn_udp_send_skb(peer->ovpn, peer, skb);
>      break;
>    case IPPROTO_TCP:
>      ovpn_tcp_send_skb(peer, skb);
>      break;
>    default:
>      /* no transport configured yet */
>      goto err;
>    }
>    ...
> }
> 
> or
> 
> void ovpn_peer_keepalive_work(...)
> {
>    ...
>    switch (ovpn->mode) {
>    case OVPN_MODE_MP:
>      next_run = ovpn_peer_keepalive_work_mp(ovpn, now);
>      break;
>    case OVPN_MODE_P2P:
>      next_run = ovpn_peer_keepalive_work_p2p(ovpn, now);
>      break;
>    }
>    ...
> }
> 
> Did you consider to implement mode specific operations as a set of 
> operations like this:
> 
> ovpn_ops {
>    int (*peer_add)(struct ovpn_struct *ovpn, struct ovpn_peer *peer);
>    int (*peer_del)(struct ovpn_peer *peer, enum ovpn_del_peer_reason 
> reason);
>    void (*send_skb)(struct ovpn_peer *peer, struct sk_buff *skb);
>    time64_t (*keepalive_work)(...);
> };
> 
> Initialize them during the interface creation and invoke these 
> operations indirectly. E.g.
> 
> int ovpn_peer_add(struct ovpn_struct *ovpn, struct ovpn_peer *peer)
> {
>    return ovpn->ops->peer_add(ovpn, peer);
> }
> 
> void ovpn_encrypt_post(void *data, int ret)
> {
>    ...
>    ovpn->ops->send_skb(peer, skb);
>    ...
> }
> 
> void ovpn_peer_keepalive_work(...)
> {
>    ...
>    next_run = ovpn->ops->keepalive_work(ovpn, now);
>    ...
> }
> 
> Anyway the module has all these option values in advance during the 
> network interface creation phase and I believe replacing 'switch' 
> statements with indirect calls can make code easy to read.

I see this was already discussed with Sabrina under another patch and I 
have the same opinion.

To me the switch/case approach looks cleaner and I truly like it, 
especially when enums are involved.

ops/callbacks are fine when they can be redefined at runtime (i.e. a 
proto that can be registered by another module), but this is not the 
case here.
I also feel that with ops it's not easy to understand what call is truly 
being made by just looking at the caller context and reading can be harder.

So I truly prefer to stick to this schema.

Thanks a lot for sharing your point though.

Regards,

Sergey Ryazanov Nov. 14, 2024, 10:10 p.m. UTC | #42

On 14.11.2024 17:33, Antonio Quartulli wrote:
> On 06/11/2024 02:18, Sergey Ryazanov wrote:
>> Regarding "big" topics I have only two concerns: link creation using 
>> RTNL and a switch statement usage. In the corresponding thread, I 
>> asked Jiri to clarify that "should" regarding .newlink implementation. 
>> Hope he will have a chance to find a time to reply.
> 
> True, but to be honest at this point I am fine with sticking to RTNL, 
> also because we will soon introduce the ability to create 'persistent' 
> ifaces, which a user should be able to create before starting openvpn.

Could you share the use case for this functionality?

> Going through RTNL for this is the best choice IMHO, therefore we have 
> an extra use case in favour of this approach (next to what Jiri already 
> mentioned).

In absence of arguments it's hard to understand, what's the "best" 
meaning. So, I'm still not sure is it worth to split uAPI between two 
interfaces. Anyway, it's up to maintainers to decide is it mergeable in 
this form or not. I just shared some arguments for the full management 
interface in GENL.

--
Sergey

Sergey Ryazanov Nov. 14, 2024, 10:57 p.m. UTC | #43

On 14.11.2024 10:07, Antonio Quartulli wrote:
> On 12/11/2024 17:47, Sabrina Dubroca wrote:
>> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
>>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>>> +/* When the OpenVPN protocol is ran in AEAD mode, use
>>>> + * the OpenVPN packet ID as the AEAD nonce:
>>>> + *
>>>> + *    00000005 521c3b01 4308c041
>>>> + *    [seq # ] [  nonce_tail   ]
>>>> + *    [     12-byte full IV    ] -> NONCE_SIZE
>>>> + *    [4-bytes                   -> NONCE_WIRE_SIZE
>>>> + *    on wire]
>>>> + */
>>>
>>> Nice diagram! Can we go futher and define the OpenVPN packet header as a
>>> stucture? Referencing the structure instead of using magic sizes and 
>>> offsets
>>> can greatly improve the code readability. Especially when it comes to 
>>> header
>>> construction/parsing in the encryption/decryption code.
>>>
>>> E.g. define a structures like this:
>>>
>>> struct ovpn_pkt_hdr {
>>>    __be32 op;
>>>    __be32 pktid;
>>>    u8 auth[];
>>> } __attribute__((packed));
>>>
>>> struct ovpn_aead_iv {
>>>    __be32 pktid;
>>>    u8 nonce[OVPN_NONCE_TAIL_SIZE];
>>> } __attribute__((packed));
>>
>> __attribute__((packed)) should not be needed here as the fields in
>> both structs look properly aligned, and IIRC using packed can cause
>> the compiler to generate worse code.
> 
> Agreed. Using packed will make certain architecture read every field 
> byte by byte (I remember David M. biting us on this in batman-adv :))

Still curious to see an example of that strange architecture/compiler 
combination. Anyway, as Sabrina mentioned, the header is already pretty 
aligned. So it's up to you how to document the structure.

> This said, I like the idea of using a struct, but I don't feel confident 
> enough to change the code now that we are hitting v12.
> This kind of change will be better implemented later and tested 
> carefully. (and patches are always welcome! :))

The main reason behind the structure introduction is to improve the code 
readability. To reduce a shadow, where bugs can reside. I wonder how 
many people have invested their time to dig through the encryption 
preparation function?

As for risk of breaking something I should say that it can be addressed 
by connecting the kernel implementation to pure usespace implementation, 
which can be assumed the reference. And, I believe, it worth the benefit 
of merging easy to understand code.

--
Sergey

Antonio Quartulli Nov. 15, 2024, 1 p.m. UTC | #44

On 09/11/2024 02:01, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
>> Add basic infrastructure for handling ovpn interfaces.
>>
>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
>> ---
>>   drivers/net/ovpn/main.c       | 115 ++++++++++++++++++++++++++++++++ 
>> ++++++++--
>>   drivers/net/ovpn/main.h       |   7 +++
>>   drivers/net/ovpn/ovpnstruct.h |   8 +++
>>   drivers/net/ovpn/packet.h     |  40 +++++++++++++++
>>   include/uapi/linux/if_link.h  |  15 ++++++
>>   5 files changed, 180 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>> index 
>> d5bdb0055f4dd3a6e32dc6e792bed1e7fd59e101..eead7677b8239eb3c48bb26ca95492d88512b8d4 100644
>> --- a/drivers/net/ovpn/main.c
>> +++ b/drivers/net/ovpn/main.c
>> @@ -10,18 +10,52 @@
>>   #include <linux/genetlink.h>
>>   #include <linux/module.h>
>>   #include <linux/netdevice.h>
>> +#include <linux/inetdevice.h>
>> +#include <net/ip.h>
>>   #include <net/rtnetlink.h>
>> -#include <uapi/linux/ovpn.h>
>> +#include <uapi/linux/if_arp.h>
>>   #include "ovpnstruct.h"
>>   #include "main.h"
>>   #include "netlink.h"
>>   #include "io.h"
>> +#include "packet.h"
>>   /* Driver info */
>>   #define DRV_DESCRIPTION    "OpenVPN data channel offload (ovpn)"
>>   #define DRV_COPYRIGHT    "(C) 2020-2024 OpenVPN, Inc."
>> +static void ovpn_struct_free(struct net_device *net)
>> +{
>> +}
> 
> nit: since this handler is not mandatory, its introduction can be moved 
> to the later patch, which actually fills it with meaningful operations.


ehmm sure I will move it



> 
>> +static int ovpn_net_open(struct net_device *dev)
>> +{
>> +    netif_tx_start_all_queues(dev);
>> +    return 0;
>> +}
>> +
>> +static int ovpn_net_stop(struct net_device *dev)
>> +{
>> +    netif_tx_stop_all_queues(dev);
>> +    return 0;
>> +}
>> +
>> +static const struct net_device_ops ovpn_netdev_ops = {
>> +    .ndo_open        = ovpn_net_open,
>> +    .ndo_stop        = ovpn_net_stop,
>> +    .ndo_start_xmit        = ovpn_net_xmit,
>> +};
>> +
>> +static const struct device_type ovpn_type = {
>> +    .name = OVPN_FAMILY_NAME,
> 
> nit: same question here regarding name deriviation. Are you sure that 
> the device type name is the same as the GENL family name?

Like I said in the previous patch, I want all representative strings to 
be "ovpn", that is already the netlink family name.
But I can create another constant to document this explicitly.


> 
>> +};
>> +
>> +static const struct nla_policy ovpn_policy[IFLA_OVPN_MAX + 1] = {
>> +    [IFLA_OVPN_MODE] = NLA_POLICY_RANGE(NLA_U8, OVPN_MODE_P2P,
>> +                        OVPN_MODE_MP),
>> +};
>> +
>>   /**
>>    * ovpn_dev_is_valid - check if the netdevice is of type 'ovpn'
>>    * @dev: the interface to check
>> @@ -33,16 +67,76 @@ bool ovpn_dev_is_valid(const struct net_device *dev)
>>       return dev->netdev_ops->ndo_start_xmit == ovpn_net_xmit;
>>   }
>> +static void ovpn_setup(struct net_device *dev)
>> +{
>> +    /* compute the overhead considering AEAD encryption */
>> +    const int overhead = sizeof(u32) + NONCE_WIRE_SIZE + 16 +
> 
> Where are these magic sizeof(u32) and '16' came from?

It's in the "nice diagram" you commented later in this patch :-)
But I can extend the comment.

[...]


>> @@ -51,26 +145,37 @@ static int ovpn_netdev_notifier_call(struct 
>> notifier_block *nb,
>>                        unsigned long state, void *ptr)
>>   {
>>       struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +    struct ovpn_struct *ovpn;
>>       if (!ovpn_dev_is_valid(dev))
>>           return NOTIFY_DONE;
>> +    ovpn = netdev_priv(dev);
> 
> nit: netdev_priv() returns only a pointer, it is safe to fetch the 
> pointer in advance, but do not dereference it until we are sure that an 
> event references the desired interface type. Takin this into 
> consideration, the assignment of private data pointer can be moved above 
> to the variable declaration. Just to make code couple of lines shorter.

I do it here because it seems more "logically correct" to retrieve the 
priv pointer after having confirmed that this is a ovpn interface with 
ovpn_dev_is_valid().

Moving it above kinda says "I already know there is a ovpn object here", 
but this is not the case until after the valid() check. So I prefer to 
keep it here.

[...]

>> --- a/drivers/net/ovpn/main.h
>> +++ b/drivers/net/ovpn/main.h
>> @@ -12,4 +12,11 @@
>>   bool ovpn_dev_is_valid(const struct net_device *dev);
>> +#define SKB_HEADER_LEN                                       \
>> +    (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \
>> +     sizeof(struct udphdr) + NET_SKB_PAD)
>> +
>> +#define OVPN_HEAD_ROOM ALIGN(16 + SKB_HEADER_LEN, 4)
> 
> Where is this magic '16' came from?

should be the same 16 af the over head above (it's the auth tag len)
Will make this more explicit with a comment.

> 
>> +#define OVPN_MAX_PADDING 16
>> +
>>   #endif /* _NET_OVPN_MAIN_H_ */
>> diff --git a/drivers/net/ovpn/ovpnstruct.h b/drivers/net/ovpn/ 
>> ovpnstruct.h
>> index 
>> e3e4df6418b081436378fc51d98db5bd7b5d1fbe..211df871538d34fdff90d182f21a0b0fb11b28ad 100644
>> --- a/drivers/net/ovpn/ovpnstruct.h
>> +++ b/drivers/net/ovpn/ovpnstruct.h
>> @@ -11,15 +11,23 @@
>>   #define _NET_OVPN_OVPNSTRUCT_H_
>>   #include <net/net_trackers.h>
>> +#include <uapi/linux/if_link.h>
>> +#include <uapi/linux/ovpn.h>
>>   /**
>>    * struct ovpn_struct - per ovpn interface state
>>    * @dev: the actual netdev representing the tunnel
>>    * @dev_tracker: reference tracker for associated dev
>> + * @registered: whether dev is still registered with netdev or not
>> + * @mode: device operation mode (i.e. p2p, mp, ..)
>> + * @dev_list: entry for the module wide device list
>>    */
>>   struct ovpn_struct {
>>       struct net_device *dev;
>>       netdevice_tracker dev_tracker;
>> +    bool registered;
>> +    enum ovpn_mode mode;
>> +    struct list_head dev_list;
> 
> dev_list is no more used and should be deleted.

ACK

[...]

>> +
>> +/* OpenVPN nonce size */
>> +#define NONCE_SIZE 12
> 
> nit: is using the common 'OVPN_' prefix here and for other constants any 
> good idea? E.g. OVPN_NONCE_SIZE. It can give some hints where it comes 
> from for a code reader.

ACK

> 
> And another one question. Could you clarify in the comment to this 
> constant where it came from? AFAIU, these 12 bytes is the expectation of 
> the nonce size of AEAD crypto protocol, rigth?

Correct: 12bytes/96bits. Will extend the comment.

> 
>> +
>> +/* OpenVPN nonce size reduced by 8-byte nonce tail -- this is the
>> + * size of the AEAD Associated Data (AD) sent over the wire
>> + * and is normally the head of the IV
>> + */
>> +#define NONCE_WIRE_SIZE (NONCE_SIZE - sizeof(struct ovpn_nonce_tail))
> 
> If the headers and IV are defined as structures, we no more need this 
> constant since the header construction will be done by a compiler 
> according to the structure layout.

yap yap. Will do this later as explained in the other email.

> 
>> +/* Last 8 bytes of AEAD nonce
>> + * Provided by userspace and usually derived from
>> + * key material generated during TLS handshake
>> + */
>> +struct ovpn_nonce_tail {
>> +    u8 u8[OVPN_NONCE_TAIL_SIZE];
>> +};
> 
> Why do you need a dadicated structure for this array? Can we declare the 
> corresponding fields like this:
> 
> u8 nonce_tail_xmit[OVPN_NONCE_TAIL_SIZE];
> u8 nonce_tail_recv[OVPN_NONCE_TAIL_SIZE];
> 

I think the original reason was to have something to pass to sizeof() 
without making it harder for the reader.

At some point I also wanted to get rid of the struct,but something 
stopped me. Not sure what was though. Will give it a try.


Thanks a lot.
Regards,

Antonio Quartulli Nov. 15, 2024, 1:45 p.m. UTC | #45

On 14/11/2024 23:57, Sergey Ryazanov wrote:
> On 14.11.2024 10:07, Antonio Quartulli wrote:
>> On 12/11/2024 17:47, Sabrina Dubroca wrote:
>>> 2024-11-09, 03:01:21 +0200, Sergey Ryazanov wrote:
>>>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>>>> +/* When the OpenVPN protocol is ran in AEAD mode, use
>>>>> + * the OpenVPN packet ID as the AEAD nonce:
>>>>> + *
>>>>> + *    00000005 521c3b01 4308c041
>>>>> + *    [seq # ] [  nonce_tail   ]
>>>>> + *    [     12-byte full IV    ] -> NONCE_SIZE
>>>>> + *    [4-bytes                   -> NONCE_WIRE_SIZE
>>>>> + *    on wire]
>>>>> + */
>>>>
>>>> Nice diagram! Can we go futher and define the OpenVPN packet header 
>>>> as a
>>>> stucture? Referencing the structure instead of using magic sizes and 
>>>> offsets
>>>> can greatly improve the code readability. Especially when it comes 
>>>> to header
>>>> construction/parsing in the encryption/decryption code.
>>>>
>>>> E.g. define a structures like this:
>>>>
>>>> struct ovpn_pkt_hdr {
>>>>    __be32 op;
>>>>    __be32 pktid;
>>>>    u8 auth[];
>>>> } __attribute__((packed));
>>>>
>>>> struct ovpn_aead_iv {
>>>>    __be32 pktid;
>>>>    u8 nonce[OVPN_NONCE_TAIL_SIZE];
>>>> } __attribute__((packed));
>>>
>>> __attribute__((packed)) should not be needed here as the fields in
>>> both structs look properly aligned, and IIRC using packed can cause
>>> the compiler to generate worse code.
>>
>> Agreed. Using packed will make certain architecture read every field 
>> byte by byte (I remember David M. biting us on this in batman-adv :))
> 
> Still curious to see an example of that strange architecture/compiler 
> combination. Anyway, as Sabrina mentioned, the header is already pretty 
> aligned. So it's up to you how to document the structure.

IIRC MIPS was one of those, but don't take my word for granted.

> 
>> This said, I like the idea of using a struct, but I don't feel 
>> confident enough to change the code now that we are hitting v12.
>> This kind of change will be better implemented later and tested 
>> carefully. (and patches are always welcome! :))
> 
> The main reason behind the structure introduction is to improve the code 
> readability. To reduce a shadow, where bugs can reside. I wonder how 
> many people have invested their time to dig through the encryption 
> preparation function?
> 
> As for risk of breaking something I should say that it can be addressed 
> by connecting the kernel implementation to pure usespace implementation, 
> which can be assumed the reference. And, I believe, it worth the benefit 
> of merging easy to understand code.
> 

I understand your point, but this is something I need to spend time on 
because the openvpn packet format is not "very stable", as in "it can 
vary depending on negotiated features".

When implementing ovpn I decided what was the supported set of features 
so to create a stable packet header, but this may change moving forward 
(there is already some work going on in userspace regarding new features 
that ovpn will have to support).
Therefore I want to take some time thinking about what's best.

Regards,

Antonio Quartulli Nov. 15, 2024, 2:03 p.m. UTC | #46

On 10/11/2024 21:42, Sergey Ryazanov wrote:
> Missed the most essential note regarding this patch :)
> 
> On 29.10.2024 12:47, Antonio Quartulli wrote:
>> +static int ovpn_net_open(struct net_device *dev)
>> +{
>> +    netif_tx_start_all_queues(dev);
>> +    return 0;
>> +}
>> +
>> +static int ovpn_net_stop(struct net_device *dev)
>> +{
>> +    netif_tx_stop_all_queues(dev);
> 
> Here we stop a user generated traffic in downlink. Shall we take care 
> about other kinds of traffic: keepalive, uplink?

Keepalive is "metadata" and should continue to flow, regardless of 
whether the user interface is brought down.

Uplink traffic directed to *this* device should just be dropped at 
delivery time.

Incoming traffic directed to other peers will continue to work.

> 
> I believe we should remove all the peers here or at least stop the 
> keepalive generation. But peers removing is better since 
> administratively down is administratively down, meaning user expected 
> full traffic stop in any direction. And even if we only stop the 
> keepalive generation then peer(s) anyway will destroy the tunnel on 
> their side.

Uhm, I don't think the user expects all "protocol" traffic (and client 
to client) to stop by simply bringing down the interface.

> 
> This way we even should not care about peers removing on the device 
> unregistering. What do you think?

I think you are now mixing data plane and control plane.

The fact that the user is stopping payload traffic does not imply we 
want to stop the VPN.
The user may just be doing something with the interface (and on an MP 
node client-to-client traffic will still continue to flow).

This would also be a non-negligible (and user faving) change in 
behaviour compared to the current openvpn implementation.

Thanks for your input though, I can imagine coming from different angles 
things may look not the same.

Regards,

> 
>> +    return 0;
>> +}

Antonio Quartulli Nov. 15, 2024, 2:13 p.m. UTC | #47

On 09/11/2024 02:11, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
>> An ovpn interface will keep carrier always on and let the user
>> decide when an interface should be considered disconnected.
>>
>> This way, even if an ovpn interface is not connected to any peer,
>> it can still retain all IPs and routes and thus prevent any data
>> leak.
>>
>> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
>> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
>> ---
>>   drivers/net/ovpn/main.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
>> index 
>> eead7677b8239eb3c48bb26ca95492d88512b8d4..eaa83a8662e4ac2c758201008268f9633643c0b6 100644
>> --- a/drivers/net/ovpn/main.c
>> +++ b/drivers/net/ovpn/main.c
>> @@ -31,6 +31,13 @@ static void ovpn_struct_free(struct net_device *net)
>>   static int ovpn_net_open(struct net_device *dev)
>>   {
>> +    /* ovpn keeps the carrier always on to avoid losing IP or route
>> +     * configuration upon disconnection. This way it can prevent leaks
>> +     * of traffic outside of the VPN tunnel.
>> +     * The user may override this behaviour by tearing down the 
>> interface
>> +     * manually.
>> +     */
>> +    netif_carrier_on(dev);
> 
> If a user cares about the traffic leaking, then he can create a 
> blackhole route with huge metric:
> 
> # ip route add blackhole default metric 10000
> 
> Why the network interface should implicitly provide this functionality? 
> And on another hand, how a routing daemon can learn a topology change 
> without indication from the interface?

This was discussed loooong ago with Andrew. Here my last response:

https://lore.kernel.org/all/d896bbd8-2709-4834-a637-f982fc51fc57@openvpn.net/


Regards,

> 
>>       netif_tx_start_all_queues(dev);
>>       return 0;
>>   }
>>
>

Antonio Quartulli Nov. 15, 2024, 3:02 p.m. UTC | #48

On 11/11/2024 02:54, Sergey Ryazanov wrote:
[...]

>> +/* Called after decrypt to write the IP packet to the device.
>> + * This method is expected to manage/free the skb.
>> + */
>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff 
>> *skb)
>> +{
>> +    unsigned int pkt_len;
>> +
>> +    /* we can't guarantee the packet wasn't corrupted before entering 
>> the
>> +     * VPN, therefore we give other layers a chance to check that
>> +     */
>> +    skb->ip_summed = CHECKSUM_NONE;
>> +
>> +    /* skb hash for transport packet no longer valid after 
>> decapsulation */
>> +    skb_clear_hash(skb);
>> +
>> +    /* post-decrypt scrub -- prepare to inject encapsulated packet 
>> onto the
>> +     * interface, based on __skb_tunnel_rx() in dst.h
>> +     */
>> +    skb->dev = peer->ovpn->dev;
>> +    skb_set_queue_mapping(skb, 0);
>> +    skb_scrub_packet(skb, true);
>> +
> 
> The skb->protocol field is going to be updated in the upcoming patch in 
> the caller (ovpn_decrypt_post). Shall we put a comment here clarifying, 
> why do not touch the protocol field here?

Well, I would personally not document missing details in a partly 
implemented code path.

> 
>> +    skb_reset_network_header(skb);
> 
> ovpn_decrypt_post() already reseted the network header. Why do we need 
> it here again?

yeah, I think this can be removed.

> 
>> +    skb_reset_transport_header(skb);
>> +    skb_probe_transport_header(skb);
>> +    skb_reset_inner_headers(skb);
>> +
>> +    memset(skb->cb, 0, sizeof(skb->cb));
> 
> Why do we need to zero the control buffer here?

To avoid the next layer to assume the cb is clean while it is not.
Other drivers do the same as well.

I think this was recommended by Sabrina as well.

> 
>> +    /* cause packet to be "received" by the interface */
>> +    pkt_len = skb->len;
>> +    if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
>> +                     skb) == NET_RX_SUCCESS))
>> +        /* update RX stats with the size of decrypted packet */
>> +        dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
>> +}
>> +
>> +static void ovpn_decrypt_post(struct sk_buff *skb, int ret)
>> +{
>> +    struct ovpn_peer *peer = ovpn_skb_cb(skb)->peer;
>> +
>> +    if (unlikely(ret < 0))
>> +        goto drop;
>> +
>> +    ovpn_netdev_write(peer, skb);
>> +    /* skb is passed to upper layer - don't free it */
>> +    skb = NULL;
>> +drop:
>> +    if (unlikely(skb))
>> +        dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
>> +    ovpn_peer_put(peer);
>> +    kfree_skb(skb);
>> +}
>> +
>> +/* pick next packet from RX queue, decrypt and forward it to the 
>> device */
> 
> The function now receives packets from externel callers. Should we 
> update the above comment?

yap will do.

[...]

>> --- /dev/null
>> +++ b/drivers/net/ovpn/proto.h
>> @@ -0,0 +1,75 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*  OpenVPN data channel offload
>> + *
>> + *  Copyright (C) 2020-2024 OpenVPN, Inc.
>> + *
>> + *  Author:    Antonio Quartulli <antonio@openvpn.net>
>> + *        James Yonan <james@openvpn.net>
>> + */
>> +
>> +#ifndef _NET_OVPN_OVPNPROTO_H_
>> +#define _NET_OVPN_OVPNPROTO_H_
>> +
>> +#include "main.h"
>> +
>> +#include <linux/skbuff.h>
>> +
>> +/* Methods for operating on the initial command
>> + * byte of the OpenVPN protocol.
>> + */
>> +
>> +/* packet opcode (high 5 bits) and key-id (low 3 bits) are combined in
>> + * one byte
>> + */
>> +#define OVPN_KEY_ID_MASK 0x07
>> +#define OVPN_OPCODE_SHIFT 3
>> +#define OVPN_OPCODE_MASK 0x1F
> 
> Instead of defining mask(s) and shift(s), we can define only masks and 
> use bitfield API (see below).
> 
>> +/* upper bounds on opcode and key ID */
>> +#define OVPN_KEY_ID_MAX (OVPN_KEY_ID_MASK + 1)
>> +#define OVPN_OPCODE_MAX (OVPN_OPCODE_MASK + 1)
>> +/* packet opcodes of interest to us */
>> +#define OVPN_DATA_V1 6 /* data channel V1 packet */
>> +#define OVPN_DATA_V2 9 /* data channel V2 packet */
>> +/* size of initial packet opcode */
>> +#define OVPN_OP_SIZE_V1 1
>> +#define OVPN_OP_SIZE_V2    4
>> +#define OVPN_PEER_ID_MASK 0x00FFFFFF
>> +#define OVPN_PEER_ID_UNDEF 0x00FFFFFF
>> +/* first byte of keepalive message */
>> +#define OVPN_KEEPALIVE_FIRST_BYTE 0x2a
>> +/* first byte of exit message */
>> +#define OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE 0x28
> 
>  From the above list of macros, OVPN_KEY_ID_MAX, OVPN_OPCODE_MAX, 
> OVPN_OP_SIZE_V1, OVPN_KEEPALIVE_FIRST_BYTE, and 
> OVPN_EXPLICIT_EXIT_NOTIFY_FIRST_BYTE are unused and looks like should be 
> removed.

ACK

> 
>> +/**
>> + * ovpn_opcode_from_skb - extract OP code from skb at specified offset
>> + * @skb: the packet to extract the OP code from
>> + * @offset: the offset in the data buffer where the OP code is located
>> + *
>> + * Note: this function assumes that the skb head was pulled enough
>> + * to access the first byte.
>> + *
>> + * Return: the OP code
>> + */
>> +static inline u8 ovpn_opcode_from_skb(const struct sk_buff *skb, u16 
>> offset)
>> +{
>> +    u8 byte = *(skb->data + offset);
>> +
>> +    return byte >> OVPN_OPCODE_SHIFT;
> 
> For example here, the shift can be replaced with bitfield macro:
> 
> #define OVPN_OPCODE_PKTTYPE_MSK  0xf8000000
> #define OVPN_OPCODE_KEYID_MSK    0x07000000
> #define OVPN_OPCODE_PEERID_MSK   0x00ffffff
> 
> static inline u8 ovpn_opcode_from_skb(...)
> {
>      u32 opcode = be32_to_cpu(*(__be32 *)(skb->data + offset));
> 
>      return FIELD_GET(OVPN_OPCODE_PKTTYPE_MSK, opcode);
> }
> 
> And the upcoming ovpn_opcode_compose() can be implemented like this:
> 
> static inline u32 ovpn_opcode_compose(u8 opcode, u8 key_id, u32 peer_id)
> {
>      return FIELD_PREP(OVPN_OPCODE_PKTTYPE_MSK, opcode) |
>             FIELD_PREP(OVPN_OPCODE_KEYID_MSK, key_id) |
>             FIELD_PREP(OVPN_OPCODE_PEERID_MSK, peer_id);
> }
> 
> And with this size can be even embedded into ovpn_aead_encrypt() to make 
> the header composing more clear.

I wasn't aware of the bitfield API.

Yeah, it looks cleaner and gives a better definition of the first 4 
bytes of the header.

There is also GENMASK() that helps with creating MASKs instead of 
hardcofing the bits in hex.

Will give it a try, thanks!

> 
>> +}
>> +
>> +/**
>> + * ovpn_peer_id_from_skb - extract peer ID from skb at specified offset
>> + * @skb: the packet to extract the OP code from
>> + * @offset: the offset in the data buffer where the OP code is located
>> + *
>> + * Note: this function assumes that the skb head was pulled enough
>> + * to access the first 4 bytes.
>> + *
>> + * Return: the peer ID.
>> + */
>> +static inline u32 ovpn_peer_id_from_skb(const struct sk_buff *skb, 
>> u16 offset)
>> +{
>> +    return ntohl(*(__be32 *)(skb->data + offset)) & OVPN_PEER_ID_MASK;
>> +}
>> +
>> +#endif /* _NET_OVPN_OVPNPROTO_H_ */
>> diff --git a/drivers/net/ovpn/socket.c b/drivers/net/ovpn/socket.c
>> index 
>> 090a3232ab0ec19702110f1a90f45c7f10889f6f..964b566de69f4132806a969a455cec7f6059a0bd 100644
>> --- a/drivers/net/ovpn/socket.c
>> +++ b/drivers/net/ovpn/socket.c
>> @@ -22,6 +22,9 @@ static void ovpn_socket_detach(struct socket *sock)
>>       if (!sock)
>>           return;
>> +    if (sock->sk->sk_protocol == IPPROTO_UDP)
>> +        ovpn_udp_socket_detach(sock);
>> +
>>       sockfd_put(sock);
>>   }
>> @@ -71,6 +74,27 @@ static int ovpn_socket_attach(struct socket *sock, 
>> struct ovpn_peer *peer)
>>       return ret;
>>   }
>> +/* Retrieve the corresponding ovpn object from a UDP socket
>> + * rcu_read_lock must be held on entry
>> + */
>> +struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
>> +{
>> +    struct ovpn_socket *ovpn_sock;
>> +
>> +    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) != 
>> UDP_ENCAP_OVPNINUDP))
>> +        return NULL;
>> +
>> +    ovpn_sock = rcu_dereference_sk_user_data(sk);
>> +    if (unlikely(!ovpn_sock))
>> +        return NULL;
>> +
>> +    /* make sure that sk matches our stored transport socket */
>> +    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
>> +        return NULL;
>> +
>> +    return ovpn_sock->ovpn;
> 
> Now, returning of this pointer is safe. But the following TCP transport 
> support calls the socket release via a scheduled work. What extends 
> socket lifetime and makes it possible to receive a UDP packet way after 
> the interface private data release. Is it correct assumption?

Sorry you lost me when sayng "following *TCP* transp[ort support calls".
This function is invoked only in UDP context.
Was that a typ0?

> 
> If the above is right then shall we set ->ovpn = NULL before scheduling 
> the socket releasing work or somehow else mark the socket as half- 
> destroyed?
> 
>> +}
>> +
>>   /**
>>    * ovpn_socket_new - create a new socket and initialize it
>>    * @sock: the kernel socket to embed
>> diff --git a/drivers/net/ovpn/udp.c b/drivers/net/ovpn/udp.c
>> index 
>> d26d7566e9c8dfe91fa77f49c34fb179a9fb2239..d1e88ae83843f02d591e67a7995f2d6868720695 100644
>> --- a/drivers/net/ovpn/udp.c
>> +++ b/drivers/net/ovpn/udp.c
>> @@ -21,9 +21,95 @@
>>   #include "bind.h"
>>   #include "io.h"
>>   #include "peer.h"
>> +#include "proto.h"
>>   #include "socket.h"
>>   #include "udp.h"
>> +/**
>> + * ovpn_udp_encap_recv - Start processing a received UDP packet.
>> + * @sk: socket over which the packet was received
>> + * @skb: the received packet
>> + *
>> + * If the first byte of the payload is DATA_V2, the packet is further 
>> processed,
>> + * otherwise it is forwarded to the UDP stack for delivery to user 
>> space.
>> + *
>> + * Return:
>> + *  0 if skb was consumed or dropped
>> + * >0 if skb should be passed up to userspace as UDP (packet not 
>> consumed)
>> + * <0 if skb should be resubmitted as proto -N (packet not consumed)
>> + */
>> +static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>> +{
>> +    struct ovpn_peer *peer = NULL;
>> +    struct ovpn_struct *ovpn;
>> +    u32 peer_id;
>> +    u8 opcode;
>> +
>> +    ovpn = ovpn_from_udp_sock(sk);
>> +    if (unlikely(!ovpn)) {
>> +        net_err_ratelimited("%s: cannot obtain ovpn object from UDP 
>> socket\n",
>> +                    __func__);
> 
> Probably we should zero ovpn pointer in the ovpn_sock to survive 
> scheduled socket release (see comment in ovpn_from_udp_sock). So, this 
> print should be removed to avoid printing misguiding errors.

I am also not following this. ovpn is already NULL if we are entering 
this branch, no?

And I think this condition is quite improbable as well.

> 
>> +        goto drop_noovpn;
>> +    }
>> +
>> +    /* Make sure the first 4 bytes of the skb data buffer after the UDP
>> +     * header are accessible.
>> +     * They are required to fetch the OP code, the key ID and the 
>> peer ID.
>> +     */
>> +    if (unlikely(!pskb_may_pull(skb, sizeof(struct udphdr) +
>> +                    OVPN_OP_SIZE_V2))) {
>> +        net_dbg_ratelimited("%s: packet too small\n", __func__);
>> +        goto drop;
>> +    }
>> +
>> +    opcode = ovpn_opcode_from_skb(skb, sizeof(struct udphdr));
>> +    if (unlikely(opcode != OVPN_DATA_V2)) {
>> +        /* DATA_V1 is not supported */
>> +        if (opcode == OVPN_DATA_V1)
>> +            goto drop;
> 
> This packet dropping makes protocol accelerator, intendent to speed up 
> the data packets processing, a protocol enforcement entity, isn't it? 
> Shall we follow the principle of beeing liberal in what we accept and 
> just forward everything besides data packets upstream to a userspace 
> application?

'ovpn' only supports DATA_V2. When ovpn is in use userspace does nto 
expect any DATA packet to bubble up as it would not know what to do with it.

So any decision regarding data packets should stay in 'ovpn'.

We just decided to support the modern DATA_V2 (DATA_V1 is seldomly used 
nowadays).

Moreover, it's quite impossible that a peer will send us DATA_V1 if it 
passed userspace handshake and negotiation.

> 
>> +
>> +        /* unknown or control packet: let it bubble up to userspace */
>> +        return 1;
>> +    }
>> +
>> +    peer_id = ovpn_peer_id_from_skb(skb, sizeof(struct udphdr));
>> +    /* some OpenVPN server implementations send data packets with the
>> +     * peer-id set to undef. In this case we skip the peer lookup by 
>> peer-id
>> +     * and we try with the transport address
>> +     */
>> +    if (peer_id != OVPN_PEER_ID_UNDEF) {
>> +        peer = ovpn_peer_get_by_id(ovpn, peer_id);
>> +        if (!peer) {
>> +            net_err_ratelimited("%s: received data from unknown peer 
>> (id: %d)\n",
>> +                        __func__, peer_id);
> 
> Why do we consider a peer sending us garbage our problem? Meaning, this 
> peer miss can be not our fault but a malformed packet from a 3rd party 
> side. E.g. nowdays I can see a lot of traces of these "active probers" 
> in my OpenVPN logs. Shall remove this message or at least make it debug 
> to avoid bothering users with garbage traveling Internet? Anyway we can 
> not do anything regarding incoming traffic.

It could also be a peer that believes to be connected while 'ovpn' 
dropped it earlier on. So this message would help the admin/user 
understanding what's going on. no?

Maybe make it an info/notice instead of error?

> 
>> +            goto drop;
>> +        }
>> +    }
>> +
>> +    if (!peer) {
> 
> AFAIU, this condition can true only in case of peer_id beeing equal to 
> OVPN_PEER_ID_UNDEF, right? In this case the condition check can be 
> replaced by simple 'else' statement.
> 

This part was actually rewritten already, so better wait for v12 before 
further discussing.

> And to make code more corresponding to the above comment regarding 
> implementations that send undefined peer-id, can we swap sides of the 
> lookup method selection? E.g.
> 
> /* Comment about fancy implementations sending undefined peer-id */
> if (peer_id == OVPN_PEER_ID_UNDEF) {
>    /* Do transport address based loockup */
> } else {
>    /* Do peer-id based loockup */
> }
> 
>> +        /* data packet with undef peer-id */
>> +        peer = ovpn_peer_get_by_transp_addr(ovpn, skb);
>> +        if (unlikely(!peer)) {
>> +            net_dbg_ratelimited("%s: received data with undef peer-id 
>> from unknown source\n",
>> +                        __func__);
>> +            goto drop;
>> +        }
>> +    }
>> +
>> +    /* pop off outer UDP header */
>> +    __skb_pull(skb, sizeof(struct udphdr));
>> +    ovpn_recv(peer, skb);
>> +    return 0;
>> +
>> +drop:
>> +    if (peer)
>> +        ovpn_peer_put(peer);
> 
> AFAIU, the peer is alway NULL here. Shall we remove the above check?

yeah simplified as well already.

Thanks!

Regards,

Antonio Quartulli Nov. 15, 2024, 3:05 p.m. UTC | #49

On 12/11/2024 01:16, Sergey Ryazanov wrote:
> On 29.10.2024 12:47, Antonio Quartulli wrote:
>> +static void ovpn_netdev_write(struct ovpn_peer *peer, struct sk_buff 
>> *skb)
>> +{
>> +    unsigned int pkt_len;
>> +
>> +    /* we can't guarantee the packet wasn't corrupted before entering 
>> the
>> +     * VPN, therefore we give other layers a chance to check that
>> +     */
>> +    skb->ip_summed = CHECKSUM_NONE;
>> +
>> +    /* skb hash for transport packet no longer valid after 
>> decapsulation */
>> +    skb_clear_hash(skb);
>> +
>> +    /* post-decrypt scrub -- prepare to inject encapsulated packet 
>> onto the
>> +     * interface, based on __skb_tunnel_rx() in dst.h
>> +     */
>> +    skb->dev = peer->ovpn->dev;
>> +    skb_set_queue_mapping(skb, 0);
>> +    skb_scrub_packet(skb, true);
>> +
>> +    skb_reset_network_header(skb);
>> +    skb_reset_transport_header(skb);
>> +    skb_probe_transport_header(skb);
>> +    skb_reset_inner_headers(skb);
>> +
>> +    memset(skb->cb, 0, sizeof(skb->cb));
>> +
>> +    /* cause packet to be "received" by the interface */
>> +    pkt_len = skb->len;
>> +    if (likely(gro_cells_receive(&peer->ovpn->gro_cells,
>> +                     skb) == NET_RX_SUCCESS))
> 
> nit: to improve readability, the packet delivery call can be composed 
> like this:
> 
>        pkt_len = skb->len;
>        res = gro_cells_receive(&peer->ovpn->gro_cells, skb);
>        if (likely(res == NET_RX_SUCCESS))
> 

hm, you don't like calls on two lines? :-)

ok, will change it.

Regards,

>> +        /* update RX stats with the size of decrypted packet */
>> +        dev_sw_netstats_rx_add(peer->ovpn->dev, pkt_len);
>> +}

Antonio Quartulli Nov. 15, 2024, 3:08 p.m. UTC | #50

On 14/11/2024 23:10, Sergey Ryazanov wrote:
> On 14.11.2024 17:33, Antonio Quartulli wrote:
>> On 06/11/2024 02:18, Sergey Ryazanov wrote:
>>> Regarding "big" topics I have only two concerns: link creation using 
>>> RTNL and a switch statement usage. In the corresponding thread, I 
>>> asked Jiri to clarify that "should" regarding .newlink 
>>> implementation. Hope he will have a chance to find a time to reply.
>>
>> True, but to be honest at this point I am fine with sticking to RTNL, 
>> also because we will soon introduce the ability to create 'persistent' 
>> ifaces, which a user should be able to create before starting openvpn.
> 
> Could you share the use case for this functionality?

This is better asked to the users mailing list, but, for example, we 
recently had a discussion about this with the VyOS guys, where they want 
to create the interface and have it fit the various 
firewall/routing/chachacha logic, before any daemon is started.

In OpenVPN userspace we already support the --mktun directive to help 
with this specific use case, so it's a long existing use case.

> 
>> Going through RTNL for this is the best choice IMHO, therefore we have 
>> an extra use case in favour of this approach (next to what Jiri 
>> already mentioned).
> 
> In absence of arguments it's hard to understand, what's the "best" 
> meaning. 

well, that's why I added "IMHO" :)

> So, I'm still not sure is it worth to split uAPI between two 
> interfaces. Anyway, it's up to maintainers to decide is it mergeable in 
> this form or not. I just shared some arguments for the full management 
> interface in GENL.

well, doing it differently from all other virtual drivers also requires 
some important reason IMHO.

Anyway, I like the idea that iproute2 can be used to create interfaces, 
without the need to have another userspace tool for that.

Regards,

Sergey Ryazanov Nov. 19, 2024, 3:08 a.m. UTC | #51

On 15.11.2024 16:03, Antonio Quartulli wrote:
> On 10/11/2024 21:42, Sergey Ryazanov wrote:
>> Missed the most essential note regarding this patch :)
>>
>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>> +static int ovpn_net_open(struct net_device *dev)
>>> +{
>>> +    netif_tx_start_all_queues(dev);
>>> +    return 0;
>>> +}
>>> +
>>> +static int ovpn_net_stop(struct net_device *dev)
>>> +{
>>> +    netif_tx_stop_all_queues(dev);
>>
>> Here we stop a user generated traffic in downlink. Shall we take care 
>> about other kinds of traffic: keepalive, uplink?
> 
> Keepalive is "metadata" and should continue to flow, regardless of 
> whether the user interface is brought down.
> 
> Uplink traffic directed to *this* device should just be dropped at 
> delivery time.
> 
> Incoming traffic directed to other peers will continue to work.

How it's possible? AFAIU, the module uses the kernel IP routing 
subsystem. Putting the interface down will effectively block a 
client-to-client packet to reenter the interface.

>> I believe we should remove all the peers here or at least stop the 
>> keepalive generation. But peers removing is better since 
>> administratively down is administratively down, meaning user expected 
>> full traffic stop in any direction. And even if we only stop the 
>> keepalive generation then peer(s) anyway will destroy the tunnel on 
>> their side.
> 
> Uhm, I don't think the user expects all "protocol" traffic (and client 
> to client) to stop by simply bringing down the interface.
> 
>>
>> This way we even should not care about peers removing on the device 
>> unregistering. What do you think?
> 
> I think you are now mixing data plane and control plane.
> 
> The fact that the user is stopping payload traffic does not imply we 
> want to stop the VPN.
> The user may just be doing something with the interface (and on an MP 
> node client-to-client traffic will still continue to flow).
> 
> This would also be a non-negligible (and user faving) change in 
> behaviour compared to the current openvpn implementation.

It's not about previous implementation, it's about the interface 
management procedures. I just cannot image how the proposed approach can 
be aligned with RFC 2863 section 3.1.13. IfAdminStatus and IfOperStatus.

And if we are talking about a user experience, I cannot imagine my WLAN 
interface maintaining a connection to the access point after shutting it 
down. Or even better, a WLAN interface in the AP mode still forwarding 
traffic between wireless clients. Or a bridge interface switching 
traffic between ports and sending STP frames.

> Thanks for your input though, I can imagine coming from different angles 
> things may look not the same.

I believe nobody will mind if a userspace service will do a failover to 
continue serving connected clients. But from the kernel perspective, 
when user says 'ip link set down' the party is over.

--
Sergey

Antonio Quartulli Nov. 19, 2024, 8:45 a.m. UTC | #52

On 19/11/2024 04:08, Sergey Ryazanov wrote:
> On 15.11.2024 16:03, Antonio Quartulli wrote:
>> On 10/11/2024 21:42, Sergey Ryazanov wrote:
>>> Missed the most essential note regarding this patch :)
>>>
>>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>>> +static int ovpn_net_open(struct net_device *dev)
>>>> +{
>>>> +    netif_tx_start_all_queues(dev);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int ovpn_net_stop(struct net_device *dev)
>>>> +{
>>>> +    netif_tx_stop_all_queues(dev);
>>>
>>> Here we stop a user generated traffic in downlink. Shall we take care 
>>> about other kinds of traffic: keepalive, uplink?
>>
>> Keepalive is "metadata" and should continue to flow, regardless of 
>> whether the user interface is brought down.
>>
>> Uplink traffic directed to *this* device should just be dropped at 
>> delivery time.
>>
>> Incoming traffic directed to other peers will continue to work.
> 
> How it's possible? AFAIU, the module uses the kernel IP routing 
> subsystem. Putting the interface down will effectively block a client- 
> to-client packet to reenter the interface.

True.
At least part of the traffic is stopped (traffic directed to the VPN IP 
of a peer will still flow as it does not require a routing table lookup).

I circled this discussion through the other devs to see what perspective 
they would bring and we also agree that if something is stopping, better 
stop the entire infra.

Also, if a user is fumbling with the link state, they are probably 
trying to bring the VPN down.

I will go that way and basically perform the same cleanup as if the 
interface is being deleted.

"the party is over"[cit.] :)

Regards,

Sergey Ryazanov Nov. 20, 2024, 11:22 p.m. UTC | #53

On 13.11.2024 12:03, Sabrina Dubroca wrote:
> 2024-11-13, 03:37:13 +0200, Sergey Ryazanov wrote:
>> On 12.11.2024 19:31, Sabrina Dubroca wrote:
>>> 2024-11-10, 15:38:27 +0200, Sergey Ryazanov wrote:
>>>> On 29.10.2024 12:47, Antonio Quartulli wrote:
>>>>> An ovpn_peer object holds the whole status of a remote peer
>>>>> (regardless whether it is a server or a client).
>>>>>
>>>>> This includes status for crypto, tx/rx buffers, napi, etc.
>>>>>
>>>>> Only support for one peer is introduced (P2P mode).
>>>>> Multi peer support is introduced with a later patch.
>>>>
>>>> Reviewing the peer creation/destroying code I came to a generic question.
>>>> Did you consider keeping a single P2P peer in the peers table as well?
>>>>
>>>> Looks like such approach can greatly simply the code by dropping all these
>>>> 'switch (ovpn->mode)' checks and implementing a unified peer management. The
>>>> 'peer' field in the main private data structure can be kept to accelerate
>>>> lookups, still using peers table for management tasks like removing all the
>>>> peers on the interface teardown.
>>>
>>> It would save a few 'switch(mode)', but force every client to allocate
>>> the hashtable for no reason at all. That tradeoff doesn't look very
>>> beneficial to me, the P2P-specific code is really simple. And if you
>>> keep ovpn->peer to make lookups faster, you're not removing that many
>>> 'switch(mode)'.
>>
>> Looking at the done review, I can retrospectively conclude that I personally
>> do not like short 'switch' statements and special handlers :)
>>
>> Seriously, this module has a highest density of switches per KLOC from what
>> I have seen before and a major part of it dedicated to handle the special
>> case of P2P connection.
> 
> I think it's fine. Either way there will be two implementations of
> whatever mode-dependent operation needs to be done. switch doesn't
> make it more complex than an ops structure.
> 
> If you're reading the current version and find ovpn_peer_add, you see
> directly that it'll do either ovpn_peer_add_mp or
> ovpn_peer_add_p2p. With an ops structure, you'd have a call to
> ovpn->ops->peer_add, and you'd have to look up all possible ops
> structures to know that it can be either ovpn_peer_add_mp or
> ovpn_peer_add_p2p. If there's an undefined number of implementations
> living in different modules (like net_device_ops, or L4 protocols),
> you don't have a choice.
> 
> xfrm went the opposite way to what you're proposing a few years ago
> (see commit 0c620e97b349 ("xfrm: remove output indirection from
> xfrm_mode") and others), and it made the code simpler.

I checked this. Florian did a nice rework. And the way of implementation 
looks reasonable since there are more than two encapsulation modes and 
handling is more complex than just selecting a function to call.

What I don't like about switches, that it requires extra lines of code 
and pushes an author to introduce a default case with error handling. It 
was mentioned that the module unlikely going to support more than two 
modes. In this context shall we consider ternary operator usage. E.g.:

next_run = ovpn->mode == OVPN_MODE_P2P ?
            ovpn_peer_keepalive_work_p2p(...) :
            ovpn_peer_keepalive_work_mp(...);

>> And back to the hashtable(s) size for the MP mode. 8k-bins table looks a
>> good choice for a normal server with 1-2Gb uplink serving up to 1k
>> connections. But it sill unclear, how this choice can affect installations
>> with a bigger number of connections? Or is this module applicable for
>> embedded solutions? E.g. running a couple of VPN servers on a home router
>> with a few actual connections looks like a waste of RAM. I was about to
>> suggest to use rhashtable due to its dynamic sizing feature, but the module
>> needs three tables. Any better idea?
> 
> For this initial implementation I think it's fine. Sure, converting to
> rhashtable (or some other type of dynamically-sized hashtable, if
> rhashtable doesn't fit) in the future would make sense. But I don't
> think it's necessary to get the patches into net-next.

Make sense. Thanks for sharing these thoughts.

--
Sergey

[net-next,v11,00/23] Introducing OpenVPN Data Channel Offload

Message

Comments