diff mbox series

[net,1/2] selftests: net: lib: fix broken ping with coreutils ping util

Message ID 20241130113314.6488-1-ansuelsmth@gmail.com
State New
Headers show
Series [net,1/2] selftests: net: lib: fix broken ping with coreutils ping util | expand

Commit Message

Christian Marangi Nov. 30, 2024, 11:33 a.m. UTC
If the coreutils variant of ping is used instead of the busybox one, the
ping_do() command is broken. This comes by the fact that for coreutils
ping, the ping IP needs to be the very last elements.

To handle this, reorder the ping args and make $dip last element.

The use of coreutils ping might be useful for case where busybox is not
compiled with float interval support and ping command doesn't support
0.1 interval. (in such case a dedicated ping utility is installed
instead)

Cc: stable@vger.kernel.org
Fixes: 73bae6736b6b ("selftests: forwarding: Add initial testing framework")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 tools/testing/selftests/net/forwarding/lib.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Vladimir Oltean Dec. 2, 2024, 11:52 a.m. UTC | #1
On Sat, Nov 30, 2024 at 12:33:10PM +0100, Christian Marangi wrote:
> It seems real hardware requires some time to stabilize and actually
> works after an 'ip link up'. This is not the case for veth as everything
> is simulated but this is a requirement for real hardware to permit
> receiving packet.
> 
> Without this the very fist test for unicast always fails on real
> hardware. With the introduced sleep of one second after mc_route_prepare,
> the test corretly pass as the packet can correctly be delivered.

I think the analysis is not very convincing for the following reason.

To wait after "ip link up", setup_wait() calls setup_wait_dev_with_timeout()
which waits until "ip link show dev $dev up" reports 'state UP'.
This comes from IFLA_OPERSTATE, set by linkwatch.

I remember having this conversation with Danielle Ratson a few years ago:
https://lore.kernel.org/netdev/20210624151515.794224-1-danieller@nvidia.com/
but the bottom line should be that, since commit facd15dfd691 ("net:
core: synchronize link-watch when carrier is queried") AFAIU, an operstate
of UP really means that the net device is ready of passing traffic. Failure
to do so should be a device-side problem.

Then I thought that maybe tcpdump needs some time to set up its filters
on the receving net device. But tcpdump_start() already has "sleep 1" in it.
I admit, that was purely empirical and there's no guarantee that tcpdump
has finished setting up even after 1 second. If you increase it to 2,
does it also solve your problem?

Or do you really have to place the sleep call after the mc_route_prepare() calls,
and any earlier won't help? In that case, it isolates the sleeping
requirement to the multicast routes themselves?
Christian Marangi Dec. 2, 2024, 8:39 p.m. UTC | #2
On Sat, Nov 30, 2024 at 05:48:40PM +0200, Vladimir Oltean wrote:
> On Sat, Nov 30, 2024 at 04:46:14PM +0100, Christian Marangi wrote:
> > On Sat, Nov 30, 2024 at 05:43:07PM +0200, Vladimir Oltean wrote:
> > > On Sat, Nov 30, 2024 at 12:33:09PM +0100, Christian Marangi wrote:
> > > > If the coreutils variant of ping is used instead of the busybox one, the
> > > > ping_do() command is broken. This comes by the fact that for coreutils
> > > > ping, the ping IP needs to be the very last elements.
> > > > 
> > > > To handle this, reorder the ping args and make $dip last element.
> > > > 
> > > > The use of coreutils ping might be useful for case where busybox is not
> > > > compiled with float interval support and ping command doesn't support
> > > > 0.1 interval. (in such case a dedicated ping utility is installed
> > > > instead)
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: 73bae6736b6b ("selftests: forwarding: Add initial testing framework")
> > > > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > > > ---
> > > 
> > > Do you mean the other way around? that the busybox ping is the broken one?
> > > And by coreutils ping, you actually mean iputils ping, right?
> > 
> > Mhh no busybox ping utility is problematic only if FLOAT INTERVAL is not
> > enabled (aka 0.1 interval are not supported)
> > 
> > Yes I'm referring to iputils ping. With that I notice args are wrongly
> > parsed... especially with the -c option.
> 
> But isn't iputils ping what everybody else uses? I'm confused. I have
> this version and the current syntax is not problematic for me.
> 
> $ ping -V
> ping from iputils 20240905
> libcap: yes, IDN: yes, NLS: no, error.h: yes, getrandom(): yes, __fpending(): yes

Mhh the problem seems to be -c 

Let me post some outputs...

root@OpenWrt:~# ping -V
ping from iputils 20240117
libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes
root@OpenWrt:~# ping -c 10 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms
^C
--- 192.168.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2080ms
rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms
root@OpenWrt:~# ping 192.168.1.1 -c 10
ping: -c: Name does not resolve

As you can see swapping the ip cause this "Name does not resolve" error.
Vladimir Oltean Dec. 2, 2024, 9:24 p.m. UTC | #3
On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote:
> Mhh the problem seems to be -c 
> 
> Let me post some outputs...
> 
> root@OpenWrt:~# ping -V
> ping from iputils 20240117
> libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes
> root@OpenWrt:~# ping -c 10 192.168.1.1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms
> 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms
> 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms
> ^C
> --- 192.168.1.1 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2080ms
> rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms
> root@OpenWrt:~# ping 192.168.1.1 -c 10
> ping: -c: Name does not resolve
> 
> As you can see swapping the ip cause this "Name does not resolve" error.

Ok, I opened the iputils source code and there isn't any relevant recent
change there. But it uses getopt(3), and that seems to be implemented
more simplistically for musl libc:
https://wiki.musl-libc.org/functional-differences-from-glibc.html
"musl and the POSIX standard getopt stop processing options at the first
non-option argument with no permutation."

On GNU libc:
$ ping 192.168.1.1 -c 1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms

--- 192.168.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms
Christian Marangi Dec. 2, 2024, 9:28 p.m. UTC | #4
On Mon, Dec 02, 2024 at 11:24:29PM +0200, Vladimir Oltean wrote:
> On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote:
> > Mhh the problem seems to be -c 
> > 
> > Let me post some outputs...
> > 
> > root@OpenWrt:~# ping -V
> > ping from iputils 20240117
> > libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes
> > root@OpenWrt:~# ping -c 10 192.168.1.1
> > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms
> > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms
> > 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms
> > ^C
> > --- 192.168.1.1 ping statistics ---
> > 3 packets transmitted, 3 received, 0% packet loss, time 2080ms
> > rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms
> > root@OpenWrt:~# ping 192.168.1.1 -c 10
> > ping: -c: Name does not resolve
> > 
> > As you can see swapping the ip cause this "Name does not resolve" error.
> 
> Ok, I opened the iputils source code and there isn't any relevant recent
> change there. But it uses getopt(3), and that seems to be implemented
> more simplistically for musl libc:
> https://wiki.musl-libc.org/functional-differences-from-glibc.html
> "musl and the POSIX standard getopt stop processing options at the first
> non-option argument with no permutation."
> 
> On GNU libc:
> $ ping 192.168.1.1 -c 1
> PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms
> 
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms

Well it's definitely that... As we use musl as glibc is BIIIG and won't
ever fit 4mb of flash ahahha

Also I just notice msend suffer the very same problem...

root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1
Now sending to multicast group: [ff2e::0102:0304]:4444
sendto: Address family not supported by protocol
root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304
Now sending to multicast group: [224.1.1.1]:4444
Sending msg 1, TTL 1, to [224.1.1.1]:4444:
Sending msg 2, TTL 1, to [224.1.1.1]:4444:
Christian Marangi Dec. 2, 2024, 9:29 p.m. UTC | #5
On Mon, Dec 02, 2024 at 10:28:16PM +0100, Christian Marangi wrote:
> On Mon, Dec 02, 2024 at 11:24:29PM +0200, Vladimir Oltean wrote:
> > On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote:
> > > Mhh the problem seems to be -c 
> > > 
> > > Let me post some outputs...
> > > 
> > > root@OpenWrt:~# ping -V
> > > ping from iputils 20240117
> > > libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes
> > > root@OpenWrt:~# ping -c 10 192.168.1.1
> > > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> > > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms
> > > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms
> > > 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms
> > > ^C
> > > --- 192.168.1.1 ping statistics ---
> > > 3 packets transmitted, 3 received, 0% packet loss, time 2080ms
> > > rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms
> > > root@OpenWrt:~# ping 192.168.1.1 -c 10
> > > ping: -c: Name does not resolve
> > > 
> > > As you can see swapping the ip cause this "Name does not resolve" error.
> > 
> > Ok, I opened the iputils source code and there isn't any relevant recent
> > change there. But it uses getopt(3), and that seems to be implemented
> > more simplistically for musl libc:
> > https://wiki.musl-libc.org/functional-differences-from-glibc.html
> > "musl and the POSIX standard getopt stop processing options at the first
> > non-option argument with no permutation."
> > 
> > On GNU libc:
> > $ ping 192.168.1.1 -c 1
> > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
> > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms
> > 
> > --- 192.168.1.1 ping statistics ---
> > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms
> 
> Well it's definitely that... As we use musl as glibc is BIIIG and won't
> ever fit 4mb of flash ahahha
> 
> Also I just notice msend suffer the very same problem...
> 
> root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1
> Now sending to multicast group: [ff2e::0102:0304]:4444
> sendto: Address family not supported by protocol
> root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304
> Now sending to multicast group: [224.1.1.1]:4444
> Sending msg 1, TTL 1, to [224.1.1.1]:4444:
> Sending msg 2, TTL 1, to [224.1.1.1]:4444:
>

Ignore the last part about msend... just me messing around...
Vladimir Oltean Dec. 3, 2024, 11:46 a.m. UTC | #6
On Mon, Dec 02, 2024 at 10:29:31PM +0100, Christian Marangi wrote:
> > Also I just notice msend suffer the very same problem...
> > 
> > root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1
> > Now sending to multicast group: [ff2e::0102:0304]:4444
> > sendto: Address family not supported by protocol
> > root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304
> > Now sending to multicast group: [224.1.1.1]:4444
> > Sending msg 1, TTL 1, to [224.1.1.1]:4444:
> > Sending msg 2, TTL 1, to [224.1.1.1]:4444:
> >
> 
> Ignore the last part about msend... just me messing around...

As discussed privately, "Address family not supported by protocol" is an
mtools bug introduced in v3.1. It has been fixed by Joachim today:
https://github.com/troglobit/mtools/issues/8#issuecomment-2514087499
Please pull and test with mtools tag v3.2.
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index c992e385159c..2060f95d5c62 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1473,8 +1473,8 @@  ping_do()
 
 	vrf_name=$(master_name_get $if_name)
 	ip vrf exec $vrf_name \
-		$PING $args $dip -c $PING_COUNT -i 0.1 \
-		-w $PING_TIMEOUT &> /dev/null
+		$PING $args -c $PING_COUNT -i 0.1 \
+		-w $PING_TIMEOUT $dip &> /dev/null
 }
 
 ping_test()
@@ -1504,8 +1504,8 @@  ping6_do()
 
 	vrf_name=$(master_name_get $if_name)
 	ip vrf exec $vrf_name \
-		$PING6 $args $dip -c $PING_COUNT -i 0.1 \
-		-w $PING_TIMEOUT &> /dev/null
+		$PING6 $args -c $PING_COUNT -i 0.1 \
+		-w $PING_TIMEOUT $dip &> /dev/null
 }
 
 ping6_test()