Message ID | 20241130113314.6488-1-ansuelsmth@gmail.com |
---|---|
State | New |
Headers | show |
Series | [net,1/2] selftests: net: lib: fix broken ping with coreutils ping util | expand |
On Sat, Nov 30, 2024 at 12:33:10PM +0100, Christian Marangi wrote: > It seems real hardware requires some time to stabilize and actually > works after an 'ip link up'. This is not the case for veth as everything > is simulated but this is a requirement for real hardware to permit > receiving packet. > > Without this the very fist test for unicast always fails on real > hardware. With the introduced sleep of one second after mc_route_prepare, > the test corretly pass as the packet can correctly be delivered. I think the analysis is not very convincing for the following reason. To wait after "ip link up", setup_wait() calls setup_wait_dev_with_timeout() which waits until "ip link show dev $dev up" reports 'state UP'. This comes from IFLA_OPERSTATE, set by linkwatch. I remember having this conversation with Danielle Ratson a few years ago: https://lore.kernel.org/netdev/20210624151515.794224-1-danieller@nvidia.com/ but the bottom line should be that, since commit facd15dfd691 ("net: core: synchronize link-watch when carrier is queried") AFAIU, an operstate of UP really means that the net device is ready of passing traffic. Failure to do so should be a device-side problem. Then I thought that maybe tcpdump needs some time to set up its filters on the receving net device. But tcpdump_start() already has "sleep 1" in it. I admit, that was purely empirical and there's no guarantee that tcpdump has finished setting up even after 1 second. If you increase it to 2, does it also solve your problem? Or do you really have to place the sleep call after the mc_route_prepare() calls, and any earlier won't help? In that case, it isolates the sleeping requirement to the multicast routes themselves?
On Sat, Nov 30, 2024 at 05:48:40PM +0200, Vladimir Oltean wrote: > On Sat, Nov 30, 2024 at 04:46:14PM +0100, Christian Marangi wrote: > > On Sat, Nov 30, 2024 at 05:43:07PM +0200, Vladimir Oltean wrote: > > > On Sat, Nov 30, 2024 at 12:33:09PM +0100, Christian Marangi wrote: > > > > If the coreutils variant of ping is used instead of the busybox one, the > > > > ping_do() command is broken. This comes by the fact that for coreutils > > > > ping, the ping IP needs to be the very last elements. > > > > > > > > To handle this, reorder the ping args and make $dip last element. > > > > > > > > The use of coreutils ping might be useful for case where busybox is not > > > > compiled with float interval support and ping command doesn't support > > > > 0.1 interval. (in such case a dedicated ping utility is installed > > > > instead) > > > > > > > > Cc: stable@vger.kernel.org > > > > Fixes: 73bae6736b6b ("selftests: forwarding: Add initial testing framework") > > > > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> > > > > --- > > > > > > Do you mean the other way around? that the busybox ping is the broken one? > > > And by coreutils ping, you actually mean iputils ping, right? > > > > Mhh no busybox ping utility is problematic only if FLOAT INTERVAL is not > > enabled (aka 0.1 interval are not supported) > > > > Yes I'm referring to iputils ping. With that I notice args are wrongly > > parsed... especially with the -c option. > > But isn't iputils ping what everybody else uses? I'm confused. I have > this version and the current syntax is not problematic for me. > > $ ping -V > ping from iputils 20240905 > libcap: yes, IDN: yes, NLS: no, error.h: yes, getrandom(): yes, __fpending(): yes Mhh the problem seems to be -c Let me post some outputs... root@OpenWrt:~# ping -V ping from iputils 20240117 libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes root@OpenWrt:~# ping -c 10 192.168.1.1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms ^C --- 192.168.1.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2080ms rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms root@OpenWrt:~# ping 192.168.1.1 -c 10 ping: -c: Name does not resolve As you can see swapping the ip cause this "Name does not resolve" error.
On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote: > Mhh the problem seems to be -c > > Let me post some outputs... > > root@OpenWrt:~# ping -V > ping from iputils 20240117 > libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes > root@OpenWrt:~# ping -c 10 192.168.1.1 > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms > 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms > ^C > --- 192.168.1.1 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2080ms > rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms > root@OpenWrt:~# ping 192.168.1.1 -c 10 > ping: -c: Name does not resolve > > As you can see swapping the ip cause this "Name does not resolve" error. Ok, I opened the iputils source code and there isn't any relevant recent change there. But it uses getopt(3), and that seems to be implemented more simplistically for musl libc: https://wiki.musl-libc.org/functional-differences-from-glibc.html "musl and the POSIX standard getopt stop processing options at the first non-option argument with no permutation." On GNU libc: $ ping 192.168.1.1 -c 1 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms --- 192.168.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms
On Mon, Dec 02, 2024 at 11:24:29PM +0200, Vladimir Oltean wrote: > On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote: > > Mhh the problem seems to be -c > > > > Let me post some outputs... > > > > root@OpenWrt:~# ping -V > > ping from iputils 20240117 > > libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes > > root@OpenWrt:~# ping -c 10 192.168.1.1 > > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms > > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms > > 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms > > ^C > > --- 192.168.1.1 ping statistics --- > > 3 packets transmitted, 3 received, 0% packet loss, time 2080ms > > rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms > > root@OpenWrt:~# ping 192.168.1.1 -c 10 > > ping: -c: Name does not resolve > > > > As you can see swapping the ip cause this "Name does not resolve" error. > > Ok, I opened the iputils source code and there isn't any relevant recent > change there. But it uses getopt(3), and that seems to be implemented > more simplistically for musl libc: > https://wiki.musl-libc.org/functional-differences-from-glibc.html > "musl and the POSIX standard getopt stop processing options at the first > non-option argument with no permutation." > > On GNU libc: > $ ping 192.168.1.1 -c 1 > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms > > --- 192.168.1.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms Well it's definitely that... As we use musl as glibc is BIIIG and won't ever fit 4mb of flash ahahha Also I just notice msend suffer the very same problem... root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1 Now sending to multicast group: [ff2e::0102:0304]:4444 sendto: Address family not supported by protocol root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304 Now sending to multicast group: [224.1.1.1]:4444 Sending msg 1, TTL 1, to [224.1.1.1]:4444: Sending msg 2, TTL 1, to [224.1.1.1]:4444:
On Mon, Dec 02, 2024 at 10:28:16PM +0100, Christian Marangi wrote: > On Mon, Dec 02, 2024 at 11:24:29PM +0200, Vladimir Oltean wrote: > > On Mon, Dec 02, 2024 at 09:39:15PM +0100, Christian Marangi wrote: > > > Mhh the problem seems to be -c > > > > > > Let me post some outputs... > > > > > > root@OpenWrt:~# ping -V > > > ping from iputils 20240117 > > > libcap: no, IDN: no, NLS: no, error.h: no, getrandom(): yes, __fpending(): yes > > > root@OpenWrt:~# ping -c 10 192.168.1.1 > > > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > > > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.102 ms > > > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.084 ms > > > 64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=0.236 ms > > > ^C > > > --- 192.168.1.1 ping statistics --- > > > 3 packets transmitted, 3 received, 0% packet loss, time 2080ms > > > rtt min/avg/max/mdev = 0.084/0.140/0.236/0.067 ms > > > root@OpenWrt:~# ping 192.168.1.1 -c 10 > > > ping: -c: Name does not resolve > > > > > > As you can see swapping the ip cause this "Name does not resolve" error. > > > > Ok, I opened the iputils source code and there isn't any relevant recent > > change there. But it uses getopt(3), and that seems to be implemented > > more simplistically for musl libc: > > https://wiki.musl-libc.org/functional-differences-from-glibc.html > > "musl and the POSIX standard getopt stop processing options at the first > > non-option argument with no permutation." > > > > On GNU libc: > > $ ping 192.168.1.1 -c 1 > > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. > > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.696 ms > > > > --- 192.168.1.1 ping statistics --- > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > rtt min/avg/max/mdev = 0.696/0.696/0.696/0.000 ms > > Well it's definitely that... As we use musl as glibc is BIIIG and won't > ever fit 4mb of flash ahahha > > Also I just notice msend suffer the very same problem... > > root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1 > Now sending to multicast group: [ff2e::0102:0304]:4444 > sendto: Address family not supported by protocol > root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304 > Now sending to multicast group: [224.1.1.1]:4444 > Sending msg 1, TTL 1, to [224.1.1.1]:4444: > Sending msg 2, TTL 1, to [224.1.1.1]:4444: > Ignore the last part about msend... just me messing around...
On Mon, Dec 02, 2024 at 10:29:31PM +0100, Christian Marangi wrote: > > Also I just notice msend suffer the very same problem... > > > > root@OpenWrt:~# ip vrf exec vlan1 msend -g ff2e::0102:0304 -I lan1 -c 1 > > Now sending to multicast group: [ff2e::0102:0304]:4444 > > sendto: Address family not supported by protocol > > root@OpenWrt:~# ip vrf exec vlan1 msend -I lan1 -c -g 1ff2e::0102:0304 > > Now sending to multicast group: [224.1.1.1]:4444 > > Sending msg 1, TTL 1, to [224.1.1.1]:4444: > > Sending msg 2, TTL 1, to [224.1.1.1]:4444: > > > > Ignore the last part about msend... just me messing around... As discussed privately, "Address family not supported by protocol" is an mtools bug introduced in v3.1. It has been fixed by Joachim today: https://github.com/troglobit/mtools/issues/8#issuecomment-2514087499 Please pull and test with mtools tag v3.2.
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index c992e385159c..2060f95d5c62 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -1473,8 +1473,8 @@ ping_do() vrf_name=$(master_name_get $if_name) ip vrf exec $vrf_name \ - $PING $args $dip -c $PING_COUNT -i 0.1 \ - -w $PING_TIMEOUT &> /dev/null + $PING $args -c $PING_COUNT -i 0.1 \ + -w $PING_TIMEOUT $dip &> /dev/null } ping_test() @@ -1504,8 +1504,8 @@ ping6_do() vrf_name=$(master_name_get $if_name) ip vrf exec $vrf_name \ - $PING6 $args $dip -c $PING_COUNT -i 0.1 \ - -w $PING_TIMEOUT &> /dev/null + $PING6 $args -c $PING_COUNT -i 0.1 \ + -w $PING_TIMEOUT $dip &> /dev/null } ping6_test()
If the coreutils variant of ping is used instead of the busybox one, the ping_do() command is broken. This comes by the fact that for coreutils ping, the ping IP needs to be the very last elements. To handle this, reorder the ping args and make $dip last element. The use of coreutils ping might be useful for case where busybox is not compiled with float interval support and ping command doesn't support 0.1 interval. (in such case a dedicated ping utility is installed instead) Cc: stable@vger.kernel.org Fixes: 73bae6736b6b ("selftests: forwarding: Add initial testing framework") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> --- tools/testing/selftests/net/forwarding/lib.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)