Message ID | 20200918181801.2571-1-mathieu.desnoyers@efficios.com |
---|---|
Headers | show |
Series | l3mdev icmp error route lookup fixes | expand |
On 9/18/20 12:17 PM, Mathieu Desnoyers wrote: > Hi, > > Here is an updated series of fixes for ipv4 and ipv6 which which ensure > the route lookup is performed on the right routing table in VRF > configurations when sending TTL expired icmp errors (useful for > traceroute). > > It includes tests for both ipv4 and ipv6. > > These fixes address specifically address the code paths involved in > sending TTL expired icmp errors. As detailed in the individual commit > messages, those fixes do not address similar issues related to network > namespaces and unreachable / fragmentation needed messages, which appear > to use different code paths. > New selftests are failing: TEST: Ping received ICMP frag needed [FAIL] Both IPv4 and IPv6 versions are failing.
----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@gmail.com wrote: > On 9/18/20 12:17 PM, Mathieu Desnoyers wrote: >> Hi, >> >> Here is an updated series of fixes for ipv4 and ipv6 which which ensure >> the route lookup is performed on the right routing table in VRF >> configurations when sending TTL expired icmp errors (useful for >> traceroute). >> >> It includes tests for both ipv4 and ipv6. >> >> These fixes address specifically address the code paths involved in >> sending TTL expired icmp errors. As detailed in the individual commit >> messages, those fixes do not address similar issues related to network >> namespaces and unreachable / fragmentation needed messages, which appear >> to use different code paths. >> > > New selftests are failing: > TEST: Ping received ICMP frag needed [FAIL] > > Both IPv4 and IPv6 versions are failing. Indeed, this situation is discussed in each patch commit message: ipv4: [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] ipv6: [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] I do not have the time to investigate further unfortunately, so I thought it best to post what I have. Note that network namespaces also probably have the same problem, but those are not covered by the test cases. Thanks, Mathieu
On 9/21/20 12:44 PM, Mathieu Desnoyers wrote: > ----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@gmail.com wrote: > >> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote: >>> Hi, >>> >>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure >>> the route lookup is performed on the right routing table in VRF >>> configurations when sending TTL expired icmp errors (useful for >>> traceroute). >>> >>> It includes tests for both ipv4 and ipv6. >>> >>> These fixes address specifically address the code paths involved in >>> sending TTL expired icmp errors. As detailed in the individual commit >>> messages, those fixes do not address similar issues related to network >>> namespaces and unreachable / fragmentation needed messages, which appear >>> to use different code paths. >>> >> >> New selftests are failing: >> TEST: Ping received ICMP frag needed [FAIL] >> >> Both IPv4 and IPv6 versions are failing. > > Indeed, this situation is discussed in each patch commit message: > > ipv4: > > [ It has also been pointed out that a similar issue exists with > unreachable / fragmentation needed messages, which can be triggered by > changing the MTU of eth1 in r1 to 1400 and running: > > ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 > > Some investigation points to raw_icmp_error() and raw_err() as being > involved in this last scenario. The focus of this patch is TTL expired > ICMP messages, which go through icmp_route_lookup. > Investigation of failure modes related to raw_icmp_error() is beyond > this investigation's scope. ] > > ipv6: > > [ Testing shows that similar issues exist with ipv6 unreachable / > fragmentation needed messages. However, investigation of this > additional failure mode is beyond this investigation's scope. ] > > I do not have the time to investigate further unfortunately, so I > thought it best to post what I have. > the test setup is bad. You have r1 dropping the MTU in VRF red, but not telling VRF red how to send back the ICMP. e.g., for IPv4 add: ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue do the same for v6. Also, I do not see a reason for r2; I suggest dropping it. What you are testing is icmp crossing VRF with route leaking, so there should not be a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and the return via r2).
----- On Sep 21, 2020, at 3:11 PM, David Ahern dsahern@gmail.com wrote: > On 9/21/20 12:44 PM, Mathieu Desnoyers wrote: >> ----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@gmail.com wrote: >> >>> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote: >>>> Hi, >>>> >>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure >>>> the route lookup is performed on the right routing table in VRF >>>> configurations when sending TTL expired icmp errors (useful for >>>> traceroute). >>>> >>>> It includes tests for both ipv4 and ipv6. >>>> >>>> These fixes address specifically address the code paths involved in >>>> sending TTL expired icmp errors. As detailed in the individual commit >>>> messages, those fixes do not address similar issues related to network >>>> namespaces and unreachable / fragmentation needed messages, which appear >>>> to use different code paths. >>>> >>> >>> New selftests are failing: >>> TEST: Ping received ICMP frag needed [FAIL] >>> >>> Both IPv4 and IPv6 versions are failing. >> >> Indeed, this situation is discussed in each patch commit message: >> >> ipv4: >> >> [ It has also been pointed out that a similar issue exists with >> unreachable / fragmentation needed messages, which can be triggered by >> changing the MTU of eth1 in r1 to 1400 and running: >> >> ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 >> >> Some investigation points to raw_icmp_error() and raw_err() as being >> involved in this last scenario. The focus of this patch is TTL expired >> ICMP messages, which go through icmp_route_lookup. >> Investigation of failure modes related to raw_icmp_error() is beyond >> this investigation's scope. ] >> >> ipv6: >> >> [ Testing shows that similar issues exist with ipv6 unreachable / >> fragmentation needed messages. However, investigation of this >> additional failure mode is beyond this investigation's scope. ] >> >> I do not have the time to investigate further unfortunately, so I >> thought it best to post what I have. >> > > the test setup is bad. You have r1 dropping the MTU in VRF red, but not > telling VRF red how to send back the ICMP. e.g., for IPv4 add: > > ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue > > do the same for v6. > > Also, I do not see a reason for r2; I suggest dropping it. What you are > testing is icmp crossing VRF with route leaking, so there should not be > a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and > the return via r2). CCing Michael Jeanson, author of the selftests patch. Thanks for your feedback, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
----- On 21 Sep, 2020, at 15:33, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Sep 21, 2020, at 3:11 PM, David Ahern dsahern@gmail.com wrote: > >> On 9/21/20 12:44 PM, Mathieu Desnoyers wrote: >>> ----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@gmail.com wrote: >>> >>>> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote: >>>>> Hi, >>>>> >>>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure >>>>> the route lookup is performed on the right routing table in VRF >>>>> configurations when sending TTL expired icmp errors (useful for >>>>> traceroute). >>>>> >>>>> It includes tests for both ipv4 and ipv6. >>>>> >>>>> These fixes address specifically address the code paths involved in >>>>> sending TTL expired icmp errors. As detailed in the individual commit >>>>> messages, those fixes do not address similar issues related to network >>>>> namespaces and unreachable / fragmentation needed messages, which appear >>>>> to use different code paths. >>>>> >>>> >>>> New selftests are failing: >>>> TEST: Ping received ICMP frag needed [FAIL] >>>> >>>> Both IPv4 and IPv6 versions are failing. >>> >>> Indeed, this situation is discussed in each patch commit message: >>> >>> ipv4: >>> >>> [ It has also been pointed out that a similar issue exists with >>> unreachable / fragmentation needed messages, which can be triggered by >>> changing the MTU of eth1 in r1 to 1400 and running: >>> >>> ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 >>> >>> Some investigation points to raw_icmp_error() and raw_err() as being >>> involved in this last scenario. The focus of this patch is TTL expired >>> ICMP messages, which go through icmp_route_lookup. >>> Investigation of failure modes related to raw_icmp_error() is beyond >>> this investigation's scope. ] >>> >>> ipv6: >>> >>> [ Testing shows that similar issues exist with ipv6 unreachable / >>> fragmentation needed messages. However, investigation of this >>> additional failure mode is beyond this investigation's scope. ] >>> >>> I do not have the time to investigate further unfortunately, so I >>> thought it best to post what I have. >>> >> >> the test setup is bad. You have r1 dropping the MTU in VRF red, but not >> telling VRF red how to send back the ICMP. e.g., for IPv4 add: >> >> ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue >> >> do the same for v6. >> >> Also, I do not see a reason for r2; I suggest dropping it. What you are >> testing is icmp crossing VRF with route leaking, so there should not be >> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and >> the return via r2). The objective of the test was to replicate a clients environment where packets are crossing from a VRF which has a route back to the source to one which doesn't while reaching a ttl of 0. If the route lookup for the icmp error is done on the interface in the first VRF, it can be routed to the source but not on the interface in the second VRF which is the current behaviour for icmp errors generated while crossing between VRFs. There may be a better test case that doesn't involve asymmetric routing to test this but it's the only way I found to replicate this.
On 9/22/20 7:52 AM, Michael Jeanson wrote: >>> >>> the test setup is bad. You have r1 dropping the MTU in VRF red, but not >>> telling VRF red how to send back the ICMP. e.g., for IPv4 add: >>> >>> ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue >>> >>> do the same for v6. >>> >>> Also, I do not see a reason for r2; I suggest dropping it. What you are >>> testing is icmp crossing VRF with route leaking, so there should not be >>> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and >>> the return via r2). > > The objective of the test was to replicate a clients environment where > packets are crossing from a VRF which has a route back to the source to > one which doesn't while reaching a ttl of 0. If the route lookup for the > icmp error is done on the interface in the first VRF, it can be routed to > the source but not on the interface in the second VRF which is the > current behaviour for icmp errors generated while crossing between VRFs. > > There may be a better test case that doesn't involve asymmetric routing > to test this but it's the only way I found to replicate this. > It should work without asymmetric routing; adding the return route to the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It should work for TTL as well. Adding a second pass on the tests with the return through r2 is fine, but add a first pass for the more typical case.
On 2020-09-22 21 h 59, David Ahern wrote: > On 9/22/20 7:52 AM, Michael Jeanson wrote: >>>> >>>> the test setup is bad. You have r1 dropping the MTU in VRF red, but not >>>> telling VRF red how to send back the ICMP. e.g., for IPv4 add: >>>> >>>> ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue >>>> >>>> do the same for v6. >>>> >>>> Also, I do not see a reason for r2; I suggest dropping it. What you are >>>> testing is icmp crossing VRF with route leaking, so there should not be >>>> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and >>>> the return via r2). >> >> The objective of the test was to replicate a clients environment where >> packets are crossing from a VRF which has a route back to the source to >> one which doesn't while reaching a ttl of 0. If the route lookup for the >> icmp error is done on the interface in the first VRF, it can be routed to >> the source but not on the interface in the second VRF which is the >> current behaviour for icmp errors generated while crossing between VRFs. >> >> There may be a better test case that doesn't involve asymmetric routing >> to test this but it's the only way I found to replicate this. >> > > It should work without asymmetric routing; adding the return route to > the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It > should work for TTL as well. > > Adding a second pass on the tests with the return through r2 is fine, > but add a first pass for the more typical case. Hi, Before writing new tests I just want to make sure we are trying to fix the same issue. If I add a return route to the red VRF then we don't need this patchset because whether the ICMP error are routed using the table from the source or destination interface they will reach the source host. The issue for which this patchset was sent only happens when the destination interface's VRF doesn't have a route back to the source host. I guess we might question if this is actually a bug or not. So the question really is, when a packet is forwarded between VRFs through route leaking and an icmp error is generated, which table should be used for the route lookup? And does it depend on the type of icmp error? (e.g. TTL=1 happens before forwarding, but fragmentation needed happens after when on the destination interface) Cheers, Michael
On 2020-09-23 12 h 04, Michael Jeanson wrote: >> It should work without asymmetric routing; adding the return route to >> the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It >> should work for TTL as well. >> >> Adding a second pass on the tests with the return through r2 is fine, >> but add a first pass for the more typical case. > > Hi, > > Before writing new tests I just want to make sure we are trying to fix > the same issue. If I add a return route to the red VRF then we don't > need this patchset because whether the ICMP error are routed using the > table from the source or destination interface they will reach the > source host. > > The issue for which this patchset was sent only happens when the > destination interface's VRF doesn't have a route back to the source > host. I guess we might question if this is actually a bug or not. > > So the question really is, when a packet is forwarded between VRFs > through route leaking and an icmp error is generated, which table should > be used for the route lookup? And does it depend on the type of icmp > error? (e.g. TTL=1 happens before forwarding, but fragmentation needed > happens after when on the destination interface) As a side note, I don't mind reworking the tests as you requested even if the patchset as a whole ends up not being needed and if you think they are still useful. I just wanted to make sure we understood each other. Cheers, Michael
On 9/23/20 11:03 AM, Michael Jeanson wrote: > On 2020-09-23 12 h 04, Michael Jeanson wrote: >>> It should work without asymmetric routing; adding the return route to >>> the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It >>> should work for TTL as well. >>> >>> Adding a second pass on the tests with the return through r2 is fine, >>> but add a first pass for the more typical case. >> >> Hi, >> >> Before writing new tests I just want to make sure we are trying to fix >> the same issue. If I add a return route to the red VRF then we don't >> need this patchset because whether the ICMP error are routed using the >> table from the source or destination interface they will reach the >> source host. >> >> The issue for which this patchset was sent only happens when the >> destination interface's VRF doesn't have a route back to the source >> host. I guess we might question if this is actually a bug or not. >> >> So the question really is, when a packet is forwarded between VRFs >> through route leaking and an icmp error is generated, which table >> should be used for the route lookup? And does it depend on the type of >> icmp error? (e.g. TTL=1 happens before forwarding, but fragmentation >> needed happens after when on the destination interface) > > As a side note, I don't mind reworking the tests as you requested even > if the patchset as a whole ends up not being needed and if you think > they are still useful. I just wanted to make sure we understood each other. > if you are leaking from VRF 1 to VRF 2 and you do not configure VRF 2 with how to send to errors back to source - MTU or TTL - then I will argue that is a configuration problem, not a bug. Now the TTL problem is interesting. You need the FIB lookup to know that the packet is forwarded, and by the time of the ttl check in ip_forward skb->dev points to the ingress VRF and dst points to the egress VRF. So I think the change is warranted. Let's do this for the tests: 1 pass through all of the tests (TTL and MTU, v4 and v6) with symmetric routing configured and make sure they all pass. ie., keep all of them and make sure all tests pass. No sense losing the tests and the thoughts behind them. Add a second pass with the asymmetric routing per the customer setup since it motivated the investigation. Rename the test to something like vrf_route_leaking.sh. It can be expanded with more tests related to route leaking as they come up.
On 2020-09-23 14 h 46, David Ahern wrote: > On 9/23/20 11:03 AM, Michael Jeanson wrote: >> On 2020-09-23 12 h 04, Michael Jeanson wrote: >>>> It should work without asymmetric routing; adding the return route to >>>> the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It >>>> should work for TTL as well. >>>> >>>> Adding a second pass on the tests with the return through r2 is fine, >>>> but add a first pass for the more typical case. >>> >>> Hi, >>> >>> Before writing new tests I just want to make sure we are trying to fix >>> the same issue. If I add a return route to the red VRF then we don't >>> need this patchset because whether the ICMP error are routed using the >>> table from the source or destination interface they will reach the >>> source host. >>> >>> The issue for which this patchset was sent only happens when the >>> destination interface's VRF doesn't have a route back to the source >>> host. I guess we might question if this is actually a bug or not. >>> >>> So the question really is, when a packet is forwarded between VRFs >>> through route leaking and an icmp error is generated, which table >>> should be used for the route lookup? And does it depend on the type of >>> icmp error? (e.g. TTL=1 happens before forwarding, but fragmentation >>> needed happens after when on the destination interface) >> >> As a side note, I don't mind reworking the tests as you requested even >> if the patchset as a whole ends up not being needed and if you think >> they are still useful. I just wanted to make sure we understood each other. >> > > if you are leaking from VRF 1 to VRF 2 and you do not configure VRF 2 > with how to send to errors back to source - MTU or TTL - then I will > argue that is a configuration problem, not a bug. > > Now the TTL problem is interesting. You need the FIB lookup to know that > the packet is forwarded, and by the time of the ttl check in ip_forward > skb->dev points to the ingress VRF and dst points to the egress VRF. So > I think the change is warranted. > > Let's do this for the tests: > 1 pass through all of the tests (TTL and MTU, v4 and v6) with symmetric > routing configured and make sure they all pass. ie., keep all of them > and make sure all tests pass. No sense losing the tests and the thoughts > behind them. > > Add a second pass with the asymmetric routing per the customer setup > since it motivated the investigation. > > Rename the test to something like vrf_route_leaking.sh. It can be > expanded with more tests related to route leaking as they come up. > Just a final clarification, the asymmetric setup would have no return route in VRF 2 and only test the TTL case since the others would fail?
On 9/23/20 1:12 PM, Michael Jeanson wrote: > > Just a final clarification, the asymmetric setup would have no return > route in VRF 2 and only test the TTL case since the others would fail? correct. add a statement about it representing a customer setup so it is clear such a config is a 1-off