Skip to content

gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only assume neighbor cache entries to always be on-link#16671

Merged
miri64 merged 4 commits intoRIOT-OS:masterfrom
benpicco:gnrc/nib-_nib_onl_get-fix
Aug 11, 2021
Merged

gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only assume neighbor cache entries to always be on-link#16671
miri64 merged 4 commits intoRIOT-OS:masterfrom
benpicco:gnrc/nib-_nib_onl_get-fix

Conversation

@benpicco
Copy link
Copy Markdown
Contributor

@benpicco benpicco commented Jul 21, 2021

Contribution description

Consider the following: A node tries to forward a packet to another host for which it does not know the route yet. It assumes it to be on-link and starts a neighbor solicitation, putting the node address in the destination cache.

Later the route is known (via a second hop) but the host is still in the NIB.

The result is that gnrc_ipv6_nib_get_next_hop_l2addr() ends up in the "nib: %s is in NC or on-link, start address resolution" case and does not attempt to resolve the route.

This results in the host remaining unreachable even though now a route is present.

The fix should make sense since only neighbor cache entries are guaranteed to be on-link.

Testing procedure

Issues/PRs references

@benpicco benpicco added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Jul 21, 2021
@github-actions github-actions bot added Area: network Area: Networking Area: sys Area: System labels Jul 21, 2021
@benpicco benpicco force-pushed the gnrc/nib-_nib_onl_get-fix branch from 437c37c to e33f429 Compare July 21, 2021 22:34
Copy link
Copy Markdown
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH it looks to me like you want to remove the neighbor cache altogether with this PR and I don't really understand how this makes sense. Even with your explanation.

ipv6_addr_to_str(addr_str, dst, sizeof(addr_str)));
DEBUG("nib: %s is %s, start address resolution\n",
ipv6_addr_to_str(addr_str, dst, sizeof(addr_str)),
node ? "in NC" : "on-link");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an entry is in the NC it is by definition on-link. Not sure why the distinction is made in this debug message...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found it useful for debugging to know whether _on_link() returned true or if node was not NULL.

@benpicco
Copy link
Copy Markdown
Contributor Author

benpicco commented Jul 22, 2021

Ok, so I think the 'original patch' would be

--- a/sys/net/gnrc/network_layer/ipv6/nib/nib.c
+++ b/sys/net/gnrc/network_layer/ipv6/nib/nib.c
@@ -220,7 +220,7 @@ int gnrc_ipv6_nib_get_next_hop_l2addr(const ipv6_addr_t *dst,
         /* consider neighbor cache entries first */
         unsigned iface = (node == NULL) ? 0 : _nib_onl_get_if(node);
 
-        if ((node != NULL) || _on_link(dst, &iface)) {
+        if (((node != NULL) && (nce->mode & _NC)) || _on_link(dst, &iface)) {
             DEBUG("nib: %s is on-link or in NC, start address resolution\n",
                   ipv6_addr_to_str(addr_str, dst, sizeof(addr_str)));
             /* on-link prefixes return their interface */

Then I was wondering if it even makes sense for _nib_onl_get() to return non-NC entries as all on-link entries should also be in the neighbor cache.

If I'm wrong I can also add a _nib_onl_nc_get() and simplify some of the code that manually checks for the _NC flag now.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

Then I was wondering if it even makes sense for _nib_onl_get() to return non-NC entries as all on-link entries should also be in the neighbor cache.

Yes, have a look at the original model. _onl_entry_ts (not to confuse with on-link entries like the term is defined in RFC 4861) are just an abstract component in the NIB. They may represent a neighbor cache (when the _NC flag is set) but they also can just be the next-hop component of a route or prefix list entry. This was done to safe space as most of the times those next hops at some point also end up in the neighbor cache (but don't have to, e.g. if address resolution for that next hop is never triggered). That those kind of entries are not returned should be guaranteed since they should be _EMPTY, so using non-_NC here would definitely be wrong (as this might then return a next-hop for which reachability can not be guaranteed yet e.g.).

Regarding that particular line, it might be the case, that we only want the neighbor cache entry here, indeed, but not non-NC entries...

@benpicco benpicco force-pushed the gnrc/nib-_nib_onl_get-fix branch 2 times, most recently from 419007d to 53efcd4 Compare July 22, 2021 11:36
benpicco added 3 commits July 22, 2021 13:36
Consider the following: A node tries to forward a packet to another
host for which it does not know the route yet. It assumes it to be
on-link and starts a neighbor solicitation, putting the node address
in the destinatio cache.

Later the route is known (via a second hop) but the host is still in
the NIB.

The result is that gnrc_ipv6_nib_get_next_hop_l2addr() ends up in the
"nib: %s is in NC or on-link, start address resolution" case and does
not attempt to resolve the route.

This results in the host remaining unreachable even though now a route
is present.
@benpicco benpicco force-pushed the gnrc/nib-_nib_onl_get-fix branch from 53efcd4 to 5ad58b1 Compare July 22, 2021 11:37
@benpicco
Copy link
Copy Markdown
Contributor Author

Ok I kept _nib_onl_get() as is and added _nib_onl_nc_get() instead.

If too controversial, I can drop the last two commits 😉

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

It looks alright, but I am still wondering what type of on-link entry you get that this happens. Did you check (you could just output the flags in a debug line)?

The result is that gnrc_ipv6_nib_get_next_hop_l2addr() ends up in the "nib: %s is in NC or on-link, start address resolution" case and does not attempt to resolve the route.

This results in the host remaining unreachable even though now a route is present.

Also why wasn't

nce = _nib_onl_get(&ipv6->src, netif->pid);
/* NIB is full */
if (nce == NULL) {
return SIXLOWPAN_ND_STATUS_NC_FULL;
}

updated?

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

I guess the PR title should also be updated now ;-).

@benpicco
Copy link
Copy Markdown
Contributor Author

It looks alright, but I am still wondering what type of on-link entry you get that this happens

I suspect it's _DC, but will check

Also why wasn't RIOT/sys/net/gnrc/network_layer/ipv6/nib/_nib-6lr.c updated?

It's not checking for nce->mode & _NC, that's what I was searching for - but maybe it should?

@benpicco
Copy link
Copy Markdown
Contributor Author

It's _DST:

2021-07-22 15:18:34,837 # nib: Getting on-link node entry (addr = 2001:16b8:45d7:59fb:a43e:9dff:fe26:7a2e, iface = 0)
2021-07-22 15:18:34,837 #   Found 0x2000224c
2021-07-22 15:18:34,838 # nib: 2001:16b8:45d7:59fb:a43e:9dff:fe26:7a2e is in NC, start address resolution
2021-07-22 15:18:34,838 # nce->mode: 40
2021-07-22 15:18:34,839 # nib: Getting on-link node entry (addr = 2001:16b8:45d7:59fb:a43e:9dff:fe26:7a2e, iface = 7)
2021-07-22 15:18:34,839 #   Found 0x2000224c

@benpicco benpicco changed the title gnrc/nib: only consider neighbor cache entries in _nib_onl_get() gnrc/nib: only consider neighbor cache entries in gnrc_ipv6_nib_get_next_hop_l2addr() Jul 22, 2021
@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

Ah right! That mode is used when an _offl_entry_t is referring to that, so I think after all, your change is valid.

benpicco changed the title ~gnrc/nib: only consider neighbor cache entries in _nib_onl_get() ~ gnrc/nib: only consider neighbor cache entries in gnrc_ipv6_nib_get_next_hop_l2addr() 1 hour ago

Just from the title I know wonder: so no FIB lookup happens now in gnrc_ipv6_nib_get_next_hop_l2addr()? I think you need to clarify ;-)

@benpicco benpicco changed the title gnrc/nib: only consider neighbor cache entries in gnrc_ipv6_nib_get_next_hop_l2addr() gnrc/nib: only consider neighbor cache entries as on-link in gnrc_ipv6_nib_get_next_hop_l2addr() Jul 22, 2021
@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

On the other hand: why is there a global address as next hop in the first place? Typically, (while not necessarily) they are link-local.

@benpicco
Copy link
Copy Markdown
Contributor Author

On the other hand: why is there a global address as next hop in the first place? Typically, (while not necessarily) they are link-local.

#16665 👀

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 22, 2021

On the other hand: why is there a global address as next hop in the first place? Typically, (while not necessarily) they are link-local.

#16665 eyes

I don't see the relation between source address selection and next-hops of routes...

@benpicco
Copy link
Copy Markdown
Contributor Author

If the node takes the source address of the packet it got from the next hop as it's next-hop-node, then it will do just that.

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 27, 2021
@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

If the node takes the source address of the packet it got from the next hop as it's next-hop-node, then it will do just that.

Not really sure how to parse this sentence? Why would a packet come from the next hop? My understanding of routing is that packets go to a next hop. And why does this decide the source address of a packet sent by the host at hand? Forwarding a packet should not change the addresses of a packet.

@benpicco
Copy link
Copy Markdown
Contributor Author

This is with #16568. If the Router Advertisement contains a RIO, the node that receives it will use the source address of the RA as the next hop to reach the the network advertised in the RIO.

This however is wholly unrelated to this PR.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

This is with #16568. If the Router Advertisement contains a RIO, the node that receives it will use the source address of the RA as the next hop to reach the the network advertised in the RIO.

Then those next hops should not be global, as RA source addresses MUST be link-local. If that's not the case, this is the bug that needs fixing.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

That a router advertisement has a link-local address as source should be assured by

if (src == NULL) {
/* get address from source selection algorithm.
* Only link local addresses may be used (RFC 4861 section 4.1) */
src = gnrc_netif_ipv6_addr_best_src(netif, dst, true);
if (src == NULL) {
DEBUG("ndp rtr: no VALID source address found for RA\n");
break;
}
}
unless src was initialized on call of course, which isn't the case in current master:

$ git grep "rtr_adv_send" | grep -v "^tests"
sys/include/net/gnrc/ndp.h:void gnrc_ndp_rtr_adv_send(gnrc_netif_t *netif, const ipv6_addr_t *src,
sys/net/gnrc/network_layer/ipv6/nib/_nib-router.c:    gnrc_ndp_rtr_adv_send(netif, NULL, dst, final, ext_opts);
sys/net/gnrc/network_layer/ndp/gnrc_ndp.c:void gnrc_ndp_rtr_adv_send(gnrc_netif_t *netif, const ipv6_addr_t *src,

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

RAs with non-link-local addresses should also be rejected via this check

if (!(ipv6_addr_is_link_local(&ipv6->src)) ||
(ipv6->hl != NDP_HOP_LIMIT) || (rtr_adv->code != 0U) ||
(icmpv6_len < sizeof(ndp_rtr_adv_t)) ||
(!gnrc_netif_is_6ln(netif) &&
(byteorder_ntohs(rtr_adv->ltime) > NDP_RTR_ADV_LTIME_SEC_MAX))) {
DEBUG("nib: Received router advertisement is invalid. "
"Discarding silently\n");
DEBUG(" - IP Hop Limit: %u (should be %u)\n", ipv6->hl,
NDP_HOP_LIMIT);
DEBUG(" - ICMP code: %u (should be 0)\n", rtr_adv->code);
DEBUG(" - ICMP length: %u (should > %u)\n", (unsigned)icmpv6_len,
(unsigned)sizeof(ndp_rtr_adv_t));
DEBUG(" - Source address: %s (should be link-local)\n",
ipv6_addr_to_str(addr_str, &ipv6->src, sizeof(addr_str)));
DEBUG(" - Router lifetime: %u (should be <= 9000 on non-6LN)\n",
byteorder_ntohs(rtr_adv->ltime));
return;
}

@benpicco
Copy link
Copy Markdown
Contributor Author

tbh I'm not sure anymore with which state of my PRs I got to this case, however this PR should be useful regardless as it only does two things:

  • adds a missing node->mode & _NC check in gnrc_ipv6_nib_get_next_hop_l2addr() to ensure the entry is on-link
  • adds a _nib_onl_nc_get() helper function to get rid of all those sprawling (entry != NULL) && (entry->mode & _NC) checks that are easy to forget / get wrong

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

  • adds a missing node->mode & _NC check in gnrc_ipv6_nib_get_next_hop_l2addr() to ensure the entry is on-link

Not sure this is the right way to go. Only because an entry is not in the neighbor cache (yet), does not mean it is not on-link. A routing protocol (or for that matter a user using the nib route command) could have added that entry and the only thing missing for that entry to become a neighbor cache entry might be address resolution which would be triggered by sending to that neighbor.

  • adds a _nib_onl_nc_get() helper function to get rid of all those sprawling (entry != NULL) && (entry->mode & _NC) checks that are easy to forget / get wrong

That I don't argue with :-).

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

Another reason could be, that the node simply just cached out from the neighbor cache (but still is kept in the _onl_entry_t list for other purposes) e.g. because the sending node tries to reach a lot of neighbors in a short time frame, leading to a round robin situation in the neighbor cache.

@benpicco
Copy link
Copy Markdown
Contributor Author

benpicco commented Jul 29, 2021

But in that case we would still do the _on_link(dst, &iface) check. (see 11e9b7c)
The problem is that the _on_link(dst, &iface) check is not performed if the destination is already an _offl_entry_t.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 29, 2021

But in that case we would still do the _on_link(dst, &iface) check.
The problem is that the _on_link(dst, &iface) check is not performed if the destination is already an _offl_entry_t.

This somehow contradicts the title of this PR, but maybe only that needs adaption. Apart from that, I think the claim is valid. If there is no NC, there is no link-layer address.

@benpicco benpicco changed the title gnrc/nib: only consider neighbor cache entries as on-link in gnrc_ipv6_nib_get_next_hop_l2addr() gnrc/nib: only consider neighbor cache entries as guaranteed on-link in gnrc_ipv6_nib_get_next_hop_l2addr() Jul 29, 2021
@benpicco
Copy link
Copy Markdown
Contributor Author

benpicco commented Aug 2, 2021

Does this make for a better PR title? 😉

@benpicco benpicco changed the title gnrc/nib: only consider neighbor cache entries as guaranteed on-link in gnrc_ipv6_nib_get_next_hop_l2addr() gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries as guaranteed on-link Aug 4, 2021
@benpicco benpicco changed the title gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries as guaranteed on-link gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries to be guaranteed on-link Aug 4, 2021
@benpicco benpicco changed the title gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries to be guaranteed on-link gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries to always be on-link Aug 4, 2021
@benpicco benpicco changed the title gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only consider neighbor cache entries to always be on-link gnrc/nib: gnrc_ipv6_nib_get_next_hop_l2addr(): only assume neighbor cache entries to always be on-link Aug 4, 2021
@benpicco
Copy link
Copy Markdown
Contributor Author

@miri64 ping 😉

Copy link
Copy Markdown
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, changes make sense now.

@miri64 miri64 merged commit de3e5f9 into RIOT-OS:master Aug 11, 2021
@benpicco benpicco deleted the gnrc/nib-_nib_onl_get-fix branch August 11, 2021 10:17
@benpicco benpicco added this to the Release 2021.10 milestone Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants