Skip to content

pkg/datapath: Assign correct IPv6 router IP to cilium_host interface#16044

Closed
eyanulis wants to merge 1 commit intocilium:masterfrom
eyanulis:gh16042-cilium-host-ipv6
Closed

pkg/datapath: Assign correct IPv6 router IP to cilium_host interface#16044
eyanulis wants to merge 1 commit intocilium:masterfrom
eyanulis:gh16042-cilium-host-ipv6

Conversation

@eyanulis
Copy link
Copy Markdown
Contributor

@eyanulis eyanulis commented May 7, 2021

Cilium generates an IPv4 and IPv6 router address from the node CIDRs,
which should then be passed to bpf/init.sh for assignment to the
cilium_host veth. However, due to a wrong function call, the IPv6
address being assigned to cilium_host was the node IP, rather than the
router IP.

This patch replaces the erroneous call to node.GetIPv6() with a call to
node.GetIPv6Router().

Fixes: #16042

Cilium generates an IPv4 and IPv6 router address from the node CIDRs,
which should then be passed to bpf/init.sh for assignment to the
cilium_host veth. However, due to a wrong function call, the IPv6
address being assigned to cilium_host was the node IP, rather than the
router IP.

This patch replaces the erroneous call to node.GetIPv6() with a call to
node.GetIPv6Router().

Fixes: #16042

Signed-off-by: Eric M. Yanulis <eric@eyanulis.net>
@eyanulis eyanulis requested a review from a team as a code owner May 7, 2021 04:46
@eyanulis eyanulis requested a review from aditighag May 7, 2021 04:46
@maintainer-s-little-helper maintainer-s-little-helper Bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 7, 2021
@eyanulis
Copy link
Copy Markdown
Contributor Author

eyanulis commented May 7, 2021

A possible additional point for consideration: I suspect the origin of this bug is from somewhat misleading variable naming. To a first time reader, the intended function of initArgIPv4NodeIP and initArgIPv6NodeIP is not clear - they read as if they are supposed to be the k8s node IPs, but after reviewing bpf/init.sh it becomes clear that they are actually supposed to be the derived router IPs.

Should we consider renaming these variables such that their purpose is more clear? The PR as it stands now is the minimal change needed to resolve the underlying issue (fix the function call) but I'd be happy to also do a sweep through and rename the vars as well if folks deem it useful.

@brb
Copy link
Copy Markdown
Member

brb commented May 7, 2021

I think assigning NodeIP instead of RouterIP is the intended behavior. Asked @borkmann to review this PR, as he mentioned the context for doing that in this commit msg: 7792ffc.

@eyanulis
Copy link
Copy Markdown
Contributor Author

eyanulis commented May 7, 2021

@brb Thanks for the pointer to context. I had poked through git history but didn't manage to come across the commit you linked.

For full context on where this came from, it originates from #16019 : having the same IP on multiple interfaces runs afoul of some assumptions in the auto-interface-derivation code (golang map unique-key semantics) in kube-proxy-replacement, and the end result is that KPR can end up picking cilium_host as a valid nodeport/direct routing interface. Details here: #16019 (comment)

Coming in without full context, the use of the node IP on cilium_host seemed like a red flag given how IPv4 was handled - but if it is intended then happy to defer to that. Perhaps the actual fix for #16019 would be in the nodeport derivation code instead.

That said, I'm not sure I fully follow the comment in 7792ffc which states re: v6 router IP:

[...] but adding it to the cilium_host interface is not an option since it's going to be scoped globally and is therefore public

I grok the issue regarding asymmetric routing, but I'm not quite sure I understand the problem if the ROUTER_IP were a globally scoped but non-node IP - since the entire pod CIDR on the node is assumed to be globally routable anyway. I assume I'm overlooking something as to why that's undesirable.

@brb
Copy link
Copy Markdown
Member

brb commented May 10, 2021

Perhaps the actual fix for #16019 would be in the nodeport derivation code instead.

Yep. I think the potential fix is to filter out ifaces which name start with cilium_ in detectDevices() function.

That said, I'm not sure I fully follow the comment in 7792ffc which states re: v6 router IP:

I guess @borkmann can comment the best here.

@eyanulis
Copy link
Copy Markdown
Contributor Author

Seems reasonable. I'm working on a PR against #16019 to skip cilium_* devices. Definitely still curious to hear from @borkmann in this context however.

@borkmann
Copy link
Copy Markdown
Member

@eyanulis I definitely agree that filtering out cilium_ in detectDevices() is the quick & needed fix to go either way.

It's been a really long time; the cilium_host addressing dates back to the early days of Cilium ... the commit which fixed the routing said wrt the current state of art:

   Some more context and background: for v6 (direct routing or tunnel
   mode), the cilium_host device has a globally scoped v6 address which
   matches the one of the public interface. This is due to the fact that
   we cannot add a private address with link scope like in v4 case where
   kernel can ensure that no traffic is leaked to the public:

   # ip a
   [...]
   2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet6 2a02:120b:2c12:c120:ec4:7aff:fa13:f90f/64 scope global dynamic mngtmpaddr noprefixroute
       valid_lft 262sec preferred_lft 22sec
   [...]
   12: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet6 2a02:120b:2c12:c120:ec4:7aff:fa13:f90f/128 scope global
       valid_lft forever preferred_lft forever
   [...]

   # ip -6 r
   [...]
   2001:db1:3::d899 dev cilium_host metric 1024 pref medium
   2001:db1:3::/112 via 2001:db1:3::d899 dev cilium_host src 2a02:120b:2c12:c120:ec4:7aff:fa13:f90f metric 1024 mtu 1450 pref medium
   2001:db8:5::/112 via 2001:db1:3::d899 dev cilium_host src 2a02:120b:2c12:c120:ec4:7aff:fa13:f90f metric 1024 mtu 1450 pref medium
   [...]

   This means that cilium_host effectively has two v6 addresses: HOST_IP
   which above is 2a02:120b:2c12:c120:ec4:7aff:fa13:f90f and ROUTER_IP
   from our v6 allocation pool which is 2001:db1:3::d899.

I'm a bit puzzled how this change is not breaking routing here? Maybe it would make sense to have a design doc which explains/analyses the current v6 implementation under the aspect of addressing/routing, describes the new proposed behavior in detail with its implications and why it would be safe.

@eyanulis
Copy link
Copy Markdown
Contributor Author

eyanulis commented May 12, 2021

@borkmann I've filed PR #16104 to cover the "quick/dirty" fix (#16019) w/r/t ignoring cilium_* interfaces.

No specific design goal on this PR - this came from what was ultimately a somewhat ignorant investigation into the source of the behavior in #16019. To an unfamiliar eye the differing treatment of IPv4 vs. IPv6 on the cilium_host interface appeared to be a red flag.

In any case - I've done more testing and this does break routing in non-legacy mode (enable-endpoint-routes disabled - the pod cidr routes to cilium_host never get installed. I didn't initially notice this since I was running with endpoint routes enabled) so it's clearly not the right way to go about this.

I'll drop this back into draft state since it definitely isn't useful in its current state (and ultimately maybe the best way forward is to close it).


From an outside perspective it would be useful to have a document explaining the purpose of cilium_host and particularly the choices made with respect to addressing it (in both address families). I definitely don't understand the datapath well enough to write such a document, however.

In any case - apologies for wasting time with the uninformed initial attempt, and thanks for humoring me while I got my head better wrapped around this.

@eyanulis
Copy link
Copy Markdown
Contributor Author

Closing. Per above, not a bug, nothing to fix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dont-merge/needs-release-note-label The author needs to describe the release impact of these changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cilium_host IPv6 address is set to node IP instead of generated IPv6 router IP

4 participants