pkg/datapath: Assign correct IPv6 router IP to cilium_host interface#16044
pkg/datapath: Assign correct IPv6 router IP to cilium_host interface#16044eyanulis wants to merge 1 commit intocilium:masterfrom eyanulis:gh16042-cilium-host-ipv6
Conversation
Cilium generates an IPv4 and IPv6 router address from the node CIDRs, which should then be passed to bpf/init.sh for assignment to the cilium_host veth. However, due to a wrong function call, the IPv6 address being assigned to cilium_host was the node IP, rather than the router IP. This patch replaces the erroneous call to node.GetIPv6() with a call to node.GetIPv6Router(). Fixes: #16042 Signed-off-by: Eric M. Yanulis <eric@eyanulis.net>
|
A possible additional point for consideration: I suspect the origin of this bug is from somewhat misleading variable naming. To a first time reader, the intended function of Should we consider renaming these variables such that their purpose is more clear? The PR as it stands now is the minimal change needed to resolve the underlying issue (fix the function call) but I'd be happy to also do a sweep through and rename the vars as well if folks deem it useful. |
|
@brb Thanks for the pointer to context. I had poked through git history but didn't manage to come across the commit you linked. For full context on where this came from, it originates from #16019 : having the same IP on multiple interfaces runs afoul of some assumptions in the auto-interface-derivation code (golang map unique-key semantics) in kube-proxy-replacement, and the end result is that KPR can end up picking cilium_host as a valid nodeport/direct routing interface. Details here: #16019 (comment) Coming in without full context, the use of the node IP on cilium_host seemed like a red flag given how IPv4 was handled - but if it is intended then happy to defer to that. Perhaps the actual fix for #16019 would be in the nodeport derivation code instead. That said, I'm not sure I fully follow the comment in 7792ffc which states re: v6 router IP:
I grok the issue regarding asymmetric routing, but I'm not quite sure I understand the problem if the ROUTER_IP were a globally scoped but non-node IP - since the entire pod CIDR on the node is assumed to be globally routable anyway. I assume I'm overlooking something as to why that's undesirable. |
Yep. I think the potential fix is to filter out ifaces which name start with
I guess @borkmann can comment the best here. |
|
@eyanulis I definitely agree that filtering out It's been a really long time; the cilium_host addressing dates back to the early days of Cilium ... the commit which fixed the routing said wrt the current state of art: I'm a bit puzzled how this change is not breaking routing here? Maybe it would make sense to have a design doc which explains/analyses the current v6 implementation under the aspect of addressing/routing, describes the new proposed behavior in detail with its implications and why it would be safe. |
|
@borkmann I've filed PR #16104 to cover the "quick/dirty" fix (#16019) w/r/t ignoring No specific design goal on this PR - this came from what was ultimately a somewhat ignorant investigation into the source of the behavior in #16019. To an unfamiliar eye the differing treatment of IPv4 vs. IPv6 on the cilium_host interface appeared to be a red flag. In any case - I've done more testing and this does break routing in non-legacy mode ( I'll drop this back into draft state since it definitely isn't useful in its current state (and ultimately maybe the best way forward is to close it). From an outside perspective it would be useful to have a document explaining the purpose of cilium_host and particularly the choices made with respect to addressing it (in both address families). I definitely don't understand the datapath well enough to write such a document, however. In any case - apologies for wasting time with the uninformed initial attempt, and thanks for humoring me while I got my head better wrapped around this. |
|
Closing. Per above, not a bug, nothing to fix here. |
Cilium generates an IPv4 and IPv6 router address from the node CIDRs,
which should then be passed to bpf/init.sh for assignment to the
cilium_hostveth. However, due to a wrong function call, the IPv6address being assigned to
cilium_hostwas the node IP, rather than therouter IP.
This patch replaces the erroneous call to node.GetIPv6() with a call to
node.GetIPv6Router().
Fixes: #16042