Skip to content

tunnel:agent: Error out in case of disabled underlayProtocol#42861

Closed
smagnani96 wants to merge 2 commits intomainfrom
pr/smagnani96/agent-error-on-wrong-underlay
Closed

tunnel:agent: Error out in case of disabled underlayProtocol#42861
smagnani96 wants to merge 2 commits intomainfrom
pr/smagnani96/agent-error-on-wrong-underlay

Conversation

@smagnani96
Copy link
Copy Markdown
Contributor

@smagnani96 smagnani96 commented Nov 18, 2025

Prior to this commit, it was possible to select a disabled underlay protocol, causing potential MTU calculation errors. For example, in an IPv6-only cluster, the defaul value for underlayProtocol is IPv4, and nothing prevents the agent from running with this incorrect information. That leads to error in computing the MTU, accounting for 50B of overhead rather than the 70B for the IPv6 underlay case.

The above problems lead to fragmentation, and in some configuration to packet drops and no pod-to-pod connectivity. Let's just error in case of a disabled selected underlay protocol, so that at least the user is well aware that something was wrong during Cilium installation.

See comment #23917 (comment).

For previous IPv6-only cluster in native routing, this will not cause troubles, given the previous commit: config enablers/validators are run first, and immediately return a disabled config in case it is not needed.

@smagnani96 smagnani96 self-assigned this Nov 18, 2025
@smagnani96 smagnani96 added kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. area/agent Cilium agent related. labels Nov 18, 2025
@julianwiedmann
Copy link
Copy Markdown
Member

👍 Although the failing CI workflow tells us that we might be breaking IPv6-only users on upgrade ... should we maybe just forcibly override the configured underlay family in such a case?

(This essentially addresses my comment in #40877 (comment))

@smagnani96
Copy link
Copy Markdown
Contributor Author

... should we maybe just forcibly override the configured underlay family in such a case?

Given it's already well stated in the doc https://docs.cilium.io/en/stable/network/concepts/routing/#configuration, I'm more lean to return an error. But that's not a strong preference 😅

@julianwiedmann
Copy link
Copy Markdown
Member

... should we maybe just forcibly override the configured underlay family in such a case?

Given it's already well stated in the doc https://docs.cilium.io/en/stable/network/concepts/routing/#configuration, I'm more lean to return an error. But that's not a strong preference 😅

I'd agree, we haven't good past experience with silently over-riding config options.

Just saying that IPv6-only with native-routing has been a valid config for a long time, and all those folks would need to fix-up their config now (the underlay-protocol config option only exists since v1.18)

@julianwiedmann julianwiedmann added the feature/ipv6-only Relates to single-stack IPv6 support. label Nov 19, 2025
@smagnani96 smagnani96 force-pushed the pr/smagnani96/agent-error-on-wrong-underlay branch 3 times, most recently from 30ed4b7 to 497076b Compare November 19, 2025 14:16
@smagnani96
Copy link
Copy Markdown
Contributor Author

Just saying that IPv6-only with native-routing has been a valid config for a long time, and all those folks would need to fix-up their config now (the underlay-protocol config option only exists since v1.18)

I see what you mean, thanks. Just updated the PR, added a commit to move the Enablers check at the very beginning, so that we won't run the parsing logic for all those configs in which TunnelConfig is not needed (e.g., native routing) 🙏🏼

@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member

Just saying that IPv6-only with native-routing has been a valid config for a long time, and all those folks would need to fix-up their config now (the underlay-protocol config option only exists since v1.18)

I see what you mean, thanks. Just updated the PR, added a commit to move the Enablers check at the very beginning, so that we won't run the parsing logic for all those configs in which TunnelConfig is not needed (e.g., native routing) 🙏🏼

I imagine we don't have way of nicely expressing whether a feature actually requires "tunneling" to be set up correctly? Thinking EGW, or DSR-Geneve, or IPsec (which relies on the tunnel_endpoint to be populated) ?

@smagnani96
Copy link
Copy Markdown
Contributor Author

I imagine we don't have way of nicely expressing whether a feature actually requires "tunneling" to be set up correctly? Thinking EGW, or DSR-Geneve, or IPsec (which relies on the tunnel_endpoint to be populated) ?

The closest I'm seeing is declaring config Enablers:

  • primary routing mode is tunnel, here
  • KPR + DSR Geneve, here
  • EGW enabled, here

I don't see others (e.g., IPSec), but should be pretty easy to add using this method though.

Concerning this PR, I'll adjust integration tests and give it another try. WDYT?

This commit moves the logic to check for enablers and validators in the
function newConfig for tunnel config at the very beginning.
There is no reason to validate tunnel-related parameters if the config
is not needed, i.e. when non of the enablers/validators match.

With this commit, if the tunnel config is not enabled (e.g., native routing),
inputs are simply ignored and the default tunnel disabled config is returned.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Prior to this commit, it was possible to select a disabled underlay
protocol, causing potential MTU calculation errors. For example,
in an IPv6-only cluster, the defaul value for underlayProtocol is IPv4,
and nothing prevents the agent from running with this incorrect information.
That leads to error in computing the MTU, accounting for 50B of overhead
rather than the 70B for the IPv6 underlay case.

The above problems lead to fragmentation, and in some configuration to
packet drops and no pod-to-pod connectivity. Let's just error in case
of a disabled selected underlay protocol, so that at least the user is
well aware that something was wrong during Cilium installation.

See comment #23917 (comment).

For previous IPv6-only cluster in native routing, this will not cause
troubles, given the previous commit: config enablers/validators are run
first, and immediately return a disabled config in case it is not needed.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96 smagnani96 force-pushed the pr/smagnani96/agent-error-on-wrong-underlay branch from 497076b to 41736c8 Compare November 20, 2025 11:17
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@smagnani96
Copy link
Copy Markdown
Contributor Author

Closing this, will be superseeded by #43057.

@smagnani96 smagnani96 closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Cilium agent related. feature/ipv6-only Relates to single-stack IPv6 support. kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants