Skip to content

Fix --exclude-local-address with eBPF Host-Routing#41275

Merged
pchaigno merged 1 commit intocilium:mainfrom
DataDog:ai/fix-exclude-local-addr
Aug 22, 2025
Merged

Fix --exclude-local-address with eBPF Host-Routing#41275
pchaigno merged 1 commit intocilium:mainfrom
DataDog:ai/fix-exclude-local-addr

Conversation

@antonipp
Copy link
Copy Markdown
Contributor

Description

The problem is described in detail in the original issue #41241

This PR adds error-handling logic that fixes routing for local addresses which have been passed in --exclude-local-address. The routing code will always attempt a FIB lookup for packets destined to these addresses and it will always fail with BPF_FIB_LKUP_RET_NOT_FWDED because the addresses are local. The routing code will now remediate this by passing the packet to the kernel's routing stack when encountering this scenario.

Another idea I originally had was to keep track of excluded local addresses in a separate map ( DataDog@844f1c8 ) but this obviously introduces more complexity that the current PR.

Testing

Setup: deploy a pod with NLD bound to 169.254.20.10. Set --exclude-local-address="169.254.20.10/31" and enable eBPF Host-Routing on the Cilium Agent.

Before the PR:

# dig google.com
;; communications error to 169.254.20.10#53: timed out
;; communications error to 169.254.20.10#53: timed out
[...]

After the PR:

# dig google.com
; <<>> DiG 9.18.30-0ubuntu0.20.04.2-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
[...]

Fixes: #41241

Fix --exclude-local-address with eBPF Host-Routing

@antonipp antonipp requested a review from a team as a code owner August 19, 2025 14:07
@antonipp antonipp requested a review from jrife August 19, 2025 14:07
@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from 7a058c4 to 2c0afdf Compare August 19, 2025 15:43
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 19, 2025
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@pchaigno pchaigno added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Aug 19, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 19, 2025
@pchaigno pchaigno requested a review from borkmann August 19, 2025 17:45
@pchaigno
Copy link
Copy Markdown
Member

/test

@pchaigno
Copy link
Copy Markdown
Member

/test

@antonipp It's strange that you couldn't trigger tests. Have you had this issue before?

Copy link
Copy Markdown
Contributor

@jrife jrife left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should something similar be done in handle_ipv6_from_lxc? Also just a heads up, you might want to rebase and test with #37725 which just got merged. AFAIK it shouldn't impact anything, but slightly changes how the fib lookup happens with the addition of the BPF_FIB_LOOKUP_SKIP_NEIGH flag.

@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from 2c0afdf to 74637cf Compare August 20, 2025 09:34
@antonipp
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews, addressed the comments 👍

It's strange that you couldn't trigger tests. Have you had this issue before?

Yep, I did notice that this was quite flaky recently. For example in this PR, I was not able to run them consistently: #41154

I also noticed that the current PR has more failed E2E tests than usual 🤔 https://github.com/cilium/cilium/actions/runs/17077534336/job/48422818273

And this BPF complexity verifier failure looks legit as well: https://github.com/cilium/cilium/actions/runs/17077535162/job/48422742276 (need to look further into it)

@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@pchaigno
Copy link
Copy Markdown
Member

pchaigno commented Aug 20, 2025

/test

And this BPF complexity verifier failure looks legit as well: https://github.com/cilium/cilium/actions/runs/17077535162/job/48422742276 (need to look further into it)

Ouch, that's going to be tricky to work around. I'm guessing we're probably already quite "close" to the complexity limit and, since your change is adding new branches, it's enough to push complexity over the limit.

I also noticed that the current PR has more failed E2E tests than usual 🤔 https://github.com/cilium/cilium/actions/runs/17077534336/job/48422818273

Probably related to the complexity issue, no?

@pchaigno
Copy link
Copy Markdown
Member

My bad, it's not a complexity issue but a good old verifier reject:

; if (ret == DROP_NO_FIB && *ext_err == BPF_FIB_LKUP_RET_NOT_FWDED) {
1849: (79) r1 = *(u64 *)(r10 -144)    ; R1_w=scalar() R10=fp0
; __u8 new_ttl, ttl = ip4->ttl;
1850: (71) r3 = *(u8 *)(r1 +22)
R1 invalid mem access 'scalar'
verification time 74405 usec

Are we sure ext_err is always defined on that code path?

@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from 74637cf to e25cfd4 Compare August 20, 2025 12:29
@antonipp
Copy link
Copy Markdown
Contributor Author

/ci-verifier

@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from e25cfd4 to 4976737 Compare August 20, 2025 12:53
@jrife
Copy link
Copy Markdown
Contributor

jrife commented Aug 20, 2025

/ci-verifier

@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch 2 times, most recently from b2fa68b to f9b5a55 Compare August 21, 2025 12:07
@antonipp
Copy link
Copy Markdown
Contributor Author

/ci-verifier

@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from f9b5a55 to e5d4267 Compare August 21, 2025 14:59
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

This change adds error-handling logic that fixes routing for local addresses which have been passed in --exclude-local-address.
Previously, the routing code would always attempt a FIB lookup for packets destined to these addresses which would always fail with BPF_FIB_LKUP_RET_NOT_FWDED because the addresses are local.

The routing code will now remediate this by passing the packet to the kernel's routing stack when encountering this scenario.

The additional revalidate_data check on the ipv4 pointer is needed because it can get invalidated in fib_redirect_v4 so it should be checked before being passed further

Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
@antonipp antonipp force-pushed the ai/fix-exclude-local-addr branch from e5d4267 to 367db3e Compare August 22, 2025 08:22
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@jrife
Copy link
Copy Markdown
Contributor

jrife commented Aug 22, 2025

@jrife
Copy link
Copy Markdown
Contributor

jrife commented Aug 22, 2025

/ci-ipsec-e2e

@jrife
Copy link
Copy Markdown
Contributor

jrife commented Aug 22, 2025

/ci-eks

@jrife
Copy link
Copy Markdown
Contributor

jrife commented Aug 22, 2025

/ci-clustermesh

@pchaigno pchaigno enabled auto-merge August 22, 2025 18:22
@pchaigno pchaigno added this pull request to the merge queue Aug 22, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 22, 2025
Merged via the queue into cilium:main with commit 4415e13 Aug 22, 2025
68 checks passed
@HadrienPatte HadrienPatte deleted the ai/fix-exclude-local-addr branch August 22, 2025 18:55
@pippolo84 pippolo84 mentioned this pull request Aug 25, 2025
17 tasks
@pippolo84 pippolo84 added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Aug 25, 2025
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. labels Sep 1, 2025
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

--exclude-local-address incompatible with eBPF Host Routing

6 participants