Skip to content

bpf: nodeport: use hairpin redirect for L7 LB on bridge devices#44658

Merged
julianwiedmann merged 1 commit intomainfrom
pr/smagnani96/fix-l7lb
Mar 12, 2026
Merged

bpf: nodeport: use hairpin redirect for L7 LB on bridge devices#44658
julianwiedmann merged 1 commit intomainfrom
pr/smagnani96/fix-l7lb

Conversation

@smagnani96
Copy link
Copy Markdown
Contributor

@smagnani96 smagnani96 commented Mar 6, 2026

l7lb: bpf: fix use hairpin redirect for L7 LB on bridge devices
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where
the BPF program is attached to a Linux bridge device and the
br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1.

With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark
packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on
an iptables TPROXY rule in PREROUTING to deliver the packet to the
proxy. This optimization avoids a hairpin redirect through cilium_net.

However, when br_netfilter is active on a bridge device, the kernel's
ip_sabotage_in() function interferes:

  1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING
    (1st time, BPF hasn't run yet -> TPROXY rule doesn't match)
  2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet
  3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack
  4. IP stack calls PREROUTING again, but ip_sabotage causes it to be
    SKIPPED entirely
  5. TPROXY rule never fires -> no listening socket on VIP ->
    DROP_REASON_NOSOCKET

We therefore add a new datapath config variable that will be set to true
only when compiling the Netdev datapath program (cil_{from,to}_netdev)
and the network device is a bridge. This allows us to fall back to the
pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the
ip_sabotage_in() function and allowing the TPROXY rule to match as expected.

Non-bridge devices continue using the optimized punt-to-stack path.

@smagnani96 smagnani96 self-assigned this Mar 6, 2026
@smagnani96 smagnani96 added kind/bug This is a bug in the Cilium logic. area/loader Impacts the loading of BPF programs into the kernel. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. release-note/bug This PR fixes an issue in a previous release of Cilium. backport/author The backport will be carried out by the author of the PR. needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 6, 2026
@smagnani96 smagnani96 force-pushed the pr/smagnani96/fix-l7lb branch 2 times, most recently from d8538a7 to 0ba47cd Compare March 9, 2026 10:02
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just a quick first look ...

@smagnani96 smagnani96 force-pushed the pr/smagnani96/fix-l7lb branch from 0ba47cd to 473a9e6 Compare March 10, 2026 23:18
@smagnani96
Copy link
Copy Markdown
Contributor Author

I've created a follow-up PR for potentially migrating ENABLE_TPROXY #44719.
Kept this PR as simple as possible, local to our use case to fix.

@smagnani96 smagnani96 force-pushed the pr/smagnani96/fix-l7lb branch from 473a9e6 to 19b1d6a Compare March 11, 2026 00:08
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where
the BPF program is attached to a Linux bridge device and the
br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1.

With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark
packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on
an iptables TPROXY rule in PREROUTING to deliver the packet to the
proxy. This optimization avoids a hairpin redirect through cilium_net.

However, when br_netfilter is active on a bridge device, the kernel's
ip_sabotage_in() function interferes:

1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING
   (1st time, BPF hasn't run yet -> TPROXY rule doesn't match)
2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet
3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack
4. IP stack calls PREROUTING again, but ip_sabotage causes it to be
   SKIPPED entirely
5. TPROXY rule never fires -> no listening socket on VIP ->
   DROP_REASON_NOSOCKET

We therefore add a new datapath config variable that will be set to true
only when compiling the Netdev datapath program (cil_{from,to}_netdev)
and the network device is a bridge. This allows us to fall back to the
pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the
ip_sabotage_in() function and allowing the TPROXY rule to match as expected.

Non-bridge devices continue using the optimized punt-to-stack path.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@smagnani96 smagnani96 force-pushed the pr/smagnani96/fix-l7lb branch from 19b1d6a to 5a588ae Compare March 11, 2026 11:02
@smagnani96
Copy link
Copy Markdown
Contributor Author

@julianwiedmann I rebased on top of #44649.
I still keep introducing a separate variable as you suggested, but I'm not sure 100% (it has the same meaning of enable_tproxy, but more local to the program, and only used in bpf_host).
This will however ease the backport.
Otherwise, reusing enable_tproxy would require (1) making sure other codepaths are not affected and (2) backporting also #44649.

@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member

@julianwiedmann I rebased on top of #44649. I still keep introducing a separate variable as you suggested, but I'm not sure 100% (it has the same meaning of enable_tproxy, but more local to the program, and only used in bpf_host). This will however ease the backport. Otherwise, reusing enable_tproxy would require (1) making sure other codepaths are not affected and (2) backporting also #44649.

+100 on not re-using enable_tproxy, that would get very confusing imo.

@julianwiedmann julianwiedmann self-requested a review March 11, 2026 13:12
@smagnani96 smagnani96 marked this pull request as ready for review March 11, 2026 21:21
@smagnani96 smagnani96 requested review from a team as code owners March 11, 2026 21:21
@smagnani96 smagnani96 requested a review from christarazi March 11, 2026 21:21
@julianwiedmann julianwiedmann added area/loadbalancing Impacts load-balancing and Kubernetes service implementations area/kpr Anything related to our kube-proxy replacement. labels Mar 12, 2026
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 12, 2026
@julianwiedmann julianwiedmann added this pull request to the merge queue Mar 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026
@julianwiedmann julianwiedmann added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit 52f35a9 Mar 12, 2026
711 of 724 checks passed
@julianwiedmann julianwiedmann deleted the pr/smagnani96/fix-l7lb branch March 12, 2026 07:50
@julianwiedmann julianwiedmann added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Mar 12, 2026
@github-actions github-actions bot added the backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. label Mar 13, 2026
@julianwiedmann julianwiedmann added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 13, 2026
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/kpr Anything related to our kube-proxy replacement. area/loadbalancing Impacts load-balancing and Kubernetes service implementations area/loader Impacts the loading of BPF programs into the kernel. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. backport/author The backport will be carried out by the author of the PR. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants