bpf: nodeport: forward L7 svc traffic straight to proxy#36383
Merged
julianwiedmann merged 1 commit intocilium:mainfrom Dec 6, 2024
Merged
bpf: nodeport: forward L7 svc traffic straight to proxy#36383julianwiedmann merged 1 commit intocilium:mainfrom
julianwiedmann merged 1 commit intocilium:mainfrom
Conversation
Member
Author
|
/test |
Member
Author
|
I'm thinking whether this will change anything from a netfilter-visibility / "asymmetric path" perspective ... |
jschwinger233
approved these changes
Dec 6, 2024
L7 svc requests are currently redirected to cilium_net / cilium_host, where they get marked with MARK_MAGIC_TO_PROXY and then get rerouted to the proxy. Simplify this by marking the packet in from-netdev and from-overlay, and letting it pass up the stack. This avoids the dependency on skb->cb surviving the transfer from the native program to cilium_net. The BPF TPROXY path needs further investigation, so only apply this change when BPF TPROXY is disabled. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
9cbb9c5 to
0b80b4f
Compare
Member
Author
|
/test |
jrajahalme
approved these changes
Dec 6, 2024
brb
approved these changes
Dec 6, 2024
smagnani96
added a commit
that referenced
this pull request
Mar 6, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET In previous commits, we added a datapath config variable so that each program is aware of its attached link type. With this, if the device is a bridge, the L7 LB path falls back to the pre-#36383 hairpin redirect via cilium_net, bypassing the punt-to-stack path that ip_sabotage would break. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 7, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET In previous commits, we added a datapath config variable so that each program is aware of its attached link type. With this, if the device is a bridge, the L7 LB path falls back to the pre-#36383 hairpin redirect via cilium_net, bypassing the punt-to-stack path that ip_sabotage would break. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 9, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET In previous commits, we added a datapath config variable so that each program is aware of its attached link type. With this, if the device is a bridge, the L7 LB path falls back to the pre-#36383 hairpin redirect via cilium_net, bypassing the punt-to-stack path that ip_sabotage would break. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 10, 2026
[ upstream commit 0ba47cd ] [ backporter's notes: * Added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET In previous commits, we added a datapath config variable so that each program is aware of its attached link type. With this, if the device is a bridge, the L7 LB path falls back to the pre-#36383 hairpin redirect via cilium_net, bypassing the punt-to-stack path that ip_sabotage would break. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 10, 2026
[ upstream commit 0ba47cd ] [ backporter's notes: * Added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET In previous commits, we added a datapath config variable so that each program is aware of its attached link type. With this, if the device is a bridge, the L7 LB path falls back to the pre-#36383 hairpin redirect via cilium_net, bypassing the punt-to-stack path that ip_sabotage would break. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 10, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 10, 2026
[ upstream commit 473a9e6 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 11, 2026
[ upstream commit 473a9e6 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 11, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 11, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 12, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 12, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 12, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * changed check in bpf/tests/l7_lb_hairpin.c to check for CILIUM_NET_IFINDEX instead of CONFIG(cilium_net_ifindex), as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
smagnani96
added a commit
that referenced
this pull request
Mar 12, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * changed check in bpf/tests/l7_lb_hairpin.c to check for CILIUM_NET_IFINDEX instead of CONFIG(cilium_net_ifindex), as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 13, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * changed check in bpf/tests/l7_lb_hairpin.c to check for CILIUM_NET_IFINDEX instead of CONFIG(cilium_net_ifindex), as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
julianwiedmann
pushed a commit
that referenced
this pull request
Mar 13, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * changed check in bpf/tests/l7_lb_hairpin.c to check for CILIUM_NET_IFINDEX instead of CONFIG(cilium_net_ifindex), as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
julianwiedmann
pushed a commit
that referenced
this pull request
Mar 13, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 13, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * changed check in bpf/tests/l7_lb_hairpin.c to check for CILIUM_NET_IFINDEX instead of CONFIG(cilium_net_ifindex), as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 13, 2026
[ upstream commit 52f35a9 ] [ backporter's notes: fixed conflicts: * adapting to the old loader implementation, adding changes only to the Netdev program attached to network devices. * added bits into bpf/tests/pktgen.h without backporting whole unrelated PRs * adjusted LB4_SERVICES_MAP_V2, LB4_REVERSE_NAT_MAP, and their respective IPv6 versions references, different than in upstream * added `ENABLE_SERVICE_PROTOCOL_DIFFERENTIATION` in the new test, alongside with some different infra to tail call the ingress program, and macros to check redirect interface * changed check in bpf/tests/l7_lb_hairpin.c to check for HOST_IFINDEX instead of cilium_net, as the latter is not used in this backport ] Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
javiercardona-work
pushed a commit
to javiercardona-work/cilium
that referenced
this pull request
Mar 18, 2026
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With cilium#36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-cilium#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
L7 svc requests are currently redirected to cilium_net / cilium_host, where they get marked with MARK_MAGIC_TO_PROXY and then get rerouted to the proxy.
Simplify this by marking the packet in from-netdev and from-overlay, and letting it pass up the stack. This avoids the dependency on skb->cb surviving the transfer from the native program to cilium_net.
The BPF TPROXY path needs further investigation, so only apply this change when BPF TPROXY is disabled.