bpf: nodeport: use hairpin redirect for L7 LB on bridge devices#44658
bpf: nodeport: use hairpin redirect for L7 LB on bridge devices#44658julianwiedmann merged 1 commit intomainfrom
Conversation
d8538a7 to
0ba47cd
Compare
|
/test |
julianwiedmann
left a comment
There was a problem hiding this comment.
Thank you! Just a quick first look ...
0ba47cd to
473a9e6
Compare
|
I've created a follow-up PR for potentially migrating ENABLE_TPROXY #44719. |
473a9e6 to
19b1d6a
Compare
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where the BPF program is attached to a Linux bridge device and the br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1. With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on an iptables TPROXY rule in PREROUTING to deliver the packet to the proxy. This optimization avoids a hairpin redirect through cilium_net. However, when br_netfilter is active on a bridge device, the kernel's ip_sabotage_in() function interferes: 1. Packet arrives on bridge port -> br_netfilter evaluates PREROUTING (1st time, BPF hasn't run yet -> TPROXY rule doesn't match) 2. ip_sabotage_in() sets NF_HOOK_STATE_SABOTAGED on the packet 3. tc-ingress BPF runs, sets MARK_MAGIC_TO_PROXY, punts to stack 4. IP stack calls PREROUTING again, but ip_sabotage causes it to be SKIPPED entirely 5. TPROXY rule never fires -> no listening socket on VIP -> DROP_REASON_NOSOCKET We therefore add a new datapath config variable that will be set to true only when compiling the Netdev datapath program (cil_{from,to}_netdev) and the network device is a bridge. This allows us to fall back to the pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the ip_sabotage_in() function and allowing the TPROXY rule to match as expected. Non-bridge devices continue using the optimized punt-to-stack path. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
19b1d6a to
5a588ae
Compare
|
@julianwiedmann I rebased on top of #44649. |
|
/test |
+100 on not re-using |
l7lb: bpf: fix use hairpin redirect for L7 LB on bridge devices
Fix L7 proxy traffic disruption (DROP_REASON_NOSOCKET) on nodes where
the BPF program is attached to a Linux bridge device and the
br_netfilter module is loaded with net.bridge.bridge-nf-call-iptables=1.
With #36383, we changed the non-BPF-TPROXY L7 LB redirect path to mark
packets with MARK_MAGIC_TO_PROXY and punt them to the stack, relying on
an iptables TPROXY rule in PREROUTING to deliver the packet to the
proxy. This optimization avoids a hairpin redirect through cilium_net.
However, when br_netfilter is active on a bridge device, the kernel's
ip_sabotage_in() function interferes:
(1st time, BPF hasn't run yet -> TPROXY rule doesn't match)
SKIPPED entirely
DROP_REASON_NOSOCKET
We therefore add a new datapath config variable that will be set to true
only when compiling the Netdev datapath program (cil_{from,to}_netdev)
and the network device is a bridge. This allows us to fall back to the
pre-#36383 hairpin redirect via cilium_net for L7 LB traffic, bypassing the
ip_sabotage_in() function and allowing the TPROXY rule to match as expected.
Non-bridge devices continue using the optimized punt-to-stack path.