Make XDP Nodeport Acceleration compatible with EgressGW#20837
Make XDP Nodeport Acceleration compatible with EgressGW#20837joamaki merged 8 commits intocilium:masterfrom
Conversation
038e57e to
e3eec05
Compare
|
/test |
e3eec05 to
cec7283
Compare
|
/test |
cec7283 to
6f482d8
Compare
|
/test |
6f482d8 to
c7780d3
Compare
|
/test |
0879be7 to
164433f
Compare
|
/test |
164433f to
016611c
Compare
|
/test |
7d62320 to
0620f43
Compare
|
/test |
00c4961 to
8dde957
Compare
|
/test |
8dde957 to
b84d68c
Compare
|
/test |
b84d68c to
9f5f0d4
Compare
|
[rebase to pick up the reshuffled test matrix] |
|
/test Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
|
/test-1.16-4.9 |
NikAleksandrov
left a comment
There was a problem hiding this comment.
Overall looks good to me, the only 2 minor nits I have are the XFER_FLAGS naming because it looks like they are mutually exclusive and the hardcoded offset validation in ctx_move_xfer(), but these are not a big deal.
ack, we'll need a proper design once we start adding more xfer flags/data. Also a smarter way of doing |
qmonnet
left a comment
There was a problem hiding this comment.
Looks good! Just a few minor comments below.
Ta! Will pick up those small changes when resolving the |
bpf/lib/overloadable_xdp.h
Outdated
There was a problem hiding this comment.
Nit, feel free to ignore it.
I'm finding this asymmetry:
- XDP: set these properties with
ctx_store_meta()besideXFER_PKT_ENCAP, which is set withctx_set_xfer() - TC: read all these properties with
ctx_get_xfer()
a bit confusing, maybe we can switch all the xfer properties to ctx_set_xfer() here (and have an additional helper to set the XFER_MARKER)?
There was a problem hiding this comment.
Heh, for me it's actually ctx_set_xfer() in XDP what's causing the confusion. As it's not touching the meta_data area, that feels misplaced and should imho be a plain ctx_store_meta(ctx, XFER_MARKER, ...) (without the wrapping).
9f5f0d4 to
7f89ca2
Compare
When the encap code can't redirect traffic straight to the encap interface (because it needs IPSec handling), it returns IPSEC_ENDPOINT to the caller. In the future we want to make use of this behaviour beyond just the IPSec case, so replace the special IPSEC_ENDPOINT return code with its actual value (CTX_ACT_OK). No functional change. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
The encap helpers are currently enabled if ENCAP_IFINDEX is available. For XDP we can't redirect to the encap interface, so we don't want visibility of the actual ifindex. But we still want to call into the common helper infrastructure from generic code (ie. nodeport.h). So add a HAVE_ENCAP macro to decide whether we need the encap helpers, and hide the tunnel-key details in an skb-specific helper. The XDP variant should not be called by anyone yet. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
When redirecting a packet to the encap iface, set the target ifindex in the ctx-specific part of the code path. This avoids exposing the ENCAP_IFINDEX to XDP programs (where it can't be used), and in the future allows us to redirect to an arbitrary interface (eg. if XDP already added the encap headers manually). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Callers of __encap_with_nodeid() currently assume that they can redirect the packet afterwards. Make this explicit by returning CTX_ACT_REDIRECT, and having the immediate callers handle it. This later allows us to return CTX_ACT_OK from XDP context, and have the caller pass the packet to the stack instead - like we do for IPSec today. In the high-level paths, it's enough to teach the rev_nodeport_*() parts in nodeport.h about CTX_ACT_OK. All the other callers are certainly skb-only, and some already know how to handle CTX_ACT_OK as they enter through an IPSec-enabled path. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Our XDP and TC programs use custom areas to store their metadata (XDP uses a per-CPU scratchpad, TC uses __sk_buff->cb[]). So when XDP passes a packet up to TC, we explicitly need to copy any relevant metadata into xdp_md->data_meta. Extract a helper for this copy operation, so that it can be easily called from shared code. Also extend ctx_get_xfer() so that the TC layer can access additional xfer data. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
When XDP needs to tunnel-encap a packet, pass it up the stack and redirect it to the tunnel interface from the netdev's TC ingress. In contrast to a pure XDP solution, this lets us re-use all the existing tunnel infrastructure in the kernel and host. Achieve this by having ctx_set_encap_info() store the tunnel information into our internal metadata, and transfer it over when going from XDP to TC. For now we only use this for the case of EgressGW reply traffic - so the change in tail_rev_nodeport_lb6() is just for commonality, we never hit the ret != CTX_ACT_REDIRECT case there. But it should be possible to extend this for the general case of running XDP with TUNNEL_MODE. The relevant code paths just need to learn how to handle CTX_ACT_OK and CTX_ACT_REDIRECT return values when calling encap helpers. Note that if all the xfer data becomes too much, we can probably also just have a second tunnel look-up in cil_from_netdev() to get all that stuff. Fixes: cilium#19717 Fixes: cilium#19624 Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Add a combination with Nodeport-Acceleration, Direct-Routing and EgressGW. Also move ENABLE_EGRESS_GATEWAY into the MAX_BASE_OPTIONS, so that MAX_XDP_OPTIONS can pick it up. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Test the new EgressGW functionality in XDP, which flags EgressGW reply packets that need to go via the tunnel so that TC-Ingress can redirect them. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
7f89ca2 to
93f6c4f
Compare
|
/test Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
|
[Test suite passed previously, and the rebase was trivial. But let's make sure of it ...] |
|
Caught the |
When combining XDP-based Nodeport Acceleration (-> Direct-routing) and Egress Gateway, the XDP program can't redirect the EgressGW reply traffic straight to the
cilium_vxlaninterface. So reply traffic that needs to be forwarded to another node gets passed up to the kernel, where it needs to pass through iptables (and is potentially dropped) before being routed tocilium_vxlan.This PR introduces a fallback mechanism, where the XDP program adds a
needs-encapflag (+ relevant tunnel info) to the packet before it gets passed up to the kernel. We pick up this flag in thefrom-netdevprogram @TC-Ingress, and perform the redirect tocilium_vxlanfrom there.This currently only gets enabled for EgressGW reply traffic, but could get extended to also cover
TUNNEL_MODE(so that Nodeport-Acceleration can be used without direct-routing). We already lay some groundwork for this.A second add-on would be to add support for building the encap headers into the XDP program itself. In that case this fallback mechanism would still be taken if eg. the FIB lookup fails, or we only want to use
XDP_TX(and defer any case where we would needXDP_REDIRECT).Fixes: #19717
Fixes: #19624