Skip to content

Make XDP Nodeport Acceleration compatible with EgressGW#20837

Merged
joamaki merged 8 commits intocilium:masterfrom
julianwiedmann:xdp-tunnel-egressgw
Sep 7, 2022
Merged

Make XDP Nodeport Acceleration compatible with EgressGW#20837
joamaki merged 8 commits intocilium:masterfrom
julianwiedmann:xdp-tunnel-egressgw

Conversation

@julianwiedmann
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann commented Aug 9, 2022

When combining XDP-based Nodeport Acceleration (-> Direct-routing) and Egress Gateway, the XDP program can't redirect the EgressGW reply traffic straight to the cilium_vxlan interface. So reply traffic that needs to be forwarded to another node gets passed up to the kernel, where it needs to pass through iptables (and is potentially dropped) before being routed to cilium_vxlan.

This PR introduces a fallback mechanism, where the XDP program adds a needs-encap flag (+ relevant tunnel info) to the packet before it gets passed up to the kernel. We pick up this flag in the from-netdev program @ TC-Ingress, and perform the redirect to cilium_vxlan from there.

This currently only gets enabled for EgressGW reply traffic, but could get extended to also cover TUNNEL_MODE (so that Nodeport-Acceleration can be used without direct-routing). We already lay some groundwork for this.

A second add-on would be to add support for building the encap headers into the XDP program itself. In that case this fallback mechanism would still be taken if eg. the FIB lookup fails, or we only want to use XDP_TX (and defer any case where we would need XDP_REDIRECT).

Fixes: #19717
Fixes: #19624

When combining XDP Nodeport Acceleration with Egress Gateway, forwarding the EgressGW reply traffic no longer requires a specific iptables configuration on the Gateway node.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 9, 2022
@julianwiedmann julianwiedmann force-pushed the xdp-tunnel-egressgw branch 2 times, most recently from 038e57e to e3eec05 Compare August 9, 2022 14:10
@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the xdp-tunnel-egressgw branch 3 times, most recently from 0879be7 to 164433f Compare August 10, 2022 15:42
@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann julianwiedmann force-pushed the xdp-tunnel-egressgw branch 5 times, most recently from 7d62320 to 0620f43 Compare August 24, 2022 08:26
@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test

@julianwiedmann julianwiedmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/feature This introduces new functionality. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 25, 2022
@julianwiedmann
Copy link
Copy Markdown
Member Author

[rebase to pick up the reshuffled test matrix]

@julianwiedmann
Copy link
Copy Markdown
Member Author

julianwiedmann commented Aug 29, 2022

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Namespaces policies Tests the same Policy in different namespaces

Failure Output

FAIL: testapp pods are not ready after timeout in namspace "2022082906302k8spolicytestnamespacespoliciesteststhesamepolicyi"

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

@julianwiedmann
Copy link
Copy Markdown
Member Author

/test-1.16-4.9

NikAleksandrov
NikAleksandrov approved these changes Aug 30, 2022
Copy link
Copy Markdown

@NikAleksandrov NikAleksandrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, the only 2 minor nits I have are the XFER_FLAGS naming because it looks like they are mutually exclusive and the hardcoded offset validation in ctx_move_xfer(), but these are not a big deal.

@julianwiedmann
Copy link
Copy Markdown
Member Author

Overall looks good to me, the only 2 minor nits I have are the XFER_FLAGS naming because it looks like they are mutually exclusive and the hardcoded offset validation in ctx_move_xfer(), but these are not a big deal.

ack, we'll need a proper design once we start adding more xfer flags/data. Also a smarter way of doing ctx_move_xfer(), instead of sprinkling it over paths that "might" run in XDP context. But right now I didn't want to over-engineer it.

Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few minor comments below.

@qmonnet qmonnet added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Sep 2, 2022
@julianwiedmann
Copy link
Copy Markdown
Member Author

Looks good! Just a few minor comments below.

Ta! Will pick up those small changes when resolving the needs-rebase.

Copy link
Copy Markdown
Member

@jibi jibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🚀 🚀

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, feel free to ignore it.

I'm finding this asymmetry:

  • XDP: set these properties with ctx_store_meta() beside XFER_PKT_ENCAP, which is set with ctx_set_xfer()
  • TC: read all these properties with ctx_get_xfer()

a bit confusing, maybe we can switch all the xfer properties to ctx_set_xfer() here (and have an additional helper to set the XFER_MARKER)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, for me it's actually ctx_set_xfer() in XDP what's causing the confusion. As it's not touching the meta_data area, that feels misplaced and should imho be a plain ctx_store_meta(ctx, XFER_MARKER, ...) (without the wrapping).

When the encap code can't redirect traffic straight to the encap interface
(because it needs IPSec handling), it returns IPSEC_ENDPOINT to the caller.

In the future we want to make use of this behaviour beyond just the IPSec
case, so replace the special IPSEC_ENDPOINT return code with its actual
value (CTX_ACT_OK).

No functional change.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
The encap helpers are currently enabled if ENCAP_IFINDEX is available. For
XDP we can't redirect to the encap interface, so we don't want visibility
of the actual ifindex. But we still want to call into the common helper
infrastructure from generic code (ie. nodeport.h).

So add a HAVE_ENCAP macro to decide whether we need the encap helpers, and
hide the tunnel-key details in an skb-specific helper. The XDP variant
should not be called by anyone yet.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
When redirecting a packet to the encap iface, set the target ifindex in the
ctx-specific part of the code path. This avoids exposing the ENCAP_IFINDEX
to XDP programs (where it can't be used), and in the future allows us to
redirect to an arbitrary interface (eg. if XDP already added the encap
headers manually).

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Callers of __encap_with_nodeid() currently assume that they can redirect
the packet afterwards. Make this explicit by returning CTX_ACT_REDIRECT,
and having the immediate callers handle it.

This later allows us to return CTX_ACT_OK from XDP context, and have the
caller pass the packet to the stack instead - like we do for IPSec today.

In the high-level paths, it's enough to teach the rev_nodeport_*() parts in
nodeport.h about CTX_ACT_OK. All the other callers are certainly skb-only,
and some already know how to handle CTX_ACT_OK as they enter through an
IPSec-enabled path.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Our XDP and TC programs use custom areas to store their metadata
(XDP uses a per-CPU scratchpad, TC uses __sk_buff->cb[]). So when XDP
passes a packet up to TC, we explicitly need to copy any relevant metadata
into xdp_md->data_meta.

Extract a helper for this copy operation, so that it can be easily called
from shared code.

Also extend ctx_get_xfer() so that the TC layer can access additional
xfer data.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
When XDP needs to tunnel-encap a packet, pass it up the stack and redirect
it to the tunnel interface from the netdev's TC ingress. In contrast to a
pure XDP solution, this lets us re-use all the existing tunnel
infrastructure in the kernel and host.

Achieve this by having ctx_set_encap_info() store the tunnel information
into our internal metadata, and transfer it over when going from XDP to TC.

For now we only use this for the case of EgressGW reply traffic - so the
change in tail_rev_nodeport_lb6() is just for commonality, we never hit the
ret != CTX_ACT_REDIRECT case there.

But it should be possible to extend this for the general case of running
XDP with TUNNEL_MODE. The relevant code paths just need to learn how to
handle CTX_ACT_OK and CTX_ACT_REDIRECT return values when calling encap
helpers.

Note that if all the xfer data becomes too much, we can probably also just
have a second tunnel look-up in cil_from_netdev() to get all that stuff.

Fixes: cilium#19717
Fixes: cilium#19624
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Add a combination with Nodeport-Acceleration, Direct-Routing and EgressGW.

Also move ENABLE_EGRESS_GATEWAY into the MAX_BASE_OPTIONS, so that
MAX_XDP_OPTIONS can pick it up.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Test the new EgressGW functionality in XDP, which flags EgressGW reply
packets that need to go via the tunnel so that TC-Ingress can redirect
them.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@julianwiedmann julianwiedmann removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Sep 5, 2022
@julianwiedmann
Copy link
Copy Markdown
Member Author

julianwiedmann commented Sep 6, 2022

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sConformance Portmap Chaining Check connectivity-check compliance with portmap chaining

Failure Output

FAIL: connectivity-check pods are not ready after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

@julianwiedmann
Copy link
Copy Markdown
Member Author

[Test suite passed previously, and the rebase was trivial. But let's make sure of it ...]

@julianwiedmann
Copy link
Copy Markdown
Member Author

Caught the test-1.16-4.9 failure that's already been fixed in #21207. Otherwise all good.

@julianwiedmann julianwiedmann added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 6, 2022
@joamaki joamaki merged commit 605e422 into cilium:master Sep 7, 2022
@julianwiedmann julianwiedmann deleted the xdp-tunnel-egressgw branch September 7, 2022 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/egress-gateway Impacts the egress IP gateway feature. kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Egress gateway partially incompatible with LB XDP acceleration Add support to bpf_xdp for redirecting packets to a tunnel device

7 participants