Add WireGuard host2host and LB encryption#19401
Merged
Conversation
Member
Author
|
/test-1.23-net-next |
Member
Author
|
test-1.23-net-next |
f7e67e9 to
6224144
Compare
Closed
This was referenced Apr 26, 2022
6224144 to
1ee8cd2
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
This pull request has been automatically marked as stale because it |
This comment was marked as resolved.
This comment was marked as resolved.
37d5c58 to
f321011
Compare
Member
Author
|
/test |
2 tasks
2 tasks
f321011 to
92667af
Compare
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit completely changes the WireGuard integration in the
datapath to enable the host2host encryption (also, pod2host and
host2pod).
Previously, we supported only the pod2pod case. This was implemented by
marking a to be encrypted packet, and then letting the IP rule installed
in the host netns to forward the packet to the WireGuard tunnel device
"cilium_wg0" for the encryption, as shown below:
┌─────────────────┐
│ Pod A netns │
│ ┌────────┐ │
│ │ eth0 │ │
└───┴────┬───┴────┘
┌────┴──────────┐
│ bpf_lxc@veth0 │ (host netns)
└────┬──────────┘
│1."from-container" in bpf_lxc sets MARK_MAGIC_ENCRYPT
│2. ip rule matches the mark and routes packet to WG netdev
│ ┌───────────┐
└──────►│cilium_wg0 │
└────┬──────┘
│
┌───▼───┐
│ eth0 │
└───────┘
This was working fine for the pod2pod case (albeit one danger that a
sysadmin could nuke the rule making the packet to bypass the WG dev).
However, with this approach it was not possible to enable the host2host
case, as a packet originating from the host netns was never handled by
bpf_lxc. Thus, we needed to change the datapath.
To encrypt a host2host packet we need to attach bpf_host to the outgoing
device connecting cluster nodes which in the picture is "eth0". Then the
program "to-netdev" from bpf_host can forward the packet to the WG dev.
Once encrypted, the packet will be again hitting the same bpf_host
program. To avoid the packet looping forever, we can configure the WG
netdev to set the skb mark after the encryption. Then, in the program
we can skip the redirection to the WG netdev if the mark is set.
The flow below shows the new integration.
┌─────────────────┐
│ Pod A netns │
│ ┌────────┐ │
│ │ eth0 │ │
└───┴────┬───┴────┘
┌────┴──────────┐
│ bpf_lxc@veth0 │ (host netns)
└────┬──────────┘
│
┌────▼───────────┐ 1. "to-netdev" does redirect ┌───────────┐
│ bpf_host@eth0 │─────────────────────────────────►│cilium_wg0 │
└─┬─────────▲────┘ └──────┬────┘
│ │ │
│ │ 2. encrypt and set MARK_MAGIC_ENCRYPTED │
│ └──────────────────────────────────────────────┘
│
│ 3. output the encrypted packet
│
▼
The same flow is used for the host2host, host2pod and pod2host cases.
Another advantage of this change is that the WG can be used with the L7
proxy (the mutual exclusion check is going to be removed in a subsequent
commit).
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit changes the agent code to support the new WireGuard integration described in the previous commit. The most important changes: 1. Configure the WG netdev to add the skb mark. 2. Add NodeIP to allowed-ips when --encrypt-node=true. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
We need to detect a direct routing dev (= one which is used to connect K8s Nodes) in order to attach bpf_host when WG is enabled, as bpf_host is responsible for redirecting packets to the WG netdev for encryption. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
There is no longer skb mark conflict with L7 proxy, so we can drop the check. This means that the L7 proxy can work together with the WG integration. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
Before changing the WG integration's behavior, when running in the tunneling mode, a pod2pod@remote-node traffic escaped the bpf_overlay's tunneling, and was encapsulated once by the WG tunnel netdev. To be compatible with this < v1.13 behavior, this commit adds the redirect to the WG tunnel to the __encap_and_redirect_with_nodeid() function which is eventually called in the pod2pod packet path. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit attaches the bpf_host's "from-netdev" section to the
Cilium's WireGuard tunnel netdev ("cilium_wg0").
This is needed to enable the encryption of the KPR traffic. In
particular, we encrypt the N/S KPR requests which will be forwarded to
a remote node running a selected service endpoint.
IMPORTANT: this encrypts KPR traffic only when running in the
non-tunneling mode.
For the request path no changes are required. The existing datapath
configuration already handles it, as shown in the following:
1. The "from-netdev" attached to eth0 is invoked for the NodePort
request.
2. A remote service endpoint is selected, the DNAT and SNAT translations
are performed.
3. The translated request is redirected to eth0.
4. The "to-netdev" section on eth0 is invoked. It detects that the
packet needs to encrypted, so it redirects to the cilium_wg0.
For the reply path a minimal changes were required. After the WG netdev
has decrypted the reply packet, the packet is returned to the networking
stack. Because the networking stack is not aware of the connection, the
reply packet is dropped. To avoid that, we attach the "from-netdev"
section to the WG netdev, so that the following can be performed:
1. Reverse SNAT and DNAT translations are applied to the reply.
2. The reply packet is redirected to the outgoing interface.
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Currently, the bpf_overlay prog doesn't redirect a packet to the WG netdev for encryption (will be addressed in a follow-up PR). So, in order for the tests to pass, we need to enable the host2host encryption. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit introduces a new command-line option to specify a label selector to make nodes opt-out of node-to-node encryption. The default label selector set will match kubeadm control-plane nodes (i.e. the nodes hosting kube-apiserver). This ensures that all Cilium-managed nodes will be able to reach the kube-apiserver running on that node regardless of encryption status. This is important, because we want to ensure that nodes can change their public keys when they re-join the cluster. Nodes who opted out of node-to-node encryption will still perform encryption for pod-to-pod traffic. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds a new section about node-to-node encryption and removes some obsolete limitations. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
It was hidden because it's currently not supported by IPSec. But with the previous commits, we do now support node-to-node encryption via WireGuard. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The encryption tests were introduced in cilium/cilium-cli#1308. Signed-off-by: Martynas Pumputis <m@lambda.lt>
4a5f0b6 to
0e6e703
Compare
Member
|
Rebased on master to resolve merge conflict. Re-running CI |
Member
|
/test |
pchaigno
approved these changes
Jan 23, 2023
7 tasks
Member
Author
|
Got reviews from majority of folks. Marking as ready-to-merge. |
This was referenced Feb 2, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for node-to-node encryption to WireGuard. To achieve this, we've completely changed the WireGuard integration in the datapath. Previously, WireGuard support was implemented by marking packets to be encrypted in
"from-container"and redirecting it to the WireGuard tunnel via a hostns IP rule. This worked fine for traffic originating in pods - but for node-to-node traffic, we need to redirect the packets on the outgoing network interface. Thus, the new implementation attachesbpf_hostto the outgoing device and redirects packets to the WireGuard tunnel from there. See commit descriptions for more details.On the agent side, there are also changes to the implementation. Previously, the datapath assumed that any IPCache entry with an associated tunnel endpoint would need encryption. To determine if we need to encryption traffic to a remote endpoint, we now rely on the
encrypt_keyfield instead. This allows us to more precisely track if traffic to a particular destination needs to be encrypted, and allows certain nodes to opt out of encryption (see below). The agent code has been updated to populate the CiliumEndpoint and CiliumNode CRDs with a static non-zeroEncryptKeyvalue if encryption for those resources is enabled.Additional points worth noting:
ensure worker nodes are always able to communicate with the kube-apiserver node, in order to be able to manage their own encryption key. See docs for more details.
✔️ https://github.com/cilium/cilium/actions/runs/3884806887/jobs/6627856819
ℹ️ Please see commit messages for many more details
Joint work between @gandro and @brb
Follow-ups, to be done in separate PRs: