Skip to content

datapath: Do not SNAT replies to outside#17168

Merged
kaworu merged 3 commits intomasterfrom
pr/brb/ct-snat-bpf
Sep 10, 2021
Merged

datapath: Do not SNAT replies to outside#17168
kaworu merged 3 commits intomasterfrom
pr/brb/ct-snat-bpf

Conversation

@brb
Copy link
Copy Markdown
Member

@brb brb commented Aug 16, 2021

See commit msgs.

Fix: #12544

@brb brb added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Aug 16, 2021
@brb brb force-pushed the pr/brb/ct-snat-bpf branch 2 times, most recently from 49b6f6b to 6107b87 Compare August 16, 2021 12:03
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 16, 2021

test-net-next

@brb brb force-pushed the pr/brb/ct-snat-bpf branch 3 times, most recently from 76ad256 to b9a4d2d Compare August 16, 2021 13:42
@brb brb added needs-backport/1.10 area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Aug 16, 2021
@brb brb changed the title datapath: Do not SNAT replies datapath: Do not SNAT replies to outside Aug 16, 2021
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 16, 2021

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 16, 2021

test-1.19-5.4

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 16, 2021

test-1.20-4.19

1 similar comment
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 16, 2021

test-1.20-4.19

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 17, 2021

4.19 is hitting the complexity issue 👀

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 18, 2021

test-1.20-4.19

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 18, 2021

test-me-please

@brb brb force-pushed the pr/brb/ct-snat-bpf branch from 12656d7 to c6764f1 Compare August 23, 2021 17:20
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 23, 2021

test-me-please

@brb brb marked this pull request as ready for review August 23, 2021 17:20
@brb brb requested a review from a team August 23, 2021 17:20
@brb brb requested a review from a team as a code owner August 23, 2021 17:20
@brb brb requested review from a team, kkourt and nebril August 23, 2021 17:20
@brb brb requested a review from borkmann August 23, 2021 17:26
@pchaigno pchaigno added the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Sep 6, 2021
@brb brb force-pushed the pr/brb/ct-snat-bpf branch from c6764f1 to b24f05b Compare September 7, 2021 07:46
@brb
Copy link
Copy Markdown
Member Author

brb commented Sep 7, 2021

test-me-please

@brb brb force-pushed the pr/brb/ct-snat-bpf branch from b24f05b to d567075 Compare September 7, 2021 08:11
brb added 3 commits September 7, 2021 10:14
Previously, the BPF-based masquerading (--enable-bpf-masquerade=true)
was wrongly masquerading replies from a pod to an outside when the
outside had initiated a connection. This was possible when e.g., the
outside had a route to the pod cidr.

To fix this, we introduce a lightweight CT lookup function
ct_is_reply4() which checks whether a given flow is a reply. The lookup
function is called in snat_v4_needed().

As a side note, I've tried to move the port extraction to a separate
function, but unfortunately it hits complexity issues on the 4.19
kernel in the "K8sDatapathConfig AutoDirectNodeRoutes Check direct
connectivity with per endpoint routes" suite:

    BPF program is too large. Processed 131073 insn
    libbpf: failed to load program 'handle_to_container'
    libbpf: failed to load object '624_next/bpf_lxc.o'

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, they were failing due to our datapath masquerading replies
from pod to outside. As it got fixed in the previous commit, we can
enable BPF-based masquerading.

This will also gives us some coverage for the fix.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Copy Markdown
Member Author

brb commented Sep 7, 2021

test-me-please

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDemosTest Tests Star Wars Demo

Failure Output

FAIL: DNS entry is not ready after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

@brb
Copy link
Copy Markdown
Member Author

brb commented Sep 8, 2021

test-net-next

@christarazi christarazi removed dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. dont-merge/bad-bot To prevent MLH from marking ready-to-merge. labels Sep 8, 2021
@christarazi
Copy link
Copy Markdown
Member

CI has passed and we have reviewer approval, marking ready to merge.

@christarazi christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 8, 2021
@kaworu kaworu merged commit 4b92d2d into master Sep 10, 2021
@kaworu kaworu deleted the pr/brb/ct-snat-bpf branch September 10, 2021 15:17
brb added a commit that referenced this pull request Jul 11, 2024
Previously, we required socketLB to be enabled in order for BPF
masquerade to properly function. The reasoning was outlined in [1] and
[2].

As pointed by Julian Wiedmann, [3] resolved the following NAT reply
issue:

    On the remote node, the reply (dst=the client node IP) gets
    masqueraded by the BPF-masq feature, because we masquerade pod ->
    remote host IP in the tunnel mode (see comment in the
    "snat_v4_needed()" for the reason), and currently we don't consult
    the CT map to see whether a packet is reply.

Thus, we can remove the check.

[1]: #15437
[2]: 50e59c3
[3]: #17168

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this pull request Jul 11, 2024
Previously, we required socketLB to be enabled in order for BPF
masquerade to properly function. The reasoning was outlined in [1] and
[2].

As pointed by Julian Wiedmann, [3] resolved the following NAT reply
issue:

    On the remote node, the reply (dst=the client node IP) gets
    masqueraded by the BPF-masq feature, because we masquerade pod ->
    remote host IP in the tunnel mode (see comment in the
    "snat_v4_needed()" for the reason), and currently we don't consult
    the CT map to see whether a packet is reply.

Thus, we can remove the check.

[1]: #15437
[2]: 50e59c3
[3]: #17168

Signed-off-by: Martynas Pumputis <m@lambda.lt>
github-merge-queue Bot pushed a commit that referenced this pull request Jul 15, 2024
Previously, we required socketLB to be enabled in order for BPF
masquerade to properly function. The reasoning was outlined in [1] and
[2].

As pointed by Julian Wiedmann, [3] resolved the following NAT reply
issue:

    On the remote node, the reply (dst=the client node IP) gets
    masqueraded by the BPF-masq feature, because we masquerade pod ->
    remote host IP in the tunnel mode (see comment in the
    "snat_v4_needed()" for the reason), and currently we don't consult
    the CT map to see whether a packet is reply.

Thus, we can remove the check.

[1]: #15437
[2]: 50e59c3
[3]: #17168

Signed-off-by: Martynas Pumputis <m@lambda.lt>
sayboras pushed a commit that referenced this pull request Jul 16, 2024
[ upstream commit 3de8537 ]

Previously, we required socketLB to be enabled in order for BPF
masquerade to properly function. The reasoning was outlined in [1] and
[2].

As pointed by Julian Wiedmann, [3] resolved the following NAT reply
issue:

    On the remote node, the reply (dst=the client node IP) gets
    masqueraded by the BPF-masq feature, because we masquerade pod ->
    remote host IP in the tunnel mode (see comment in the
    "snat_v4_needed()" for the reason), and currently we don't consult
    the CT map to see whether a packet is reply.

Thus, we can remove the check.

[1]: #15437
[2]: 50e59c3
[3]: #17168

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
github-merge-queue Bot pushed a commit that referenced this pull request Jul 16, 2024
[ upstream commit 3de8537 ]

Previously, we required socketLB to be enabled in order for BPF
masquerade to properly function. The reasoning was outlined in [1] and
[2].

As pointed by Julian Wiedmann, [3] resolved the following NAT reply
issue:

    On the remote node, the reply (dst=the client node IP) gets
    masqueraded by the BPF-masq feature, because we masquerade pod ->
    remote host IP in the tunnel mode (see comment in the
    "snat_v4_needed()" for the reason), and currently we don't consult
    the CT map to see whether a packet is reply.

Thus, we can remove the check.

[1]: #15437
[2]: 50e59c3
[3]: #17168

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BPF-masq unexpectedly SNATs replies to outside

8 participants