bpf: nat: skip snat when using iptables masquerade#36631
bpf: nat: skip snat when using iptables masquerade#36631gyutaeb wants to merge 1 commit intocilium:mainfrom
Conversation
3d57c2b to
dde7574
Compare
julianwiedmann
left a comment
There was a problem hiding this comment.
It is no longer correct to check the NAT_MIN_EGRESS condition. This is because target->from_local_endpoint is now being checked. In the case of !target->from_local_endpoint, source NAT is not required,
I don't follow. Can you please elaborate much more on the suggested change. Why is source NAT not required in the case you're describing? Also please add this in-depth explanation to the patch description, not just the PR.
dde7574 to
f75e97c
Compare
|
@julianwiedmann @gentoo-root Thank you for review! When using iptables masquerade instead of bpf masquerade, SNAT is not required. I plan to enhance the |
f75e97c to
2649ebb
Compare
2649ebb to
3ce2466
Compare
|
@julianwiedmann @gentoo-root @ti-mo Thank you for your patience. The previous changes had some ambiguities. To clarify the bug and the fix, I added Lines 2416 to 2426 in 5a4fdc9 |
When using iptables masquerade, the source address is already nat-ed by iptables. So, we should skip snat when using iptables masquerade. If we don't skip snat, the source address is nat-ed twice. It also unnecessarily occupies entry in the SNAT_MAPPING map. And if it exceeds SNAT_COLLISION_RETRIES, it leads to connection failures. Fixes: cilium#36572 Signed-off-by: Gyutae Bae <gyu.8ae@gmail.com>
3ce2466 to
2a6d867
Compare
|
*Update Branch(Rebase) |
There was a problem hiding this comment.
I don't see how this can be correct for the case of using BPF Nodeport without BPF masquerade. Please elaborate a lot more.
We're intentionally tracking connections in the SNAT engine for this case, and applying SNAT as needed to avoid conflicts between LB-to-remote-backend connections and host-originating / iptables-masqueraded connections.
@julianwiedmann Thank you for your detailed explanation. I agree with your opinion, After reviewing the case you explained, I realized that skipping SNAT in iptables-masquerade cannot prevent the conflict with LB-to-remote-backend. I hadn't considered that scenario, so thank you for pointing it out. This PR should not be merged. To elaborate, during my attempts to reproduce this issue, I observed a rare case where the SNAT engine performed SNAT on an egress packet, but the reverse NAT for the response packet failed. This occurred when the |
Thank you for the feedback! Looking forward to what you come up with :). I'll flip this PR to |
@julianwiedmann Hello, I'd like to share the root cause of the issue that caused REV NAT to fail, as mentioned earlier. This issue was mitigated by #35304, which introduced the idea of recreating evicted reverse entries during NAT, using the otuple. I think this is a fantastic approach! |
Sorry, I don't think I'll have the cycles to co-design a solution for the problem you encountered. But maybe you can connect with @sugangli and discuss this in more detail - hack up a minimal reproducer that demonstrates the problem etc? |
|
Hi @gyutaeb, IIUC, retrieving the original entry from the rev nat path could be a bit tricky. Could you give an example on how it could be done? Feel free to tag me in the new PR once you are ready. |
|
Hi @julianwiedmann @sugangli. First of all, recreating the reverse entry in The issue occurs as follows:
This issue can still occur, and recreating the original entry in Additionally, it would be helpful to provide a tool that allows users to determine whether their bpf-nat-global-max value is appropriate. Normally, the max value is not reached, but in cases where keepalive is not used, it can reach the limit. Moreover, when LRU eviction occurs, it can take time for users to determine whether it was caused by LRU. Therefore, it would be beneficial to develop a tool like the one proposed in kubectl-cilium. There are two possible approaches: adding this functionality to the cilium CLI within cilium-agent, or, integrating it into kubectl-cilium. |
|
This pull request has been automatically marked as stale because it |
|
This pull request has been automatically marked as stale because it |
|
This pull request has not seen any activity since it was marked stale. |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.
Fixes: <commit-id>tag, thenplease add the commit author[s] as reviewer[s] to this issue.
When using iptables masquerade, the source address is already nat-ed by iptables. So, we should skip snat when using iptables masquerade. If we don't skip snat, the source address is nat-ed twice. It also unnecessarily occupies entry in the SNAT_MAPPING map. And if it exceeds SNAT_COLLISION_RETRIES, it leads to connection failures.
Fixes: #36572