datapath/iptables: Masquerade hairpin traffic that traversed the stack#10928
datapath/iptables: Masquerade hairpin traffic that traversed the stack#10928
Conversation
|
test-me-please |
|
test-me-please Edit: provisioning failure |
|
test-me-please |
|
test-me-please Hit #10118 |
|
test-me-please |
borkmann
left a comment
There was a problem hiding this comment.
Given we haven't seen this in non-chaining scenarios so far, should we gate this to avoid adding the rule for native Cilium setups?
qmonnet
left a comment
There was a problem hiding this comment.
Looks good to me, with minor nits.
This can happen in direct routing mode as well if |
6986f4b to
5483585
Compare
|
test-me-please IP frag test failed: |
joestringer
left a comment
There was a problem hiding this comment.
Change looks good, do we have a small repro we can integrate into the CI? This type of bug seems easy to break by just rearranging rules a bit.
Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
Add checks for multi-node and intra-node connectivity via hostport. These tests will remain unready if hostport/portmap is not enabled. Signed-off-by: Thomas Graf <thomas@cilium.io>
Requires the following tests to pass: ``` NAME READY STATUS RESTARTS AGE echo-a-5f555bbc8b-9cxv9 1/1 Running 0 41s echo-b-659766fb56-zw2wl 1/1 Running 0 41s echo-b-host-65d7db76d8-5wmhm 1/1 Running 0 41s host-to-b-multi-node-clusterip-c7557d4f8-gv6ws 1/1 Running 0 41s host-to-b-multi-node-headless-5dfcdf9b76-9hcqn 1/1 Running 0 41s pod-to-a-6cf58894b7-mqg67 1/1 Running 0 40s pod-to-a-allowed-cnp-5898f7d8c9-bdfxz 1/1 Running 0 41s pod-to-a-external-1111-5779fb7cb9-tgdlh 1/1 Running 0 39s pod-to-a-l3-denied-cnp-74b9566cc7-zjhhh 1/1 Running 0 41s pod-to-b-intra-node-77b485d996-xfv45 1/1 Running 0 40s pod-to-b-intra-node-hostport-6c55bf8459-vddt2 1/1 Running 0 40s pod-to-b-multi-node-clusterip-75f5c78f68-2lk8x 1/1 Running 0 40s pod-to-b-multi-node-headless-5df88f9bd4-f5jlt 1/1 Running 0 40s pod-to-b-multi-node-hostport-d7c8d659f-4xqpt 1/1 Running 0 39s pod-to-external-fqdn-allow-google-cnp-74466b4c6f-dxvlq 1/1 Running 0 39s ``` Signed-off-by: Thomas Graf <thomas@cilium.io>
5483585 to
02115cf
Compare
Good point. I've extended the connectivity-check YAML to include hostport and added it to the CI. This uncovered one more case that needed fixing which required an additional commit. |
|
test-me-please One test failed with: GKE and -with-kernel tests all passed |
| imagePullPolicy: IfNotPresent | ||
| ports: | ||
| - containerPort: 80 | ||
| hostPort: 40000 |
There was a problem hiding this comment.
Also added one here for services (ebd7392) but only for the BPF case so far. I wonder if we somewhat could consolidate both, just a thought; not needed for this series in here.
|
Tests were interrupted, retrying: |
|
restart-ginkgo |
1 similar comment
|
restart-ginkgo |
This PR builds on #10926
The traffic is sent to the stack and hairpin'ed back into a local pod
after a component on the stack has applied a DNAT rule, the traffic must
be SNATed to ensure the reverse NAT can take place. This can happen if
portmap or kiam is being used and redirection happens to a local
destination.
The masquerade filter must be limited as not all DNAT traffic may be
affected. NodePort traffic from a non-local source must remain
unmasqueraded in order for trafficPolicy=local to continue working.
Also, when EnableEndpointRoutes is enabled, traffic always traverses the
stack and must not be masqueraded either.
Fixes: #9784
Requires: #10926