Skip to content

nodeinit: Disable default ip-masq-agent jumps#11782

Merged
nebril merged 3 commits intocilium:masterfrom
planetscale:disable-masq-agent-jump
Aug 11, 2020
Merged

nodeinit: Disable default ip-masq-agent jumps#11782
nebril merged 3 commits intocilium:masterfrom
planetscale:disable-masq-agent-jump

Conversation

@dctrwatson
Copy link
Copy Markdown
Contributor

@dctrwatson dctrwatson commented May 29, 2020

Noticed some surprising behavior where disabling Cilium masquerade does not actually disable masquerade in GKE.

In GKE, even if the IP masquerade agent is not enabled, the instance configure script adds some default rules

This removes the jump from POSTROUTING to the IP-MASQ chain

Signed-off-by: John Watson <johnw@planetscale.com>
@dctrwatson dctrwatson requested a review from a team as a code owner May 29, 2020 19:35
@dctrwatson dctrwatson requested a review from a team May 29, 2020 19:35
@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@coveralls
Copy link
Copy Markdown

coveralls commented May 29, 2020

Coverage Status

Coverage remained the same at 36.919% when pulling 9a06f30 on planetscale:disable-masq-agent-jump into 38404a8 on cilium:master.

Signed-off-by: John Watson <johnw@planetscale.com>
@dctrwatson
Copy link
Copy Markdown
Contributor Author

Nevermind about AKS, I see AKS itself has added cilium as a supported network plugin provider that fixes this in the future. Just updating AKS doc can fix it now.

@nebril
Copy link
Copy Markdown
Member

nebril commented Jun 2, 2020

test-gke

@aanm aanm added kind/bug This is a bug in the Cilium logic. needs-backport/1.8 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 3, 2020
Copy link
Copy Markdown
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgraf can you take a look a this PR? Is it related with #11819?

@aanm aanm requested a review from tgraf June 3, 2020 16:23
@stale
Copy link
Copy Markdown

stale Bot commented Jul 3, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 3, 2020
{{- end }}

{{- if .Values.global.gke.enabled }}
# If the IP-MASQ chain exists, add back jump rule
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a more descriptive comment on why this is needed instead of what the code does. Why is this needed? Why does Cilium need to manipulate k8s iptables rules?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgraf Add more info to the comment in the install script portion

@stale stale Bot removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Jul 6, 2020
@dctrwatson dctrwatson requested a review from a team as a code owner July 7, 2020 01:57
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 80b4ac05cf42d85ad575a1ad1e8086c2ac4eaea9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper Bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 7, 2020
Signed-off-by: John Watson <johnw@planetscale.com>
@dctrwatson dctrwatson force-pushed the disable-masq-agent-jump branch from 80b4ac0 to 9a06f30 Compare July 7, 2020 01:57
@maintainer-s-little-helper
Copy link
Copy Markdown

Commit 80b4ac05cf42d85ad575a1ad1e8086c2ac4eaea9 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@qmonnet qmonnet removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 14, 2020
@qmonnet
Copy link
Copy Markdown
Member

qmonnet commented Jul 14, 2020

test-gke

@qmonnet qmonnet requested a review from tgraf July 14, 2020 10:39
@nathanjsweet
Copy link
Copy Markdown
Member

retest-netnext

@dctrwatson
Copy link
Copy Markdown
Contributor Author

Can you point me to the link for these failures logs?

https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/2061/consoleFull

is where I pulled the excerpt from

@fristonio
Copy link
Copy Markdown
Member

Sorry, but I think this has been fixed as a hotfix for GKE builds in #12725
@nebril is currently working on some patches that will fix the remaining issues with GKE builds. We can retest once those are merged.

@fristonio
Copy link
Copy Markdown
Member

retest-gke

@fristonio
Copy link
Copy Markdown
Member

retest-gke
The cluster was not provisioned in previous run - https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/2131/

@dctrwatson
Copy link
Copy Markdown
Contributor Author

Looks like just this over and over

07:41:17  Error from server (AlreadyExists): namespaces "cilium-ci-lock" already exists
07:41:18  
07:41:18  #15 [stage-3 2/9] COPY --from=builder /tmp/install /
07:41:18  #15 DONE 0.6s
07:41:18  Error from server (AlreadyExists): error when creating "/home/jenkins/workspace/Cilium-PR-K8s-GKE@2/src/github.com/cilium/cilium/test/gke/lock.yaml": deployments.apps "lock" already exists
07:41:18  error: --overwrite is false but found the following declared annotation(s): 'lock' already has a value (1)
07:41:18  1
[...]
07:41:23  Error from server (AlreadyExists): error when creating "/home/jenkins/workspace/Cilium-PR-K8s-GKE@2/src/github.com/cilium/cilium/test/gke/lock.yaml": deployments.apps "lock" already exists

@joestringer
Copy link
Copy Markdown
Member

joestringer commented Aug 6, 2020

@dctrwatson have you been able to manually test this in your own GKE environment?

The GKE CI job is not required, but this PR is specifically about GKE so the main thing we're looking for is reassurance that it works. We're working through some CI infrastructure provisioning issues with the GKE-based CI but as long as you've done due diligence this seems reasonable to merge.

EDIT: I should maybe add that seeing connectivity-check output that's successful with this patch would be a useful datapoint as a baseline.

@dctrwatson
Copy link
Copy Markdown
Contributor Author

@dctrwatson have you been able to manually test this in your own GKE environment?

Yes, have been using this with Cilium 1.7.x since a couple days before submitting this PR.

EDIT: I should maybe add that seeing connectivity-check output that's successful with this patch would be a useful datapoint as a baseline.

GKE Node Version: v1.16.13-gke.1
Kernel Version: 4.19.112+
Cilium Version: 1.7.7


NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-58dd59998d-rk8w4                                  1/1     Running   0          72s
echo-b-865969889d-nw4qb                                  1/1     Running   0          71s
echo-b-host-659c674bb6-9qg6m                             1/1     Running   0          72s
host-to-b-multi-node-clusterip-6fb94d9df6-2nmw9          1/1     Running   0          71s
host-to-b-multi-node-headless-7c4ff79cd-k5ccz            1/1     Running   0          71s
pod-to-a-5c8dcf69f7-692j4                                1/1     Running   0          70s
pod-to-a-allowed-cnp-75684d58cc-kntqj                    1/1     Running   0          71s
pod-to-a-external-1111-669ccfb85f-m55hf                  1/1     Running   0          71s
pod-to-a-l3-denied-cnp-7b8bfcb66c-flfcr                  1/1     Running   0          70s
pod-to-b-intra-node-74997967f8-dlrlx                     1/1     Running   0          70s
pod-to-b-intra-node-nodeport-5fd7b6df5-kw566             1/1     Running   0          70s
pod-to-b-multi-node-clusterip-587678cbc4-5cq7q           1/1     Running   0          70s
pod-to-b-multi-node-headless-574d9f5894-t5cg8            1/1     Running   0          69s
pod-to-b-multi-node-nodeport-f544b6765-qhttw             1/1     Running   0          69s
pod-to-external-fqdn-allow-google-cnp-6dd57bc859-nz5jd   1/1     Running   0          69s

I used: https://raw.githubusercontent.com/cilium/cilium/v1.7/examples/kubernetes/connectivity-check/connectivity-check.yaml

Though I had to fix the port for the 2 nodeport pods:

--- connectivity-check.yaml	2020-08-07 18:45:53.970029385 -0700
+++ connectivity-check.yaml.orig	2020-08-07 18:47:32.146661105 -0700
@@ -365,13 +365,13 @@
         - -c
         - sleep 1000000000
         image: docker.io/byrnedo/alpine-curl:0.1.8
-        imagePullPolicy: IfNotPresent
+        imagePullPolicy: IfNotPresent 
         livenessProbe:
           exec:
-            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:41000" ]
+            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:31313" ]
         readinessProbe:
           exec:
-            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:41000" ]
+            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:31313" ]
         name: pod-to-b-intra-node-hostport
 ---
 apiVersion: apps/v1
@@ -506,10 +506,10 @@
         imagePullPolicy: IfNotPresent
         livenessProbe:
           exec:
-            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:41000" ]
+            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:31313" ]
         readinessProbe:
           exec:
-            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:41000" ]
+            command: [ "curl", "-sS", "--fail", "-o", "/dev/null", "echo-b-host-headless:31313" ]
         name: pod-to-b-multi-node-nodeport

@joestringer
Copy link
Copy Markdown
Member

Though I had to fix the port for the 2 nodeport pods:

Hmm. I don't actually see a nodeport declaration in that v1.7 version of the connectivity-check. The port 41000 seems to actually be a host application listening on the port. Seems like we need to fix that up. If you compare to the v1.8 YAML, there's actually a nodeport configured on port 31313 so no changes to the connectivity check should be necessary. Instead, we should extend the YAML to include the nodeport like this:

---
apiVersion: v1
kind: Service
metadata:
  name: echo-b
spec:
  type: NodePort
  ports:
  - port: 80
    nodePort: 31313
  selector:
    name: echo-b

@nebril
Copy link
Copy Markdown
Member

nebril commented Aug 10, 2020

test-gke

@dctrwatson
Copy link
Copy Markdown
Contributor Author

I just tried with the v1.8 connectivity-check.yaml and it worked out of the box!

NAME                                                     READY   STATUS    RESTARTS   AGE
echo-a-58dd59998d-l7svd                                  1/1     Running   0          35s
echo-b-865969889d-hlxf4                                  1/1     Running   0          34s
echo-b-host-659c674bb6-mrm2z                             1/1     Running   0          35s
host-to-b-multi-node-clusterip-6fb94d9df6-hd5hv          1/1     Running   0          34s
host-to-b-multi-node-headless-7c4ff79cd-vjklg            1/1     Running   0          34s
pod-to-a-5c8dcf69f7-qhhhr                                1/1     Running   0          33s
pod-to-a-allowed-cnp-75684d58cc-zvd2k                    1/1     Running   0          34s
pod-to-a-external-1111-669ccfb85f-c9vh6                  1/1     Running   0          34s
pod-to-a-l3-denied-cnp-7b8bfcb66c-27fkr                  1/1     Running   0          33s
pod-to-b-intra-node-74997967f8-fkkkv                     1/1     Running   0          33s
pod-to-b-intra-node-nodeport-775f967f47-5ss46            1/1     Running   0          33s
pod-to-b-multi-node-clusterip-587678cbc4-hzlpc           1/1     Running   0          32s
pod-to-b-multi-node-headless-574d9f5894-86rnc            1/1     Running   0          32s
pod-to-b-multi-node-nodeport-7944d9f9fc-bbj7m            1/1     Running   0          32s
pod-to-external-fqdn-allow-google-cnp-6dd57bc859-nts9v   1/1     Running   0          32s

@nebril
Copy link
Copy Markdown
Member

nebril commented Aug 11, 2020

test-gke

@nebril
Copy link
Copy Markdown
Member

nebril commented Aug 11, 2020

The gke flake seems to be caused by a known flake, let's merge.

@nebril nebril merged commit 487313d into cilium:master Aug 11, 2020
@joestringer joestringer added the upgrade-impact This PR has potential upgrade or downgrade impact. label Aug 19, 2020
@joestringer
Copy link
Copy Markdown
Member

joestringer commented Aug 19, 2020

Discussed during SIG-datapath meeting:

This fixes BPF masq (including ip-masq-agent) in v1.8. However the way it applies right now, if BPF masquerade is disabled, it could break ip-masq-agent configuration in GKE.

The default Cilium configuration should be to enable BPF masquerade (on new enough kernels), in which case we want the logic introduced here (delete the rule). The concern we have is that when disabling BPF masquerade, this could cause unintended SNAT. The check switches on the gke.enabled helm flag right now, but it seems like there should also be a check for config.bpfMasquerade as well.

@dctrwatson
Copy link
Copy Markdown
Contributor Author

dctrwatson commented Aug 19, 2020

Oh, interesting.

This was originally against v1.7, so config.bpfMasquerade didn't exist yet.

The intention is that by using Cilium, that ip-masq-agent is no longer used, thus making global.masquerade actually accurate.

I see 3 configurations:

  1. config.bpfMasquerade=false + global.masquerade=true, use cilium iptables to masquerade
  2. config.bpfMasquerade=true + global.masquerade=true, use cilium bpf to masquerade
  3. config.bpfMasquerade=false + global.masquerade=false, no masquerading whatsoever

So I think another flag would have to be added explicitly stating to fallback to old behavior of using ip-masq-agent?

@Weil0ng
Copy link
Copy Markdown
Contributor

Weil0ng commented Aug 19, 2020

FWIW, on GKE option 3 means we should just let GKE's ip-masq-agent do its thing, so the current behavior might be an issue in this case?

@joestringer
Copy link
Copy Markdown
Member

joestringer commented Aug 19, 2020

@dctrwatson right now the check is just for whether GKE is enabled:

{{- if .Values.global.gke.enabled }}

Perhaps a check for global.masquerade == false would be the right thing to add here to address case (3)?

@dctrwatson
Copy link
Copy Markdown
Contributor Author

dctrwatson commented Aug 20, 2020

FWIW, on GKE option 3 means we should just let GKE's ip-masq-agent do its thing, so the current behavior might be an issue in this case?

@Weil0ng that's the behavior that I found to be surprising and why I made this PR. My expectation was that Cilium manages all aspects of networking once installed by overriding kube-proxy, ip-masq-agent, etc (on new enough kernels.)

I'm not opposed to the old behavior of falling back to ip-masq-agent if docs are updated to reflect that and in GKE, there are default masquerade rules that will always apply. So to actually disable masquerade to have 3 flags:

  • global.masquerade=false
  • global.gke.enabled=true
  • global.gke.bypassIpMasqAgent=true (this is the new flag, name subject to change)

Perhaps a check for global.masquerade == false would be the right thing to add here to address case (3)?

@joestringer that would satisfy my intention for case (3). But as @Weil0ng points out, maybe not everyone wants to bypass ip-masq-agent rules?

@David0922
Copy link
Copy Markdown

David0922 commented Aug 21, 2020

bypassing/replacing ip-masq-agent or not is more of a strategic decision. there are users who have custom maps on ip-masq-agent so it might be beneficial to preserve the old behavior (default back to ip-masq-agent
when config.bpfMasquerade=false + global.masquerade=false (or a new flag?)) unless specific flags are set.

@dctrwatson
Copy link
Copy Markdown
Contributor Author

@Weil0ng @David0922 @joestringer I think #12952 handles the unexpected behavior this PR introduced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.