Skip to content

Fix GKE Helm options for CI and docs.#12087

Merged
joestringer merged 9 commits intomasterfrom
pr/jrajahalme/gke-fixes
Jun 18, 2020
Merged

Fix GKE Helm options for CI and docs.#12087
joestringer merged 9 commits intomasterfrom
pr/jrajahalme/gke-fixes

Conversation

@jrajahalme
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme commented Jun 15, 2020

GKE install guide specifies the "ipam.config=kubernetes" option while
CI gkeHelmOverrides do not. Adding this option to CI gkeHelmOverrides
allowed Ginkgo runs in GKE to succeed.

Helm name for the native routing CIDR is
"global.nativeRoutingCIDR". Cilium agent command line option name is
"native-routing-cidr".

Fixes: #12053

@jrajahalme jrajahalme added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/bug/CI This is a bug in the testing code. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.8 labels Jun 15, 2020
@jrajahalme jrajahalme requested review from a team as code owners June 15, 2020 23:39
@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

test-me-please

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need this particular change on v1.7 as well (but not the ipam option). Given you're on backports this week, probably easiest for you to make sure the right commit from this PR goes back.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like we backported this change to the docs but never backported the helm side interpretation so it has been a no-op the entire time 🤦

@coveralls
Copy link
Copy Markdown

coveralls commented Jun 16, 2020

Coverage Status

Coverage decreased (-0.05%) to 37.1% when pulling af11191 on pr/jrajahalme/gke-fixes into a1cc34d on master.

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

Pushed one more commit only touching the GKE jenkinsfile after all other CI jobs already succeeded.

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

1 similar comment
@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@b3a-dev b3a-dev mentioned this pull request Jun 16, 2020
20 tasks
@jrajahalme
Copy link
Copy Markdown
Member Author

@nebril added one more commit to pass Hubble-relay image with the proper tag to Ginkgo, please have a look!

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

This needs #12076, waiting it to get merged before rebasing.

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/gke-fixes branch from 92a1fc9 to f7fbf0a Compare June 16, 2020 18:35
@jrajahalme
Copy link
Copy Markdown
Member Author

rebased, retesting..

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

GKE pipeline failed but now with some more data: https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/1688/

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/gke-fixes branch from f7fbf0a to 2b4e618 Compare June 17, 2020 07:01
@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

Helm name for the native routing CIDR is
"global.nativeRoutingCIDR". Cilium agent command line option name is
"native-routing-cidr".

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
GKE install guide specifies the "ipam.config=kubernetes" option while
CI gkeHelmOverrides do not. Adding this option to CI gkeHelmOverrides
allowed Ginkgo runs in GKE to succeed.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Without this override we'd get the following in the generated Cilium yaml:

$ diff -C3 good-cilium.yaml cilium-1619730510c46898.yaml
*** good-cilium.yaml	2020-06-17 14:07:44.000000000 -0700
--- cilium-1619730510c46898.yaml	2020-06-17 14:46:51.000000000 -0700
***************
*** 224,229 ****
--- 224,232 ----
    install-iptables-rules: "true"
    auto-direct-node-routes: "false"
    native-routing-cidr: 10.0.0.0/8
+   # List of devices used to attach bpf_host.o (implements BPF NodePort,
+   # host-firewall and BPF masquerading)
+   devices: "eth0 eth0\neth0"
    kube-proxy-replacement:  "probe"
    node-port-mode: "snat"
    node-port-bind-protection: "true"

Fixes: #11969
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Using the "fat" cilium-operator image fails with:

  State:          Waiting
    Reason:       CrashLoopBackOff
  Last State:     Terminated
    Reason:       ContainerCannotRun
    Message:      OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"cilium-operator-generic\": executable file not found in $PATH": unknown
    Exit Code:    127

Looks like the "cilium-operator" image tries to exec "cilium-operator-generic"?

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Pass the full image name to Ginkgo in the GKE Jenkinsfile, as
otherwise Ginkgo tries to use the default tag (latest), which is not
available for the test setup.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
…docs

Basic regular expressions in sed do not support '+', while the command
line arguments for extended regular expressions are different for GNU
sed and Mac OSX. Use (unquoted) '*' instead, as there is no harm
replacing '[]' with '[]'. This form work both on Linux and Mac OSX.

Add a note to the e2e testing docs so that users know to run the
script if namespaces get stuck at 'Terminating'.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
GKE can time out on apply when installing Cilium, use a longer timeout.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Create the cluster lock in namespace 'cilium-ci-lock' instead of
'default' so that it survives the CI deleting all resources in the
default namespace:

15:57:58  22:57:57 STEP: Deleting rs [lock-6f4fb6cdfb] in namespace default
15:57:58  22:57:57 STEP: Waiting for 1 deletes to return (lock-6f4fb6cdfb)
15:57:58  22:57:57 STEP: Deleting deployment [lock] in namespace default
15:57:58  22:57:57 STEP: Waiting for 1 deletes to return (lock)

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/gke-fixes branch from 0214ce8 to af11191 Compare June 18, 2020 15:43
@jrajahalme
Copy link
Copy Markdown
Member Author

test-me-please

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

rebased, no changes

@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

1 similar comment
@jrajahalme
Copy link
Copy Markdown
Member Author

test-gke

@jrajahalme
Copy link
Copy Markdown
Member Author

The only change here that can affect non-GKE tests is the doubling of the Apply timeout from 30 seconds to 60 seconds. Due to the cluster locking bug all GKE test runs can step on each other at any time until this PR is merged and all PRs are rebased to include the locking fix.

There has been many successful GKE CI runs, but AFAIK all of them have been on cluster cilium-ci-13. The only other cluster I've seen selected (cilium-ci-12) has always failed tests.

Given the above this PR is ready merge.

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 18, 2020
@joestringer joestringer merged commit 16907c4 into master Jun 18, 2020
@joestringer joestringer deleted the pr/jrajahalme/gke-fixes branch June 18, 2020 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/bug/CI This is a bug in the testing code. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI/GKE: cilium pre-flight checks failed

8 participants