Skip to content

CI: bring back k8s3 and make net-next kubeproxy-free#9901

Merged
aanm merged 17 commits intomasterfrom
pr/brb/third-host
Jan 27, 2020
Merged

CI: bring back k8s3 and make net-next kubeproxy-free#9901
aanm merged 17 commits intomasterfrom
pr/brb/third-host

Conversation

@brb
Copy link
Copy Markdown
Member

@brb brb commented Jan 20, 2020

This PR:

  • Re-introduces the third host (aka k8s3) which was previously disabled in ginkgo.Jenkinsfile: set k8s nodes back to 2 #9908, but this time it's leaner (1 vCPU, 2048MB RAM).
  • Introduces NO_CILIUM_ON_NODE env var which disables scheduling cilium-agent and other pods on the node.
  • Disabled kube-proxy in the k8s1.11 net-next CI job.
  • Adds couple of test fixes which were discovered when running with the configuration ^^.

Reviewable per commit.

Run CI tests with kube-proxy being disabled

This change is Reviewable

@brb brb added the wip label Jan 20, 2020
@maintainer-s-little-helper
Copy link
Copy Markdown

Release note label not set, please set the appropriate release note.

@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 20, 2020

test-me-please

@coveralls
Copy link
Copy Markdown

coveralls commented Jan 20, 2020

Coverage Status

Coverage decreased (-0.01%) to 45.932% when pulling 646239b on pr/brb/third-host into e6df048 on master.

@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 20, 2020

test-me-please

1 similar comment
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb brb force-pushed the pr/brb/third-host branch 3 times, most recently from c7899fc to c9a2608 Compare January 21, 2020 10:52
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb brb force-pushed the pr/brb/third-host branch from c9a2608 to 78dbb77 Compare January 21, 2020 10:56
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

4 similar comments
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb brb force-pushed the pr/brb/third-host branch from 8959324 to 9f0185e Compare January 21, 2020 15:13
@brb brb changed the title WIP: CI: Fix various issues for k8s3 CI: bring back k8s3 Jan 21, 2020
@brb brb added the release-note/ci This PR makes changes to the CI. label Jan 21, 2020
@brb brb added area/CI Continuous Integration testing issue or flake dont-merge/needs-release-note labels Jan 21, 2020
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 21, 2020

test-me-please

@brb brb marked this pull request as ready for review January 21, 2020 15:18
Comment thread test/helpers/utils.go Outdated
Comment thread test/helpers/utils.go Outdated
@brb brb force-pushed the pr/brb/third-host branch from eb29fbb to f86a583 Compare January 27, 2020 10:08
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 27, 2020

test-me-please

@brb brb requested a review from aanm January 27, 2020 10:09
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 27, 2020

@aanm Thanks for reviewing. PTAL.

brb added 17 commits January 27, 2020 15:42
The commit 4ac28ce ("bpf: make bpf_sock REMOTE_NODE_ID aware")
introduced the following optimization in NodePort service translation:

    If a client which runs on any node managed by Cilium sends a request to
    a NodePort service via another node (i.e. dst IP != the client node
    IP), then the NodePort service lookup and translation will happen on
    the client-side instead of the destination node.

Unfortunately, the optimization complicates some test cases of
test/k8sT/Services.go suite, in which we expect that the lookup and
translation will happen on the destination node (we want to test
bpf_netdev.c instead of bpf_sock.c).

To avoid this, we prevent cilium-agent from being scheduled on the k8s3
node by setting an affinity.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
To avoid scheduling any pods which would fail due to cilium-agent
not running on that host (missing CNI).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The env var prevents scheduling cilium-agent and any pods running in
non-host netns on that host.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
No cilium-agent or other pods are going to be scheduled on that VM
(except for the log-gatherer), so reduce memory by half (to 2048mb)
and set vCPU to 1.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit prevents from provisioning kube-proxy in the k8s1.11
net-next job.

Also, it introduces a way to control provisioning of kube-proxy via
the KUBEPROXY env var (set to "0" in the job).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit fixes two issues:

- Previously k8s1 was used to determine k8s2{Name,IP}.
- Due to the bpf_sock.c optimization, the k8s2 ->
  k8s1:test-nodeport-local-k8s2 did not fail.

Fixes: 11bb75d ("CI: Add GetNodeNameByLabel helpers")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit removes testing of MetalLB from in-cluster, as a request
sent from a node to itself via LoadBalancer IP is expected to fail
(we run MetalLB in L2 mode, so the node will get an ARP response for
LoadBalancer IP with its MAC addr).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, the test case which checked whether DSR NAT entry was
successfully evicted expected the cmd ".... | grep $NAT_ENTRY_PORT"
to succeed, which is wrong, as grep returns != 0 when no line can
be matched by grep.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The test fetches old vsn of cilium DS from GH which does not have
affinity which prevents cilium-agent from being scheduled on the third
host.

Also, v1.5 (=stable) doesn't implement the kube-proxy functionality.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The "NO_CILIUM_ON_NODE" node does not have cilium-agent running, so
a log-gatherer on this node won't be able to access kube-dns server.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Currently, Transparent Encryption and NodePort BPF are mutually
exclusive features.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This helps to prevent from etcd-operator being scheduled on
NO_CILIUM_ON_NODE host in CI.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
It seems that per endpoint routes does not work for IPv6, and kube-proxy
was hiding that fact.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The commit b2c9b07 ("bpf, sock: reduce xlation for externalTrafficPolicy=Local to host_id")
has fixed the externalTrafficPolicy=Local behaviour, so adjust the test
accordingly.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Previously, in ManagedEtcd test case, cilium-operator used to add
the "$NO_CILIUM_ON_NODE" node into kvstore, which make the node to be
added into nodes list at each cilium-agent. Consequently, this made
health checks to fail on each cilium-agent, and thus the test case
failed.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The iface is used to communicate with other cluster nodes.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/third-host branch from f86a583 to 646239b Compare January 27, 2020 14:42
@brb
Copy link
Copy Markdown
Member Author

brb commented Jan 27, 2020

test-me-please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake area/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. release-note/ci This PR makes changes to the CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants