Skip to content

v1.8 backports 2020-07-07#12442

Merged
qmonnet merged 15 commits intov1.8from
pr/v1.8-backport-2020-07-07
Jul 15, 2020
Merged

v1.8 backports 2020-07-07#12442
qmonnet merged 15 commits intov1.8from
pr/v1.8-backport-2020-07-07

Conversation

@vadorovsky
Copy link
Copy Markdown
Member

@vadorovsky vadorovsky commented Jul 7, 2020

Once this PR is merged, you can update the PR labels via:

$ for pr in 12291 12301 12270 12342 12379 12366 12403 11236 12398 12395 12426 12433 12417 11794; do contrib/backporting/set-labels.py $pr done 1.8; done

aanm and others added 14 commits July 7, 2020 11:54
[ upstream commit 8f624a7 ]

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 269c985 ]

If the cluster has just been recreated, it can have no nodepool yet.
Don't choose such clusters because scaling it will fail.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 82f0cd0 ]

We are hitting a lot of issues caused by state left over from test run.

This change deletes the cluster after the test run and relies on our
infra recreating the cluster for subsequent builds.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 0868c6c ]

Remove the generated temp files after they are no longer needed.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 425fbd7 ]

We already have the test where the request fails, so add one for the
case where it must succeed from outside.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 5382f2a ]

Hubble v0.5 used to display FQDN names without a trailing dot for
absolute paths, i.e. `cilium.io` instead of `cilium.io.`.

We accidentally changed this in Hubble v0.6 when we started accessing
the Cilium DNS cache directly. This in in turn broke filtering on FQDNs
(--{from-,to-,}fqdn), as the filtering logic does not assume trailing
dots, same as Hubble UI, the CLI and metrics.

This commit restores the old behavior of stripping trailing dots from
the source and destination name.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 00fc18f ]

This performs an additional `hubble observe` invocation when testing
DNS-based policies. We expect Hubble to annotate the `destination_names`
field of the test flows with the observed DNS name exactly as stated
in the test.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 86729fb ]

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit edff374 ]

This PR adds in an option 'enable-health-check-nodeport' to the
cilium-agent for disabling 'HealthCheckNodePort' based on the
KubeProxyReplacement configuration.
It also adds in an opton 'enableHealthCheckNodePort' to the helm
charts and documents the new config options with impacts when
kubeProxyReplacement is set to 'partial'.

Fixes: #11168
Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 082e879 ]

This change removes reliance on hardcoded node names when retrieving
pods in several tests, which caused GKE tests (where we don't control
node names) to fail.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit 54275be ]

Signed-off-by: Vlad Ungureanu <vladu@palantir.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit d131945 ]

It's no longer required to specify podCIDR [1] when provisioning
k8s with kubeadm for Cilium. Removing this allows us to simplify
the guide by getting rid of passing the ipam mode to helm which
was introduced by #12246.

[1]: kubernetes/website#21432 (comment)

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit fd56ee9 ]

Noticed we didn't mention anything about it yet, so state it explicitly here.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
[ upstream commit f4ea3f6 ]

Also, bump tests to 1.18.5, 1.17.8 and 1.16.12

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
@vadorovsky vadorovsky requested a review from a team as a code owner July 7, 2020 10:20
@maintainer-s-little-helper maintainer-s-little-helper Bot added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Jul 7, 2020
@vadorovsky
Copy link
Copy Markdown
Member Author

test-backport-1.8

@vadorovsky
Copy link
Copy Markdown
Member Author

In all failed jobs it's the same test failing

13:12:48  • Failure [39.489 seconds]
13:12:48  K8sPolicyTest
13:12:48  /home/jenkins/workspace/Cilium-PR-K8s-oldest-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:478
13:12:48    Basic Test
13:12:48    /home/jenkins/workspace/Cilium-PR-K8s-oldest-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:478
13:12:48      Traffic redirections to proxy
13:12:48      /home/jenkins/workspace/Cilium-PR-K8s-oldest-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:478
13:12:48        Tests DNS proxy visibility without policy [It]
13:12:48        /home/jenkins/workspace/Cilium-PR-K8s-oldest-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514
13:12:48  
13:12:48        Expected
[2020-07-07T11:12:48.199Z]           <string>: vagrant-cache.ci.cilium.io
[2020-07-07T11:12:48.199Z]       to be an element of
[2020-07-07T11:12:48.199Z]           <[]interface {} | len:1, cap:1>: [
[2020-07-07T11:12:48.199Z]               [
[2020-07-07T11:12:48.199Z]                   vagrant-cache.ci.cilium.io,
[2020-07-07T11:12:48.199Z]               ],
[2020-07-07T11:12:48.199Z]           ]
13:12:48  
13:12:48        /home/jenkins/workspace/Cilium-PR-K8s-oldest-net-next/src/github.com/cilium/cilium/test/k8sT/Policies.go:1014
13:12:48  ------------------------------

Comment thread test/k8sT/Policies.go
res := kubectl.HubbleObserve(helpers.CiliumNamespace, ciliumPod,
fmt.Sprintf("--last 1 --from-pod %s/%s --to-fqdn %q",
namespaceForTest, appPods[helpers.App2], "*.cilium.io"))
res.ExpectContainsFilterLine("{.destination_names[0]}", "vagrant-cache.ci.cilium.io")
Copy link
Copy Markdown
Member

@joestringer joestringer Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this newly added line is the culprit of the test failures. Tracing a bit of git blame it seems like this may rely on #12332, but that PR was already backported to v1.8. @christarazi I think you did the backport of that PR, do you recall anything unusual about the backport of that one?

Anyway, if it's so reliably failing in every job then this shouldn't be a huge effort to reproduce locally & iterate to locate/fix the issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not seeing which PR you're referring to. I see the backport PR I've done, but not the upstream PR.

I don't recall any conflicts when doing 1.8 backports. Also, the tests passed the first time in the backport PR (#12332), so it seems we have a new failure here, rather than in the previous.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I quoted the wrong PR, I meant #12252 . However if there weren't any conflicts then this won't provide us any simple answers anyway, something else must be missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have absolutely no idea what could be wrong here, since the full stdout is:

{"time":"2020-07-13T11:21:23.607974208Z","verdict":"FORWARDED","ethernet":{"source":"82:6a:80:39:5f:4b","destination":"5a:7e:bc:be:37:ab"},"IP":{"source":"10.0.0.131","destination":"147.75.38.95","ipVersion":"IPv4"},"l4":{"TCP":{"source_port":56212,"destination_port":80,"flags":{"ACK":true}}},"source":{"ID":277,"identity":25122,"namespace":"202007131320k8spolicytestbasictesttrafficredirectionstoproxytes","labels":["k8s:appSecond=true","k8s:id=app2","k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=app2-account","k8s:io.kubernetes.pod.namespace=202007131320k8spolicytestbasictesttrafficredirectionstoproxytes","k8s:zgroup=testapp"],"pod_name":"app2-88b7f8c4b-l784f"},"destination":{"identity":2,"labels":["reserved:world"]},"Type":"L3_L4","node_name":"k8s1","destination_names":["vagrant-cache.ci.cilium.io"],"event_type":{"type":4,"sub_type":3},"traffic_direction":"EGRESS","Summary":"TCP Flags: ACK"}

so the .destination[0] selector should clearly return vagrant-cache.ci.cilium.io.

The output of the failing test also seems weird:

[2020-07-07T11:12:48.199Z]           <string>: vagrant-cache.ci.cilium.io
[2020-07-07T11:12:48.199Z]       to be an element of
[2020-07-07T11:12:48.199Z]           <[]interface {} | len:1, cap:1>: [
[2020-07-07T11:12:48.199Z]               [
[2020-07-07T11:12:48.199Z]                   vagrant-cache.ci.cilium.io,
[2020-07-07T11:12:48.199Z]               ],
[2020-07-07T11:12:48.199Z]           ]

So... basically... "I could not find vagrant-cache.ci.cilium.io in [ vagrant-cache.ci.cilium.io, ]." Huh?

I suspect that maybe the backported ExpectContainsFilterLine function #12252 is now working properly, but still, I have no idea why it could be. Anyway, I'm going to check it with Delve and see if it leads me anywhere.

Copy link
Copy Markdown
Member

@gandro gandro Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... basically... "I could not find vagrant-cache.ci.cilium.io in [ vagrant-cache.ci.cilium.io, ]." Huh?

Sounds a bit like a type issue,i.e. the needle is of type string, but the haystack is of type[]string, so there is never a match. We had a similar issue before: #11794 -- but not sure why this would only happen on the v1.8 branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So it looks like #11794 apparently was merged after we branched off to v1.8. In that case, I think #11794 needs to be backported!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks a lot!

[ upstream commit a20dcd1 ]

Current matchers implementation makes policy related specs always pass
even when value is present in the output. This patch updates
ExpectContainsFilterLine and ExpectDoesNotContainFilterLine matchers
to match expected value against string slice rather than slice of
FilterBuffer. This patch also updates policy specs to use
ExpectContainsFilterLine matcher where applicable. Minor consistency
changes were also added to policy specs.

Signed-off-by: Arthur Evstifeev <aevstifeev@gitlab.com>
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
@vadorovsky
Copy link
Copy Markdown
Member Author

test-backport-1.8

@brb
Copy link
Copy Markdown
Member

brb commented Jul 15, 2020

The CI is still failing with:

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.13-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:461
hubble-relay was not able to get into ready state
Expected
    <*errors.errorString | 0xc00130a5f0>: {
        s: "timed out waiting for pods with filter -l k8s-app=hubble-relay to be ready: 10m0s timeout expired",
    }
to be nil
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-K8s/1.13-gopath/src/github.com/cilium/cilium/test/k8sT/hubble.go:124

which is due to

	   Warning  Failed                  9m8s (x4 over 10m)   kubelet, k8s2      Failed to pull image "147.75.68.105/cilium/hubble-relay:latest": rpc error: code = Unknown desc = Error response from daemon: Get http://147.75.68.105/v2/: dial tcp 147.75.68.105:80: connect: no route to host
	   Warning  Failed                  9m8s (x4 over 10m)   kubelet, k8s2      Error: ErrImagePull
	   Warning  Failed                  69s (x39 over 10m)   kubelet, k8s2      Error: ImagePullBackOff

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-K8s/3305/testReport/junit/Suite-k8s-1/13/K8sHubbleTest_Hubble_Observe_Test_L3_L4_Flow/

@gandro Any pointers which backport might be missing?

@gandro
Copy link
Copy Markdown
Member

gandro commented Jul 15, 2020

	   Warning  Failed                  9m8s (x4 over 10m)   kubelet, k8s2      Failed to pull image "147.75.68.105/cilium/hubble-relay:latest": rpc error: code = Unknown desc = Error response from daemon: Get http://147.75.68.105/v2/: dial tcp 147.75.68.105:80: connect: no route to host
	   Warning  Failed                  9m8s (x4 over 10m)   kubelet, k8s2      Error: ErrImagePull
	   Warning  Failed                  69s (x39 over 10m)   kubelet, k8s2      Error: ImagePullBackOff

@gandro Any pointers which backport might be missing?

"connect: no route to host" looks like a connectivity problem from the vagrant machine to the image registry. Nothing specific to Hubble as far as I can tell, as there are other non-Hubble failures as well. Packet having connectivity issues?

@brb
Copy link
Copy Markdown
Member

brb commented Jul 15, 2020

test-backport-1.8

@maintainer-s-little-helper maintainer-s-little-helper Bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 15, 2020
@qmonnet qmonnet merged commit 79d801b into v1.8 Jul 15, 2020
@qmonnet qmonnet deleted the pr/v1.8-backport-2020-07-07 branch July 15, 2020 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.