v1.7 backports 2020-06-04#11906
Conversation
|
test-backport-1.7 |
dbcb14a to
9f81d41
Compare
|
test-backport-1.7 |
9f81d41 to
bcc2d65
Compare
|
test-backport-1.7 |
bcc2d65 to
c4fe341
Compare
|
test-backport-1.7 |
1 similar comment
|
test-backport-1.7 |
|
test-missed-k8s Edit: timed out |
|
restart-ginkgo Edit: timed out |
|
restart-ginkgo |
|
test-missed-k8s |
|
restart-ginkgo |
|
test-missed-k8s |
[ upstream commit 03602e3 ] Due to bug in jenkins, nesting timeout in retry block causes build to abort. Work around by using shell-based timeout Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
|
The ManagedEtcd tests were failing legitimately because the anti-affinity changes caused the cilium-etcd-operator to be stuck in the pending scheduling state because the anti-affinity is set as a Helm When these rules were applied, they rendered either the Cilium daemonset or the cilium-etcd-operator deployment to get stuck pending to schedule, depending on who races first to be deployed on a node. The other would then get stuck pending, ultimately, causing the test to timeout. Working on a fix.
Update: fix is in commit 1c6bf9f |
e3de12f to
ed1b51b
Compare
|
test-backport-1.7 |
|
restart-ginkgo |
|
test-focus K8sDatapathConfig.Encapsulation Check connectivity. |
|
test-focus K8sDatapathConfig.Encapsulation.(Check connectivity with sockops|Check connectivity with VXLAN|Check connectivity with Geneve) Edit: qauy hit 502: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated-Focus/248/console |
|
test-focus K8sDatapathConfig.Encapsulation.(Check connectivity with sockops|Check connectivity with VXLAN|Check connectivity with Geneve) Edit: quay might be down :( ... |
|
test-focus K8sDatapathConfig.Encapsulation.(Check connectivity with sockops|Check connectivity with VXLAN|Check connectivity with Geneve) Edit: still failing on the above tests: https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated-Focus/250/ The same tests have passed locally 35 times in a row...not sure what's going on. |
|
Temporarily reverting #11863 to see if it causes the encryption tests to fail (shot in the dark). |
|
test-focus K8sDatapathConfig.Encapsulation.(Check connectivity with sockops|Check connectivity with VXLAN|Check connectivity with Geneve) Edit: looks like it has failed again (1.11 netnext), but passes for 1.17 |
46460ae to
27d4d4b
Compare
|
test-backport-1.7 Edit: trying without net-next label |
|
test-backport-1.7 |
|
From the last |
|
Closed by accident... apologies. |
|
test-backport-1.7 |
|
Ah, finally was able to reproduce locally. The difference between local and CI was that CI is running 3 K8s nodes. I was only running 2. Looking closer into fix now |
[ upstream commit f7b0378 ] This fixes an issue with the `HealthCheckNodePort` server where it would non-deterministically sometimes return a non-zero `localEndpoints` count on nodes which do not have local endpoints. Because Cilium internally creates a service object per frontend IP, we end up with multiple services sharing the same name. In the case where a `LoadBalancer` service has `externalTrafficPolicy=Local` with no local backends, Cilium will still create a `ClusterIP` sibling service which retains the non-local backends. In that case, we must take care to not incooperate the `ClusterIP` backends into the `localEndpoints` count intended for external traffic. The final count is dependent on the order in which services are added to the service manager, which explains why the occurence of this bug was non-deterministic. This commit fixes this issue by checking that the service may only contain local backends before its count is added to the `HealthCheckNodePort` server. Fixes: #11043 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 7d26df1 ] This commit is an attempt to add retry logic to Helm operations in the Kubernetes test suite. Signed-off-by: Chris Tarazi <chris@isovalent.com>
27d4d4b to
ec47f62
Compare
|
test-backport-1.7 EDIT(@joestringer): This didn't seem to trigger for some reason, retrying. |
|
test-backport-1.7 |
|
Looks like the previous regression is gone and only failures are known flakes #10442. This is probably good to go, @joestringer please double-check |
|
I agree that these are caused by the known flake. Two tests fail specifically with the symptoms and another two tests fail in the Merging. |
Skipped due to non-trivial conflicts:
Skipped as it depends on #11766:* #11804 -- fix(datarace): Fix possible nil pointer dereference (@sayboras)The above PR doesn't need to be backported, see here. Removing.
Skipped as it will be handled by original author:
Once this PR is merged, you can update the PR labels via: