Skip to content

v1.8 backports 2020-08-25#12963

Merged
joestringer merged 21 commits intov1.8from
pr/brb/backport-v1.8-20200825
Aug 26, 2020
Merged

v1.8 backports 2020-08-25#12963
joestringer merged 21 commits intov1.8from
pr/brb/backport-v1.8-20200825

Conversation

@brb
Copy link
Copy Markdown
Member

@brb brb commented Aug 25, 2020

v1.8 backports 2020-08-25

NOT INCLUDED DUE TO MERGE CONFLICTS:

Once this PR is merged, you can update the PR labels via:

$ for pr in 12809 12852 12841 12940 12947; do contrib/backporting/set-labels.py $pr done 1.8; done

@brb brb added pending-review kind/backports This PR provides functionality previously merged into master. labels Aug 25, 2020
@brb brb requested a review from a team as a code owner August 25, 2020 11:20
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 25, 2020

Tests are failing because #12852 has to be backported too.

aanm and others added 8 commits August 25, 2020 16:38
[ upstream commit 268f406 ]

If the client does not receive a keep alive from the server, that
connection should be closed so the etcd client library does proper
round robin for the other available endpoints.

This might be a little bit aggressive in a larger environment if all
clients perform a keep alive requests to the etcd servers. Some
testing could be done to verify if there is a large overhead of
doing these keep alive requests.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit a4a1df0 ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 62a4b6e ]

The struct is used to pass params to (*service).UpsertService(). The
main advantage of using the struct instead of listing individual params
is that a new param can be added without much pain.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 696e3ec ]

Major changes:
    - Remove unused multi-svc type feature
    - Introduce struct to represent NewSvcFlag() params.

Misc:
    - Remove redundant logging (pkg/service is verbose enough).
    - Add a flag to be used by LB src range check.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 0818740 ]

This commit re-enables LoadBalancer service tests, and switches from
MetalLB to dummylb [1]. The main reason for switching is that the former
was operating in L2 mode which interfered with the vbox VM bridge
resulting in the test flakes.

The new lb only assigns LoadBalancerIP addr to a service (specified by a
user in .spec.loadBalancerIP). Therefore, making the LB IP addr reachable
from other nodes is up to a user. This can be achieved by installing a
route on a client node (as we do in the test).

[1]: https://github.com/cilium/dummylb.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 03ef3e4 ]

Instead of returning (*CmdRes, error) make the method to return *CmdRes
by setting error in the CmdRes.err field.

This makes assertions of functions which depend on the method simpler.
A notable example is ExecInHostNetNS() which previously required to
assertions.

Finally, extend the *CMDRes success matcher by adding a check whether
CmdRes.err is not nil. This will help us to catch bugs when CmdRes was
considered as successful, although an error was set.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit a7c2130 ]

- Invoke cmds for a given node instead of returning cmd itself - reduces
  boilerplate by quite a bit.
- Changes method names form to verb + noun.

Suggested-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit acd8a7d ]

Otherwise, GKE LB might fail to assign the IP addr to other LB svcs
which results in the following failure:

  <*errors.errorString | 0xc0002a6410>: {
              s: "could not get service LoadBalancer IP addr: 30s
              timeout expired",
  }

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/backport-v1.8-20200825 branch 2 times, most recently from 8919a93 to 82bd63c Compare August 25, 2020 15:34
brb added 13 commits August 25, 2020 17:36
[ upstream commit c815406 ]

Currently, the flag is noop.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 3195681 ]

It's going to be used by the loadBalancerSourceRanges check.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 945a852 ]

The maps are going to be used by the LB source range check. The check is
going to be enabled for those services which type is LoadBlancer AND
.spec.loadBalancerSourceRanges is not empty. Such service flags have the
check source range bit being set. This allows the datapath to avoid
doing a lookup in the maps for each request sent to a service of the
LoadBalancer type.

The maps are of the LPM type, and a key is a tuple containing LPM
prefix, SVC rev_nat_id (aka svc ID), some padding and source IPv{4,6}
addr.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 34fae55 ]

The service manager gets updates from the k8s service watcher. If
--enable-loadbalancer-source-range=true AND svc type = LB AND
len(.spec.loadBalancerSourceRanges) != 0, it will populate the
corresponding BPF maps via lbmap.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 8ad79dc ]

The check is performed only if the SVC_FLAG_CHECK_SRC bit is set of a
given service. The check is based on a lookup of the
LB{4,6}_SRC_RANGE_MAP BPF maps which are of the LPM type. If an entry
is not found for a given src IP addr + rev_nat_index, then such request
is dropped.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit f903c17 ]

Both structs are identical, and the latter has been used everywhere.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit dd135f5 ]

The check is going to be used by the LB src range check, so reuse a
result from a previous check if available.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit ae71d89 ]

Disable it when running in the non-strict mode if the full LPM is
missing, or fail hard if running in the strict mode.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit bd6600f ]

Rename to --enable-svc-source-range-check, as in the future we might
enable (e.g. via annotations) the checks for services which type !=
LoadBalancer.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit c142e5e ]

Currently, the test contains some sleeps for waiting until cilium-agents
have updated their LB src range maps. In the future, it will be replaced
by checking whether "cilium bpf source-list list" (TODO) contains
relevant entries.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 3ec7b35 ]

The config option can be used to disable the (LB) source range check
feature (--enable-svc-source-range-check).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 871080b ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 2a30cc1 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 25, 2020

test-me-please

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 25, 2020

test-backport-1.8

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 26, 2020

4.9 CI failed due to:

k8s1-1.19: sudo: kubeadm: command not found
[2020-08-25T16:48:41.713Z] The SSH command responded with a non-zero exit status. Vagrant

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 26, 2020

retest-4.9

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 26, 2020

test-upstream-k8s

@brb
Copy link
Copy Markdown
Member Author

brb commented Aug 26, 2020

retest-gke

@joestringer
Copy link
Copy Markdown
Member

k8s-1.19-kernel-4.9 target failed provisioning. Will retry.

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.9/296/flowGraphTable/

@joestringer
Copy link
Copy Markdown
Member

test-1.19-4.9

@joestringer
Copy link
Copy Markdown
Member

Hmm, it did nothing, but I also realised that there's already k8s-1.18-kernel-4.9 which tested a similar set of options and that one passed:

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/1335/

Furthermore, AFAIK GKE is not yet fully fixed & enabled on the latest code let alone v1.8 branch so I'm assuming that the GKE failure here is a CI issue:

https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/2305/

Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the commits not authored by you since I figure you have the best view into the LB ones to properly resolve any conflicts. 👍

@joestringer joestringer merged commit 1a69583 into v1.8 Aug 26, 2020
@joestringer joestringer deleted the pr/brb/backport-v1.8-20200825 branch August 26, 2020 19:13
@aanm aanm mentioned this pull request Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/backports This PR provides functionality previously merged into master.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants