Skip to content

v1.8 backports 2020-06-18#12173

Merged
borkmann merged 80 commits intov1.8from
pr/v1.8-backport-2020-06-18
Jun 18, 2020
Merged

v1.8 backports 2020-06-18#12173
borkmann merged 80 commits intov1.8from
pr/v1.8-backport-2020-06-18

Conversation

@jrajahalme
Copy link
Copy Markdown
Member

@jrajahalme jrajahalme commented Jun 18, 2020

Added next batch into this branch:

Plus:

Once this PR is merged, you can update the PR labels via:

$ for pr in 12085 12076 12088 12101 12121 12090 12114 12106 12099 11905 12109 12089 12110 12126 11979 11969 12143 12150 10899 12156 12157 12148 12139 12149 12097 12146 12172 12164 12166 12108 12033 12174 12171 12155 12151 12134 12183 12179 12180; do contrib/backporting/set-labels.py $pr done 1.8; done

glibsm and others added 30 commits June 18, 2020 08:17
[ upstream commit f6994de ]

This not only adds a much neded enum, but also creates room
to fix up the zero-value trace point in the API.

Signed-off-by: Glib Smaga <code@gsmaga.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 331aab4 ]

Signed-off-by: Glib Smaga <code@gsmaga.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 496624b ]

'hubble-relay.image' was changed from a string to a map of strings by
commit commit bbf5377 ("Fix values scoping for newly implemented
sub-chart hubble-relay"). Fix the test helpers accordingly.

Without this helm would panic when ginkgo option
'-cilium.hubble-relay-image="docker.io/cilium/hubble-relay:latest"' is
passed as the helm command line included both 'hubble-relay.image' and
'hubble-relay.image.tag' like so:

$ helm template install/kubernetes/cilium --namespace=cilium  --set hubble-relay.image=docker.io/cilium/hubble-relay:latest  --set hubble-relay.image.tag=latest
panic: interface conversion: interface {} is string, not map[string]interface {}

goroutine 1 [running]:
helm.sh/helm/v3/pkg/strvals.(*parser).key(0xc0004c5ac0, 0xc00031a3c0, 0xc00067f800, 0xc)
	/private/tmp/helm-20200608-50972-gq0j1j/src/helm.sh/helm/pkg/strvals/parser.go:211 +0xdea

Fixes: bbf5377
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 3f96392 ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit dc4f549 ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit df97ca1 ]

It seems the we try to validate an endpoint without setting its
allocator which causes Cilium to panic while restoring old endpoints:

```
goroutine 259 [running]:
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).identityLabelsChanged(0xc000b122c0, 0x280e920, 0xc0009d2840, 0x1, 0x0, 0x0, 0x0)
        /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1818 +0x45c
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runIdentityResolver(0xc000b122c0, 0x280e920, 0xc0009d2840, 0x1, 0x1, 0xc000cb2d80)
        /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1748 +0x3db
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).UpdateLabels(0xc000b122c0, 0x280e920, 0xc0009d2840, 0xc000d2ca50, 0xc000d2cab0, 0x1, 0x6)
        /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1709 +0x3d0
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).RunMetadataResolver.func2(0x280e920, 0xc0009d2840, 0x3b4b200, 0xc000ce31b8)
        /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1621 +0x46a
github.com/cilium/cilium/pkg/controller.(*Controller).runController(0xc0000d6800)
        /go/src/github.com/cilium/cilium/pkg/controller/controller.go:205 +0xa2a
created by github.com/cilium/cilium/pkg/controller.(*Manager).updateController
        /go/src/github.com/cilium/cilium/pkg/controller/manager.go:120 +0xb09
```

With this change we set the allocator in the endpoint before we validate
the endpoint, avoiding it to panic.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 8191b16 ]

This uses `-D__NR_CPUS__=$(nproc --all)` (or `GetNumPossibleCPUs` when
invoked from Go) to compile the datapath.

This fixes an issue where cilium monitor fails to report any events
on AKS, due to the `perf_event_array` map duplicates being created
with different max_entries sizes, presumably causing the datapath
to write to the first one, while the agent is reading from the second
one.

This bug occurs for example on AKS due to the present/possible cpuset on
the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but
128 possible CPUs in /sys/devices/system/cpu.

Fixes: #12070

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit f3f583b ]

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 5148463 ]

Signed-off-by: Zang Li <zangli@google.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit ae19a9d ]

Having a init function to initialize all structures does not initialize
the different fields of 'CNPCRV' in case this variable is accessed
outside the 'v2/client' package. Replacing the 'init' function with
dedicated functions that initialize those fields allows 'CNPCRV' to have
the fields rightfully initialized.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 35501c7 ]

The CCNP validation is different from the CNP validation so we need to
validate the CCNP with the right schema validation.

Fixes: 9b0ae85 ("k8s: Fix CCNP for host policies")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit cb045a4 ]

This will allow us to read config variables in ginkgo-ext package before
ginkgo creates a test tree.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 30fa820 ]

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 5f3549c ]

Parsing cli flags in init func of our test package caused flag values to
be unavailable when ginkgo is creating test tree, which caused
`cilium.RunQuarantined` option to always be false.

This change moves flag parsing to a moment before ginkgo tree is
created.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 471fe63 ]

Skip Istio test if running cilium-istioctl is not supported for the
current Go runtime.

Support running Istio test from OSX by downloading the osx version of
cilium-istioctl if the test suite is running in OSX. This allows
running the Istio test on a remote cluster (e.g., GKE) when Ginkgo is
running on OSX.

On Windows the test is skipped, even though the cilium-istioctl binary
is released also for Windows, but this has not been tested yet.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 12ef7d1 ]

Only use the Ginkgo runtime OS for determining which cilium-istioctl
binary to download is the command executor is local, otherwise default
to "linux". This supports Ginkgo running in OSX both with local and
SSH Executors.

Fixes: #11905
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 0eab60d ]

Ginkgo --focus option is a regular expression that matches a substring
without any trailing wildcarding. Simplify all --focus parameters
accordingly.  The --focus parameter is also not repeatable, the last
one takes the effect if multiple ones are given. Fix the docs for
this.

Update the document to the current default K8S_VERSION (1.18).

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 305093a ]

Add generated or downloaded files to .gitignore and avoid pulling
images mentioned in generated logs or yamls.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 1f8cc15 ]

This commit makes it possible to do the following:

```
$ ./contrib/k8s/k8s-cilium-exec.sh bash -c "cilum status && hostname && echo"
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 8191855 ]

This commit adds a few misc. changes such as defining a new helper
function and using double-quotes around variables for consistency.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 502b225 ]

* Move the detailed flow deep dive into a sub section
* Mark Cluster Scope / Pool as the default IPAM
* Mark old host scope as legacy

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 76ebdf9 ]

* Improve structure
* Add overview diagram

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 3d0e805 ]

This guide has been unused for a while and is outdated.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 0326703 ]

* Mention all involved components
* Use the compoonent overview for a high-level description. More
  detailed descriptions will be added in dedicated concepts chapters.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit d7dbd6e ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit f7d9f4d ]

The different routing modes have been spread across multiple different
chapters. Consolidate it all in a single place: Concepts -> Networking
-> Routing.

Also split out multi-cluster to extend it later.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 71cfb26 ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit c7208d9 ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit 742fa95 ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
[ upstream commit e1b9658 ]

Also removes the oudated Kubernetes section.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
joestringer and others added 13 commits June 18, 2020 17:20
[ upstream commit c847327 ]

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 0793cb9 ]

Just to help people to avoid a situation where they think policy is
enabled but it is not.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit ab3f96c ]

- move `az aks create` to a separate file
- remove `--generate-ssh-keys` as it's not required
- use long flag names consistently
- ensure each version creates 2-node clusters
- ensure each version calls `az group create`
- avoid asking user copy-paste API credentials,
  utilise exsiting dependency on `jq`

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit cf76e2b ]

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit d28767b ]

Specifying the service name by itself as the DNS is sufficient since
hubble-relay and hubble-ui get deployed to the same namespace.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 6ea5d2a ]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 753d315 ]

The current command will complain with:

  $ sudo amazon-linux-extras install -y ethtool kernel-ng
  Topic ethtool is not found.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 82e36cd ]

The status field should be ignored when comparing 2 CNPs as this field
does not matter to the policy enforcement of the CNP.

This fixes a bug introduced by 134fdb5 which make Cilium to process
all CNP events from k8s including the ones where a status was the only
field modified. This made a cluster with 2 or more nodes to concurrently
trying the update its own status in the CNP causing the other node to
receive and process the CNP event.

Fixes: 134fdb5 ("k8s/watchers: fix missing missing CNP/CCNP updates")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 6482860 ]

This will be used for a new Getting Started with Hubble guide.
While here, fix a few typos.

Co-authored-by:: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 64905db ]

Co-authored-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 6a20868 ]

This adds a quick how-to for Hubble UI to the Hubble Getting Started
Guide. It deploys the Star Wars demo app from the policy tutorials.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 6d55cbf ]

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit a1cc34d ]

Performing a "greater than 16" causes the logic to be true when the
cluster is running with a minor version set to "16+" which is incorrect.
With this commit, the priorityClass will only be set when the clusters
are "greater or equal than 17".

Fixes: ce99a99 ("Set priorityClassName outside kube-system in k8s 1.17+")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Copy Markdown
Member

test-me-please

Copy link
Copy Markdown
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes

brb and others added 9 commits June 18, 2020 21:44
[ upstream commit f0ef604 ]

- Add session affinity to auto-{enable,disable} msgs.
- Make more clear that auto-enabling does not guarantee that the listed
  features will be enabled.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit da6ed27 ]

And move all related helpers from cmd/daemon_main.go into
cmd/kube_proxy_replacement.go to not bloat over the former.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit e601e8f ]

Previously, initKubeProxyReplacementOptions() was called after
initMaps() which when running with --kube-proxy-replacement=probe and
--enable-session-affinity=false resulted in the session affinity -related
BPF maps not being created.

An example error log message which illustrated that:

    level=warning msg="Unable to add entry to
    affinity match map" backendID=5 error="Unable to get object
    /sys/fs/bpf/tc/globals/cilium_lb_affinity_match: no such file or
    directory" serviceID=11 subsys=service

Fixes: bcdcf9b ("daemon: Move kubeProxyReplacement init after connect to k8s")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 3ab3d68 ]

The documentation about the BPF map scale and limitation is currently
duplicated. The version in the "Introduction" section (intro.rst) was
updated with the most recent changes, while the version in the "Maps"
section (which was split out in #11979 into maps.rst) contains an
outdated version.  Move all the up-to-date info to maps.rst.
Incidentally, this also makes the last sentence in the previous section
"Below we show the following possible flows..." make sense again since
it refers to the figure in section "Kubernetes Integration".

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 90adda6 ]

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 6d44f4c ]

As Cilium is more dependent on the CiliumNodes information to be more up
to date, the Cilium Operator should sync those nodes into the KVStore
instead of the k8s nodes. Using k8s nodes is not reliable as some of the
fields set in these structures are not up to date with the fields set in
the Cilium Nodes.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
…fail

[ upstream commit 03c39a0 ]

Fixes: e7d4f5c ("daemon: validate IPv4NativeRoutingCIDR value in DaemonConfig")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 658f9db ]

Fixes: c496e25 ("eni: Support masquerading")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit fc94aa1 ]

As we are currently running our CI with a CIDR from the Cilium-Operator,
which is "10.0.0.0/16", we should set it as part of our
'nativeRoutingCIDR'.

Fixes: ace902d ("helm: Enable BPF masquerading by default")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann
Copy link
Copy Markdown
Member

test-backport-1.8

@borkmann
Copy link
Copy Markdown
Member

The 4.19 still has the single bookinfo test fail, as discussed with @aanm and @joestringer it should not block the PR and the fix is coming in the next batch here #12190 . Once Cilium-PR-Ginkgo-Tests-K8s is done & green, we can merge.

@borkmann borkmann merged commit cb7667a into v1.8 Jun 18, 2020
@borkmann borkmann deleted the pr/v1.8-backport-2020-06-18 branch June 18, 2020 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/backports This PR provides functionality previously merged into master.

Projects

None yet

Development

Successfully merging this pull request may close these issues.