v1.8 backports 2020-07-15#12536
Conversation
[ upstream commit a091ba1 ] This avoids complications of the compiler reordering the fields which we also use from Go code, and also allows the use of masks in checks. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 414cadf ] Instead of just a boolean, return the reason of failure in isConnectedAndHasQuorum() Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 5875b77 ] In general, the etcd client library should fail over to a healthy etcd endpoint and quorum errors should automatically resolve. If this does not happen for some reason, report unhealthy status to trigger a restart of the agent. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit b95650b ] Shuffling the list of etcd endpoints on each bootstrap has two positive effects: * Agents in the cluster will connect to different etcd members. * In case etcd fails to fail-over to due to a bug, a Cilium agent restart has a chance to fail-over to a healthy etcd member. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 53c17cd ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit d7e2076 ] Previously, BPF masquerading could not be performed when running with non-hybrid DSR due to a call to nodeport_nat_fwd() being compiled out. The latter is responsible for SNATing packets from pods to world. Fix: 0962c02 ("datapath: Enable BPF MASQ for veth mode in IPv4") Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 31592c5 ] Host Policies can break kube if applied incorrectly. If a user unintentionally applies an overly aggressive host-level netpolicy it will break access to their nodes and break kube. Given the blast radius of a mistake and the undocumented fix, I discussed an initial solution in the docs with @pchaigno and we agreed that a good start is preventing an exmaple policy from loading until the user changes example values. This has the benefit of explicitly requiring some thought from users before applying a policy that could lead to a k8s outage. Happy to discuss alternative ideas here but the goal is to protect operators from breaking their environments while testing this feature. Signed-off-by: Jed Salazar <jed@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 9fe5b33 ] Signed-off-by: JieJhih Jhang <jiejhihjhang@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit f6ffe0d ] Name it generically and add another bool that allows to control whether the dst port must be in the nodeport range or outside of it. For the current users, it is set so that they match on the configured NodePort range. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 48bf28a ] Prepare the socket LB for a hostport-related wildcard lookup. The NodePort and HostPort range are exclusive, so the lookup is still fast given we can bail out early w/o map lookup in sock{4,6}_wildcard_lookup(). Aside from the range, we must match on HOST_ID dst address for the hostport lookup. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 7555632 ] Ensure that we don't process such packets from outside if they ever land on our node. Instead of doing a more expensive memcmp() on the address, just include this into the test for the underlying type. The latter is still present for the case where the agent is restarted & reconfigured and map entries are still stale before k8s resync. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 81353c6 ] In IPv6 case we already use net.IPv6zero in LoadBalancerNodeAddresses(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 7651ecd ] In order to be compatible with portmap's HostPort mode, we need to expose the host port on all addresses for the case where no hostIP was specified. This also includes the loopback address. For the case where hostIP was specified, we only expose the user provided IP address under HostPort. A typical portmap rule looks like (removing CNI-HOSTPORT-SETMARK from here): [...] [6:408] -A PREROUTING -m addrtype --dst-type LOCAL -j CNI-HOSTPORT-DNAT [...] [0:0] -A CNI-DN-1ba85aa0447ede8ba6dab -d 8.8.8.8/32 -p tcp -m tcp --dport 999 -j DNAT --to-destination 10.0.0.217:80 [0:0] -A CNI-DN-489a67438eed03d272897 -d 127.0.0.1/32 -p tcp -m tcp --dport 298 -j DNAT --to-destination 10.0.0.27:80 [2:120] -A CNI-DN-526ac7f749f8260cb0d9f -p tcp -m tcp --dport 666 -j DNAT --to-destination 10.0.0.188:80 [2:120] -A CNI-DN-81de08afe40acdfde3fac -d 192.168.178.29/32 -p tcp -m tcp --dport 777 -j DNAT --to-destination 10.0.0.138:80 [2:120] -A CNI-HOSTPORT-DNAT -p tcp -m comment --comment "dnat name: \"portmap\" id: \"dd111758b3cc7d46e0c027725f4c1cc3d5ac349c6cb1d6ebb919a5f433f155aa\"" -m multiport --dports 666 -j CNI-DN-526ac7f749f8260cb0d9f [...] In above example, hostPort 666 has no hostIP specified; hostPort 777 had the hostIP of 192.168.178.29 specified. In such case access via 127.0.0.1:777 is not possible. hostPort 298 had 127.0.0.1 as hostIP specified, so it's only available via loopback but not from outside world. And hostPort 999 had 8.8.8.8 as hostIP specified. It's not forbidden but it won't match/xlate if 8.8.8.8 is not a local address on the node. Implement these semantics also for the BPF-based replacement by reworking the k8s watcher's genServiceMappings(). Fixes: #12116 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 1550a36 ] Final one to enforce drop of surrogate entries for external traffic. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 56e0e72 ] Extend the batch of hostPort tests to improve coverage. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 651199c ] Some keys can be reused in between GC function calls. To avoid them being GCed we should not mark them as stale keys and wait for the next GC call to be executed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit fe9ca34 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 0892086 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit ea57e36 ] To prevent bursts of security identities from being deleted in the KVStore, possibly causing Cilium agent to have a high CPU usage due policy calculation, this commit adds a rate limiter for such KVStore deletes. For example, in case there are 1000 identities to GCed, the operator will delete 250 every minute until all 1000 identities are GCed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 93e4845 ] Simplify the code which handles HostPort's frontend/backend generation to make it less complex. Suggested-by: Andre Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 3a311ab ] This commit fixes a bug introduced by 9975bba where we were accidentally writing into the object metadata of the local pod store, which should never happen. Doing that could cause k8s update pod events to give to the Update function handlers an old pod structure with such fields modified. Fixes: 9975bba ("pkg/endpoint: fetch pod and namespace labels from local stores") Reported-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 657295c ] Since we address NodePort in k8s2 using the DNS proxy port of k8s2 as the source port from k8s1, one round is enough regardless of the backend selection, as in both cases the replies are reverse NATted at k8s2 (where the port conflict was happening before it was fixed by #12248). Fixes: #12336 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
…ementation [ upstream commit d88d5f9 ] The linuxNodeHandler.neighByNode map is also protected by linuxNodeHandler.mutex in all other code paths. Protect access to it in (*linuxNodeHandler).NodeValidateImplementation as well. Fixes #12460 Fixes: 6c06c51 ("node: Remove permanent ARP entry when remote node is deleted") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 84dda5f ] We missed this step during the v1.8 release cycle. Clarify the required steps to hopefully reduce the likelihood we skip past it. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 5e9c427 ] 0a9685e introduced the reserved:unmanaged identity, but did not introduce the associated entity. This commit fixes it and allows the following in policies: - fromEntity: - unmanaged The goal is to allow policies to work even before all pods are restarted and managed by Cilium. In particular, this can be an issue with host policies since the host is more likely than pods to need to talk with unmanaged pods (e.g., on GKE right after deploying Cilium). Fixes: 0a9685e ("identity: Introduce reserved:unmanaged identity") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 7e6a7bb ] Do not convert the loopback address into a surrogate, but instead, if explicitly selected, ignore it and dump a warning message to the agent log. Converting to surrogate is not comparable given this would also allow _local_ Pods to connect via the Node's IP. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 2c0da38 ] Document the expected behavior with regards to addressing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 4d831cd ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit cc162e5 ] "Backend not initialized" does not mean much to users. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit a751039 ] The initial status message of the etcd subsystem is: ``` KVStore: Ok No connection to etcd ``` This can be misleading as it does not indicate whether the etcd session was ever established or not. Clarify this: ``` KVStore: Ok Waiting for initial connection to be established ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 03b3c10 ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 0b0dd55 ] When releasing the etcd connection, sessions are attempted to be revoked. In the event of an unhealthy etcd connection, the operation will fail and time out. This operation will take a long time though. Instead of blocking, release the resources in the background. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 3e951e1 ] Good condition: ``` cluster2: ready, 4 nodes, 3 identities, 1 services, 0 failures (last: never) ``` Bad condition: ``` cluster2: not-ready, 0 nodes, 0 identities, 0 services, 1 failures (last: 9s ago) ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
[ upstream commit 21783d5 ] Previously, regenerating an endpoint manually via CLI resulted in compilation errors as seen below. These errors occurred because the endpoint headerfile (ep_config.h or lxc_config.h) was not written to the `<endpoint ID>_next` directory (will be reffered to as "next"), where the BPF compilation takes place. The reason the headerfile was not written to the "next" directory was because Cilium only wrote the headerfile if it was changed (via hash). However, a manual regeneration triggered through the API, sets the regeneration level to "compile+load" (a full regeneration), where Cilium expects all relevant files, including the headerfile, to be present in the "next" directory (`<endpoint ID>_next`). Hence, the compilation errors. This commit fixes this issue by checking whether there has been a request for a full regeneration, in which case, we write the headerfile to the "next" directory. ``` root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c In file included from /var/lib/cilium/bpf/bpf_lxc.c:22: /var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC' union macaddr smac, dmac = NODE_MAC; ^ /var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC' union macaddr router_mac = NODE_MAC; ^ In file included from /var/lib/cilium/bpf/bpf_lxc.c:26: /var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC' union macaddr router_mac = NODE_MAC; ^ /var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip6_remote_endpoint(&orig_dip); ^ /var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'? /var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here lookup_ip6_endpoint(struct ipv6hdr *ip6) ^ /var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip6_remote_endpoint(&orig_dip); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip4_remote_endpoint(orig_dip); ^ /var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'? /var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here lookup_ip4_endpoint(const struct iphdr *ip4) ^ /var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip4_remote_endpoint(orig_dip); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip6_remote_endpoint(src); ^ /var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip6_remote_endpoint(src); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip4_remote_endpoint(ip4->saddr); ^ /var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip4_remote_endpoint(ip4->saddr); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11 errors generated. ``` Fixes #10630 Fixes #12005 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
|
Hit also #12555 in the k8s-1.13 run. https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-K8s/3312/ |
|
test-missed-k8s |
|
I believe once we pick up the backport of #12559, the remaining k8s CI test should begin passing again. |
|
(all green except |
[ upstream commit 037bef5 ] Previously, we did not take into account 'from-network' sources in the monitor aggregation logic check in `send_trace_notify()`, which was fine because we rarely ever sent such events (limited to ipsec for instance). However, since commit c470e28 we also use this in bpf_host which suddenly means that any and all traffic from the network will trigger monitor events, flooding the monitor output. Fixes: 7a4b0be ("bpf: Add MonitorAggregation option") Fixes: c470e28 ("Adds TRACE_TO_NETWORK obs label and trace pkts in to-netdev prog.") Fixes: #12555 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt>
|
test-backport-1.8 |
|
@joestringer I've included #12559 into this backport. |
qmonnet
left a comment
There was a problem hiding this comment.
Looks all good to me, let's wait for the tests and we're good to go!
|
retest-missed-k8s |
|
Pulling images has failed. E.g. |
|
test-missed-k8s |
|
test-missed-k8s |
|
test-missed-k8s (Damn, GH UI didn't show that it has been already merged.) |
v1.8 backports 2020-07-15
Once this PR is merged, you can update the PR labels via: