testing: final conversion 4.19 CI to kube-proxy free#12045
Merged
Conversation
|
Please set the appropriate release note label. |
Member
Author
|
retest-4.19 |
65c3fe4 to
994fe15
Compare
0bd37b3 to
248b222
Compare
Member
Author
|
test-focus K8sUpdates |
Member
Author
|
test-me-please |
Member
Author
|
test-focus K8sUpdates |
Member
Author
|
test-me-please |
pchaigno
reviewed
Jun 13, 2020
fa455e4 to
ba58905
Compare
In XDP, we can only xmit via XDP_TX, so a multi-device setup would otherwise be broken. Tell the user to specify the device manually in case there were multiple devices detected. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It is otherwise just confusing to users to figure out whether they need to react to this or not. Only really warn if we cannot auto-derive it from the k8s Node IP since this case would be unexpected. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add few notes on how to deal with nodes that have multiple devices which are auto-detected. Also make it clear as a limitation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a small paragraph for helping users to verify whether enabled or not. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Make users aware that they might need to tweak their map sizes and link to the guide which has more details on it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit b2c9b07 ("bpf, sock: reduce xlation for externalTrafficPolicy=Local to host_id") skips service translation for remote externalTrafficPolicy=Local services (aka not being HOST_ID ipcache scope). However, later on, we relaxed bpf_lxc to compile legacy service xlation back in for services with externalIPs via 6426c86 ("bpf, external ip: fix service xlation for containers"), which would then go and try to xlate such services locally. Issue is that here we don't differentiate between the different service types so we would go and attempt to xlate. Given we don't populate service maps with remote backends that traffic gets blackholed on bpf_lxc. Instead, only xlate externalIP services at that layer and skip the rest, so we let externalTrafficPolicy=Local dsts pass through and be xlated at the remote LB. We /already/ have this kind of behavior when we have netns cookies since the bpf_lxc xlation is compiled out entirely there, but not for older kernels, thus, fix it to make both consistent. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On Linux 4.19 (and any kernel with 4096 BPF instructions limit), when
running with kube-proxy-replacement=strict, IPv4, IPv6, fragment support
(and in debug mode), Cilium fails to load bpf_overlay and bpf_host with:
Prog section '2/19' rejected: Argument list too long (7)
- Type: 3
- Attach Type: 0
- Instructions: 4203 (107 over limit)
- License: GPL
Error filling program arrays!
2/19 is section CILIUM_CALL_NODEPORT_REVNAT. We can reduce the number of
instructions in that section by breaking the IPv4 and IPv6 cases with
tail calls. Instead of introducing a new tail call on each path, we can
turn CILIUM_CALL_NODEPORT_REVNAT into CILIUM_CALL_IPV{4,6}_NODEPORT_REVNAT.
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Given v4/v6 has been split off into different tail calls, it enables us to significantly bump the collision retries on older kernels. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
... as otherwise it won't work on multi-dev node. Revert back to auto-detection after XDP tests have run. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Re-enable it such that we can run CI with kube-proxy-free BPF code alongside kube-proxy. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Prior tests have been run with the latest Cilium image. Using the old one for the initial cleanup is broken since it will not clean-up everything. For example the 1.7 cleanup method will try to remove tc attached filters with the name bpf_netdev.o, but for 1.8/latest they are names bpf_netdev_<dev>.o which will then break connectivity after flushing all files from BPF fs as tail call maps are purged which gets us into drop-all due to missed tail calls. Run new (and old) to make sure there are no quirks from old version left behind that the new one didn't catch. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
... and instead rely on kube-proxy's strict replacement. Also disable tests that are not working on 4.19 due to missing kernel features on NodePort side. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
78e6f32 to
beb8ee0
Compare
Member
Author
|
test-me-please |
Member
Author
|
retest-4.9 |
Member
Author
|
4.19 has a single test that is flaking (K8sDatapathConfig Etcd Check connectivity). Was green on prior test run. |
qmonnet
added a commit
that referenced
this pull request
Jun 17, 2020
The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present. Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19"). Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next). Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
qmonnet
added a commit
that referenced
this pull request
Jun 18, 2020
The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present. Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19"). Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next). Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
aanm
pushed a commit
that referenced
this pull request
Jun 21, 2020
The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present. Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19"). Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next). Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
aanm
pushed a commit
that referenced
this pull request
Jun 21, 2020
[ upstream commit 5b9503f ] The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present. Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19"). Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next). Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
aanm
pushed a commit
that referenced
this pull request
Jun 22, 2020
[ upstream commit 5b9503f ] The test for fragment tracking support got a fix with commit 0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"), where the pattern for searching entries in the Conntrack table accounts for DNAT not happening if kube-proxy is present. Following recent changes in the datapath and tests for the Linux 4.19 kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in use. This led to CI failures, and the test was disabled for 4.19 kernels with commit 1120aed ("test/K8sServices: disable fragment tracking test for kernel 4.19"). Now that complexity issues are fixed (see #11977 and #12045), let's enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present and bpf_sock (host-reachable services) is not in use. This is also the case for net-next kernels (this didn't fail in CI before because we do not test with kube-proxy on net-next). Note that (as far as I know) both 4.19 and net-next always use bpf_sock in CI runs, so the check on hostReachableServices is currently superfluous. Let's have it all the same, in case something changes in the future, to avoid unexpected breakage. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: André Martins <andre@cilium.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-enabling probe mode on 4.19.
Fixes: #11175