Skip to content

testing: final conversion 4.19 CI to kube-proxy free#12045

Merged
borkmann merged 14 commits intomasterfrom
pr/fix-4.19-verifier2
Jun 16, 2020
Merged

testing: final conversion 4.19 CI to kube-proxy free#12045
borkmann merged 14 commits intomasterfrom
pr/fix-4.19-verifier2

Conversation

@borkmann
Copy link
Copy Markdown
Member

@borkmann borkmann commented Jun 12, 2020

Re-enabling probe mode on 4.19.

Fixes: #11175

@maintainer-s-little-helper
Copy link
Copy Markdown

Please set the appropriate release note label.

@borkmann
Copy link
Copy Markdown
Member Author

retest-4.19

@borkmann borkmann changed the title [no not merge] testing only testing: final conversion 4.19 CI to kube-proxy free Jun 12, 2020
@borkmann borkmann force-pushed the pr/fix-4.19-verifier2 branch from 65c3fe4 to 994fe15 Compare June 12, 2020 14:05
@borkmann borkmann added the release-note/misc This PR makes changes that have no direct user impact. label Jun 12, 2020
@borkmann borkmann added area/CI Continuous Integration testing issue or flake and removed pending-review labels Jun 12, 2020
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 12, 2020

Coverage Status

Coverage decreased (-0.04%) to 36.999% when pulling beb8ee0 on pr/fix-4.19-verifier2 into e4bd54c on master.

@borkmann borkmann force-pushed the pr/fix-4.19-verifier2 branch 4 times, most recently from 0bd37b3 to 248b222 Compare June 12, 2020 22:17
@borkmann
Copy link
Copy Markdown
Member Author

test-focus K8sUpdates

@borkmann borkmann requested review from joestringer and pchaigno June 12, 2020 22:24
@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

@borkmann borkmann added needs-backport/1.8 area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. labels Jun 12, 2020
@borkmann
Copy link
Copy Markdown
Member Author

test-focus K8sUpdates

@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

Comment thread bpf/bpf_host.c
borkmann and others added 12 commits June 16, 2020 13:11
In XDP, we can only xmit via XDP_TX, so a multi-device setup would
otherwise be broken. Tell the user to specify the device manually
in case there were multiple devices detected.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It is otherwise just confusing to users to figure out whether they need
to react to this or not. Only really warn if we cannot auto-derive it
from the k8s Node IP since this case would be unexpected.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add few notes on how to deal with nodes that have multiple devices which
are auto-detected. Also make it clear as a limitation.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a small paragraph for helping users to verify whether enabled or not.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Make users aware that they might need to tweak their map sizes and
link to the guide which has more details on it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit b2c9b07 ("bpf, sock: reduce xlation for externalTrafficPolicy=Local
to host_id") skips service translation for remote externalTrafficPolicy=Local
services (aka not being HOST_ID ipcache scope).

However, later on, we relaxed bpf_lxc to compile legacy service xlation back in
for services with externalIPs via 6426c86 ("bpf, external ip: fix service
xlation for containers"), which would then go and try to xlate such services
locally.

Issue is that here we don't differentiate between the different service types
so we would go and attempt to xlate. Given we don't populate service maps with
remote backends that traffic gets blackholed on bpf_lxc.

Instead, only xlate externalIP services at that layer and skip the rest, so we
let externalTrafficPolicy=Local dsts pass through and be xlated at the remote
LB. We /already/ have this kind of behavior when we have netns cookies since
the bpf_lxc xlation is compiled out entirely there, but not for older kernels,
thus, fix it to make both consistent.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On Linux 4.19 (and any kernel with 4096 BPF instructions limit), when
running with kube-proxy-replacement=strict, IPv4, IPv6, fragment support
(and in debug mode), Cilium fails to load bpf_overlay and bpf_host with:

  Prog section '2/19' rejected: Argument list too long (7)
   - Type:         3
   - Attach Type:  0
   - Instructions: 4203 (107 over limit)
   - License:      GPL
  Error filling program arrays!

2/19 is section CILIUM_CALL_NODEPORT_REVNAT. We can reduce the number of
instructions in that section by breaking the IPv4 and IPv6 cases with
tail calls. Instead of introducing a new tail call on each path, we can
turn CILIUM_CALL_NODEPORT_REVNAT into CILIUM_CALL_IPV{4,6}_NODEPORT_REVNAT.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Given v4/v6 has been split off into different tail calls, it enables
us to significantly bump the collision retries on older kernels.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
... as otherwise it won't work on multi-dev node. Revert back to auto-detection
after XDP tests have run.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Re-enable it such that we can run CI with kube-proxy-free BPF code alongside
kube-proxy.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Prior tests have been run with the latest Cilium image. Using the old one
for the initial cleanup is broken since it will not clean-up everything.

For example the 1.7 cleanup method will try to remove tc attached filters
with the name bpf_netdev.o, but for 1.8/latest they are names bpf_netdev_<dev>.o
which will then break connectivity after flushing all files from BPF fs as
tail call maps are purged which gets us into drop-all due to missed tail calls.

Run new (and old) to make sure there are no quirks from old version left behind
that the new one didn't catch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
... and instead rely on kube-proxy's strict replacement.

Also disable tests that are not working on 4.19 due to missing kernel
features on NodePort side.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann force-pushed the pr/fix-4.19-verifier2 branch from 78e6f32 to beb8ee0 Compare June 16, 2020 11:14
@borkmann
Copy link
Copy Markdown
Member Author

test-me-please

@borkmann
Copy link
Copy Markdown
Member Author

retest-4.9

@borkmann
Copy link
Copy Markdown
Member Author

4.19 has a single test that is flaking (K8sDatapathConfig Etcd Check connectivity). Was green on prior test run.

@borkmann borkmann merged commit 0fa8ac7 into master Jun 16, 2020
@borkmann borkmann deleted the pr/fix-4.19-verifier2 branch June 16, 2020 13:17
qmonnet added a commit that referenced this pull request Jun 17, 2020
The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
qmonnet added a commit that referenced this pull request Jun 18, 2020
The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
aanm pushed a commit that referenced this pull request Jun 21, 2020
The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
aanm pushed a commit that referenced this pull request Jun 21, 2020
[ upstream commit 5b9503f ]

The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
aanm pushed a commit that referenced this pull request Jun 22, 2020
[ upstream commit 5b9503f ]

The test for fragment tracking support got a fix with commit
0e772e7 ("test: Fix fragment tracking test under KUBEPROXY=1"),
where the pattern for searching entries in the Conntrack table accounts
for DNAT not happening if kube-proxy is present.

Following recent changes in the datapath and tests for the Linux 4.19
kernel, DNAT is now used even with kube-proxy, provided bpf_sock is in
use. This led to CI failures, and the test was disabled for 4.19 kernels
with commit 1120aed ("test/K8sServices: disable fragment tracking
test for kernel 4.19").

Now that complexity issues are fixed (see #11977 and #12045), let's
enable the test on 4.19 again. Ignore DNAT only if kube-proxy is present
and bpf_sock (host-reachable services) is not in use. This is also the
case for net-next kernels (this didn't fail in CI before because we do
not test with kube-proxy on net-next).

Note that (as far as I know) both 4.19 and net-next always use bpf_sock
in CI runs, so the check on hostReachableServices is currently
superfluous. Let's have it all the same, in case something changes in
the future, to avoid unexpected breakage.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce BPF program complexity for kernel 4.19 (ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict VS. fragment support)

4 participants