Skip to content

bpf: Fix 4.19 program size and complexity issues#11915

Closed
pchaigno wants to merge 10 commits intomasterfrom
pr/pchaigno/fix-4.19-bpf-program-size
Closed

bpf: Fix 4.19 program size and complexity issues#11915
pchaigno wants to merge 10 commits intomasterfrom
pr/pchaigno/fix-4.19-bpf-program-size

Conversation

@pchaigno
Copy link
Copy Markdown
Member

@pchaigno pchaigno commented Jun 5, 2020

This PR does mainly two things, implemented by the 10 following commits.

Fix program size and complexity issues:

  • Enforce host policies before BPF-based SNAT to allow for following commit fix.
  • Add tail call to fix program size issue.
  • Simplify and extend tail_call.cocci.
  • Remove relax_verifier() that increases complexity.
  • Tail call even if IPv6 is disabled to fix complexity issue.

Enable kube-proxy-replacement=strict in 4.19 CI build:

  • Disable host firewall if BPF-based egress LB is enabled.
  • Add RunsOn419Kernel() helper
  • Helm option for fragment tracking to be able to disable it in CI.
  • Disable fragment tracking in 4.19 CI.
  • Finally: disable kube-proxy in 4.19 CI and use kube-proxy-replacement=strict.

Fixes: #11175.

/cc @brb

@pchaigno pchaigno added the release-note/misc This PR makes changes that have no direct user impact. label Jun 5, 2020
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 5, 2020

Coverage Status

Coverage decreased (-0.007%) to 37.003% when pulling 6b14bad on pr/pchaigno/fix-4.19-bpf-program-size into 55fccf6 on master.

@pchaigno pchaigno force-pushed the pr/pchaigno/fix-4.19-bpf-program-size branch from 03bbcaa to 90c8fa2 Compare June 5, 2020 18:49
@pchaigno pchaigno changed the title bpf: Testing partial host firewall fix bpf: Fix 4.19 program size and complexity issues Jun 5, 2020
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-4.19-bpf-program-size branch from 16150cb to 62e8b9e Compare June 5, 2020 21:30
@michi-covalent
Copy link
Copy Markdown
Contributor

hey @pchaigno i'm still hitting the same issue following https://docs.cilium.io/en/latest/gettingstarted/minikube/

% kubectl exec -it -n kube-system cilium-4rwtz -- cilium version
Client: 1.8.90 62e8b9e9a 2020-06-05T23:20:08+02:00 go version go1.14.4 linux/amd64
Daemon: 1.8.90 62e8b9e9a 2020-06-05T23:20:08+02:00 go version go1.14.4 linux/amd64

level=warning subsys=datapath-loader
level=warning msg="from 1084 to 1093: R0=inv(id=0) R1=inv0 R2=inv1500 R3=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R4=inv128 R6=ctx(id=0,off=0,imm=0) R7=inv3 R8=inv(id=0) R9=inv3 R10=fp0,call_-1 fp-144=0 fp-152=0 fp-192=0 fp-200=0 fp-208=0 fp-216=0 fp-224=map_ptr fp-256=0 fp-272=0" subsys=datapath-loader
level=warning msg="1093: (7b) *(u64 *)(r10 -264) = r2" subsys=datapath-loader
level=warning msg="1094: (7b) *(u64 *)(r10 -224) = r9" subsys=datapath-loader
level=warning msg="1095: (79) r1 = *(u64 *)(r10 -240)" subsys=datapath-loader
level=warning msg="1096: (63) *(u32 *)(r10 -88) = r1" subsys=datapath-loader
level=warning msg="1097: (18) r1 = 0x100000000000040" subsys=datapath-loader
level=warning msg="1099: (7b) *(u64 *)(r10 -96) = r1" subsys=datapath-loader
level=warning msg="1100: (b7) r2 = 0" subsys=datapath-loader
level=warning msg="1101: (7b) *(u64 *)(r10 -80) = r2" subsys=datapath-loader
level=warning msg="1102: (b7) r1 = 0" subsys=datapath-loader
level=warning msg="1103: (7b) *(u64 *)(r10 -240) = r1" subsys=datapath-loader
level=warning msg="1104: (63) *(u32 *)(r10 -84) = r2" subsys=datapath-loader
level=warning msg="1105: (bf) r2 = r10" subsys=datapath-loader
level=warning msg="BPF program is too large. Processed 131073 insn" subsys=datapath-loader

@joestringer joestringer added the needs/e2e-test This issue is not covered by existing CI tests, but should be. label Jun 6, 2020
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-4.19-bpf-program-size branch 3 times, most recently from 12950fd to 04f0c98 Compare June 8, 2020 05:50
pchaigno added 2 commits June 8, 2020 13:41
Signed-off-by: Paul Chaignon <paul@cilium.io>
This fix is required to get the host policies to work properly with
BPF-backed NodePort services. It is not a sufficient fix, but required
to resolve the 4.19 program size issue in subsequent commits, so sending
now.

Fixes: 88bf291 ("bpf: Enforce host policies for IPv4")
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-4.19-bpf-program-size branch from 04f0c98 to 390afe7 Compare June 8, 2020 11:46
@pchaigno
Copy link
Copy Markdown
Member Author

pchaigno commented Jun 8, 2020

test-me-please

@pchaigno
Copy link
Copy Markdown
Member Author

pchaigno commented Jun 8, 2020

  • K8s-1.11-Kernel-netnext fails because the K8sUpdates test fails and seems to bring subsequent tests down with it. From a quick look at all changes, I don't see what could cause this.
  • K8s-1.17-Kernel-4.19 fails because of other complexity issues. The good news is that all complexity issues seem to be confined to section 2/10 (CILIUM_CALL_IPV6_FROM_LXC) of bpf_lxc. I think I've only seen two different program sizes reported, so it could be just two different option combinations that are failing.

I suspect that the relax_verifier() that I removed because it was increasing complexity is maybe sometimes needed...

Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes so far look good to me (minor nits inline below), thanks a lot for this work!

Comment thread contrib/coccinelle/tail_calls.cocci Outdated
Comment thread bpf/lib/nodeport.h Outdated
Comment thread install/kubernetes/quick-install.yaml
Comment thread test/k8sT/Services.go
@michi-covalent
Copy link
Copy Markdown
Contributor

@pchaigno i tested this branch on minikube, and it does fix the issue i was seeing 💯 👍

Comment thread bpf/lib/common.h
Comment thread test/helpers/utils.go
Comment on lines +505 to +508
// DoesNotRunOn419Kernel is the complement function of RunsOn419Kernel.
func DoesNotRunOn419Kernel() bool {
return !RunsOn419Kernel()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can't we just open-code the !RunsOn419Kernel() in the callers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably. I was following the pattern established by other, similar helpers.

Comment thread install/kubernetes/quick-install.yaml
pchaigno added 8 commits June 9, 2020 08:00
On Linux 4.19 (and any kernel with 4096 BPF instructions limit), when
running with kube-proxy-replacement=strict, IPv4, IPv6, fragment support
and in debug mode, Cilium fails to load bpf_overlay and bpf_host with:

  Prog section '2/19' rejected: Argument list too long (7)
   - Type:         3
   - Attach Type:  0
   - Instructions: 4203 (107 over limit)
   - License:      GPL
  Error filling program arrays!

2/19 is section CILIUM_CALL_NODEPORT_REVNAT. We can reduce the number of
instructions in that section by breaking the IPv4 and IPv6 cases with
tail calls. Instead of introducing a new tail call on each path, we can
turn CILIUM_CALL_NODEPORT_REVNAT into CILIUM_CALL_IPV{4,6}_NODEPORT_REVNAT.

Signed-off-by: Paul Chaignon <paul@cilium.io>
This commit extends tail_call.cocci to:
- recognize invoke_tailcall_if() calls followed by 'return ret;', as
  used in nodeport.h;
- refactor the code a bit using disjunctions;
- ensure the return statement we are trying to match (e.g., return
  DROP_MISSED_TAIL_CALL or return send_drop_notify()) is not preceeded by
  another return statement.

Fixes: f02bd2a ("cocci: Detect unlogged missed tail calls")
Signed-off-by: Paul Chaignon <paul@cilium.io>
relax_verifier() is supposed to help the verifier prune branches and
therefore decrease the reported complexity. On 4.19 however, the call to
relax_verifier in CILIUM_CALL_IPV6_FROM_LXC actually increases the
complexity because it adds instructions in complexity-hot paths.

After a quick check, it doesn't seem to be required for 4.9 either, so
let's see if we can remove it.

Signed-off-by: Paul Chaignon <paul@cilium.io>
When IPv6 is disabled (e.g., default quick-install), the verifier rejects
the from-container section of bpf_lxc with a complexity error. This
error happens because when one of IPv6 and IPv4 is disabled, we don't
use a tail call to call tail_handle_ipv{4,6} and from-container gets
just long enough to cause a complexity error.

This commit fixes it by always using a tail call to execute
tail_handle_ipv{4,6}.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
We disable it to avoid hitting complexity issues on that kernel. The
fragment tracking test disables debug mode instead.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno force-pushed the pr/pchaigno/fix-4.19-bpf-program-size branch from 390afe7 to 6b14bad Compare June 9, 2020 06:09
@pchaigno
Copy link
Copy Markdown
Member Author

pchaigno commented Jun 9, 2020

retest-net-next

brb added a commit that referenced this pull request Jun 10, 2020
Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit that referenced this pull request Jun 11, 2020
Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit that referenced this pull request Jun 11, 2020
Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
aanm pushed a commit that referenced this pull request Jun 11, 2020
[ upstream commit 6beb6b3 ]

Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
joestringer pushed a commit that referenced this pull request Jun 12, 2020
[ upstream commit 6beb6b3 ]

Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
@pchaigno
Copy link
Copy Markdown
Member Author

Closing in favor of #11977 and #12045.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs/e2e-test This issue is not covered by existing CI tests, but should be. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce BPF program complexity for kernel 4.19 (ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict VS. fragment support)

6 participants