Skip to content

Reduce BPF program complexity for kernel 4.19 (ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict VS. fragment support) #11175

@qmonnet

Description

@qmonnet

@brb reported that enabling a large set of features, even with debug=false, would have BPF from_overlay() fail to pass the verifier because it is bigger than the 4096-instruction limit.

From #11085:

Also, validated on 4.19.57 with ipv4+ipv6+debug=false+vxlan+kubeProxyReplacement=strict - had to disable the ipv4 frags feature to make the verifier happy (otherwise, from-overlay is 26 insn over the limit

Slack thread (#sig-datapath)

martynas Apr 17th

We are trying to enable all tests for test-with-kernel build which runs on 4.19.57 kernel (4k instr limit). There are problems with the BPF programs complexity though. By default, all tests run with --enable-debug=true which has been broken when running with ipv4+ipv6+nodeport+snat combo for quite some time (270 instr over the limit in from-overlay of bpf_overlay.c; I haven't found a commit which would not hit limit yet). If we disable the debug mode, we are still over the limit (14 instr) which is due to the recent ipv4 frag feature (/cc @qmo). So, I'm suggesting to run test-with-kernel with the debug mode disabled for now. The ipv4 frag issue should be fixed though. WDYT?

Joe

Yeah, I think getting coverage on this is better than getting debug messages although I'm sure it'll be painful to debug 😕

qmo

Not sure how to remove insns for fragment support. But I can take a look.

Joe

does --enable-debug both enable the datapath debug config as well as control the cilium-agent logs? I'd want to make sure we at least still get the agent logs about the verifier failures

martynas

I believe that it enables both

Joe

that's my impression as well. But actually maybe it doesn't matter too much for this specific case because any verifier failure should actually print out the warning, shouldn't it?

martynas

yeah, it will crash the agent process anyway

Joe

would need to double-check that we print the verifier output if debug logs are disabled. I'm pretty sure we do if the load comes from bpf/init.sh because of the way the bash tracing works, but I'm not sure about the endpoint bpf loads where we invoke TC directly from Go

martynas

the failures are logged with level=warning, so it should print them before crashing

pchaigno

Is that 14 insns on 4096 that have to be found and removed? Or 14 on the other limit, 128k?

pchaigno

I'm guessing 14/4096. Then maybe we could get rid of the bit shift insns Clang generates?

r0 <<= 32;
r0 s>>= 32;

Almost all of those come from casting the return type of BPF helpers to int. If we keep u64 and bitmatch the MSB instead, we can probably reduce by 1 insn per helper call (for those where we actually check the return code), no?

martynas

it's 14/4096

daniel

Could we do the following: --enable-debug would be split to have the options of --enable-debug=all|datapath|agentso we can select if we want all as today, or only in datapath or only in agent . Then we can run bpf-next with all and 4.19 with agent so we still have debug messages there but avoid hitting the limit for BPF code where debug is not run in user's prod settings anyway.

daniel

And this also has the advantage that we actually test the BPF code with/without debug, so I think it would be a good idea.

martynas

yeah, makes sense

Joe

> the failures are logged with level=warning, so it should print them before crashing

The warning itself, yes. I'm also concerned about the output from TC which IIRC is actually just printed to stdout, but might be gated by a conditional check for the log level


We should check if this issue can still be reproduced now that #11089 has been merged.

If it still occurs, some options to explore to save instructions could be:

  • Save instructions in fragment handling somehow? (Not sure how)
  • Use u64s instead of u32s in code to avoid shifts by clang for zeroing upper half-registers?
  • Split IPv4/IPv6 cases with tail calls.
  • Other?

Metadata

Metadata

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/complexity-issueRelates to BPF complexity or program size issueskind/enhancementThis would improve or streamline existing functionality.needs/triageThis issue requires triaging to establish severity and next steps.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions