Conversation
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
eeac455 to
9226802
Compare
|
Do you have output for the test failure? |
Yes, it can be found here https://github.com/cilium/cilium/actions/runs/9589413585 |
9226802 to
a946c12
Compare
|
What is the ubuntu base image used for? It's pretty bad if the verifier test fails with a complexity error, so I'm hesitant to just keep the old version to "fix" it. cc @ti-mo seems like maybe we're using a different clang now? i was under the impression that we bake in our own clang? |
a946c12 to
a843c76
Compare
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Tam Mach <tam.mach@cilium.io>
a843c76 to
7a6e713
Compare
This reverts commit 11cfa1a to avoid the below verifier issue in cilium/cilium upstream ``` --- FAIL: TestVerifier/bpf_host (111.35s) --- FAIL: TestVerifier/bpf_host/1 (24.05s) --- FAIL: TestVerifier/bpf_host/2 (21.32s) ... ``` Sample error log for TestVerifier/bpf_host/1 ``` verifier_test.go:244: Error: program tail_nodeport_nat_ingress_ipv4: load program: argument list too long: BPF program is too large. Processed 1000001 insn (1445 line(s) omitted) Verifier error tail: load program: argument list too long: (1436 line(s) omitted) ; struct ipv4_ct_tuple icmp_tuple = { 1201: (63) *(u32 *)(r10 -12) = r1 ; .flags = tuple->flags | TUPLE_F_RELATED, 1202: (71) r1 = *(u8 *)(r10 -123) ; .flags = tuple->flags | TUPLE_F_RELATED, 1203: (44) w1 |= 2 ; struct ipv4_ct_tuple icmp_tuple = { 1204: (73) *(u8 *)(r10 -3) = r1 BPF program is too large. Processed 1000001 insn processed 1000001 insns (limit 1000000) max_states_per_insn 17 total_states 55755 peak_states 1015 mark_read 59 ``` Sample error log for TestVerifier/bpf_host/2 ``` verifier_test.go:244: Error: program tail_nodeport_nat_ingress_ipv4: load program: argument list too long: BPF program is too large. Processed 1000001 insn (1445 line(s) omitted) Verifier error tail: load program: argument list too long: (1436 line(s) omitted) ; struct ipv4_ct_tuple icmp_tuple = { 1201: (63) *(u32 *)(r10 -12) = r1 ; .flags = tuple->flags | TUPLE_F_RELATED, 1202: (71) r1 = *(u8 *)(r10 -123) ; .flags = tuple->flags | TUPLE_F_RELATED, 1203: (44) w1 |= 2 ; struct ipv4_ct_tuple icmp_tuple = { 1204: (73) *(u8 *)(r10 -3) = r1 BPF program is too large. Processed 1000001 insn processed 1000001 insns (limit 1000000) max_states_per_insn 17 total_states 55755 peak_states 1015 mark_read 59 ```
7a6e713 to
ab1f370
Compare
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
The failure is actually due to #279. I think the best way is to revert llvm version bump, and proceed with ubuntu upgrade. |
That sucks, I don't like the idea of blindly revering the compiler version, since quite a lot of time has gone into that upgrade. I would like to involve @gentoo-root first before we do that revert. |
Yeah, after I found out about clang version, knowing Max's good work, I would have expected that Max was trying to upgrade in cilium/cilium as per below (which is having same failures as my testing). IMO, it's pretty safe to revert LLVM 1.18 commit, and tackle the verifer issue later. |
|
Since a previous LLVM version seems to work, I'm not sure about this report: it suggests that |
There was a problem hiding this comment.
I would have expected that Max was trying to upgrade in cilium/cilium as per below (which is having same failures as my testing). IMO, it's pretty safe to revert LLVM 1.18 commit, and tackle the verifer issue later.
Ah, I was of the believe that we already were on Clang 18 on cilium/cilium, if thats not the case than we can indeed revert to 17 here, for now. Sorry for the noise
We actually have the error log >1mil statements. Additionally, net-next is actually passed, only 5.10, 5.15 and 6.1 failed. I did a little bit investigation, it seems to be related to ENABLE_IPV6 flag in cilium bpf code, which triggers some complexity issue. Full error log can be found in the below GHA. |
|
I looked for similar issues in the tracker and this is what I found:
Interestingly,
|
|
I created https://github.com/cilium/image-tools/tree/ubuntu-22.04 to start the fork from before we merge this, so future fixes for Cilium v1.16 or older can go into that branch. From v1.17 onwards we can start adopting this. |
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Thanks for your input, I would defer this investigation to datapath experts (e.g. @gentoo-root or other datapath team member), as it might take more time for newbie like myself 😅. PS: Kudos to LVH team with awesome steps, local feedback loop is superb. |
Relates: cilium/image-tools#281 Signed-off-by: Tam Mach <tam.mach@cilium.io>
Testing was done in cilium/cilium#33264