Skip to content

workflows/tests-datapath-verifier: Test mcpu v3 with RHEL 8.6 kernel#40390

Merged
dylandreimerink merged 3 commits intomainfrom
pr/dylandreimerink/rhel-mcpu-v3
Jul 9, 2025
Merged

workflows/tests-datapath-verifier: Test mcpu v3 with RHEL 8.6 kernel#40390
dylandreimerink merged 3 commits intomainfrom
pr/dylandreimerink/rhel-mcpu-v3

Conversation

@dylandreimerink
Copy link
Copy Markdown
Member

@dylandreimerink dylandreimerink commented Jul 7, 2025

Currently when compiling verifier tests we use a mcpu value that is based on a kernel version number passed in by the test matrix. For RHEL 8.6 we had it set at 54. For 54 specifically we were compiling with mcpu=v2 instead of mcpu=v3 which we use for all other kernels. However, while working on #40367 I discovered that the probe we use during runtime to detect mcpu version tells us that RHEL 8.6 should use v3 as well.

So, that means that our CI is not aligned with what we actually run. Furthermore, when you change the value (the first commit in this PR) it actually breaks the complexity tests. The other two commits in this PR tweak the code so the compiler emits bytecode that passes RHEL 8.6 when under mcpu=v3. See the commit messages for details.

make verifier complexity tests on RHEL 8.6 run with mcpu=v3

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 7, 2025
@dylandreimerink
Copy link
Copy Markdown
Member Author

/ci-verifier

When compiling the BPF programs on RHEL 8.6 with mCPU v3, clang produces
bytecode in which there exists a path where the `info` pointer passed
to `nodeport_add_tunnel_encap` is `NULL`. This shouldn't happen
because `tunnel_endpoint` can only be non 0 if `info` is non-NULL.
It seems only on RHEL 8.6, this relation between `info` and
`tunnel_endpoint` cannot be tracked by the verifier. And we get the
following verifier error:

```
parent didn't have regs=2 stack=0 marks
last_idx 1049 first_idx 1033
regs=2 stack=0 before 1049: (71) r1 = *(u8 *)(r1 +0)
; return ctx_store_bytes(ctx, off + ETH_ALEN + ETH_ALEN,
1071: (79) r9 = *(u64 *)(r10 -272)
; if (info->flag_ipv6_tunnel_ep)
1072: (71) r1 = *(u8 *)(r9 +23)
R9 invalid mem access 'inv'
````

Adding an additional NULL check for `info` fixes the issue.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink
Copy link
Copy Markdown
Member Author

/ci-verifier

When compiling with -mcpu=v3, the compiler generated byte code that
was rejected by the verifier on RHEL 8.6. The error seems to be
unrelated to the mcpu value, but rather a bug in the compiler that
specifically got triggered with the full set of settings.

The error was:
```
; if (IS_ERR(ret))
2038: (61) r1 = *(u32 *)(r10 -192)
2039: (56) if w1 != 0x0 goto pc-248
    R7=invP0
2040: (05) goto pc-514
; if (a->d1 != b->d1)
1527: (71) r1 = *(u8 *)(r7 +32)
R7 invalid mem access 'inv'
```

It is very odd, it says R7 is null. It tries to take offset 32, which
is the nexthdr of the IPv6 header. `iphdr` in
`snat_v6_rev_nat_handle_icmp_pkt_toobig`. But before its passed to
the function where the read happens we copy that field into
`struct ipv6_ct_tuple tuple` which is stack allocated.

The compiler seems to track the chain of assignments, and completely
eliminate the step of reading the packet data into the stack. It seems
the verifier can still find a branch where it could attempt to read
from the ctx->data pointer, while its 0.

The only way I found to get the compiler to not do this optimization
is to insert an asm volatile statement that does nothing but takes
the memory address of the tuple, which forces the compiler to
put it on the stack and not optimize it away.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/rhel-mcpu-v3 branch from 2c81f2d to 45a9a21 Compare July 8, 2025 13:13
@dylandreimerink dylandreimerink added the release-note/ci This PR makes changes to the CI. label Jul 8, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 8, 2025
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@dylandreimerink dylandreimerink marked this pull request as ready for review July 8, 2025 14:00
@dylandreimerink dylandreimerink requested review from a team as code owners July 8, 2025 14:00
Copy link
Copy Markdown
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fun one. Thanks!

@dylandreimerink dylandreimerink added this pull request to the merge queue Jul 9, 2025
Merged via the queue into main with commit 50c319d Jul 9, 2025
374 of 376 checks passed
@dylandreimerink dylandreimerink deleted the pr/dylandreimerink/rhel-mcpu-v3 branch July 9, 2025 09:36
dylandreimerink added a commit that referenced this pull request Jul 9, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 9, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 9, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 9, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 15, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 15, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 15, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 15, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
dylandreimerink added a commit that referenced this pull request Jul 15, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jul 16, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since #40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
rabelmervin pushed a commit to rabelmervin/cilium that referenced this pull request Aug 18, 2025
In v1.18 we changed the minimum supported kernel version to v5.10.
For a while we still were testing RHEL 8.6 against v5.4, but that is
no longer the case since cilium#40390.

So we can remove the compile permutations for v5.4.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
julianwiedmann added a commit that referenced this pull request Sep 1, 2025
#40390 removed the last usage of
KERNEL=54 in CI, no need to special-case it any longer.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit that referenced this pull request Sep 1, 2025
#40390 removed the last usage of
KERNEL=54 in CI, no need to special-case it any longer.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 4, 2025
#40390 removed the last usage of
KERNEL=54 in CI, no need to special-case it any longer.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/ci This PR makes changes to the CI.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

3 participants