Skip to content

fix(driver/bpf): fix misc issues with legacy ebpf and clang20#2728

Merged
poiana merged 5 commits intofalcosecurity:masterfrom
iurly:legacy_ebpf_clang20
Dec 19, 2025
Merged

fix(driver/bpf): fix misc issues with legacy ebpf and clang20#2728
poiana merged 5 commits intofalcosecurity:masterfrom
iurly:legacy_ebpf_clang20

Conversation

@iurly
Copy link
Copy Markdown
Contributor

@iurly iurly commented Nov 25, 2025

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind test

/kind feature

/kind sync

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
This pull request focuses on improving code safety, maintainability, and clarity in the BPF driver and user-space components. The main changes include stricter type usage for syscall argument indices, improved scratch buffer bounds checking, and enhanced error reporting for memory mapping failures.

Type safety and code consistency:

  • Changed all syscall argument index parameters from int to unsigned int in several BPF helper functions, such as bpf_syscall_get_argument_from_args, bpf_syscall_get_argument_from_ctx, bpf_syscall_get_socketcall_arg, and bpf_syscall_get_argument, to ensure type safety and prevent potential negative index issues. [1] [2] [3]

Scratch buffer bounds and verifier appeasement:

  • Improved scratch buffer size checks in bpf_poll_parse_fds by performing early bounds checking and masking read_size to stay within SCRATCH_SIZE_MAX, reducing verifier complaints and preventing buffer overflows.
  • Updated push_evt_frame to use a local variable for event frame length, mask it for safety, and consistently apply bounds checks before emitting events. [1] [2]

Volatile usage for BPF verifier compatibility:

  • Marked env_len and args_len as volatile in relevant functions to appease the BPF verifier and ensure correct handling of potentially spilled variables. [1] [2]

User-space error reporting improvements:

  • Enhanced scap_bpf.c error messages for mmap failures by including the requested size and address, making debugging easier. [1] [2]
  • Avoided overwriting detailed error messages from perf_event_mmap by returning a generic failure code instead of a new error string in scap_bpf_load.

Does this PR introduce a user-facing change?:

NONE

@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 25, 2025

Please double check driver/API_VERSION file. See versioning.

/hold

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.56%. Comparing base (34c404a) to head (6976f45).
⚠️ Report is 43 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2728      +/-   ##
==========================================
- Coverage   76.90%   74.56%   -2.35%     
==========================================
  Files         296      292       -4     
  Lines       30875    29998     -877     
  Branches     4693     4651      -42     
==========================================
- Hits        23745    22367    -1378     
- Misses       7130     7631     +501     
Flag Coverage Δ
libsinsp 74.56% <ø> (-2.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ekoops ekoops modified the milestones: 0.23.0, next-driver Nov 25, 2025
@terror96
Copy link
Copy Markdown
Contributor

@iurly please fix the code formatting issues. @ekoops do we need to adjust driver/SCHEMA_VERSION in this case?

@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Nov 26, 2025

@iurly please fix the code formatting issues. @ekoops do we need to adjust driver/SCHEMA_VERSION in this case?

Hey @terror96 , no need to update SCHEMA_VERSION, as this is not touching in any way the schema.
Let me remove the do-not-merge/hold label... https://github.com/falcosecurity/libs/actions/runs/19675922462/job/56356932149?pr=2728 is a false positive.
/remove-hold

EDIT: I'll need to re-do it once @iurly repushes with the formatting fix 😂

On RHEL8 with clang-20, the verifier complains about unbounded values
for R2 (the read size).

After lots of (fruitless) experimenting with argument constraining, it
became apparent how the current implementation in the case of ia32
syscall will actually read 32 bits into the 64-bit value to be returned.
For some reason this makes clang 20 generate code with lots of spilling
just to preserve the other 32 bits (or so it seems).

Just use a different (scoped) variable of the appropriate type for each
of the two execution branches, which is what _READ_USER() actually does.

Signed-off-by: Gerlando Falauto <gerlando.falauto@sysdig.com>
clang20 tends to ignore (or optimize away) the checks we're performing
on read_size. The trick that seems to be working here is to first
perform our sanity check, then (uselessly) enforce data validity
by performing the AND operation, and finally (uselessly) create a
new exit path.

Notice how it's also important to move the reading of val at
the very beginning and away from read_size arithmetics.

Signed-off-by: Gerlando Falauto <gerlando.falauto@sysdig.com>
On RHEL8 with clang-20, the verifier fails with the following
on loading sys_generic:

457: (85) call bpf_perf_event_output#25
 R0=inv(id=19,umax_value=65523,var_off=(0x0; 0xffffffff)) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_ptr(id=0,off=0,ks=4,vs=4,imm=0) R3_w=inv4294967295 R4_w=map_value(id=0,off=0,ks=4,vs=262144,imm=0) R5_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=201,imm=0) R8=map_value(id=0,off=0,ks=4,vs=262144,imm=0) R9=inv4294967291 R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=mmmmmmmm fp-32=invP fp-40=mmmmmmmm
R5 unbounded memory access, use 'var &= const' or 'if (var < const)'
processed 448 insns (limit 1000000) max_states_per_insn 0 total_states 12 peak_states 12 mark_read 6

R5 unbounded memory access, use 'var &= const' or 'if (var < const)'

The actual value (data->state->tail_ctx.len) appears to be read from
memory every single time which increases register pressure.

Using a local variable for the value instead (and bounding it with the
appropriate limits) helps the compiler assigning it to its final register
which the verifier can finally analyze properly.

Signed-off-by: Gerlando Falauto <gerlando.falauto@sysdig.com>
@iurly iurly force-pushed the legacy_ebpf_clang20 branch from 00be16e to 660c182 Compare November 26, 2025 14:17
@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Nov 26, 2025

/remove-hold

}

/* Now appease the verifier by enforcing and checking what we already know. */
read_size &= SCRATCH_SIZE_MAX;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it indeed with clang20 that we need to re-verify after the bit-operation that the value is something that it can't be? This code looks strange...

Also, if we have already masked the value here, do we really need to keep the masks on lines 436 and 438?

@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Dec 4, 2025

Hey. I generated the kernels/distros testing matrix here: https://github.com/falcosecurity/libs/actions/runs/19925174793.
It looks like we are breaking something 😔

X64

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

ARM64

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

You can find more details by downloading this artifact: https://github.com/falcosecurity/libs/actions/runs/19925174793/artifacts/4762505476

Recent versions of clang tend to emit the following warning:

In file included from /usr/src/draios-agent-14.2.4/bpf/probe.c:26:
/usr/src/draios-agent-14.2.4/bpf/fillers.h:2311:48: warning: passing 'volatile long *' to parameter of type 'long *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
 2311 |                 res = bpf_accumulate_argv_or_env(data, argv, &args_len);
      |                                                              ^~~~~~~~~
/usr/src/draios-agent-14.2.4/bpf/fillers.h:1921:61: note: passing argument to parameter 'args_len' here
 1921 |                                                       long *args_len) {
      |                                                             ^

Jut make the (other) variable and the argument volatile.

Signed-off-by: Gerlando Falauto <gerlando.falauto@sysdig.com>
Right now if perf_event_mmap() fails, it will buffer some diagnostic
info using scap_errprintf(), spelling out which of the two mmap()
calls failed. However, upon detecting a failure, the calling function
will also call scap_errprintf() and therefore overwrite the previous log
line. This change:

a) adds the actual values passed to mmap() to the log line, so get a bit
more context

b) suppresses the subsequent scap_errprintf() invocation so to surface
the orignal log line instead.

Signed-off-by: Gerlando Falauto <gerlando.falauto@sysdig.com>
@iurly iurly force-pushed the legacy_ebpf_clang20 branch from 660c182 to 6976f45 Compare December 15, 2025 13:09
@github-actions
Copy link
Copy Markdown

Perf diff from master - unit tests

     4.64%    +12.12%  [.] std::__shared_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__weak_ptr<sinsp_threadinfo, (__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
     3.63%    +12.11%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_lock_nothrow()
     4.34%     +9.20%  [.] sinsp_threadinfo::get_main_thread()
     3.71%     +8.27%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_get_use_count() const
     6.22%     +6.98%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
     3.26%     +6.91%  [.] sinsp_threadinfo::get_fd_table()
     2.17%     +5.61%  [.] std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__weak_count<(__gnu_cxx::_Lock_policy)2> const&, std::nothrow_t)
     3.47%     +5.60%  [.] sinsp_threadinfo::update_main_fdtable()
    11.86%     -5.39%  [.] sinsp::next(sinsp_evt**)
     1.90%     +3.84%  [.] thread_group_info::get_first_thread() const

Heap diff from master - unit tests

peak heap memory consumption: 27.68K
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0443         -0.0443           248           237           247           236
BM_sinsp_split_median                                          -0.0461         -0.0462           248           236           247           236
BM_sinsp_split_stddev                                          -0.1921         -0.2277             1             1             1             1
BM_sinsp_split_cv                                              -0.1546         -0.1919             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0619         +0.0619            70            74            70            74
BM_sinsp_concatenate_paths_relative_path_median                +0.0717         +0.0716            70            74            69            74
BM_sinsp_concatenate_paths_relative_path_stddev                +1.0724         +1.0750             1             2             1             2
BM_sinsp_concatenate_paths_relative_path_cv                    +0.9515         +0.9542             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0582         -0.0582            44            41            44            41
BM_sinsp_concatenate_paths_empty_path_median                   -0.0770         -0.0770            44            40            44            40
BM_sinsp_concatenate_paths_empty_path_stddev                   +1.0807         +1.0799             1             2             1             2
BM_sinsp_concatenate_paths_empty_path_cv                       +1.2094         +1.2085             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0233         +0.0232            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_median                +0.0215         +0.0212            71            73            71            73
BM_sinsp_concatenate_paths_absolute_path_stddev                +1.1897         +1.2332             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_cv                    +1.1397         +1.1826             0             0             0             0

@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Dec 19, 2025

Kernel/distro testing matrix in the making: https://github.com/falcosecurity/libs/actions/runs/20365099008

@ekoops
Copy link
Copy Markdown
Contributor

ekoops commented Dec 19, 2025

You did it -> all green (and yellow)! 😄

X64

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟡 🟡 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

ARM64

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟡 🟡 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

@ekoops ekoops requested a review from terror96 December 19, 2025 09:38
Copy link
Copy Markdown
Contributor

@ekoops ekoops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Copy Markdown
Contributor

poiana commented Dec 19, 2025

LGTM label has been added.

DetailsGit tree hash: 7b21a10fda16f635677963a4f84f8c809429c642

@poiana
Copy link
Copy Markdown
Contributor

poiana commented Dec 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ekoops, iurly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-project-automation github-project-automation bot moved this from Todo to In progress in Falco Roadmap Dec 19, 2025
@poiana poiana merged commit 9b8a8e4 into falcosecurity:master Dec 19, 2025
56 of 58 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in Falco Roadmap Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants