Skip to content

[8.x](backport #42032) auditbeat: system/process module backed by quark#42810

Merged
haesbaert merged 5 commits into8.19from
mergify/bp/8.x/pr-42032
Apr 17, 2025
Merged

[8.x](backport #42032) auditbeat: system/process module backed by quark#42810
haesbaert merged 5 commits into8.19from
mergify/bp/8.x/pr-42032

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify bot commented Feb 20, 2025

Proposed commit message

This introduces a new provider for the sytem/process module in linux.

The main motivation is to address some of the limitations of the current implementation. The gosysinfo provider sends state reports by scraping /proc from time to time, so it loses all short lived processes. Some customers also would like to have full telemetry but can't run auditd for various reasons.

As a bonus we get some extra ECS fields that were not available before.

MAIN DIFFERENCES:

  • Publishes every process in the system, regardless of lifespan.
  • Publishes exec events for an existing process (without a fork).
  • Aggregates fork+exec+exit within one event.
  • Adds event.exit_code for processes that exited, can't express exit_time in ECS?
  • Include the original process.args, sysinfo reports args that were fetched when it parsed /proc, so a userland process can masquerade itself. For the initial /proc scraping we report the current value like sysinfo. We can't get the original value since the kernel overwrites it, if you wanna have fun: https://github.com/systemd/systemd/blob/main/src/basic/argv-util.c#L165
  • Adds process.args_count.
  • Adds process.interactive and if true, process.tty.char_device.{major,minor}
  • Attempts to hash all processes, not just long lived ones.
  • Hashing is not rate-limited anymore, but it's cached and refreshed based on metadata. It's a LRU keyed by path and refreshed if the metadata of the file changes, statx(2) if the kernel supports, stat(2) otherwise.
  • No more periodic state reports, only initial batch.
  • No more saving the timestamp of the last state-report in disk.
  • No more /proc parsing during runtime, only on boot.

MISSING:

  • Unify entity id with sessionview.
  • Publish metrics from quark.Stats(). Done, but naming and gauges should be discussed.
  • Docs.
  • Properly define config options and names.

EXTRA CHANGES:

  • Added statx(2) to seccomp_linux so we can properly use CachedHasher.
  • Updated quark to 0.3 so we have namespace inode numbers.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Run auditbeat on linux with the following configuration:

auditbeat.modules:

- module: system
  datasets:
    - process
  process.backend: "kernel_tracing"

(edit) process.backend was quark

Related issues

Integrated PRs related to this

List of previous work done to minimize the size of this PR

Screenshots

Non interactive SSH

Below is a shot of a non interactive ssh session, done with ssh fc39vm /bin/echo hi from quarkio.
It shows the intermediary processes of sshd until we fork the shell and echo, the interesting bits is that we can see a process that forked+execed and then execs again: sshd forks+execs mksh,, which in turn execs /bin/echo, without forking.
ssh_nonint

Comparison against the sysinfo provider for a long lived process:

Here we run a long sleep and just compare the events against the existing provider on 8.14.3:
vs_old

On event.type, event.action and others

I've tried to keep things as close as possible to the old provider, but it's really just a suggestion at this point and it's likely we want to change things

event.type gosysinfo quark
fork start start
fork+exec start [start, change]
short fork+exec+exit N/A [start, change, end]
short fork+exit N/A [start, end]
existing processes info info
exec only N/A change
exec+exit end [change, end]
event.action gosysinfo quark
fork process_started process_started
fork+exec process_started process_started
short fork+exec+exit N/A process_ran
short fork+exit N/A process_ran
existing processes existing_process existing_process
exec only N/A process_changed_image
exec+exit end process_stopped

As you can see, expressing things in event.action is not great, I'm
all open to suggestions, life would be easier if it could be an
array. I've tried to compromise more states into fewer words.
process_changed_image might look a bit weird, but it's less ambiguous
than "executed". Again really open to suggestions here and I have no
strong feelings about it.

event.kind is now always event as there is no more state reports every X seconds.
The initial state report at init remains, but it's also event.

On the state of this PR

This doesn't include the documentation bits, I'd like to do this in a subsequent PR once the naming, config and whatnot is decided.
We should unify process.entity_id with sessionviewer, and we can do it in this PR, worth noting that the gosysinfo backend calculates things differently as well, so this is no worse than that.

I'm going out on holidays, but I'm taking this PR out of draft so that we can start the discussion and interested parties can test it.


This is an automatic backport of pull request #42032 done by Mergify.

This introduces a new provider for the sytem/process module in linux.

The main motivation is to address some of the limitations of the current
implementation. The gosysinfo provider sends state reports by scraping /proc
from time to time, so it loses all short lived processes. Some customers also
would like to have full telemetry but can't run auditd for various reasons.

As a bonus we get some extra ECS fields that were not available before.

MAIN DIFFERENCES:
 * Publishes every process in the system, regardless of lifespan.
 * Publishes exec events for an existing process (without a fork).
 * Aggregates fork+exec+exit within one event.
 * Adds event.exit_code for processes that exited, can't express exit_time in ECS?
 * Include the original process.args, sysinfo reports args that were
   fetched when it parsed /proc, so a userland process can masquerade
   itself. For the initial /proc scraping we report the current value like
   sysinfo. We can't get the original value since the kernel
   overwrites it, if you wanna have fun:
   https://github.com/systemd/systemd/blob/main/src/basic/argv-util.c#L165
 * Adds process.args_count.
 * Adds process.interactive and if true, process.tty.char_device.{major,minor}
 * Attempts to hash all processes, not just long lived ones.
 * Hashing is not rate-limited anymore, but it's cached and refreshed
   based on metadata. It's a LRU keyed by path and refreshed if the
   metadata of the file changes, statx(2) if the kernel supports,
   stat(2) otherwise.
 * No more periodic state reports, only initial batch.
 * No more saving the timestamp of the last state-report in disk.
 * No more /proc parsing during runtime, only on boot.

MISSING:
 * Unify entity id with sessionview.
 * Docs.

EXTRA CHANGES:
 * Added statx(2) to seccomp_linux so we can properly use CachedHasher.
 * Updated quark to 0.3 so we have namespace inode numbers.

Co-authored-by: Nicholas Berlin <56366649+nicholasberlin@users.noreply.github.com>
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit ce6156b)
@mergify mergify bot added the backport label Feb 20, 2025
@mergify mergify bot requested a review from a team as a code owner February 20, 2025 16:21
@mergify mergify bot requested a review from a team as a code owner February 20, 2025 16:21
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 20, 2025
@github-actions github-actions bot added enhancement Team:Security-Linux Platform Linux Platform Team in Security Solution labels Feb 20, 2025
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 20, 2025
@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Feb 24, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 3, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 10, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 17, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 24, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

3 similar comments
@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Mar 31, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Apr 7, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

@mergify
Copy link
Copy Markdown
Contributor Author

mergify bot commented Apr 14, 2025

This pull request has not been merged yet. Could you please review and merge it @haesbaert? 🙏

@haesbaert
Copy link
Copy Markdown
Contributor

haesbaert commented Apr 17, 2025

I'll commit this tonight after my tests finish.
Currently it's sending ~55k documents/s to elasticsearch from a VM with 4 cpus, test is compiling things on loop and fork "bombs", just want to make sure there are no losses or leaks.

@haesbaert haesbaert enabled auto-merge (squash) April 17, 2025 18:37
@haesbaert haesbaert merged commit 3a8b234 into 8.19 Apr 17, 2025
28 checks passed
@haesbaert haesbaert deleted the mergify/bp/8.x/pr-42032 branch April 17, 2025 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport enhancement Team:Security-Linux Platform Linux Platform Team in Security Solution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants