Skip to content

[System] Add journald support for auth and syslog data streams#11618

Merged
belimawr merged 0 commit intoelastic:mainfrom
belimawr:10797-journald-for-system-integration
Dec 6, 2024
Merged

[System] Add journald support for auth and syslog data streams#11618
belimawr merged 0 commit intoelastic:mainfrom
belimawr:10797-journald-for-system-integration

Conversation

@belimawr
Copy link
Copy Markdown
Contributor

@belimawr belimawr commented Nov 1, 2024

Note for reviewers

I did my best to get events that would test all the grok expressions, I mostly tried to get similar messages to the syslog and auth dataset. The auth dataset might be missing some from PAM because I wasn't sure how to generate them.

Proposed commit message

This commit introduces support for Journald in the system integration,
adding a "journald" input for both auth and syslog data.

Conditions are used to select when to run the log input or the journal input, it
defaults to running the Journald input on Debian 12 and Amazon Linux 2023.

The minimum stack version is updated to 8.17.0 so the integration can use the
facilities filtering option from the journald input added in 8.16.0 and some extra fields
from the host provider added in 8.17.0.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

How to test this PR locally

Related issues

Screenshots

Integration configuration

2024-11-22_13-08

Dashboards with data from Journald (Debain 12)

2024-11-05_17-53
2024-11-05_17-54
2024-11-05_17-54_1
2024-11-05_17-55

Journald x Log input

I enabled both, the journald and the log input for the auth and syslog datasets on a fresh Debian 11 VM, both inputs collected the same amount of events. On the left is the journald input and on the right the log input.

2024-11-06_17-30
2024-11-06_17-30_1

@belimawr belimawr force-pushed the 10797-journald-for-system-integration branch from d8ffe53 to 77e4c83 Compare November 5, 2024 21:39
@belimawr belimawr changed the title 10797 journald for system integration [System] Add journald support for auth and syslog data streams Nov 5, 2024
@belimawr belimawr added the enhancement New feature or request label Nov 5, 2024
@belimawr belimawr self-assigned this Nov 5, 2024
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] label Nov 5, 2024
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

elastic-vault-github-plugin-prod bot commented Nov 5, 2024

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@belimawr belimawr marked this pull request as ready for review November 6, 2024 22:35
@belimawr belimawr requested review from a team as code owners November 6, 2024 22:35
@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@andrewkroh andrewkroh added Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Linux Platform Linux Platform Security team [elastic/sec-linux-platform] Team:Security-Windows Platform Security Windows Platform team [elastic/sec-windows-platform] labels Nov 6, 2024
@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/sec-windows-platform (Team:Security-Windows Platform)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields removed on this file were duplicated before my PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding input.type is the only real change from this PR on this file, the rest is indentation from elastic-package format

@belimawr belimawr requested a review from andrewkroh November 12, 2024 13:08
@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Nov 12, 2024

We should:

  1. Hide the default condition behind a "use journald" toggle, as it is likely to grow quite large over time. This also allows people to disable it and re-enable it without losing the condition.
  2. We can have an advanced option for a custom condition that is empty by default.

@leehinman
Copy link
Copy Markdown
Contributor

possible crazy thought. What if we had an autodisover module that set a variable that for "system module input" to something like "log" or "journald". Could the condition then use variable provided autodiscover?

@belimawr
Copy link
Copy Markdown
Contributor Author

possible crazy thought. What if we had an autodisover module that set a variable that for "system module input" to something like "log" or "journald". Could the condition then use variable provided autodiscover?

The biggest problem with that is that the Elastic-Agent needs to know the input type (that's defined in the integration manifest and does not need to map to a Beat input) before sending it to Filebeat. That's used to choose which OS process will run the input (e.g: one process for all journald inputs, another for all log inputs, etc), it is also used to aggregate and report status in multiple places, like the overview page for the Elastic-Agent. This also has implications on the state folder used.

Given all that, at the moment, we cannot postpone the decision about the input type, the input type needs to be known before the Elastic-Agent sends the configuration to Filebeat.

Another interesting challenge is that many things in Beats run "outside" of the input code, e.g: input.type is read/set based on the first configuration, a "proxy input" like the system-logs I wrote for supporting Debian 12 or even the container input will be handled as its own thing instead of the actual input that was instantiated and is running.

@belimawr
Copy link
Copy Markdown
Contributor Author

We should:

  1. Hide the default condition behind a "use journald" toggle, as it is likely to grow quite large over time. This also allows people to disable it and re-enable it without losing the condition.
  2. We can have an advanced option for a custom condition that is empty by default.

I like this experience @cmacknz, however as far as I know it's not possible to do that with the current features we have on Fleet-UI. Maybe @kpollich can chime in and help understand what is supported by Fleet UI.

Clarifying more on this "use journald" toggle, does that mean having only one section for system logs, adding the "use journald" toggle that will show/hide a second level of "advanced options"
journald-integ

Instead of having the two configuration blocks (currently aggregated by input type) that's currently implemented on this PR:
two-blocks-foo

@leehinman
Copy link
Copy Markdown
Contributor

The biggest problem with that is that the Elastic-Agent needs to know the input type (that's defined in the integration manifest and does not need to map to a Beat input) before sending it to Filebeat.

Could that be addressed by having a "new" input type called "autodiscover"?

@jlind23
Copy link
Copy Markdown
Contributor

jlind23 commented Nov 18, 2024

@lalit-satapathy can we get someone from your team review it too please?

@jlind23
Copy link
Copy Markdown
Contributor

jlind23 commented Nov 19, 2024

@belimawr shouldn't this version only be available from 8.17.0 as only this one will include the latest changes on Beats for the journald input?

@belimawr
Copy link
Copy Markdown
Contributor Author

@belimawr shouldn't this version only be available from 8.17.0 as only this one will include the latest changes on Beats for the journald input?

Good catch @jlind23! We're using a filtering option that was added in 8.16.0. I'll update it.

@belimawr belimawr force-pushed the 10797-journald-for-system-integration branch from fff7c2f to 4427c93 Compare November 19, 2024 15:16
Copy link
Copy Markdown
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor comments. I did not review in detail the contents of the new pipelines under the assumption that they are close derivatives of the existing ones.

I do think some documentation should be added to the README to explain a bit of how it works (or how it is supposed to work) w.r.t. using journald on newer OSes and that this is based on the condition.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog message no longer matches the current approach.

@belimawr
Copy link
Copy Markdown
Contributor Author

belimawr commented Nov 20, 2024

I do think some documentation should be added to the README to explain a bit of how it works (or how it is supposed to work) w.r.t. using journald on newer OSes and that this is based on the condition.

I added something on a9731b8, I tried to keep it generic as the current implementation will likely change a bit.

Copy link
Copy Markdown
Member

@ishleenk17 ishleenk17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code owner's approval!
README error in buildkite. Please resolve that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove this? What do older agent versions do if they don't support the new host.* parameters for the journald condition?

Doing this means users won't get bug fixes for the integration unless they upgrade to 8.16.0. I want to double check that this is actually solving a problem that can't instead be solved by documentation (e.g. that journald conditions don't work until this version).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove this?

The Jouranld filters won't work, auth and syslog datastreams will both ingest the whole journal.

I'll test with an older version to see the overall behaviour.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can put in a condition on the agent version to get around this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I did some testing and the results are interesting:

  • The conditions silently break, so the log input runs on Debian 12 and the journald does not.
  • Because the filters do not work, both datasets (auth and syslog) ingest all the data
  • The dashboards seem to work well, because they use specific fields set by the ingest pipelines, I believe their data is accurate.

So, overall, it does not seem to bad, we duplicate data, but that's about it.
2024-11-22_12-50

Any chance we can put in a condition on the agent version to get around this?

That's a good idea, I'll check. However it's gonna be a bit unclear for the user why the journald input shows as not available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with the latest changes and the description with the new screenshot

new screenshot

2024-11-22_13-08

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, overall, it does not seem to bad, we duplicate data, but that's about it.

We duplicate data on all non-8.16.0 agents? This is going to be a very large number of agents, and this integration is installed by default, so if there's any consequence or effect to removing the version constraint like this let's put the constraint back. I'm confident we did our diligence that it has to be there.

The system integration is the first integration most users interact with, it can't have any quirks or weird behavior or duplicate anything for currently supported 8.x versions (which is more than 8.16.x).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We duplicate data on all non-8.16.0 agents?

No, that's only for Elastic-Agents where the Jouranld input ends up running, the user would have to change the conditions to get the Journald input running on agents < 8.16.0.

So, by default, there will be no issue, however this does introduce a weird behaviour, which is undocumented and un-intuitive.

I'll revert to require at least 8.16.0

@belimawr
Copy link
Copy Markdown
Contributor Author

Folks, that's the final version. We've reverted to using the conditions default value and documenting it.

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only two changes I want are to put the stack version constraint back, and to show the user that the condition exists by default. Let's not hide the fact that the inputs are conditional.

I think the documentation for the condition is great, and the simplification to have the condition apply to the whole input instead of every stream makes things easier to see.

@belimawr
Copy link
Copy Markdown
Contributor Author

@cmacknz I added all changes you requested on f3bd0cfd66.

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Remember that once you merge this it will be available to all 8.16+ stacks rapidly afterwards

@belimawr
Copy link
Copy Markdown
Contributor Author

Thanks!

Remember that once you merge this it will be available to all 8.16+ stacks rapidly afterwards

I'm aware of this. Is there any extra testing you'd like to do?

@belimawr
Copy link
Copy Markdown
Contributor Author

I've just realised that while our tests do a good job at ensuring the inputs and ingest pipelines work as expected, they do not test the condition we're using to decide whether to run the journald or log input :/

The main challenge is that the system tests run the Elastic-Agent in its own docker container, which limits how much OS-level information we can actually test. I'll look if there is a way to circumvent this.

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

  • 💚 Build #18707 succeeded 02b785ce6aff9a78916d82c3a9a06263ebfb9e5d
  • 💚 Build #18671 succeeded f3bd0cfd667c436219e389e7cf29616756e79a00
  • 💚 Build #18619 succeeded 1a1b11f3c34c52466c407e36f4efaa4d4f445100
  • 💚 Build #18508 succeeded 91d4fd08b084af415d9e0b41206d269d9c315aa0
  • 💔 Build #18474 failed a9731b80c2847f437e9e1b3364db65ae27722a6f
  • 💚 Build #18467 succeeded e1aab244d2ba41a350e330f5473f34a83856c29e

cc @belimawr

@elastic-sonarqube
Copy link
Copy Markdown

@belimawr belimawr merged commit c7b133c into elastic:main Dec 6, 2024
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Package system - 1.63.0 containing this change is available at https://epr.elastic.co/package/system/1.63.0/

harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 4, 2025
…ic#11618)

This commit introduces support for Journald in the system integration,
adding a "journald" input for both auth and syslog data.

Conditions are used to select when to run the log input or the journal input, it
defaults to running the Journald input on Debian 12 and Amazon Linux 2023.

The minimum stack version is updated to 8.17.0 so the integration can use the
facilities filtering option from the journald input added in 8.16.0 and some extra fields
from the host provider added in 8.17.0.
harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 5, 2025
…ic#11618)

This commit introduces support for Journald in the system integration,
adding a "journald" input for both auth and syslog data.

Conditions are used to select when to run the log input or the journal input, it
defaults to running the Journald input on Debian 12 and Amazon Linux 2023.

The minimum stack version is updated to 8.17.0 so the integration can use the
facilities filtering option from the journald input added in 8.16.0 and some extra fields
from the host provider added in 8.17.0.
@belimawr belimawr deleted the 10797-journald-for-system-integration branch February 6, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:system System Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] Team:Obs-InfraObs Observability Infrastructure Monitoring team [elastic/obs-infraobs-integrations] Team:Security-Linux Platform Linux Platform Security team [elastic/sec-linux-platform] Team:Security-Windows Platform Security Windows Platform team [elastic/sec-windows-platform]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate the best way to decide when to read system logs from files or journald

7 participants