Skip to content

Update Metricbeat, Filebeat, libbeat with elastic-agent V2 support#32673

Merged
fearful-symmetry merged 45 commits intoelastic:feature-arch-v2from
fearful-symmetry:cm-v2-switch
Sep 6, 2022
Merged

Update Metricbeat, Filebeat, libbeat with elastic-agent V2 support#32673
fearful-symmetry merged 45 commits intoelastic:feature-arch-v2from
fearful-symmetry:cm-v2-switch

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

@fearful-symmetry fearful-symmetry commented Aug 12, 2022

What does this PR do?

This PR has a few features:

  • It adds support for the V2 elastic agent client controller
  • Creates a config shim callback system to replace the V1's AST config transformation, requiring all V2 beats to define a config transformation callback
  • Adds a test suite system with a mocked V2 controller that allows for testing for individual beats
  • Removes V1 support

What are the components of this PR?

  • libbeat/cfgfile/cfgfile.go
    This prevents the beats from reading a config file when they start up in remove management mode.

  • beat-specific changes metricbeat/beater/metricbeat.go, x-pack/metricbeat/cmd/agent.go, etc.
    These changes register the reloader interfaces, and define beat-specific config transformations that act as a compatiblity shim between the V2 config structure and pre-existing beats config

  • x-pack/libbeat/management/*
    The actual V2 client code that reloads beats and accepts commands from a V2 control server.

  • x-pack/libbeat/management/tests/*
    An integration test suite that initializes an entire beat, creates a mock elastic-agent server, lets the beat write any metrics to a logfile, then verifies the metrics, fields, and configs are expected.

What's missing in this PR

Due to a combination of time and knowledge constraints, there are a number of things this PR doesn't do with regards to V2 support:

  • This PR only adds support for metricbeat and filebeat. Due to the idiosyncrasy of individual beats, support for other beats will need to come from code owners on those beats.
  • Elasticsearch header transformations. The V1 AST transformations, now gone in V2, have a InjectHeadersRule that will modify the header config of an elasticsearch output. These headers were obtained via an AgentInfo struct that currently exists in V2, but it doesn't support a Headers field.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

How to test this PR locally

Check out the tests directory in x-pack/libbeat/management. Each sub-directory contains a number of folders with integration tests for each beat. Note that until elastic/elastic-agent#850 is merged, this won't work against a real agent, and the mock server in the test suite will be required to actually test the V2 client code.

@fearful-symmetry fearful-symmetry added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Aug 12, 2022
@fearful-symmetry fearful-symmetry self-assigned this Aug 12, 2022
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner August 12, 2022 19:39
@fearful-symmetry fearful-symmetry requested review from faec and rdner and removed request for a team August 12, 2022 19:39
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 12, 2022
@fearful-symmetry fearful-symmetry requested review from a team as code owners August 12, 2022 19:55
@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Aug 12, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-01T22:50:10.667+0000

  • Duration: 129 min 20 sec

Test stats 🧪

Test Results
Failed 0
Passed 22753
Skipped 1947
Total 24700

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

E2E tests to pass given the configuration needs to change for the beats->agent V2 branches to be paired together.

👍

It would also be good to test this against the V2 agent branch manually if you haven't already.

I have tested with standalone agent, and it seems to work with metricbeat and filebeat.

@blakerouse
Copy link
Copy Markdown
Contributor

I have built beats with this PR and running with the latest Elastic Agent V2 architecture I am seeing the system/metrics input crashing.

{"log.level":"info","@timestamp":"2022-08-31T11:46:47.482-0400","log.origin":{"file.name":"coordinator/coordinator.go","file.line":276},"message":"Existing component state changed","component":{"id":"system/metrics-default","state":"Healthy","message":"Healthy: communicating with pid '89296'","inputs":[{"id":"system/metrics-default-system/metrics","state":"Starting","message":"Starting: spawned pid '89296'"}],"output":{"id":"system/metrics-default","state":"Starting","message":"Starting: spawned pid '89296'"}},"ecs.version":"1.6.0"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x104510e7c]

goroutine 88 [running]:
github.com/elastic/beats/v7/x-pack/libbeat/management.generateBeatConfig(0xc000cfd600?, 0x1?)
        github.com/elastic/beats/v7/x-pack/libbeat/management/generate.go:221 +0x1c
github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).handleInputReload(0xc00015b380, 0xc000cfd600?)
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:363 +0x173
github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).handleUnitReload(0xc000c12820?, 0xc000cfd600)
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:315 +0x45
created by github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).unitListen
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:256 +0x6ca

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

fearful-symmetry commented Aug 31, 2022

@blakerouse are there any conditions where we might expect the UnitExpectedConfig{} from Expected() to be nil? That would explain that, although I always assumed we'd be getting a valid struct of some kind.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Actually, if it failed at that particular line, it's more likely that the DataStream object inside the ExpectedConfig is nil, which is...weird. @blakerouse did you do anything special to get that error?

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Added a few fixes, including one I suspect might be the issue.

@blakerouse
Copy link
Copy Markdown
Contributor

@fearful-symmetry I am just using the default configuration that is shipped with Elastic Agent. Here is the configuration I am using:

inputs:
  - type: system/metrics
    id: system/metrics

    # Namespace name must conform to the naming conventions for Elasticsearch indices, cannot contain dashes (-), and cannot exceed 100 bytes
    # For index naming restrictions, see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html#indices-create-api-path-params
    data_stream.namespace: default
    use_output: default
    streams:
      - metricset: cpu
        # Dataset name must conform to the naming conventions for Elasticsearch indices, cannot contain dashes (-), and cannot exceed 100 bytes
        data_stream.dataset: system.cpu
      - metricset: memory
        data_stream.dataset: system.memory
      - metricset: network
        data_stream.dataset: system.network
      - metricset: filesystem
        data_stream.dataset: system.filesystem

// I'm assuming that a state STOPPED just tells us to shut down the entire beat,
// as such we don't really care about updating via a particular unit
if state == client.UnitStateStopped {
if state == client.UnitStateStopped || state == client.UnitStateStopping {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Expected() you will either receive HEALTHY or STOPPED that is the only 2 expected states that the Elastic Agent will tell you for a Unit. Of course the actual state of the unit can be of any type.

When you get client.UnitstateStopped it is telling that only that specific unit should be stopped. That does not mean the whole beat should be stopped. Example is when Elastic Agent is running 2 filestream inputs, and then only 1 is removed. The remove unit would be changed to client.UnitStateStopped where the other unit would remain as client.UnitStateHealthy. Elastic Agent will still report that removed unit as existing until you report client.UnitStateStopped through unit.UpdateState. Once that is done the an update from the client to fully remove the unit will occur.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Expected() you will either receive HEALTHY or STOPPED that is the only 2 expected states that the Elastic Agent will tell you for a Unit.

Ah, thanks, I was a bit confused by the logic there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So UnitStateStopping is only used for clients to update state to the server?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnitStateStopping is an information only state. It tells the Elastic Agent that work is being done to stop the unit. If the unit can be stopped immediately you can just report UnitStateStopped instantly (UnitStateStopping is not required to be reported).

Once UnitStateStopped is reported from Expected() it is required that the unit report as UnitStateStopped once it has completely stopped.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Also, the two simultaneous truths of "not every config will have data streams" and "the protobuf structs are full of pointers" didn't quite merge in my brain, and I've been testing for the maximal case of "all the config values".

Gonna add some more pointer-safe wrappers and tests for this.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Yah, I'm definitely paranoid about all the nil struct types that we can potentially choke on.

@blakerouse
Copy link
Copy Markdown
Contributor

I rebuilt with c942273 and I am still seeing a similar error:

{"log.level":"info","@timestamp":"2022-08-31T14:42:50.132-0400","log.origin":{"file.name":"coordinator/coordinator.go","file.line":276},"message":"Existing component state changed","component":{"id":"system/metrics-default","state":"Healthy","message":"Healthy: communicating with pid '97627'","inputs":[{"id":"system/metrics-default-system/metrics","state":"Starting","message":"Starting: spawned pid '97627'"}],"output":{"id":"system/metrics-default","state":"Starting","message":"Starting: spawned pid '97627'"}},"ecs.version":"1.6.0"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x10fd77aac]

goroutine 46 [running]:
github.com/elastic/beats/v7/x-pack/libbeat/management.generateBeatConfig(0xc000ad7600?, 0x1?)
        github.com/elastic/beats/v7/x-pack/libbeat/management/generate.go:224 +0x2c
github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).handleInputReload(0xc00011fd00, 0xc000ad7600?)
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:363 +0x173
github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).handleUnitReload(0xc0005f2040?, 0xc000ad7600)
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:315 +0x45
created by github.com/elastic/beats/v7/x-pack/libbeat/management.(*BeatV2Manager).unitListen
        github.com/elastic/beats/v7/x-pack/libbeat/management/managerV2.go:256 +0x6ca
{"log.level":"info","@timestamp":"2022-08-31T14:42:50.142-0400","log.origin":{"file.name":"coordinator/coordinator.go","file.line":276},"message":"Existing component state changed","component":{"id":"system/metrics-default","state":"Failed","message":"Failed: pid '97627' exited with code '2'","inputs":[{"id":"system/metrics-default-system/metrics","state":"Failed","message":"Failed: pid '97627' exited with code '2'"}],"output":{"id":"system/metrics-default","state":"Failed","message":"Failed: pid '97627' exited with code '2'"}},"ecs.version":"1.6.0"}

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Yup, working on it now @blakerouse , just kinda pushing to the PR as I go

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Okay, now we should be good.

@cmacknz cmacknz self-requested a review September 1, 2022 00:39
@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Alright, rather than trying to set the DataStream fields to non-nil fields ahead of time to try to guarantee that we don't see any nil values to begin with, this just changes everything to use the Get* methods, which seems a little less footcannon-prone

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One place left that we aren't using the Get field accessors that should likely be fixed, other than that I think we can likely merge this.

The tests are passing and various manual tests have confirmed Beats will run under agent using this branch. We can continue with bug fixes in follow up PRs.

@fearful-symmetry fearful-symmetry merged commit a5acefb into elastic:feature-arch-v2 Sep 6, 2022
cmacknz pushed a commit that referenced this pull request Nov 9, 2022
* Update Metricbeat, Filebeat, libbeat with elastic-agent V2 support (#32673)

* basic framework

* continued tinkering

* move away from ast code, use a struct

* get metricbeat working, starting on filebeat

* add notice update

* add basic config register

* move over processors to individual beats

* remove comments

* start to integrate V2 client changes

* finishing touches

* lint

* cleanup merge

* remove V1 controller

* stil tinkering with linter

* still fixing linter

* plz linter

* fmt x-pack files

* notice update

* fix output test

* refactor stop functions, refactor tests, some misc cleanup

* fix client version string

* add devguide

* linter

* expand filebeat test

* cleanup test

* fix docs, add tests, debuggin

* add signal handler

* fix mutex issue in register

* Fix osquerybeat configuration for V2

* clean up component registration

* spelling

* remove workaround for filebeat types

* try to fix filebeat tests

* add nil checks, fix test, fix unit stop

* continue tinkering with nil type checks

* add test for missing config datastreams, clean up nil handling

* change nil protections, use getter methods

* fix config access in output code

Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>

* V2 packetbeat support (#33041)

* first attempt at auditbeat support

* add license header

* initial packetbeat support

* fix bad branch

* cleanup

* typo in comment

* clean up, move around files

* add new processors to streams

* First pass at auditbeat support (#33026)

* first attempt at auditbeat support

* add license header

* cleanup

* move files around

* Add heartbeat support for V2 (#33157)

* add v2 config

* fix name

* fix doc

* fix go.mod

* fix unchecked stream_id

* fix unchecked stream_id (#33335)

* Update elastic-agent-libs for output panic fix (#33336)

* Fix errors for non-synth capable instances (#33310)

Fixes #32694 by making sure we use the lightweight wrapper code always when monitors cannot be initialized.

This also fixes an unrelated bug, where errors attached to non-summary events would not be indexed.

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#33323)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* add pid awareness to file locking (#33169)

* add pid awareness to file locking

* cleanup, logic for handling restarts with the same PID

* add zombie-state awareness

* fix file naming

* add retry for unlock

* was confused by unlock code, fix, cleanup

* update notice

* fix race with file creation, update deps

* clean up tests, spelling

* hack for cgo

* add lic headers

* notice

* try to fix windows issues

* fix typos

* small fixes

* use exclusive locks

* remove feature to start with a specially named pidfile

* clean up some error handling, fix test cleanup

* forgot changelog

* Fix sample config in log rotation docs (#33306)

* Add banner to deprecate functionbeat (#33297)

* fix unchecked stream_id

* packetbeat/protos/dns: clean up package (#33286)

* avoid magic numbers
* fix hashableDNSTuple size and offsets
* avoid use of String and Error methods in formatted print calls
* remove redundant conversions
* quieten linter
* use plugin-owned logp.Logger

* update elastic-agent-libs

* Revert "fix unchecked stream_id"

This reverts commit 26ef6da.

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#33339)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>

* update elastic-agent-client (#33552)

Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Update Metricbeat, Filebeat, libbeat with elastic-agent V2 support (#32673)

* basic framework

* continued tinkering

* move away from ast code, use a struct

* get metricbeat working, starting on filebeat

* add notice update

* add basic config register

* move over processors to individual beats

* remove comments

* start to integrate V2 client changes

* finishing touches

* lint

* cleanup merge

* remove V1 controller

* stil tinkering with linter

* still fixing linter

* plz linter

* fmt x-pack files

* notice update

* fix output test

* refactor stop functions, refactor tests, some misc cleanup

* fix client version string

* add devguide

* linter

* expand filebeat test

* cleanup test

* fix docs, add tests, debuggin

* add signal handler

* fix mutex issue in register

* Fix osquerybeat configuration for V2

* clean up component registration

* spelling

* remove workaround for filebeat types

* try to fix filebeat tests

* add nil checks, fix test, fix unit stop

* continue tinkering with nil type checks

* add test for missing config datastreams, clean up nil handling

* change nil protections, use getter methods

* fix config access in output code

Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>

* V2 packetbeat support (#33041)

* first attempt at auditbeat support

* add license header

* initial packetbeat support

* fix bad branch

* cleanup

* typo in comment

* clean up, move around files

* add new processors to streams

* First pass at auditbeat support (#33026)

* first attempt at auditbeat support

* add license header

* cleanup

* move files around

* Add heartbeat support for V2 (#33157)

* add v2 config

* fix name

* fix doc

* fix go.mod

* fix unchecked stream_id

* fix unchecked stream_id (#33335)

* Update elastic-agent-libs for output panic fix (#33336)

* Fix errors for non-synth capable instances (#33310)

Fixes #32694 by making sure we use the lightweight wrapper code always when monitors cannot be initialized.

This also fixes an unrelated bug, where errors attached to non-summary events would not be indexed.

* [Automation] Update elastic stack version to 8.6.0-5a8d757d for testing (#33323)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

* add pid awareness to file locking (#33169)

* add pid awareness to file locking

* cleanup, logic for handling restarts with the same PID

* add zombie-state awareness

* fix file naming

* add retry for unlock

* was confused by unlock code, fix, cleanup

* update notice

* fix race with file creation, update deps

* clean up tests, spelling

* hack for cgo

* add lic headers

* notice

* try to fix windows issues

* fix typos

* small fixes

* use exclusive locks

* remove feature to start with a specially named pidfile

* clean up some error handling, fix test cleanup

* forgot changelog

* Fix sample config in log rotation docs (#33306)

* Add banner to deprecate functionbeat (#33297)

* fix unchecked stream_id

* packetbeat/protos/dns: clean up package (#33286)

* avoid magic numbers
* fix hashableDNSTuple size and offsets
* avoid use of String and Error methods in formatted print calls
* remove redundant conversions
* quieten linter
* use plugin-owned logp.Logger

* update elastic-agent-libs

* Revert "fix unchecked stream_id"

This reverts commit 26ef6da.

* [Automation] Update elastic stack version to 8.6.0-40086bc7 for testing (#33339)

Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>

Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>

* update elastic-agent-client (#33552)

Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co>
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.