Skip to content

Add a docker plugin - dockerlogbeat#13761

Merged
fearful-symmetry merged 27 commits intoelastic:feature/dockerbeatfrom
fearful-symmetry:add_dockerlogbeat
Oct 9, 2019
Merged

Add a docker plugin - dockerlogbeat#13761
fearful-symmetry merged 27 commits intoelastic:feature/dockerbeatfrom
fearful-symmetry:add_dockerlogbeat

Conversation

@fearful-symmetry
Copy link
Copy Markdown
Contributor

@fearful-symmetry fearful-symmetry commented Sep 22, 2019

This is the result of discussions between myself and @henrikno about the needs of cloud monitoring. This PR adds a newbeat, dockerlogbeat, that's a docker logging plugin.

Right now, this is a functional PoC. When configured it'll send logs to elasticsearch, and because it uses libbeat, you can (mostly) pass it the same beat-cli style of config options you would any other beat: --log-opt output.elasticsearch.hosts="172.18.0.2:9200"

This plugin uses the libbeat publisher/pipeline library. When a new container starts that uses our plugin, the plugin checks the user-supplied config against a hashtable of libbeat pipelines. If the pipeline exists, it starts a new client for that pipeline and hands it to the logger. If not, it starts up a new pipeline with that config. When a container stops, it closes the client, and if there are no open clients associated with that pipeline, it closes the pipeline as well.

There's a detailed readme with instructions on building, running and debugging.

This is a draft that needs a lot of work. Here's a tentative TODO.

For an MVP:

  • We can't pass lists to --log-opts, as it's a map[string]string. If a user wants to set multiple ES hosts, they'll need to do something else.
  • Pipeline close must be async. Docker doesn't like it when it's HTTP calls hang.
  • If something upstream down for too long, the publish operation can block, preventing the FIFO queue from draining. We need more checks/parallelism to make it harder for the FIFO queue to get backed up.
  • I'm having issues with mage fmt. I can't seem to get it to add the right license headers, and it's hanging.
  • need to standardize make/mage targets and build tooling to be in line with the rest of the beat ecosystem
  • Vendor isn't commited, as I have no idea how we actually mange/setup /vendor.
  • Settle licensing questions. This has been tentatively committed with the elastic license.
  • Build system is a little weird. We need to copy beats/ to the vendor dir in order to get the docker build to work inside a container.
  • This bug, which is impacting the Close() operation to shutdown a client. [libbeat] pipeline.Close() throwing panic #13892
  • We need a test suite
  • We need to make some kind of decision about how we'll drop events. If the plugin's FIFO queues get backed up, the container's writes to stdio will block.
  • The code that reads from the FIFO pipe is a mess, and we should find a nicer way to do it.
  • Make locking more fine-grained
  • Clean up internal logging

GA release:

  • How do we want to integrate this with ingest pipelines?
  • Can we get this to send its own logs / health data to ES?
  • We want this to support file spooling, which is currently in beta. We have a few issues with file spooling as-is. Namely, the spools have no 'garbage collection' to clean up after files that are no longer used. Also, the system isn't very good at managing the multiple spool files that would accumulate with multiple libbeat pipelines.
  • some kind of integration test suite.
  • Update documentation in config.json

@fearful-symmetry fearful-symmetry added new beat Team:Integrations Label for the Integrations team labels Sep 22, 2019
@fearful-symmetry fearful-symmetry requested review from a team September 22, 2019 21:10
@fearful-symmetry fearful-symmetry self-assigned this Sep 22, 2019
@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

I'm still struggling to get the build system working. Docker plugins require a bunch of docker build operations, and trying to cram in all the dependencies to get it to build inside the container seems to be a problem.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Okay, the build tooling stuff at least works now. Still fighting with govendor.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

jenkins, test this

@exekias
Copy link
Copy Markdown
Contributor

exekias commented Oct 7, 2019

Great to see this progressing! let's try to open a PR against a feature branch as soon as we have something mergeable, you can solve the rest of TODOs in more PRs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See .go-version file for current go version in the repository.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to access that from mage?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev-tools/mage/settings.go exports GoVersion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get this line from?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker's example logging plugin does this, so every other plugin copied it and does the same thing. I'm guessing it's to 'reset' the reader in case of some undefined error, but I can't really make sense of it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation: https://github.com/gogo/protobuf/blob/master/io/uint32.go#L114

It does not reset the reader, but returns an ErrShortBuffer. The generated reader reads the complete event into memory and then unmarshals it. The 2e6 is the max read buffer size.

Once you get a message that is potentially bigger, you are stuck. The length field has already been read (read pointer has advanced), and you MUST NOT continue reading from the reader once you get ErrShortBuffer. This potentially leaves you with another problem.

I think it is fair to rule out old big messages (keep memory usage in check). Just creating a new reader (as done here) without consuming any more bytes will put you in the middle of a protobuf message, giving you an invalid read. If we want a reader that can drop large messages, then we need another implementation. But the current loop will happily produce errors and reinitialize the reader over and over again after until it did find a valid length field. The way protobuf is encoded, there are lengths fields for nested structures and strings all over the place. Unmarshaling will likely fail over and over again itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. I'll keep reading into this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered an alternative non-blocking mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...There's a non-blocking mode?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@urso urso Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run CloseClientWithFile in a separate go-routine (building an unbuffered 'channel'), because we don't want the API to block if CloseClientWithFile blocks. Due to the mutex here we do allow CreateClientWithConfig to block indirectly, though. Is this ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If CreateClientWithConfig bombs out while the HTTP handler is waiting for it, the container startup will fail and an error will be returned to the user via the CLI or whatever they're using to start. Otherwise a container might fail after a user a started it, or we get into some weird scenario where a container is running but the user doesn't know it's not logging. I'm less worried about the container stop passing its errors to the user.

Copy link
Copy Markdown

@urso urso Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: We are ok with CreateClientWithConfig and startLoggingHandler blocking?

The WaitClose setting (currently 0 and not exposed) could force startLoggingHandler to block for as many seconds as are configured on this setting (no matter which queue).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want similar behavior to other Beats, consider common.NewConfigFrom(...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.NewConfigFrom(...) is called after parseCfgKeys. I had to hack this in because NewConfigFrom doesn't like the map[string]string structures we get from docker. I had to coerce the string value into an an interface wrapper so it'll handle config keys like ["192.168.1.2","192.168.1.3"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sounds like a bug in go-ucfg. It should happily accept map[string]string.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it'll accept map[string]string, but if you pass it something like

map[string]string{
    "output.elasticsearch.hosts" : `["192.168.1.2","192.168.1.3"]`
}

You get Elasticsearch url: http://['172.18.0.3:9200,'172.18.0.2']:9200

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see. The -E flag is implemented in libbeat/cfgfile/cfgfile.go. It uses SettingFlag from libbeat/common/flags.go. The final flag parsing is implemented here: https://github.com/elastic/go-ucfg/blob/master/flag/value.go#L48

The value parser is an internal package, but maybe we can export it for use-cases like this: CLI settings getting passed indirectly.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Removed the panic()s from main and just printed to stderr, since I didn't want to import another log handler just to deal with the errors we get outside of the http handler.

@fearful-symmetry fearful-symmetry changed the base branch from master to feature/dockerbeat October 8, 2019 19:38
@fearful-symmetry fearful-symmetry marked this pull request as ready for review October 9, 2019 13:53
Copy link
Copy Markdown
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, this looks good to me!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to restart the reader after an error?

Copy link
Copy Markdown
Contributor Author

@fearful-symmetry fearful-symmetry Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, Steffen and I were discussing this. This was from docker's example, but it's almost certainly a terrible idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to an issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly? How do debug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the Issues? Yah, that's getting ported.

@fearful-symmetry
Copy link
Copy Markdown
Contributor Author

Iteratively removing the vendored deps, since I discovered that if you break something with govendor, it's not easy to fix with govendor.

@fearful-symmetry fearful-symmetry merged commit 8206e04 into elastic:feature/dockerbeat Oct 9, 2019
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request Nov 1, 2019
fearful-symmetry added a commit that referenced this pull request Dec 13, 2019
* Add a docker plugin - Elastic Log Driver (#13761)
fearful-symmetry added a commit to fearful-symmetry/beats that referenced this pull request Dec 16, 2019
* Add a docker plugin - Elastic Log Driver (elastic#13761)

(cherry picked from commit 4a7a8c3)
fearful-symmetry added a commit that referenced this pull request Dec 16, 2019
* Add a docker plugin - Elastic Log Driver (#13761)

(cherry picked from commit 4a7a8c3)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new beat Team:Integrations Label for the Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants