Beats event processing and default fields by urso · Pull Request #10801 · elastic/beats

urso · 2019-02-18T18:41:46Z

This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.

With this change we convert the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
processing.Supporter can be used for customizations.

Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors.

Simplifies tests, but also adds a few test cases for dynamic fields and other settings.

libbeat/publisher/processing/processing.go

libbeat/publisher/processing/default.go

kvch · 2019-02-25T10:55:22Z

libbeat/publisher/processing/processing.go

To me this name is too abstract for what the function does, like calling a type Handler. What do you think about naming these processing.ConfigApplier and processing.ConfigApplierFactory or processing.ConfigResolver and processing.ConfigResolverFactory. So the concrete functions would be named processing.NewBeatConfigApplier/processing.NewBeatConfigResolver and processing.NewObserverConfigApplier/processing.NewObserverConfigResolver. WDYT?

The names mostly mimic index management which says idxmgmt.Supporter and so on. I used the same name for having a consistent naming scheme on similar functionality/patterns. Names ilm.Supporter, idxmgmt.Supporter, and processing.Supporter. Common patterns should have somewhat common names.
Maybe a better named would have been Provider or Feature, as I see these as overwritable libbeat features that originally have been hard coded.

While the constructors get an input config, I'd not name them ConfigApplier or ConfigResolver. If possible use a single noun over a multili-noun name. E.g. Configurer or Featurer. (I try not to make my names sound like twitter messages)

All these packages somewhat follow the strategy, abstract factory, and factory method patterns. The later mostly by chance.

The pattern found in ilm, idxmgmt, and now processing goes like this:

beat instance use SupporterFactory in order to provide it's own subsystems with some actual features implementation (Strategy):

The SupportFactory is feed with the beats global configuration in order to unpack configs and prepare its own state.

Supporter is the actual feature/strategy presented to a sub-system.

ilm, idxmgmt, publisher pipeline Subsystems act themselves as builders/constructors/factories for other subsystems that can pass additional parameters.

examples:

Elasticsearch output uses idxmgmt subsystem on connect

ES output uses idxmgmt subsystem for creating an index selector

setup uses idxmgmt subsystem with custom ES output

publisher pipeline uses processing in order to setup the final event processing pipeline.

As these subsystems act as Factories for other components the ilm/idxmgmt/processing.Supporter follows the Factory Method pattern. e.g. BuildSelector, Manager . This is no commonalty, but only by chance, as the other subsystems expect Factories.

The final instances generated by the Supporter(s) are the Strategies that run within a subsystems context: e.g. idxmgmt.Manager, beat.Processor.

Actually we don't really need the definitions of Supporter/SupporterFactory in these packages. Alternative we can remove them and move the interface definitions to the instance package. But as they are still passed around into other packages I defined these interface types for convenience.

I'm fine with finding better, names in general. How about Feature and FeatureFactory? No matter which names we choose, we will have to update the other packages in a followup PR as well.

Note: The processing.Supporter should be an interface, no function type.

Thank you for the detailed explanation. In this case I am ok with sticking with Supporter. I haven't been able to come up with anything better. :(
+1 on being consistent over packages.

The root cause has been fixed by elastic#10801 by accident already. This selectively backports the checks to 6.7, as elastic#10801 is to much of a change.

simitt

I think in the long run it would make sense to also export the single processors, so they are reusable. At the moment, if one needs a small adaption, the whole Processor implementation would need to be re-implemented.

libbeat/cmd/instance/beat.go

libbeat/cmd/instance/settings.go

libbeat/publisher/pipeline/pipeline.go

libbeat/publisher/processing/default.go

libbeat/publisher/processing/processors.go

libbeat/publisher/processing/default.go

libbeat/publisher/pipeline/pipeline.go

libbeat/publisher/processing/default.go

simitt · 2019-02-26T13:33:04Z

libbeat/cmd/instance/beat.go

Why is the ProcessorFactory defined on the root cmd level, but the ProcessingConfig on the pipeline level? Would it make sense to move the processorFactory also to the processing config level?

Beats do not directly publish methods to the queue in the publisher pipeline, but do so via a Client.
The pipeline including global processors and fields settings, queue, and outputs is created on startup. There are no go-routines collecting data yet.
Go-routines are supposed to use Connect, so to connect to a publisher pipeline. This returns a beat.Client (Client instances are not guaranteed to be thread safe). Each client/go-routine is allowed to configure local processors/fields/tags, which gets merged with the global settings. The factory is the global entity, loading, checking, and preparing global configurations on startup. The ProcessingConfig specifies the per go-routine local processing, which might be eventually established due to the Beat modules initialization or much later via autodiscovery.

ph · 2019-02-27T19:32:23Z

OKm I went through the code here, LGTM, I also agree with @simitt's comment that if we can align the naming that would be great even if the packages have different responsabilities.

ph · 2019-02-25T16:27:09Z

libbeat/publisher/processing/default.go

Should we instead use https://github.com/elastic/ecs/blob/master/code/go/ecs/version.go ?

The above comment is more, how we will keep the value in sync between release, I think we will forget. if we rely on the version from the official package the version will be updated every time we update the vendor libraries.

ph · 2019-02-27T19:34:58Z

I think in the long run it would make sense to also export the single processors, so they are reusable. At the moment, if one needs a small adaption, the whole Processor implementation would need to be re-implemented.

I agree with the above.

…0935) * Backport: Fix panic if user sets custom fields starting with host The root cause has been fixed by #10801 by accident already. This selectively backports the checks to 6.7, as #10801 is to much of a change.

urso · 2019-03-07T03:16:33Z

I refrained from exporting more functionality for now. The PR is already big enough. There is quite some legacy code regarding processors all over the place, that needs cleanup. I actually extracted the processors.AddFields/AddTags processors from this original code base a few weeks ago. If there is need we can export some more. The only noteworthy processors we might export in the future are: generalizeProcessor, and debugPrintProcessor.
Maybe the group one, but this one overlaps with some other legacy functionality in the processors package :(

Most of the logic is actually within the builder itself.

simitt · 2019-03-07T08:40:52Z

libbeat/publisher/processing/default.go

nit: comment not updated to SupportFactory

ah, gorename fun :)

simitt · 2019-03-07T08:43:14Z

libbeat/publisher/processing/default_test.go

nit: also rename the variable p.

ph

LGTM, just a small typo.

ph

LGTM, just a small typo.

This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Supporter` can be used for customizations.

urso · 2019-03-08T15:52:03Z

Travis was green, but codecov upload failed.
Unrelated metricbeat test failed with connection problem.

This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Support` can be used for customizations. (cherry picked from commit 83dfb2f)

…11155) Cherry-pick of PR #10801 to 7.0 branch. Original message: This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. With this change we convert the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Supporter` can be used for customizations. Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors. Simplifies tests, but also adds a few test cases for dynamic fields and other settings.

This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Support` can be used for customizations.

urso added in progress Pull request is currently in progress. libbeat labels Feb 18, 2019

urso requested review from a team as code owners February 18, 2019 18:41

houndci-bot reviewed Feb 18, 2019

View reviewed changes

urso force-pushed the beats-default-fields branch from a58a56c to 4043f14 Compare February 18, 2019 20:19

ph self-assigned this Feb 18, 2019

urso force-pushed the beats-default-fields branch 2 times, most recently from e8f1fef to 7a0259a Compare February 21, 2019 11:40

houndci-bot reviewed Feb 21, 2019

View reviewed changes

libbeat/publisher/processing/default.go Outdated Show resolved Hide resolved

urso mentioned this pull request Feb 21, 2019

Revert "Introduce SkipAddHostName setting. (#10728)" #10769

Merged

houndci-bot reviewed Feb 22, 2019

View reviewed changes

urso changed the title ~~[WIP] Beats event processing and default fields~~ Beats event processing and default fields Feb 22, 2019

urso added review and removed in progress Pull request is currently in progress. labels Feb 22, 2019

kvch reviewed Feb 25, 2019

View reviewed changes

kvch approved these changes Feb 25, 2019

View reviewed changes

urso mentioned this pull request Feb 25, 2019

Panic: fatal error: concurrent map iteration and map write #10824

Closed

urso mentioned this pull request Feb 25, 2019

Backport: Fix panic if user sets custom fields starting with host #10935

Merged

simitt reviewed Feb 26, 2019

View reviewed changes

ph approved these changes Feb 27, 2019

View reviewed changes

urso added the blocker label Mar 4, 2019

urso mentioned this pull request Mar 4, 2019

Bring back -d publish behavior #11053

Closed

urso force-pushed the beats-default-fields branch from 367ba9c to d9ab9e9 Compare March 7, 2019 03:16

simitt approved these changes Mar 7, 2019

View reviewed changes

ph approved these changes Mar 8, 2019

View reviewed changes

urso removed the request for review from a team March 8, 2019 14:20

urso added 13 commits March 8, 2019 15:23

fix stress runner

4c80ef3

fix functionbeat build

ccfeeac

fix default ecs fields

0c3f293

and unit tests

95dd496

Fix debug printer to respect the logging selector

d58a611

fix error return

02e216a

godoc

c6ecd82

typo

53202ac

turn processing.Supporter into interface

73895c8

review + minor cleanups

7400c52

missing renames

73a6f0a

typo

d5ea8b6

urso force-pushed the beats-default-fields branch from 84b9071 to d5ea8b6 Compare March 8, 2019 14:24

urso merged commit 83dfb2f into elastic:master Mar 8, 2019

urso mentioned this pull request Mar 8, 2019

Cherry-pick #10801 to 7.0: Beats event processing and default fields #11155

Merged

urso added the v7.0.0 label Mar 8, 2019

urso pushed a commit to urso/beats that referenced this pull request Mar 8, 2019

Add missing changelog for elastic#10801

b68ceaa

urso pushed a commit to urso/beats that referenced this pull request Mar 8, 2019

Add missing changelog for elastic#10801 (elastic#11166)

c9d35e5

simitt mentioned this pull request Mar 11, 2019

Ensure libbeat does not set host.name elastic/apm-server#1846

Closed

urso deleted the beats-default-fields branch May 9, 2019 18:48

Conversation

urso commented Feb 18, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ph commented Feb 27, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ph commented Feb 27, 2019

Uh oh!

urso commented Mar 7, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ph left a comment

Choose a reason for hiding this comment

Uh oh!

ph left a comment

Choose a reason for hiding this comment

Uh oh!

urso commented Mar 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants