Beats event processing and default fields#10801
Conversation
a58a56c to
4043f14
Compare
e8f1fef to
7a0259a
Compare
There was a problem hiding this comment.
To me this name is too abstract for what the function does, like calling a type Handler. What do you think about naming these processing.ConfigApplier and processing.ConfigApplierFactory or processing.ConfigResolver and processing.ConfigResolverFactory. So the concrete functions would be named processing.NewBeatConfigApplier/processing.NewBeatConfigResolver and processing.NewObserverConfigApplier/processing.NewObserverConfigResolver. WDYT?
There was a problem hiding this comment.
The names mostly mimic index management which says idxmgmt.Supporter and so on. I used the same name for having a consistent naming scheme on similar functionality/patterns. Names ilm.Supporter, idxmgmt.Supporter, and processing.Supporter. Common patterns should have somewhat common names.
Maybe a better named would have been Provider or Feature, as I see these as overwritable libbeat features that originally have been hard coded.
While the constructors get an input config, I'd not name them ConfigApplier or ConfigResolver. If possible use a single noun over a multili-noun name. E.g. Configurer or Featurer. (I try not to make my names sound like twitter messages)
All these packages somewhat follow the strategy, abstract factory, and factory method patterns. The later mostly by chance.
The pattern found in ilm, idxmgmt, and now processing goes like this:
- beat instance use
SupporterFactoryin order to provide it's own subsystems with some actual features implementation (Strategy):- The
SupportFactoryis feed with the beats global configuration in order to unpack configs and prepare its own state.
- The
- Supporter is the actual feature/strategy presented to a sub-system.
- ilm, idxmgmt, publisher pipeline Subsystems act themselves as builders/constructors/factories for other subsystems that can pass additional parameters.
- examples:
- Elasticsearch output uses idxmgmt subsystem on connect
- ES output uses idxmgmt subsystem for creating an index selector
- setup uses idxmgmt subsystem with custom ES output
- publisher pipeline uses processing in order to setup the final event processing pipeline.
- As these subsystems act as Factories for other components the
ilm/idxmgmt/processing.Supporterfollows theFactory Methodpattern. e.g.BuildSelector,Manager. This is no commonalty, but only by chance, as the other subsystems expect Factories.
- examples:
- The final instances generated by the
Supporter(s)are the Strategies that run within a subsystems context: e.g. idxmgmt.Manager, beat.Processor.
Actually we don't really need the definitions of Supporter/SupporterFactory in these packages. Alternative we can remove them and move the interface definitions to the instance package. But as they are still passed around into other packages I defined these interface types for convenience.
I'm fine with finding better, names in general. How about Feature and FeatureFactory? No matter which names we choose, we will have to update the other packages in a followup PR as well.
Note: The processing.Supporter should be an interface, no function type.
There was a problem hiding this comment.
Thank you for the detailed explanation. In this case I am ok with sticking with Supporter. I haven't been able to come up with anything better. :(
+1 on being consistent over packages.
The root cause has been fixed by elastic#10801 by accident already. This selectively backports the checks to 6.7, as elastic#10801 is to much of a change.
simitt
left a comment
There was a problem hiding this comment.
I think in the long run it would make sense to also export the single processors, so they are reusable. At the moment, if one needs a small adaption, the whole Processor implementation would need to be re-implemented.
libbeat/cmd/instance/beat.go
Outdated
There was a problem hiding this comment.
Why is the ProcessorFactory defined on the root cmd level, but the ProcessingConfig on the pipeline level? Would it make sense to move the processorFactory also to the processing config level?
There was a problem hiding this comment.
Beats do not directly publish methods to the queue in the publisher pipeline, but do so via a Client.
The pipeline including global processors and fields settings, queue, and outputs is created on startup. There are no go-routines collecting data yet.
Go-routines are supposed to use Connect, so to connect to a publisher pipeline. This returns a beat.Client (Client instances are not guaranteed to be thread safe). Each client/go-routine is allowed to configure local processors/fields/tags, which gets merged with the global settings. The factory is the global entity, loading, checking, and preparing global configurations on startup. The ProcessingConfig specifies the per go-routine local processing, which might be eventually established due to the Beat modules initialization or much later via autodiscovery.
|
OKm I went through the code here, LGTM, I also agree with @simitt's comment that if we can align the naming that would be great even if the packages have different responsabilities. |
There was a problem hiding this comment.
Should we instead use https://github.com/elastic/ecs/blob/master/code/go/ecs/version.go ?
There was a problem hiding this comment.
The above comment is more, how we will keep the value in sync between release, I think we will forget. if we rely on the version from the official package the version will be updated every time we update the vendor libraries.
I agree with the above. |
|
I refrained from exporting more functionality for now. The PR is already big enough. There is quite some legacy code regarding processors all over the place, that needs cleanup. I actually extracted the Most of the logic is actually within the builder itself. |
367ba9c to
d9ab9e9
Compare
There was a problem hiding this comment.
nit: comment not updated to SupportFactory
There was a problem hiding this comment.
nit: also rename the variable p.
This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Supporter` can be used for customizations.
84b9071 to
d5ea8b6
Compare
|
Travis was green, but codecov upload failed. |
This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Support` can be used for customizations. (cherry picked from commit 83dfb2f)
…11155) Cherry-pick of PR #10801 to 7.0 branch. Original message: This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. With this change we convert the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Supporter` can be used for customizations. Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors. Simplifies tests, but also adds a few test cases for dynamic fields and other settings.
This changes moves the generation of the event processing into it's distinct package, such that the actual publisher pipeline will not define any processors anymore. A new instance of a publisher pipeline must not add fields on it's own. This change converts the event processing pipline into the 'Supporter' pattern, which is already used for Index Management. As different beats ask for slightly different behavior in the event processing (e.g. normalize, default builtins and so on), the `processing.Support` can be used for customizations.
This changes moves the generation of the event processing into it's
distinct package, such that the actual publisher pipeline will not
define any processors anymore. A new instance of a publisher pipeline
must not add fields on it's own.
With this change we convert the event processing pipline into the 'Supporter'
pattern, which is already used for Index Management.
As different beats ask for slightly different behavior in the event
processing (e.g. normalize, default builtins and so on), the
processing.Supportercan be used for customizations.Also fixes new fields accidentily being added to the monitoring outputs, as it separates the pipeline and processors.
Simplifies tests, but also adds a few test cases for dynamic fields and other settings.