[Filebeat][httpjson] Add split_events_by config setting#19246
[Filebeat][httpjson] Add split_events_by config setting#19246marc-gr merged 4 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/siem (Team:SIEM) |
P1llus
left a comment
There was a problem hiding this comment.
Just a few small comments from my side, rest looks LGTM, will test it once we have a solution that all agrees on.
Though will surely need more than me to review.
There was a problem hiding this comment.
Testing is not really my field of expertise, but since we have one with key, one without, what happened if a key is empty? And if we only have one object in the array? I am unsure if that needs coverage or not, but just wanted to mention it.
There was a problem hiding this comment.
Leaving a comment about our previous discussion, maybe @andrewkroh can pitch in on this, if we want splitEvent to happen on every single event even when splitEvent is not configured, and handle the config checks in the splitEvent method, or do it here instead.
There was a problem hiding this comment.
maybe renaming from splitEvent to eventAsList or similar makes for a better name here? wanted to avoid having two branches to handle the single event and the list inside the same case, but if it makes for a better understanding I am up for it
There was a problem hiding this comment.
I think splitEvent makes more sense because it's the term that has been used over time regarding logstash, then it's easier to think "yeah split filters, just like splitEvent" :D
In terms of having to branches, the only reason I comment is that based on my personal taste, I don't like to run functions that is made for a specific usecase if the conditions are not met.
When reading the code now, it seems more that splitEvent should always happen, until you scroll further down and see that it handles another if condition.
Let's wait and see if we can get another comment before making a decision.
There was a problem hiding this comment.
FYI the s3 input has "expand_event_list_from_field" which does something very similar, might be worth comparing implementation.
ff76a25 to
cddb0f3
Compare
|
I used a copy of this PR together with the new ATP module, it seems to be splitting on both arrays correctly. The document had multiple events as an array under "values" key, and in each event there was also an "evidence" array of objects. One API response with 2 objects under value, and each has 2 objects under evidence returned 4 documents, one for each evidence corretly. |
There was a problem hiding this comment.
It seems that Clone() is not needed here since the field always gets overridden below. so using m should work.
There was a problem hiding this comment.
Isn't m storing the JSON object while mm is storing the array inside? mm is creating a shallow copy, to ensure that m is not modified while iterating over it here:
for _, split := range splitOn {
|
Will it be a good idea to create a split_event Filebeat processor instead of putting this just in the httpjson input, so that this functionality can be supported for other inputs also? |
Do you have any examples on which other inputs might benefit from this? I think in theory it makes sense but maybe at a later date? The reason is that I feel if splitting is currently only working on json structure then the input should enforce the structure as well |
I agree with @P1llus, but in case we decide to do it as a separate processor, I am unsure about how to do it. The processor interface looks like: type Processor interface {
String() string // print full processor description
Run(in *Event) (event *Event, err error)
}And for this use case we would need something like: type Processor interface {
String() string // print full processor description
Run(in *Event) ([]event *Event, err error)
}I am not sure if there is any other way of doing this that is not as a Processor, or to embed the list of resulting events in a single one, but that feels a bit odd to me if it is the case. WDYT? |
cddb0f3 to
fcecdce
Compare
After you mentioned it I do remember now that I have looked at split in processors before, and currently a processor cannot create multiple documents, as it will always run once for each incoming event. |
1e17a3c to
7110fff
Compare
andrewkroh
left a comment
There was a problem hiding this comment.
I think this looks good. One consideration to make is that the split filter in Logstash also allows for strings.
The field being split can either be a string or an array. (https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html)
But since this isn't a generalized processor it doesn't need to have the same features. Plus we can add the ability to work on a []string if needed in the future.
7110fff to
c97f6e6
Compare
…ne-beats * upstream/master: (105 commits) ci: enable packaging job (elastic#19536) ci: disable upstream trigger on PRs for the packaging job (elastic#19490) Implement memlog on-disk handling (elastic#19408) fix go.mod for PR elastic#19423 (elastic#19521) [MetricBeat] add param `aws_partition` to support aws-cn, aws-us-gov regions (elastic#19423) Input v2 stateless manager (elastic#19406) Input v2 compatibility layer (elastic#19401) [Elastic Agent] Fix artifact downloading to allow endpoint-security to be downloaded (elastic#19503) fix: ignore target changes on scans (elastic#19510) Add more helpers to pipeline/testing package (elastic#19405) Report dependencies in CSV format (elastic#19506) [Filebeat] Fix reference leak in TCP and Unix socket inputs (elastic#19459) Cursor input skeleton (elastic#19378) Add changelog. (elastic#19495) [DOC] Typo in Kerberos (elastic#19265) Remove accidentally commited unused NOTICE template (elastic#19485) [Elastic Agent] Support the install, control, and uninstall of Endpoint (elastic#19248) [Filebeat][httpjson] Add split_events_by config setting (elastic#19246) ci: disabling packaging job until we fix it (elastic#19481) Fix golang.org/x/tools to release1.13 (elastic#19478) ...
(cherry picked from commit ce3f505)
What does this PR do?
Adds a
split_events_bysetting to thehttpjsoninput to allow similar mechanics as thesplitfilter for logstash.Why is it important?
There are many use cases where a list of elements is passed as an API response, and we want to create each of them to a single event.
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.