Skip to content

[Discussion] Adding support/helpers/processors for XML in libbeat #23366

@P1llus

Description

@P1llus

This issue is to discuss potential implementations for XML for beats. Looking through different open issues, there is plenty of places in which some sort of XML support would be beneficial.

However there are some pro's and con's to all of them, which is why I wanted to have this open discussions to get peoples viewpoint.

XML in general, using the XML encoder in golang does not support unmarshalling to a interface unlike JSON as a built-in feature, however there are libraries out there that takes care of a lot of that, also in terms of performance, the discussion however does not really need to focus on tooling, as the scope is more important at this stage.

As far as I see it, there is a few places in which we can add this:

1. Adding it as a new helper in libbeat common, similar to jsontransform and plenty of others.
Pro's:
The reason this is handy is to allow input developers to use the helper instead of either having to rewrite XML handling each time, or implementing different types of functionality.
Compared to a processor, handling the XML on input, before the queue is beneficial in many ways, for example processors does not support splitting of lists, which is a very common usecase when working on similar JSON structures, other usecases would be using the keys or values for any sort of conditional tagging, parsing or other transformations needed during ingest.

Con's:
Each input would need to manually add support for this.

2. Adding it as a new processor in libbeat, that allows any specific beat type to
Pro's:
Anyone can use it, just as with any other processor, makes it easy to cover a much larger scope

Con's:
Similar to the Pro's of above, does not make it possible to split or format the data beforehand.

3. Adding a XML processor for ingest pipeline
Pro's:
Anyone can use it, also outside of beats, similar to how the current Logstash XML filter functions.

Con's:
Currently ingest pipelines do not support splitting functionality, and the overhead created by XML is large, transforming it on the beat to JSON before sending would reduce the overhead significantly.

My own opinion on the subject is that all 3 is viable and useful, and could be implemented, but in order of ranking, I would use the same as the order above, especially since the helper created in libbeat could later be used in the processor definition as well.

Any thoughts, or thumbs up/down?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Team:IntegrationsLabel for the Integrations teamdiscussIssue needs further discussion.libbeat

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions