Skip to content

Adding concept of dataset to package#110

Merged
ruflin merged 10 commits intoelastic:masterfrom
ruflin:input-directory
Oct 22, 2019
Merged

Adding concept of dataset to package#110
ruflin merged 10 commits intoelastic:masterfrom
ruflin:input-directory

Conversation

@ruflin
Copy link
Copy Markdown
Collaborator

@ruflin ruflin commented Sep 19, 2019

In the future metricbeat / filebeat and the agent will only support inputs. With this inputs become a first class citizen in our stack. An input is basically an agent configuration + an ingest pipeline. At the moment the package content is focused on having a config for the Beat or agent and all pipelines in one place. This complicates 2 things:

Having the concept could simplify things as the package builder must not try to prevent naming configs of ingest pipeline by introducing extra long names. Also it should simplify testing as often testing is focused on inputs. With this all assets related to an input are together.

As part of this PR there is an example on how such an input structure could look like. This should not replace the old place of ingest pipelines. If a user wants to build a package with just a ingest pipeline but not an input, this should also be possible in the future.

The changed structure is described in the ASSET.md file.

In the future metricbeat / filebeat and the agent will only support inputs. With this inputs become a first class citizen in our stack. An input is basically an agent configuration + an ingest pipeline. At the moment the package content is focused on having a config for the Beat or agent and all pipelines in one place. This complicates 2 things:

* Knowing which ingest pipeline belongs to a specific input
* Building integrations with multiple inputs: https://github.com/elastic/integrations/pulls

Having the concept could simplify things as the package builder must not try to prevent naming configs of ingest pipeline by introducing extra long names. Also it should simplify testing as often testing is focused on inputs. With this all assets related to an input are together.

As part of this PR there is an example on how such an input structure could look like. This should not replace the old place of ingest pipelines. If a user wants to build a package with just a ingest pipeline but not an input, this should also be possible in the future.

The changed structure is described in the ASSET.md file.
@@ -0,0 +1,14 @@
# This is not an array on purpose to make sure only 1 single input is specified in this file.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a meeting, @skh brought up a good point here that we have a double meaning of input here (shows up twice in the path). We didn't reach a conclusion on what this should be called, but dataset was mentioned as an option.

@exekias @jsoriano @ph Some thoughts on this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking about this a bit more. A dataset is basically a template for an input with all its assets. All inputs with the data set access.log look exactly the same in the end. An input can exist multiple times, still it is the same dataset. So I think this fits well here. Will rename.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a commit with renaming it. One thing I realised is that the agent now does not have an input anymore, but only streams. So the above was renamed to agent/stream/config.yml. @ph Does this sound correct?

@ruflin ruflin mentioned this pull request Oct 21, 2019
type: metric

# Each input can be in its own release status
release: beta
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hbharding Some inputs can also be in beta. We probably need some design to indicate this on the create data source page where inputs can be enabled / disabled.

@ruflin ruflin marked this pull request as ready for review October 22, 2019 11:27
@ruflin
Copy link
Copy Markdown
Collaborator Author

ruflin commented Oct 22, 2019

To have an example of a dataset in the repository for further discussion, I will merge this PR. There are still open questions around naming (dataset vs input) but the basic structure should stay the same. Having it in the repository will allow the EPM team to start implementing the structure and we get feedback if it works as expected. Also all future changes will be documented.

@ruflin ruflin merged commit e29b896 into elastic:master Oct 22, 2019
@ruflin ruflin changed the title Adding concept of input to package Adding concept of dataset+ to package Oct 22, 2019
@ruflin ruflin changed the title Adding concept of dataset+ to package Adding concept of dataset to package Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant