Skip to content

[Ingest] Use agent.yml template when creating a datasource stream from a package#62631

Merged
nchaulet merged 10 commits intoelastic:masterfrom
nchaulet:feature-use-agentyml-template-for-stream
Apr 15, 2020
Merged

[Ingest] Use agent.yml template when creating a datasource stream from a package#62631
nchaulet merged 10 commits intoelastic:masterfrom
nchaulet:feature-use-agentyml-template-for-stream

Conversation

@nchaulet
Copy link
Copy Markdown
Member

@nchaulet nchaulet commented Apr 6, 2020

Description

Resolves #62330
We were not using the yaml template (agent/stream.yml) from the package manager to create datasource stream from a package.

I fixed that by using the template in the POST and PUT datasources API.

Change

  • Modify the stored datasource to have a property pkg_stream that contains variable from the package.
  • Change the storedDatasourceStream to agent configurationStream, to not include variable(config) and to include pkg_stream
  • the yaml view when you create a datasource is not accurate anymore but it's going to be removed here [Ingest] Agent configuration - create data source UI #60696

@nchaulet nchaulet added the Team:Fleet Team label for Observability Data Collection Fleet team label Apr 6, 2020
@nchaulet nchaulet self-assigned this Apr 6, 2020
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@nchaulet nchaulet force-pushed the feature-use-agentyml-template-for-stream branch from cfca8e1 to fdf8741 Compare April 6, 2020 17:15
@nchaulet nchaulet force-pushed the feature-use-agentyml-template-for-stream branch from fdf8741 to 8a3d9b1 Compare April 6, 2020 19:09
@nchaulet nchaulet marked this pull request as ready for review April 6, 2020 23:19
@nchaulet nchaulet requested a review from a team April 6, 2020 23:19
@nchaulet nchaulet added release_note:skip Skip the PR/issue when compiling release notes v7.8.0 v8.0.0 labels Apr 6, 2020
@nchaulet
Copy link
Copy Markdown
Member Author

nchaulet commented Apr 7, 2020

@paul-tavares this is changing a little the comportment of the create Datasource endpoint if you are using a package, maybe worth to check it

Copy link
Copy Markdown
Contributor

@jfsiii jfsiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my phone, so I didn't get to run this. My main request is to avoid using Registry types/fetchers and only use EPM ones. Registry types are a last resort. Ideally they are never used outside EPM.

packageInfo,
config.id,
defaultOutput.id,
undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this signature to an object? Five parameters is a lot. Especially if some are optional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not modify this function here, just some prettier, and if you are good with I found this PR already containing a lot of change and keep it for another PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it's not a priority.

@nchaulet nchaulet requested a review from jfsiii April 7, 2020 18:42
@paul-tavares
Copy link
Copy Markdown
Contributor

Thanks @nchaulet for pinging me. I think we're ok. We are only currently using the config property of the datasource (datasource.inputs[].config) and the new one you added seems to be optional, so I also don't see that having an impact on us.

@nchaulet nchaulet requested a review from a team April 8, 2020 12:18
@nchaulet
Copy link
Copy Markdown
Member Author

nchaulet commented Apr 8, 2020

@paul-tavares yes this change is only concerning streams so you should be good 👍

@nchaulet nchaulet requested a review from jen-huang April 8, 2020 15:49
Copy link
Copy Markdown
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the yaml view when you create a datasource is not accurate anymore but it's going to be removed

Will the agent config YAML tab be accurate?

@nchaulet
Copy link
Copy Markdown
Member Author

nchaulet commented Apr 8, 2020

Will the agent config YAML tab be accurate?

@jen-huang Yes it will contains the same thing that is returned to the agent

@nchaulet nchaulet requested a review from jen-huang April 9, 2020 00:48
Copy link
Copy Markdown
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References to pkg_stream and "package stream" confuses me as it's not clear that these are config values coming from the stream template. What do you think of renaming to template_config (and referring to them as template config or template vars)?

I also tested a nginx metrics case and the results were not what I expected. My data source configuration:

image

Resulting YAML:

image

Based on the template definition here:
https://epr-staging.elastic.co/package/nginx/1.2.0/dataset/stubstatus/agent/stream/stream.yml

I would expect:

  • hosts, username, and password to appear at the stream level, not the input level

@ruflin Based on some discussion a while ago, before we consumed the templates, we decided that input-level vars would not get copied into each stream inside it (work done in #60594). This nginx metrics stream template conflicts with that decision as it expects vars from the input level.

@ph Maybe you can help weigh in on this from the agent side. For nginx datasource that only collects metrics, would you expect the agent config to look like this:

id: nginx-1
namespace: default
enabled: true
use_output: default
package:
  name: nginx
  version: 1.2.0
inputs:
  - type: nginx/metrics
    enabled: true
    hosts:
      - jen-test
      - jen-test2
    username: test-username
    password: test-password
    streams:
      - id: nginx/metrics-nginx.stubstatus
        enabled: true
        dataset: nginx.stubstatus
        period: 365d

or like this?:

id: nginx-1
namespace: default
enabled: true
use_output: default
package:
  name: nginx
  version: 1.2.0
inputs:
  - type: nginx/metrics
    enabled: true
    streams:
      - id: nginx/metrics-nginx.stubstatus
        enabled: true
        dataset: nginx.stubstatus
        period: 365d
        hosts:
          - jen-test
          - jen-test2
        username: test-username
        password: test-password

@nchaulet
Copy link
Copy Markdown
Member Author

nchaulet commented Apr 9, 2020

@jen-huang good catch I am currently not using the input level variable to populate the stream template.

In my opinion the generated config should be the second one, but that means we remove config from the input too, and only rely on the template to send data.

@ph
Copy link
Copy Markdown
Contributor

ph commented Apr 10, 2020

@jen-huang I understand the confusion, but the way @ruflin and I were thinking about key inheritance was that keys defined in the input would be applied to all the streams node. The driving force behind that opinion was linked to the standalone agent mode to reduce the number of fields a user needs to configure when configuring a data source and reduce duplication of fields/source of errors.

But, I think I should describe what are the rules in the agent in that context.

  • If a key is defined in the input level, the configuration will be copied in the stream. (with some exception id, enabled.)
  • If the same key exists in both input or streams an error will be produced.
  • If the key only exist at the stream level the agent will use that value.

But because of the above rules, this actually means that the agent would support both configurations you are describing. (@michalpristas this is correct right?).

I would prefer if we have a single way to define it, where values are defined at the input level, so its consistent with the user. But maybe the later is easier for code generation.

@ruflin
Copy link
Copy Markdown
Contributor

ruflin commented Apr 14, 2020

@jen-huang I understand the confusion here but I think what @ph describes is correct. Even though we define it on the input level in the end, the config we create is all on the stream level and very verbose as it is easiest to generate. Also not sure how otherwise we could have single stream template.

@nchaulet
Copy link
Copy Markdown
Member Author

Updated my PR to use agent config variable when building the agent stream from template

it's generate this input for nginx (disabling log error as the template is currently broken)

  - id: nginx-1 
    package:
       name: nginx
       version: 1.2.0
    namespace: default
    enabled: true
    use_output: default
    inputs:
      - type: nginx/metrics
        enabled: true
        streams:
          - enabled: true
            dataset: nginx.stubstatus
            input: nginx/metrics
            metricsets:
              - stubstatus
            period: 10s
            hosts:
              - 'http://127.0.0.1'
            username: nicolas
            password: mypassword
            id: nginx/metrics-nginx.stubstatus
      - type: logs
        enabled: true
        streams:
          - enabled: true
            dataset: nginx.access
            input: log
            paths:
              - /var/log/nginx/access.log*
            exclude_files:
              - .gz$
            processors:
              - add_locale: null
            id: logs-nginx.access

@nchaulet nchaulet requested a review from jen-huang April 14, 2020 19:10
@kibanamachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@ruflin
Copy link
Copy Markdown
Contributor

ruflin commented Apr 15, 2020

I pulled this down and looked at it in the UI. LGTM so far. One thing I stumbled over is:

processors:
              - add_locale: null

But in the stream.yml it is actually

processors:
  - add_locale: ~

@ph I wonder if null instead of ~ is going to work? Also wondering why it converts ~ to null.

@nchaulet
Copy link
Copy Markdown
Member Author

Yes null and ~ should be the same, we are using js-yaml to serialize, deserialize with the default config is serializing all null values to null

Copy link
Copy Markdown
Contributor

@jfsiii jfsiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the registry-related changes I mentioned. LGTM but I realize you're iterating with other people.

pkgVersion: pkg.version,
},
inputs
)) as TypeOf<typeof CreateDatasourceRequestSchema.body>['inputs'];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid the as entirely, but if it's required can we use the TS type (which does the TypeOf<typeof) instead of the kbn-config schema?

Happy for this to happen in a follow up PR.

Edit: oh this is an input, so perhaps we don't have a type?

@nchaulet nchaulet merged commit 423a297 into elastic:master Apr 15, 2020
@nchaulet nchaulet deleted the feature-use-agentyml-template-for-stream branch April 15, 2020 13:24
nchaulet added a commit to nchaulet/kibana that referenced this pull request Apr 15, 2020
nchaulet added a commit that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.8.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ingest] Fix stream generation from package

8 participants