Details around datasource config
The Agent config today consists of three level: Datasources -> Inputs -> Streams. The data sources are a group of inputs and allow configuration of output or namespace on the level above inputs. Initially we only had a list of inputs in the agent config but we introduced the data source level for a few reasons:
- Align with the UI
- Make importing of configs possible
- Better error reporting to the UI
- Higher level configs to be configured for multiple inputs like output, namespace, constraints
In the following I want to dive into each point to see if this actually still matters.
Align with the UI
Having to think about the same concept when creating a config manually or through the UI is powerful to not have two different concepts. At the same time, it seems to be ok to have a more convenient way to configure groupings in the UI then on the agent side.
This is especially true as the grouping of inputs is related to packages which are not available on the agent side. If a users configures nginx he needs to manually specify a logs and metrics input anyway and the datasource grouping will not help him much.
Importing configs
Initially the idea was that data sources would make the importing of configs possible to map it to the UI. This works in case the config imported actually matches 1-1 to a package. But if the user specifies his own inputs and groups them together in some way, this will not work anymore. The solution on our end when importing would either be putting all inputs into 1 data source or create 1 data source per input. 1 data source per input would be more likely as otherwise the UI gets more complex (many inputs in 1 data source). With this, I think the import argument is not valid anymore.
Better error reporting
This still holds but I would argue we can solve it differently through metadata on each input. Each input should support additional metadata where we can add names and ids to have a better error reporting. So when an error is reported to Fleet, we know which data sources with inputs and streams inside it belongs to. Already with datasources we would need this as reporting an error just on a data source is not enough. Having this generic meta concept allows also better error reporting in the standalone case.
Higher Level Configs
It is convenient to configure namespace and output on the data source level. We should still allow this on the UI side but is not required on the agent config level. My assumption is that most users to get started use the default namespace and the default output, this means nothing has to be configured.
Users with more complex configs are likely to use some automation to build the configs in which case specifying the output and namespace more often should not be a problem.
Summary
I think on the agent side, the arguments around having the data source object do not hold up anymore.
Proposed new config
Based on the above, I suggest we remove datasources from the agent config but add namespace, output and meta information support on the input level:
inputs:
- type: system/metrics
namespace: default
use_output: default
meta:
package.name: bar
settings.id: foo
hello: world
streams:
- metricset: cpu
dataset: system.cpu
The part under meta is not understood by the agent but logged/shipped in case of an error.
This new config combined with the proposed changed for the fields used for the indexing strategy (elastic/package-registry#482) solves also the problem that there was no good way to set a different dataset.type for an input. For example the log input could also generate metrics. The config below will send data to metrics-foo-prod:
inputs:
- type: logs
dataset.type: metrics
dataset.namespace: prod
streams:
- paths: /var/log/foo.log
dataset.name: foo
Removing the datasource part also makes getting started easier for manual configuration. The simplest config now looks as following:
inputs:
- type: logs
streams:
- paths: /var/log/foo.log
It is expect that the Ingest Manager in Kibana still has a grouping of inputs available but will flatten it before shipped to the Agent.
Details around datasource config
The Agent config today consists of three level: Datasources -> Inputs -> Streams. The data sources are a group of inputs and allow configuration of output or namespace on the level above inputs. Initially we only had a list of inputs in the agent config but we introduced the data source level for a few reasons:
In the following I want to dive into each point to see if this actually still matters.
Align with the UI
Having to think about the same concept when creating a config manually or through the UI is powerful to not have two different concepts. At the same time, it seems to be ok to have a more convenient way to configure groupings in the UI then on the agent side.
This is especially true as the grouping of inputs is related to packages which are not available on the agent side. If a users configures nginx he needs to manually specify a logs and metrics input anyway and the datasource grouping will not help him much.
Importing configs
Initially the idea was that data sources would make the importing of configs possible to map it to the UI. This works in case the config imported actually matches 1-1 to a package. But if the user specifies his own inputs and groups them together in some way, this will not work anymore. The solution on our end when importing would either be putting all inputs into 1 data source or create 1 data source per input. 1 data source per input would be more likely as otherwise the UI gets more complex (many inputs in 1 data source). With this, I think the import argument is not valid anymore.
Better error reporting
This still holds but I would argue we can solve it differently through metadata on each input. Each input should support additional metadata where we can add names and ids to have a better error reporting. So when an error is reported to Fleet, we know which data sources with inputs and streams inside it belongs to. Already with datasources we would need this as reporting an error just on a data source is not enough. Having this generic meta concept allows also better error reporting in the standalone case.
Higher Level Configs
It is convenient to configure namespace and output on the data source level. We should still allow this on the UI side but is not required on the agent config level. My assumption is that most users to get started use the default namespace and the default output, this means nothing has to be configured.
Users with more complex configs are likely to use some automation to build the configs in which case specifying the output and namespace more often should not be a problem.
Summary
I think on the agent side, the arguments around having the data source object do not hold up anymore.
Proposed new config
Based on the above, I suggest we remove datasources from the agent config but add namespace, output and meta information support on the input level:
The part under meta is not understood by the agent but logged/shipped in case of an error.
This new config combined with the proposed changed for the fields used for the indexing strategy (elastic/package-registry#482) solves also the problem that there was no good way to set a different
dataset.typefor an input. For example theloginput could also generate metrics. The config below will send data tometrics-foo-prod:Removing the datasource part also makes getting started easier for manual configuration. The simplest config now looks as following:
It is expect that the Ingest Manager in Kibana still has a grouping of inputs available but will flatten it before shipped to the Agent.