You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the new indexing strategy currently the fields used are stream.type, stream.dataset, stream.namespace. Over the last weeks it showed that these fields might not be optimal so the proposal is to change it to dataset.type, dataset.name, dataset.namespace.
Note: This issue is in the package registry as at the moment the registry enforces these fields and public but it will have many other places that need update if we move forward with this.
What is the problem with stream.* fields?
stream is Agent specific: The name stream.* came initially out of building the Elastic Agent configuration as there we have inputs with streams, and each stream goes to a single dataset. But anyone can use the new indexing strategy so it should not be tied to a specific technology.
It is more than a stream: Proposed values for stream.type also can be content which is not necessarily a stream. See also [Meta] Add ECS Dataset fields ecs#845
Talking about dataset as a whole: When talking about the indexing strategy I realised I often talk about the dataset name and all of it as one dataset. A dataset is a set of data which belongs together. It is uniquely defined by the type, name and namespace. Having logs-nginx.access-default and logs-nginx.access-prod are two different datasets.
Based on the above I came to the conclusion that dataset should be an object and used for the indexing strategy fields.
One alternative that was discussed is using datastream instead as each dataset is stored in a datastream. But not each datastream is a dataset per this definition and it would attach it again to a specific technology implementation.
The other alternative discussed was using existing ECS fields like event.kind and event.dataset but as the types are different (constant_keyword), this does not work and we will be even more strict on names than currently in these fields. But the idea is that these fields will be closely linked on possible values.
Benefits of dataset.*
Using dataset.* also solves some existing problems:
stream.* conflicts with an existing docker input field in Filebeat which is a keyword
Decoupling of input.type in the Elastic Agent config from dataset.type. Even if the input.type is log, the dataset.type could be metrics if the log file contains metrics.
Removes confusion inside the agent config between stream and streams.
Changes needed
Places to change current stream.* implementation:
Elastic Agent field enrichment
Endpoint binary field enrichment
Package registry field validation
Package registry base package with templates
Integrations repository field addition
Integrations repository modules export script for dashboard filters
Indexing strategy docs
This change will likely have no impact on the UI side.
For the new indexing strategy currently the fields used are
stream.type,stream.dataset,stream.namespace. Over the last weeks it showed that these fields might not be optimal so the proposal is to change it todataset.type,dataset.name,dataset.namespace.Note: This issue is in the package registry as at the moment the registry enforces these fields and public but it will have many other places that need update if we move forward with this.
What is the problem with stream.* fields?
stream.*came initially out of building the Elastic Agent configuration as there we have inputs with streams, and each stream goes to a single dataset. But anyone can use the new indexing strategy so it should not be tied to a specific technology.stream.typealso can be content which is not necessarily a stream. See also [Meta] Add ECS Dataset fields ecs#845logs-nginx.access-defaultandlogs-nginx.access-prodare two different datasets.Based on the above I came to the conclusion that
datasetshould be an object and used for the indexing strategy fields.One alternative that was discussed is using
datastreaminstead as eachdatasetis stored in a datastream. But not each datastream is a dataset per this definition and it would attach it again to a specific technology implementation.The other alternative discussed was using existing ECS fields like
event.kindandevent.datasetbut as the types are different (constant_keyword), this does not work and we will be even more strict on names than currently in these fields. But the idea is that these fields will be closely linked on possible values.Benefits of dataset.*
Using
dataset.*also solves some existing problems:stream.*conflicts with an existing docker input field in Filebeat which is a keywordinput.typein the Elastic Agent config fromdataset.type. Even if theinput.typeis log, thedataset.typecould bemetricsif the log file contains metrics.streamandstreams.Changes needed
Places to change current stream.* implementation:
This change will likely have no impact on the UI side.