In order to support document-based routing in Fleet, integrations need to expose their routing rules as part of their data stream manifest files.
These rules would be translated to reroute processors appropriately by Fleet during integration installation.
A routing rule is composed of a few pieces of data:
- "Source" dataset: in which dataset should Fleet place this rule?
- "Destination" dataset: where should documents be routed by this rule?
- Condition: what logic determines when a document is routed?
- Namespace: under what namespace should this document be written once routed?
We'll need to support two types of routing rules defined by an integration:
- "Local" routing rules that route from a dataset on a given integration to other datasets on that same integration
- "Injected" routing rules that route from a dataset on a external integration back to the given integration
As far as integrations are concerned, though, there is no meaningful difference between writing a "local" routing rule and an "injected" routing rule in a data stream manifest. Fleet will be responsible for generating the appropriate processors in the appropriate ingest pipelines based on these rules. So, the implementation on the package-spec side will be a generic routing_rules object at the data stream manifest level.
Typically, routing rules will be defined on a "catch-all" or "data sink" style dataset like kubernetes.router that is essentially a passthrough to more specific data streams.
For example, we might have an nginx catch-all dataset that routes Nginx logs to more specific data sets like nginx.error and nginx.access based on the logfile path reported in each document.
Here's a proposed example of the above in action. Please see the annotative comments for more details:
# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs
# This is a catch-all "sink" data stream that routes documents to
# other datasets based on conditions or variables
dataset: nginx
# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true
routing_rules:
# "Local" routing rules are included under this current dataset, not a special case
nginx:
# Route error logs to `nginx.error` when they're sourced from an error logfile
- dataset: nginx.error
if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
namespace:
- {{labels.data_stream.namespace}}
- default
# Route access logs to `nginx.access` when they're sourced from an access logfile
- dataset: nginx.access
if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
namespace:
- {{labels.data_stream.namespace}}
- default
# Route K8's container logs to this catch-all dataset for further routing
k8s.router:
- dataset: nginx
if: "ctx?.container?.image?.name == 'nginx'"
namespace:
- {{labels.data_stream.namespace}}
- default
# Route syslog entries tagged with nginx to this catch-all dataset
syslog:
- dataset: nginx
if: "ctx?.tags?.contains('nginx')"
namespace:
- {{labels.data_stream.namespace}}
- default
Fleet support will be implemented as follow:
In order to support document-based routing in Fleet, integrations need to expose their routing rules as part of their data stream manifest files.
These rules would be translated to
rerouteprocessors appropriately by Fleet during integration installation.A routing rule is composed of a few pieces of data:
We'll need to support two types of routing rules defined by an integration:
As far as integrations are concerned, though, there is no meaningful difference between writing a "local" routing rule and an "injected" routing rule in a data stream manifest. Fleet will be responsible for generating the appropriate
processorsin the appropriate ingest pipelines based on these rules. So, the implementation on the package-spec side will be a genericrouting_rulesobject at the data stream manifest level.Typically, routing rules will be defined on a "catch-all" or "data sink" style dataset like
kubernetes.routerthat is essentially a passthrough to more specific data streams.For example, we might have an
nginxcatch-all dataset that routes Nginx logs to more specific data sets likenginx.errorandnginx.accessbased on the logfile path reported in each document.Here's a proposed example of the above in action. Please see the annotative comments for more details:
Fleet support will be implemented as follow: