Skip to content

[Fleet] Add support for routing rules in integrations #514

@kpollich

Description

@kpollich

In order to support document-based routing in Fleet, integrations need to expose their routing rules as part of their data stream manifest files.

These rules would be translated to reroute processors appropriately by Fleet during integration installation.

A routing rule is composed of a few pieces of data:

  • "Source" dataset: in which dataset should Fleet place this rule?
  • "Destination" dataset: where should documents be routed by this rule?
  • Condition: what logic determines when a document is routed?
  • Namespace: under what namespace should this document be written once routed?

We'll need to support two types of routing rules defined by an integration:

  • "Local" routing rules that route from a dataset on a given integration to other datasets on that same integration
  • "Injected" routing rules that route from a dataset on a external integration back to the given integration

As far as integrations are concerned, though, there is no meaningful difference between writing a "local" routing rule and an "injected" routing rule in a data stream manifest. Fleet will be responsible for generating the appropriate processors in the appropriate ingest pipelines based on these rules. So, the implementation on the package-spec side will be a generic routing_rules object at the data stream manifest level.

Typically, routing rules will be defined on a "catch-all" or "data sink" style dataset like kubernetes.router that is essentially a passthrough to more specific data streams.

For example, we might have an nginx catch-all dataset that routes Nginx logs to more specific data sets like nginx.error and nginx.access based on the logfile path reported in each document.

Here's a proposed example of the above in action. Please see the annotative comments for more details:

# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs

# This is a catch-all "sink" data stream that routes documents to 
# other datasets based on conditions or variables
dataset: nginx

# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true

routing_rules:
  # "Local" routing rules are included under this current dataset, not a special case
  nginx:
    # Route error logs to `nginx.error` when they're sourced from an error logfile
    - dataset: nginx.error
      if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

    # Route access logs to `nginx.access` when they're sourced from an access logfile
    - dataset: nginx.access
      if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default
  
  # Route K8's container logs to this catch-all dataset for further routing
  k8s.router: 
    - dataset: nginx
      if: "ctx?.container?.image?.name == 'nginx'"
      namespace:
        - {{labels.data_stream.namespace}}
        - default
   
  # Route syslog entries tagged with nginx to this catch-all dataset
  syslog:
    - dataset: nginx
      if: "ctx?.tags?.contains('nginx')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

Fleet support will be implemented as follow:

Metadata

Metadata

Assignees

Labels

Team:FleetLabel for the Fleet team

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions