[currently open issues]
related issues from other projects:
There are many use-cases where it is important to enrich incoming data. This enrichment may be something simple like using a regular expression to extract metadata from an existing field, or something more advanced like a geoip lookup or language identification. The filter stage of the Logstash processing pipeline provides great examples of the ways in which data is often enriched. Node ingest implements a new type of ES node, which performs this enrichment prior to indexing.
Node ingest is a pure Java implementation of the filters in logstash, integrated with Elasticsearch. It works by wrapping the bulk/index APIs, executing a pipeline that is composed of multiple processors to enrich the documents. A processor is just a component that can modify incoming documents (the source before ES turns it into a document). A pipeline is a list of processors grouped under an unique id. If node ingest is enabled then the index and bulk apis can reroute the the request with documents through a pipeline.
The ingest plugin runs on dedicated client nodes and after bulk and index requests have been enriched these index and bulk request continue their way into the cluster.
Node ingest will be a plugin in the elasticsearch project, implementing 2 main aspects:
The first is a pure Java implementation for Pipeline, Processor, as well as initial processor implementation of grok, geoip, kv/mutate, date. This java implementation can then be reused in others places, such as logstash itself, reindex API, and so on. In the first version of the ingest plugin the processor implementations can reside in the ingest plugin, but the framework and processor implementations shouldn’t rely on any ES specific code, so that later on it can be moved to an isolated library.
The second part is the integration with Elasticsearch. This includes interception of the bulk/index APIs, management APIs (stats and so on in future phase), storage and live reload of the configuration, supporting multiple "live" pipelines, and simulation of pipeline execution.
The goal of the ingest plugin is to make data enrichment easier and it will not replace logstash at all. The ingest plugin should make data enrichment in most of the cases easier when events are only stored in Elasticsearch. For example when only file beat is used to ship logs, a logstash instance will no longer be required. In cases where events are stored in multiple outputs a Logstash installation is required. Also at some point Logstash will reuse the pipeline/processor framework, so the end goal is that both Elasticsearch and Logstash will benefit from the ingest initiative.
Development happens in a feature branch: https://github.com/elastic/elasticsearch/tree/feature/ingest
Current node ingest tasks:
possible v2 tasks:
[currently open issues]
related issues from other projects:
There are many use-cases where it is important to enrich incoming data. This enrichment may be something simple like using a regular expression to extract metadata from an existing field, or something more advanced like a geoip lookup or language identification. The filter stage of the Logstash processing pipeline provides great examples of the ways in which data is often enriched. Node ingest implements a new type of ES node, which performs this enrichment prior to indexing.
Node ingest is a pure Java implementation of the filters in logstash, integrated with Elasticsearch. It works by wrapping the bulk/index APIs, executing a pipeline that is composed of multiple processors to enrich the documents. A processor is just a component that can modify incoming documents (the source before ES turns it into a document). A pipeline is a list of processors grouped under an unique id. If node ingest is enabled then the index and bulk apis can reroute the the request with documents through a pipeline.
The ingest plugin runs on dedicated client nodes and after bulk and index requests have been enriched these index and bulk request continue their way into the cluster.
Node ingest will be a plugin in the elasticsearch project, implementing 2 main aspects:
The first is a pure Java implementation for Pipeline, Processor, as well as initial processor implementation of grok, geoip, kv/mutate, date. This java implementation can then be reused in others places, such as logstash itself, reindex API, and so on. In the first version of the ingest plugin the processor implementations can reside in the ingest plugin, but the framework and processor implementations shouldn’t rely on any ES specific code, so that later on it can be moved to an isolated library.
The second part is the integration with Elasticsearch. This includes interception of the bulk/index APIs, management APIs (stats and so on in future phase), storage and live reload of the configuration, supporting multiple "live" pipelines, and simulation of pipeline execution.
The goal of the ingest plugin is to make data enrichment easier and it will not replace logstash at all. The ingest plugin should make data enrichment in most of the cases easier when events are only stored in Elasticsearch. For example when only file beat is used to ship logs, a logstash instance will no longer be required. In cases where events are stored in multiple outputs a Logstash installation is required. Also at some point Logstash will reuse the pipeline/processor framework, so the end goal is that both Elasticsearch and Logstash will benefit from the ingest initiative.
Development happens in a feature branch: https://github.com/elastic/elasticsearch/tree/feature/ingest
Current node ingest tasks:
pipeline_idparameter is available to select what pipeline should be used to preprocess the documents before the index/bulk APIs get executed. First cleanup of the WIP commit #13941node.ingestset to false receives an ingest request, it should explicitly fail. (mvg) [Ingest] addnode.ingestsetting that controls whether ingest is active #15610.ingestindex has been started. (mvg) Add an internal refresh api that makes pipeline changes visible on all ingest nodes #15203pipeline_idparam name topipeline. rename "pipeline_id" param to "pipeline" #15618tags toon_failuremetadata [Ingest] Update on_failure processor metadata to include a processor's tag #16202possible v2 tasks:
on_failureprocessors to receive a document with a pre-failed-processor state (ref: [Ingest Plugin] continue_on_failure parameter #14548 (comment))