Add pipeline execution service by martijnvg · Pull Request #13990 · elastic/elasticsearch

martijnvg · 2015-10-07T11:07:19Z

Added pipeline execution service that deals with munging index & bulk requests as they come in using a dedicated thread pool.

Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.

uboness · 2015-10-07T18:49:52Z

plugins/ingest/src/main/java/org/elasticsearch/plugin/ingest/transport/IngestActionFilter.java

I think the parallelization should be on per doc level (not per bulk)... this approach will apply to all scenarios (regardless of the number of concurrent bulk requests you have). naturally, doing so means we'll need to block the bulk until all its docs were processes, but that's doable.

okay, that is what happens here. Each index request is processed at a time and when there are no more index requests to process then the bulk request continues as it would normally.

actually.. thought more about it. The above assumes something that I'm not sure we should assume - all pipelines created (and therefore treated) equally. Perhaps we should look at it differently:

conceptually, it's much easier to think of things when assigning a thread per pipeline execution (so it's not on a request level but on a pipeline level)

I imagine scenarios where there are multiple pipelines, some are more complex than others. One pipeline does very simple and fast processing, another is more complex and involved "heavier" computations (perhaps connect to 3rd party system on cache expiry of some sort).

If the above case applies, we should enable assigning a threadpool per pipeline. So for example, we define a default thread pool for all pipelines... but in its simplest form, perhaps a pipeline can define the thread pool it should be executed with. This way users can create dedicated TP for specific pipeline executions.

would love to get feedback from the LS team here... perhaps I'm completely off with my thinking here

Good point, if this is the direction we should go into then the execution service can pick the right pipeline based on its configuration. I think in that case the ingest thread pool should be the default, in case no thread pool has been configured on a pipeline.

I think in that case the ingest thread pool should be the default, in case no thread pool has been configured on a pipeline.

yes.. that was my intention with the "default TP"

This sounds reasonable to me. It would probably be nice to see performance characteristics between the two strategies when dealing with "light" pipelines

agreed, I think we should open an issue where we are going to benchmark the ingest plugin and at the same time we can experiment with the two strategies.

martijnvg · 2015-10-09T08:31:08Z

@uboness @talevy Fixed the typo and removed the todo.

…comes in using a dedicated thread pool. Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.

martijnvg added review :Distributed/Ingest Node Execution or management of Ingest Pipelines labels Oct 7, 2015

clintongormley added the >feature label Oct 7, 2015

uboness reviewed Oct 7, 2015
View reviewed changes

martijnvg mentioned this pull request Oct 9, 2015

Ingest Node - enrich data as it gets in via pipelines #14049

Closed

43 tasks

martijnvg added 3 commits October 9, 2015 18:16

Added pipeline execution service that deals with updating data as it …

82a9ba3

…comes in using a dedicated thread pool. Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.

prevent IndexRequest from being processed multipel times

b3ad3f3

rename constant name and removed the todo

11f17c0

martijnvg force-pushed the ingest/add_execution_service branch from 80666a3 to 11f17c0 Compare October 9, 2015 16:16

martijnvg merged commit 11f17c0 into elastic:feature/ingest Oct 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline execution service#13990

Add pipeline execution service#13990
martijnvg merged 3 commits intoelastic:feature/ingestfrom
martijnvg:ingest/add_execution_service

martijnvg commented Oct 7, 2015

Uh oh!

uboness Oct 7, 2015

Uh oh!

martijnvg Oct 8, 2015

Uh oh!

uboness Oct 8, 2015

Uh oh!

martijnvg Oct 8, 2015

Uh oh!

uboness Oct 8, 2015

Uh oh!

talevy Oct 9, 2015

Uh oh!

martijnvg Oct 9, 2015

Uh oh!

martijnvg commented Oct 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

martijnvg commented Oct 7, 2015

Uh oh!

uboness Oct 7, 2015

Choose a reason for hiding this comment

Uh oh!

martijnvg Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

uboness Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

martijnvg Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

uboness Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

talevy Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

martijnvg Oct 9, 2015

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Oct 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants