Skip to content

Add pipeline execution service#13990

Merged
martijnvg merged 3 commits intoelastic:feature/ingestfrom
martijnvg:ingest/add_execution_service
Oct 9, 2015
Merged

Add pipeline execution service#13990
martijnvg merged 3 commits intoelastic:feature/ingestfrom
martijnvg:ingest/add_execution_service

Conversation

@martijnvg
Copy link
Copy Markdown
Member

Added pipeline execution service that deals with munging index & bulk requests as they come in using a dedicated thread pool.

Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.

@martijnvg martijnvg added review :Distributed/Ingest Node Execution or management of Ingest Pipelines labels Oct 7, 2015
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the parallelization should be on per doc level (not per bulk)... this approach will apply to all scenarios (regardless of the number of concurrent bulk requests you have). naturally, doing so means we'll need to block the bulk until all its docs were processes, but that's doable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, that is what happens here. Each index request is processed at a time and when there are no more index requests to process then the bulk request continues as it would normally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually.. thought more about it. The above assumes something that I'm not sure we should assume - all pipelines created (and therefore treated) equally. Perhaps we should look at it differently:

  • conceptually, it's much easier to think of things when assigning a thread per pipeline execution (so it's not on a request level but on a pipeline level)
  • I imagine scenarios where there are multiple pipelines, some are more complex than others. One pipeline does very simple and fast processing, another is more complex and involved "heavier" computations (perhaps connect to 3rd party system on cache expiry of some sort).
  • If the above case applies, we should enable assigning a threadpool per pipeline. So for example, we define a default thread pool for all pipelines... but in its simplest form, perhaps a pipeline can define the thread pool it should be executed with. This way users can create dedicated TP for specific pipeline executions.

would love to get feedback from the LS team here... perhaps I'm completely off with my thinking here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, if this is the direction we should go into then the execution service can pick the right pipeline based on its configuration. I think in that case the ingest thread pool should be the default, in case no thread pool has been configured on a pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case the ingest thread pool should be the default, in case no thread pool has been configured on a pipeline.

yes.. that was my intention with the "default TP"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me. It would probably be nice to see performance characteristics between the two strategies when dealing with "light" pipelines

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I think we should open an issue where we are going to benchmark the ingest plugin and at the same time we can experiment with the two strategies.

@martijnvg
Copy link
Copy Markdown
Member Author

@uboness @talevy Fixed the typo and removed the todo.

…comes in using a dedicated thread pool.

Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.
@martijnvg martijnvg force-pushed the ingest/add_execution_service branch from 80666a3 to 11f17c0 Compare October 9, 2015 16:16
@martijnvg martijnvg merged commit 11f17c0 into elastic:feature/ingest Oct 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines >feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants