Add pipeline execution service#13990
Conversation
There was a problem hiding this comment.
I think the parallelization should be on per doc level (not per bulk)... this approach will apply to all scenarios (regardless of the number of concurrent bulk requests you have). naturally, doing so means we'll need to block the bulk until all its docs were processes, but that's doable.
There was a problem hiding this comment.
okay, that is what happens here. Each index request is processed at a time and when there are no more index requests to process then the bulk request continues as it would normally.
There was a problem hiding this comment.
actually.. thought more about it. The above assumes something that I'm not sure we should assume - all pipelines created (and therefore treated) equally. Perhaps we should look at it differently:
- conceptually, it's much easier to think of things when assigning a thread per pipeline execution (so it's not on a request level but on a pipeline level)
- I imagine scenarios where there are multiple pipelines, some are more complex than others. One pipeline does very simple and fast processing, another is more complex and involved "heavier" computations (perhaps connect to 3rd party system on cache expiry of some sort).
- If the above case applies, we should enable assigning a threadpool per pipeline. So for example, we define a default thread pool for all pipelines... but in its simplest form, perhaps a pipeline can define the thread pool it should be executed with. This way users can create dedicated TP for specific pipeline executions.
would love to get feedback from the LS team here... perhaps I'm completely off with my thinking here
There was a problem hiding this comment.
Good point, if this is the direction we should go into then the execution service can pick the right pipeline based on its configuration. I think in that case the ingest thread pool should be the default, in case no thread pool has been configured on a pipeline.
There was a problem hiding this comment.
I think in that case the
ingestthread pool should be the default, in case no thread pool has been configured on a pipeline.
yes.. that was my intention with the "default TP"
There was a problem hiding this comment.
This sounds reasonable to me. It would probably be nice to see performance characteristics between the two strategies when dealing with "light" pipelines
There was a problem hiding this comment.
agreed, I think we should open an issue where we are going to benchmark the ingest plugin and at the same time we can experiment with the two strategies.
…comes in using a dedicated thread pool. Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.
80666a3 to
11f17c0
Compare
Added pipeline execution service that deals with munging index & bulk requests as they come in using a dedicated thread pool.
Also changed how bulk requests are handled, because before it just didn't work, but added a todo there because it can potentially be handled differently.