Add thread pool for write coordination#129450
Conversation
This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.
|
|
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
henningandersen
left a comment
There was a problem hiding this comment.
Thanks, this makes sense to me, left a comment, but otherwise think we can move forward with tests etc.
| }, | ||
| executor | ||
| // Use the appropriate write executor for actual ingest processing | ||
| isOnlySystem ? systemWriteExecutor : writeExecutor |
There was a problem hiding this comment.
I am ok with this, but it seems like for any case where we have ingest processing, we would then still have the coordination happen behind any local write work. At least the PR then avoids some of the wait roundtrips.
We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.
There was a problem hiding this comment.
We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.
Yes I would like to move ingest work to a non-WRITE thread pool in a follow-up as I think there might be a few things to discuss.
|
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
This seems intuitive and simple enough that we can merge. But it seems worth keeping an eye on nightly benchmarks the following day or two, just to double check.
| safeAwait(startBarrier); | ||
| } | ||
|
|
||
| private static void fillWriteQueue(ThreadPool threadPool) { |
There was a problem hiding this comment.
Should we rename to:
| private static void fillWriteQueue(ThreadPool threadPool) { | |
| private static void fillWriteCoordinationQueue(ThreadPool threadPool) { |
| @@ -532,7 +532,7 @@ public void testShortCircuitShardLevelFailureWithIngestNodeHop() throws Exceptio | |||
| } | |||
|
|
|||
| private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) { | |||
There was a problem hiding this comment.
Rename to:
| private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) { | |
| private static void blockWriteCoordinationPool(ThreadPool threadPool, CountDownLatch finishLatch) { |
This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.
…c#4494) This PR includes ingestion loads from both `write_coordination` and `system_write_coordination` threadpools for autoscaling purpose. It also introduces a new setting that can be used to exclude the coordination thread pools in case BWC behaviour is needed. These thread pools are now always sampled for metrics purpose regardless whether they are included in the reports for autoscaling. Resolves: Relates: elastic#129450
…c#4494) This PR includes ingestion loads from both `write_coordination` and `system_write_coordination` threadpools for autoscaling purpose. It also introduces a new setting that can be used to exclude the coordination thread pools in case BWC behaviour is needed. These thread pools are now always sampled for metrics purpose regardless whether they are included in the reports for autoscaling. Resolves: Relates: elastic#129450
This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.