Add thread pool for write coordination by Tim-Brooks · Pull Request #129450 · elastic/elasticsearch

Tim-Brooks · 2025-06-14T21:16:19Z

This change adds a thread pool for write coordination to ensure that
bulk coordination does not get stuck on an overloaded primary node.

This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.

Tim-Brooks · 2025-06-14T21:16:29Z

~~WIP / opened for CI.~~

elasticsearchmachine · 2025-06-14T21:16:46Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

henningandersen

Thanks, this makes sense to me, left a comment, but otherwise think we can move forward with tests etc.

henningandersen · 2025-06-16T07:57:55Z

server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java

            },
-            executor
+            // Use the appropriate write executor for actual ingest processing
+            isOnlySystem ? systemWriteExecutor : writeExecutor


I am ok with this, but it seems like for any case where we have ingest processing, we would then still have the coordination happen behind any local write work. At least the PR then avoids some of the wait roundtrips.

We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.

We could also decide to have both a system-write-coordination pool and a write-coordination pool and use those here? We can look at this in follow-ups ofc.

Yes I would like to move ingest work to a non-WRITE thread pool in a follow-up as I think there might be a few things to discuss.

github-actions · 2025-06-17T16:04:58Z

🔍 Preview links for changed docs:

docs/reference/elasticsearch/configuration-reference/thread-pool-settings.md

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

henningandersen

LGTM.

This seems intuitive and simple enough that we can merge. But it seems worth keeping an eye on nightly benchmarks the following day or two, just to double check.

henningandersen · 2025-06-17T18:24:23Z

server/src/internalClusterTest/java/org/elasticsearch/action/bulk/IncrementalBulkIT.java

        safeAwait(startBarrier);
    }

    private static void fillWriteQueue(ThreadPool threadPool) {


Should we rename to:

Suggested change

private static void fillWriteQueue(ThreadPool threadPool) {

private static void fillWriteCoordinationQueue(ThreadPool threadPool) {

henningandersen · 2025-06-17T18:24:46Z

server/src/internalClusterTest/java/org/elasticsearch/action/bulk/IncrementalBulkIT.java

@@ -532,7 +532,7 @@ public void testShortCircuitShardLevelFailureWithIngestNodeHop() throws Exceptio
    }

    private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {


Rename to:

Suggested change

private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {

private static void blockWriteCoordinationPool(ThreadPool threadPool, CountDownLatch finishLatch) {

This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.

…c#4494) This PR includes ingestion loads from both `write_coordination` and `system_write_coordination` threadpools for autoscaling purpose. It also introduces a new setting that can be used to exclude the coordination thread pools in case BWC behaviour is needed. These thread pools are now always sampled for metrics purpose regardless whether they are included in the reports for autoscaling. Resolves: Relates: elastic#129450

Add thread pool for write coordination

92c9cc7

This change adds a thread pool for write coordination to ensure that bulk coordination does not get stuck on an overloaded primary node.

Tim-Brooks requested a review from a team as a code owner June 14, 2025 21:16

Tim-Brooks added >non-issue :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v9.1.0 labels Jun 14, 2025

elasticsearchmachine added the Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. label Jun 14, 2025

Tim-Brooks added 4 commits June 14, 2025 15:50

Fix test

31ac057

Fix

b7f5b12

Fix

ad5d0ed

Merge remote-tracking branch 'origin/main' into write_coordination_pool

828cbe5

henningandersen reviewed Jun 16, 2025

View reviewed changes

Tim-Brooks added 4 commits June 16, 2025 09:12

Merge remote-tracking branch 'origin/main' into write_coordination_pool

3249513

Change

6ca7bb9

Change

cb9b06f

Merge remote-tracking branch 'origin/main' into write_coordination_pool

fc95a00

Tim-Brooks requested a review from henningandersen June 16, 2025 19:39

Tim-Brooks added 4 commits June 16, 2025 13:49

Docs

e7f336a

Merge remote-tracking branch 'origin/main' into write_coordination_pool

902f5c8

Fix

8697eae

Merge remote-tracking branch 'origin/main' into write_coordination_pool

4f79791

henningandersen approved these changes Jun 17, 2025

View reviewed changes

Tim-Brooks added 6 commits June 17, 2025 12:58

Names

a81788c

Merge remote-tracking branch 'origin/main' into write_coordination_pool

1cf9619

Merge remote-tracking branch 'origin/main' into write_coordination_pool

9d8e9a0

Merge remote-tracking branch 'origin/main' into write_coordination_pool

a085bc1

Merge remote-tracking branch 'origin/main' into write_coordination_pool

fc0a515

Change

f76760d

Tim-Brooks merged commit 9ac6576 into elastic:main Jun 18, 2025
28 checks passed

shainaraskas mentioned this pull request Jul 22, 2025

add thread pool change availability #131734

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thread pool for write coordination#129450

Add thread pool for write coordination#129450
Tim-Brooks merged 19 commits intoelastic:mainfrom
Tim-Brooks:write_coordination_pool

Tim-Brooks commented Jun 14, 2025

Uh oh!

Tim-Brooks commented Jun 14, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 14, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Jun 16, 2025

Uh oh!

Tim-Brooks Jun 16, 2025

Uh oh!

github-actions bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Jun 17, 2025

Uh oh!

henningandersen Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	private static void fillWriteQueue(ThreadPool threadPool) {
	private static void fillWriteCoordinationQueue(ThreadPool threadPool) {

		@@ -532,7 +532,7 @@ public void testShortCircuitShardLevelFailureWithIngestNodeHop() throws Exceptio
		}

		private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {

	private static void blockWritePool(ThreadPool threadPool, CountDownLatch finishLatch) {
	private static void blockWriteCoordinationPool(ThreadPool threadPool, CountDownLatch finishLatch) {

Conversation

Tim-Brooks commented Jun 14, 2025

Uh oh!

Tim-Brooks commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 14, 2025

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tim-Brooks commented Jun 14, 2025 •

edited

Loading

github-actions bot commented Jun 17, 2025 •

edited

Loading