Threadpool merge executor does not block aborted merges#129613
Conversation
|
Hi @albertzaharovits, I've created a changelog YAML for you. |
| // updates the budget of enqueued elements (and possibly reorders the priority queue) | ||
| updateBudgetOfEnqueuedElementsAndReorderQueue(); | ||
| // update the budget of dequeued, but still in-use elements (these are the elements that are consuming budget) | ||
| unreleasedBudgetPerElement.replaceAll((e, v) -> budgetFunction.applyAsLong(e.element())); |
There was a problem hiding this comment.
This change will also adjust the budget of running merges that have been aborted to 0. That's a bit optimistic, but I find the alternative implementation convoluted, and it's probably counter-intuitive to estimate 0 for to-be-run merges but not for already-running ones.
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
We should preferably add specific testing, either before or after merging.
| .build(); | ||
| } | ||
|
|
||
| public void testShardCloseWhenDiskSpaceInsufficient() { |
There was a problem hiding this comment.
It is not clear to me what this verifies? AFAICS, there is no merge at the end of the test and thus it may not verify anything?
There was a problem hiding this comment.
Yeah, the test was not ready when you looked at it, it was still WIP, sorry for not being clear.
It is now ready and it tests that we can close a shard (an index) with enqueued merges that are blocked due to insufficient disk space. The merges will be aborted, which should unblock and prioritize them in the queue.
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
In the end, I've added 2 tests here:
There's decent coverage, I think. |
|
(labeling as a |
💔 Backport failed
You can use sqren/backport to manually backport by running |
This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes elastic#129335
…29727) This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes #129335
…) (#129728) * Threadpool merge executor does not block aborted merges (#129613) This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes #129335 * ClusterDisruptionIT.java * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes elastic#129335
This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes elastic#129335
This PR addresses a bug where aborted merges are blocked if there's insufficient disk space.
Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block.
This fix estimates a disk space budget of
0for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked.Closes #129335