Do not force refresh when write indexing buffer#50769
Merged
dnhatn merged 6 commits intoelastic:masterfrom Jan 9, 2020
Merged
Conversation
Collaborator
|
Pinging @elastic/es-distributed (:Distributed/Engine) |
dnhatn
commented
Jan 8, 2020
|
|
||
| // TODO: would be cleaner if I could pass this 1kb setting to the single node this test created.... | ||
| IndexingMemoryController imc = new IndexingMemoryController(settings, null, null) { | ||
| public void testSkipRefreshIfShardIsRefreshingAlready() throws Exception { |
Member
Author
There was a problem hiding this comment.
I only added this new test. Other tests are unchanged.
henningandersen
approved these changes
Jan 9, 2020
Contributor
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
I left a few smaller comments. Also, it would be good to have a few extra test runs of the server module to ensure it does not introduce spurious test failures.
server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/IndexingMemoryControllerIT.java
Outdated
Show resolved
Hide resolved
Member
Author
Yep, I am running it on my CI now. |
Member
Author
|
@ywelsch @henningandersen Thanks for reviewing. |
henningandersen
added a commit
to henningandersen/elasticsearch
that referenced
this pull request
Jan 10, 2020
The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates elastic#50769
henningandersen
added a commit
that referenced
this pull request
Jan 10, 2020
The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates #50769
dnhatn
added a commit
that referenced
this pull request
Jan 11, 2020
Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues: - The refresh thread pool can be exhausted and other shards can't refresh - Execute too many refreshes for the "largest" shards With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity. See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/
dnhatn
pushed a commit
that referenced
this pull request
Jan 11, 2020
The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates #50769
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues: - The refresh thread pool can be exhausted and other shards can't refresh - Execute too many refreshes for the "largest" shards With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity. See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates elastic#50769
This was referenced Feb 3, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues:
With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity.
See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/