Adding a refresh listener to a recovering shard should be a noop#26055
Merged
bleskes merged 6 commits intoelastic:masterfrom Aug 4, 2017
Merged
Adding a refresh listener to a recovering shard should be a noop#26055bleskes merged 6 commits intoelastic:masterfrom
bleskes merged 6 commits intoelastic:masterfrom
Conversation
…ds (during recovery)
jasontedor
approved these changes
Aug 4, 2017
| @@ -848,7 +848,7 @@ public long getWritingBytes() { | |||
|
|
|||
| public RefreshStats refreshStats() { | |||
| // Null refreshListeners means this shard doesn't support them so there can't be any. | |||
| .settings(settings) | ||
| .primaryTerm(0, 1).build(); | ||
| IndexShard primary = newShard(new ShardId(metaData.getIndex(), 0), true, "n1", metaData, null); | ||
| recoveryShardFromStore(primary); |
Member
There was a problem hiding this comment.
While we're here, can you fix the name of this method to be recoverShardFromStore?
Contributor
Author
|
@jasontedor thanks. Pushed another commit with a solution for the visibility issue. Can you take another look? |
jasontedor
approved these changes
Aug 4, 2017
Member
jasontedor
left a comment
There was a problem hiding this comment.
Visibility change looks good, so still LGTM.
b32f9ef to
fc6e558
Compare
bleskes
added a commit
that referenced
this pull request
Aug 4, 2017
) When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below). To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway. Deadlock with recovery: When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.
bleskes
added a commit
that referenced
this pull request
Aug 4, 2017
) When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below). To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway. Deadlock with recovery: When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When
refresh=wait_foris set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below).To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway.
There is a still a small problem I want to think how to solve - an indexing operation that came in with wait_for_refresh after the finalize recovery and before markAsDone is called will not be immediately visible when moving to
POST_RECOVERY. I'm going to give it some more thought (I hope a simple refresh will do) but I think we can start reviewing on the main issue.Deadlock with recovery:
When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with
refresh=wait_foris added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.