Exposed engine must include all operations below global checkpoint during rollback#36159
Merged
dnhatn merged 4 commits intoelastic:masterfrom Dec 9, 2018
Merged
Exposed engine must include all operations below global checkpoint during rollback#36159dnhatn merged 4 commits intoelastic:masterfrom
dnhatn merged 4 commits intoelastic:masterfrom
Conversation
…ring rollback Today we expose a new engine immediately during Lucene rollback. The new engine is started with a safe commit which might not include all acknowledged operation. With this change, we won't expose the new engine until it has recovered from the local translog. Note that this solution is not complete since it's able to reserve only acknowledged operations before the global checkpoint. This is because we replay translog up to the global checkpoint during rollback. A per-doc Lucene rollback would solve this issue entirely.
Collaborator
|
Pinging @elastic/es-distributed |
ywelsch
suggested changes
Dec 6, 2018
Contributor
ywelsch
left a comment
There was a problem hiding this comment.
I've left 2 smaller comments around the order of things and mutexes. Looking good otherwise
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
Member
Author
|
@ywelsch Two very good catches. It's ready for another round. Would you please have another look? Thank you! |
s1monw
approved these changes
Dec 7, 2018
Contributor
s1monw
left a comment
There was a problem hiding this comment.
left 2 comments. LGTM otherwise
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
| synchronized (mutex) { | ||
| // we must create a new engine under mutex (see IndexShard#snapshotStoreMetadata). | ||
| newEngine = engineFactory.newReadWriteEngine(newEngineConfig()); | ||
| onNewEngine(newEngine); |
Contributor
There was a problem hiding this comment.
onNewEngine does't publish the new engine right?
Member
Author
There was a problem hiding this comment.
onNewEngine does not expose the engine itself but exposes only the last refresh translog location to RefreshListeners.
bleskes
reviewed
Dec 7, 2018
Member
Author
|
Thanks everyone :) |
dnhatn
added a commit
that referenced
this pull request
Dec 9, 2018
Today we expose a new engine immediately during Lucene rollback. The new engine is started with a safe commit which might not include all acknowledged operation. With this change, we won't expose the new engine until it has recovered from the local translog. Note that this solution is not complete since it's able to reserve only acknowledged operations before the global checkpoint. This is because we replay translog up to the global checkpoint during rollback. A per-doc Lucene rollback would solve this issue entirely. Relates #32867
jasontedor
added a commit
to liketic/elasticsearch
that referenced
this pull request
Dec 9, 2018
* elastic/6.x: (37 commits) [HLRC] Added support for Follow Stats API (elastic#36253) Exposed engine must have all ops below gcp during rollback (elastic#36159) TEST: Always enable soft-deletes in ShardChangesTests Use delCount of SegmentInfos to calculate numDocs (elastic#36323) Add soft-deletes upgrade tests (elastic#36286) Remove LocalCheckpointTracker#resetCheckpoint (elastic#34667) Option to use endpoints starting with _security (elastic#36379) [CCR] Restructured QA modules (elastic#36404) RestClient: on retry timeout add root exception (elastic#25576) [HLRC] Add support for put privileges API (elastic#35679) HLRC: Add rollup search (elastic#36334) Explicitly recommend to forceMerge before freezing (elastic#36376) Rename internal repository actions to be internal (elastic#36377) Core: Remove parseDefaulting from DateFormatter (elastic#36386) [ML] Prevent stack overflow while copying ML jobs and datafeeds (elastic#36370) Docs: Fix Jackson reference (elastic#36366) [ILM] Fix issue where index may not yet be in 'hot' phase (elastic#35716) Undeprecate /_watcher endpoints (elastic#36269) Docs: Fix typo in bool query (elastic#36350) HLRC: Add delete template API (elastic#36320) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today we expose a new engine immediately during Lucene rollback. The new engine is started with a safe commit which might not include all acknowledged operation. With this change, we won't expose the new engine until it has recovered from the local translog.
Note that this solution is not complete since it's able to reserve only acknowledged operations before the global checkpoint. This is because we replay translog up to the global checkpoint during rollback. A per-doc Lucene rollback would solve this issue entirely.
Relates #32867