Skip to content

Generating and committing a history_uuid on existing old indices destroys translog recovery info#26734

Merged
bleskes merged 5 commits intoelastic:6.xfrom
bleskes:history_uuid_commit_6x
Sep 21, 2017
Merged

Generating and committing a history_uuid on existing old indices destroys translog recovery info#26734
bleskes merged 5 commits intoelastic:6.xfrom
bleskes:history_uuid_commit_6x

Conversation

@bleskes
Copy link
Copy Markdown
Contributor

@bleskes bleskes commented Sep 21, 2017

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

  1. A new history uuid is generated as it doesn't exist in the index
  2. The translog is opened and it's uuid doesn't change.
  3. The committing the new history uuid used the standard commitIndexWriter method.
  4. The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
  5. The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this PR moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.

This PR is opened against 6.x . We have the option to leave the code on master as is. Let me know which you prefer for the long term. I can go either way.

Copy link
Copy Markdown
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This PR is opened against 6.x . We have the option to leave the code on master as is. Let me know which you prefer for the long term. I can go either way.

I have a weak preference to keep master as is and have this only go into 6.x. I'll leave it up to you.

// assert we don't loose key entries
assert commitDataAsMap(writer).containsKey(Translog.TRANSLOG_UUID_KEY) : "commit misses translog uuid";
assert commitDataAsMap(writer).containsKey(Translog.TRANSLOG_GENERATION_KEY) : "commit misses translog generation";
assert commitDataAsMap(writer).containsKey(MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_ID) : "commit misses max unsafe times stamps";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp

@bleskes bleskes merged commit 07b0c26 into elastic:6.x Sep 21, 2017
@bleskes bleskes deleted the history_uuid_commit_6x branch September 21, 2017 14:27
@bleskes
Copy link
Copy Markdown
Contributor Author

bleskes commented Sep 21, 2017

Thx @ywelsch . I will leave master as is. Having two commitIndexWriter methods doesn't have a good smell (even with the assertions I added to protect against abuse).

bleskes added a commit that referenced this pull request Sep 21, 2017
…roys translog recovery info (#26734)

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

1) A new history uuid is generated as it doesn't exist in the index
2) The translog is opened and it's uuid doesn't change.
3) The committing the new history uuid used the standard commitIndexWriter method.
4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue v6.0.0-rc1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants