upgrades: add checkpointing for raftAppliedIndexTermMigration#85074
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Jul 27, 2022
Merged
upgrades: add checkpointing for raftAppliedIndexTermMigration#85074craig[bot] merged 1 commit intocockroachdb:masterfrom
raftAppliedIndexTermMigration#85074craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
The `raftAppliedIndexTermMigration` upgrade migration could be unreliable. It iterates over all ranges and runs a `Migrate` request which must be applied on all replicas. However, if any ranges merge or replicas are unavailable, the migration fails and starts over from the beginning. In large clusters with many ranges, this meant that it might never complete. This patch makes the upgrade more robust, by retrying each `Migrate` request 5 times, and checkpointing the progress after every fifth batch (1000 ranges), allowing resumption on failure. At some point this should be made part of the migration infrastructure. NB: This fix was initially submitted for 22.1, and even though the migration will be removed for 22.2, it is forward-ported for posterity. Release note: None
Member
irfansharif
approved these changes
Jul 26, 2022
Contributor
Author
|
bors r=irfansharif |
Contributor
|
Build failed (retrying...): |
Contributor
|
Build failed (retrying...): |
Contributor
|
Build failed (retrying...): |
Contributor
|
Build succeeded: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forward-port of #84909, for posterity.
The
raftAppliedIndexTermMigrationupgrade migration could beunreliable. It iterates over all ranges and runs a
Migraterequestwhich must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.
This patch makes the upgrade more robust, by retrying each
Migraterequest 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.
NB: This fix was initially submitted for 22.1, and even though the
migration will be removed for 22.2, it is forward-ported for posterity.
Release note: None