Skip to content

storage: preserve consistency when applying widening preemptive snapshots#29677

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
benesch:merge-inconsistency
Sep 7, 2018
Merged

storage: preserve consistency when applying widening preemptive snapshots#29677
craig[bot] merged 1 commit intocockroachdb:masterfrom
benesch:merge-inconsistency

Conversation

@benesch
Copy link
Copy Markdown
Contributor

@benesch benesch commented Sep 6, 2018

Merges can cause preemptive snapshots that widen existing replicas. For
example, consider the following sequence of events:

  1. A replica of range A is removed from store S, but is not garbage
    collected.
  2. Range A subsumes its right neighbor B.
  3. Range A is re-added to store S.

In step 3, S will receive a preemptive snapshot for A that requires
widening its existing replica, thanks to the intervening merge.

Problematically, the code to check whether this widening was possible,
in Store.canApplySnapshotLocked, was incorrectly mutating the range
descriptor in the snapshot header! Applying the snapshot would then fail
to clear all of the data from the old incarnation of the replica, since
the bounds on the range deletion tombstone were wrong. This often
resulted in replica inconsistency. Plus, the in-memory copy of the range
descriptor would be incorrect until the next descriptor update--though
this usually happened quickly, as the replica would apply the change
replicas command, which updates the descriptor, soon after applying the
preemptive snapshot.

To fix the problem, teach Store.canApplySnapshotLocked to make a copy of
the range descriptor before it mutates it.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Fixes #29252.

Release note: None

…hots

Merges can cause preemptive snapshots that widen existing replicas. For
example, consider the following sequence of events:

    1. A replica of range A is removed from store S, but is not garbage
       collected.
    2. Range A subsumes its right neighbor B.
    3. Range A is re-added to store S.

In step 3, S will receive a preemptive snapshot for A that requires
widening its existing replica, thanks to the intervening merge.

Problematically, the code to check whether this widening was possible,
in Store.canApplySnapshotLocked, was incorrectly mutating the range
descriptor in the snapshot header! Applying the snapshot would then fail
to clear all of the data from the old incarnation of the replica, since
the bounds on the range deletion tombstone were wrong. This often
resulted in replica inconsistency. Plus, the in-memory copy of the range
descriptor would be incorrect until the next descriptor update--though
this usually happened quickly, as the replica would apply the change
replicas command, which updates the descriptor, soon after applying the
preemptive snapshot.

To fix the problem, teach Store.canApplySnapshotLocked to make a copy of
the range descriptor before it mutates it.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Fixes cockroachdb#29252.

Release note: None
@benesch benesch requested review from a team, nvb and tbg September 6, 2018 21:15
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Nice!

:lgtm:

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@benesch
Copy link
Copy Markdown
Contributor Author

benesch commented Sep 7, 2018

Woohoo, thanks!

bors r+

@benesch
Copy link
Copy Markdown
Contributor Author

benesch commented Sep 7, 2018

Bors is busted.

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 7, 2018

Canceled

@benesch
Copy link
Copy Markdown
Contributor Author

benesch commented Sep 7, 2018

bors r+

craig bot pushed a commit that referenced this pull request Sep 7, 2018
29646: roachtest: better merge testing in clearrange r=benesch a=tschottdorf

Unfortunately, the method to determine the range count is quite slow since
crdb_internal.ranges internally sends an RPC for each range to determine
the leaseholder.

Anecdotally, I've seen ~25% of the merges completed after less than 15
minutes. I know that it's slowing down over time, but @benesch will fix
that.

Also throws in aggressive consistency checks so that when something goes
out of sync, we find out right there.

Release note: None

29677: storage: preserve consistency when applying widening preemptive snapshots r=benesch a=benesch

Merges can cause preemptive snapshots that widen existing replicas. For
example, consider the following sequence of events:

1. A replica of range A is removed from store S, but is not garbage
   collected.
2. Range A subsumes its right neighbor B.
3. Range A is re-added to store S.

In step 3, S will receive a preemptive snapshot for A that requires
widening its existing replica, thanks to the intervening merge.

Problematically, the code to check whether this widening was possible,
in Store.canApplySnapshotLocked, was incorrectly mutating the range
descriptor in the snapshot header! Applying the snapshot would then fail
to clear all of the data from the old incarnation of the replica, since
the bounds on the range deletion tombstone were wrong. This often
resulted in replica inconsistency. Plus, the in-memory copy of the range
descriptor would be incorrect until the next descriptor update--though
this usually happened quickly, as the replica would apply the change
replicas command, which updates the descriptor, soon after applying the
preemptive snapshot.

To fix the problem, teach Store.canApplySnapshotLocked to make a copy of
the range descriptor before it mutates it.

To prevent regressions, add an assertion that a range's start key is
never changed to the descriptor update path. With this assertion in
place, but without the fix itself,
TestStoreRangeMergeReadoptedLHSFollower reliably fails.

Fixes #29252.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 7, 2018

Build succeeded

@craig craig bot merged commit 3ced421 into cockroachdb:master Sep 7, 2018
@benesch benesch deleted the merge-inconsistency branch September 16, 2018 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage: consistency check failed on cyan

3 participants