storage: avoid (one) fatal error from splitPostApply#39796
storage: avoid (one) fatal error from splitPostApply#39796craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
I wouldn't count on me getting a chance to update the test anytime soon, unfortunately. I can't say I particularly understand the subtitles involved in this bit of code yet, so if there's no hurry, I'm inclined to skip the sciencedog review and wait for someone that does understand them to look at it. |
|
Slightly different crash seen by @knz in #40470 |
28d5d6e to
704162a
Compare
|
I added a test that reproduces this. @ajwerner, mind reviewing this now that you're dealing with similar problems? |
daafca6 to
68d6486
Compare
Without the preceding commit, this fatals as seen in cockroachdb#39796: > F190821 15:14:28.241623 312191 storage/store.go:2172 > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not > found in right hand side of split Note that cockroachdb#40459 will break these tests since it will not allow the replicaID to change without an intermittent GC cycle, but we now rely on catching up via the raft log across a removal, split, and then re-add. At that point, the only way in which a split trigger could execute while in non-member status is if a replica got removed but would still receive a later split trigger. This could happen as follows: The log is `[..., remove, split]` a) the follower receives `split` b) it learns that `split` is committed so that c) it will apply both `remove` and then `split`. a) seems possible; if the removed follower is trailing a bit, the leader may append both entries at the same time before it (the leader) applies the replication change that removes the follower from the leader's config. b) seems possible like above, at least once we do async command application, if said append carries a commit index encompassing the append itself. c) will then naturally happen in today's code. There's no way we can reasonably enact this scenario in our testing at this point, though. The suggestion for cockroachdb#40459 is to essentially revert this commit, so that we at least retain the coverage related to the RHS of this test. Release note (bug fix): Avoid crashes with message "replica descriptor of local store not found".
ajwerner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)
pkg/storage/store.go, line 2171 at r2 (raw file):
// not in the exact scenario above) but it sounds like it could be // in general. So if it's zero, we leave it at zero and if it's not // zero we wind it back to one. The code below achieves this.
Can you note that this replica id of 1 is just used for in-memory Replica.mu.replicaID field and won't actually make it to raft? I was puzzled for a second by how it could be safe to just assume that you have replica id 1.
It seems from the test that 1 is a problematic value as it may overlap with an existing replica (see your test failure). What about using something else inside like math.MaxInt32?
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)
pkg/storage/store.go, line 2171 at r2 (raw file):
Previously, ajwerner wrote…
Can you note that this replica id of 1 is just used for in-memory
Replica.mu.replicaIDfield and won't actually make it to raft? I was puzzled for a second by how it could be safe to just assume that you have replica id 1.It seems from the test that 1 is a problematic value as it may overlap with an existing replica (see your test failure). What about using something else inside like math.MaxInt32?
Actually -1 seems better than MaxInt32
This is the next band-aid on top of cockroachdb#39658 and cockroachdb#39571. The descriptor lookup I added sometimes fails because replicas can process a split trigger in which they're not a member of the range: > F190821 15:14:28.241623 312191 storage/store.go:2172 > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not > found in right hand side of split I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`. There is one more bug here that is not addressed by this change, namely the fact that there might be a tombstone for the right hand side. This continues to be tracked in cockroachdb#40470. See also cockroachdb#40367 (comment) for more work we need to do to establish sanity. Release note: None
Without the preceding commit, this fatals as seen in cockroachdb#39796: > F190821 15:14:28.241623 312191 storage/store.go:2172 > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not > found in right hand side of split Note that cockroachdb#40459 will break these tests since it will not allow the replicaID to change without an intermittent GC cycle, but we now rely on catching up via the raft log across a removal, split, and then re-add. At that point, the only way in which a split trigger could execute while in non-member status is if a replica got removed but would still receive a later split trigger. This could happen as follows: The log is `[..., remove, split]` a) the follower receives `split` b) it learns that `split` is committed so that c) it will apply both `remove` and then `split`. a) seems possible; if the removed follower is trailing a bit, the leader may append both entries at the same time before it (the leader) applies the replication change that removes the follower from the leader's config. b) seems possible like above, at least once we do async command application, if said append carries a commit index encompassing the append itself. c) will then naturally happen in today's code. There's no way we can reasonably enact this scenario in our testing at this point, though. The suggestion for cockroachdb#40459 is to essentially revert this commit, so that we at least retain the coverage related to the RHS of this test. Release note (bug fix): Avoid crashes with message "replica descriptor of local store not found".
68d6486 to
4cdf0f9
Compare
ajwerner
left a comment
There was a problem hiding this comment.
that explanation is clearer. I look forward to adding the assertion back.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @tbg)
Build failed |
|
flaked on #39321 bors r+ |
39796: storage: avoid (one) fatal error from splitPostApply r=ajwerner a=tbg This is the next band-aid on top of #39658 and #39571. The descriptor lookup I added sometimes fails because replicas can process a split trigger in which they're not a member of the range: > F190821 15:14:28.241623 312191 storage/store.go:2172 > [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not > found in right hand side of split I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`. @danhhz could you take a stab at replicating this in the test you added recently? It's probably too ambitious but it'd be nice to verify this is actually fixed (I didn't). I haven't seen this fail on master (though I also didn't look very hard) so maybe this is rare (i.e. we have time), but then I just got it doing nothing related in particular. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.
Here's a high level overview of the change:
* Once a Replica object has a non-zero Replica.mu.replicaID it will not
change.
* In this commit however, if a node crashes it may forget that it learned
about a replica ID.
* If a raft message or snapshot addressed to a higher replica ID is received
the current replica will be removed completely.
* If a replica sees a ChangeReplicasTrigger which removes it then it
completely removes itself while applying that command.
* Replica.mu.destroyStatus is used to meaningfully signify the removal state
of a Replica. Replicas about to be synchronously removed are in
destroyReasonRemovalPending.
This hopefully gives us some new invariants:
* There is only ever at most 1 Replica which IsAlive() for a range on a Store
at a time.
* Once a Replica has a non-zero ReplicaID is never changes.
* This applies only to the in-memory object, not the store itself.
* Once a Replica applies a command as a part of the range descriptor it will
never apply another command as a different Replica ID or outside of the
Range.
* Corrolary: a Replica created as a learner will only ever apply commands
while that replica is in the range.
The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.
Fixes cockroachdb#40367.
Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.
Fixes cockroachdb#40257.
Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.
Fixes cockroachdb#40470.
Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.
Release justification: This commit is safe for 19.2 because it fixes release
blockers.
Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:
cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
cockroachdb#40470 "split trigger found right-hand side with tombstone"
cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
This is the next band-aid on top of #39658 and #39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:
I saw this randomly in
make test PKG=./pkg/ccl/partitionccl.@danhhz could you take a stab at replicating this in the test you added
recently? It's probably too ambitious but it'd be nice to verify this is
actually fixed (I didn't). I haven't seen this fail on master (though I also
didn't look very hard) so maybe this is rare (i.e. we have time), but then I
just got it doing nothing related in particular.
Release note: None