storage: kv.atomic_replication_changes=true#40464
storage: kv.atomic_replication_changes=true#40464craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
This is exciting!
|
|
Hitting "descriptor changed" errors again in mixed-headroom, which means another migration concern: The I found two callers that didn't check the cluster version, it seems likely that they're used during IMPORT and caused this problem. Added a commit to address this. I saw this in 2 out of 10 runs, so it's going to take a bit of time to confirm it's gone, but pretty sure this is it |
901e38d to
b6612c7
Compare
|
5/5 headrooms and 4/5 mixed-headrooms passed. The one failure is the above, and is not related to this PR. Going to kick of another 10 mixed-headrooms with the presumed fix for the split error. |
|
@nvanbenschoten could you give me a review here? This PR was trivial but now it has a fix for the botched sticky migration I discovered while testing, and that patch won't apply cleanly on master. |
|
PS don't want to withhold the satisfying mixed-headroom results: |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 2 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/replica_command.go, line 470 at r2 (raw file):
newDesc := *desc newDesc.StickyBit = nil // can use &zero in 20.1
Is it possible to get here before all nodes update to 19.2, given the if (desc.GetStickyBit() == hlc.Timestamp{}) { check?
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
log.Warning(ctx, err) } else { // NB: Passing 'hour' here is technically illegal until 19.2 is
Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of AdminSplitRequest.ExpirationTime as the single migration boundary? It seems pretty easy to plumb a cluster setting into SSTBatcher and add logic like we have in restore.go -> splitAndScatter.
tbg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_command.go, line 470 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is it possible to get here before all nodes update to 19.2, given the
if (desc.GetStickyBit() == hlc.Timestamp{}) {check?
You're right, the early return prevents this. I added a comment instead of using &zero here because both are equivalent and nil is more obviously safe.
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does all of the other complexity in this commit go away if we actually do this migration correctly and keep the use of
AdminSplitRequest.ExpirationTimeas the single migration boundary? It seems pretty easy to plumb a cluster setting intoSSTBatcherand add logic like we have inrestore.go -> splitAndScatter.
ExpirationTime is not encapsulated at all. We have 10 calls to settings.Version.IsActive(cluster.VersionStickyBit) before this PR, all at various callers to AdminSplit. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.
So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).
I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.
b6612c7 to
78e4c88
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_command.go, line 74 at r4 (raw file):
} else if err := detail.ActualValue.GetProto(&actualDesc); err == nil && desc.RangeID == actualDesc.RangeID && !desc.Equal(actualDesc) { return fmt.Sprintf("descriptor changed: [expected] %+#v != [actual] %+#v", desc, &actualDesc), true
This is unintentional debugging detritus. Reminder to self to remove
nvb
left a comment
There was a problem hiding this comment.
if you don't think it's worth cleaning up the migration further.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/replica_command.go, line 246 at r4 (raw file):
} func splitTxnStickyUpdateAttempt(
I think this was assuming that it could always set StickyBit to a non-nil value because it was only called in the if desc.GetStickyBit().Less(args.ExpirationTime) { case, which was only possible if the correct cluster version was enabled. That should have been documented, but that should still be the case.
pkg/storage/bulk/sst_batcher.go, line 250 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
ExpirationTimeis not encapsulated at all. We have 10 calls tosettings.Version.IsActive(cluster.VersionStickyBit)before this PR, all at various callers toAdminSplit. The offending calls were added after the sticky bit was introduced (and don't handle the case in which the sticky bit is not available), simply because it's so hard to tell that there is a migration to worry about.So yes, if nobody ever issued an incorrect AdminSplit, there wouldn't be a problem (except that I think something was still sometimes using &zero here instead of nil).
I can add the cluster version check here if it tickles your OCD that I omitted it, but I'd rather not make any changes in the storage end of things, because a) I don't want to rely on the callers as outline before, and b) the moment I make a substantial change here I'll have to spend hours re-validating the results, and I don't think that's worth it.
If you feel comfortable with the degree of validation that this setup for the migration has gotten then I am. It just seemed easier to fix the botched migration with SplitAndScatter than forcing the handling of AdminSplit to need to worry about non-empty ExpirationTime values.
78e4c88 to
8c910ec
Compare
tbg
left a comment
There was a problem hiding this comment.
bors r=nvanbenschoten
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
|
flake was DML txn retry in logic test: #40542 (comment) bors r=nvanbenschoten |
Build failed (retrying...) |
|
Bors you liar bors r=nvanbenschoten |
|
I wonder if this PR crashes bors every time? bors r=nvanbenschoten |
|
bors r+ |
|
Let's try this again. bors r=nvanbenschoten |
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg I ran the experiments in #40370 (comment) on (essentially) this branch and everything passed. Going to run another five instances of mixed-headroom and headroom with this change to shake out anything else that I might've missed. Release note (general change): atomic replication changes are now enabled by default. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Before 19.2, callers to AdminSplit are only ever supposed to pass
`Expiration==hlc.Timestamp{}` as this triggers the legacy code path
necessary while nodes in the cluster may not know desc.StickyBit yet. To
avoid CPut problems when those nodes update the range descriptor, we
must not persist non-nil values on it in that case. It looked like we
would sometimes still persist a &zero, which could cause problems.
The bigger problem though was that there were also two callers that
straight-up didn't check the cluster version and passed nonzero values
into AdminSplit. These callers were added recently and I can't blame
anyone there; it is impossible to know that one argument needs to be
zero before 19.2.
Instead of trying to fix this invariant (which wasn't trivial in this
case) just ignore expiration times when the coordinator doesn't think
19.2 is already active. This could lead to sticky bits being ignored
right around the cluster transition, but that seems much better than
risking old nodes not being able to carry out any changes to the
descriptors any more (which is the consequence of writing non-nil
sticky bits before this is safe).
Release note (bug fix): Fix a cluster migration bug that could occur
while running in a mixed 19.1/19.2 cluster. The symptom would be
messages of the form:
```
X at key /Table/54 failed: unexpected value: ...
```
Affected clusters should be updated to 19.2 or, if 19.1 is desired,
recreated from a backup.
Release note (general change): atomic replication changes are now enabled by default.
8c910ec to
b93a4ee
Compare
|
Rebased on master and pushed again. Let's see if that helps. bors r=nvanbenschoten |
40464: storage: kv.atomic_replication_changes=true r=nvanbenschoten a=tbg I ran the experiments in #40370 (comment) on (essentially) this branch and everything passed. Going to run another five instances of mixed-headroom and headroom with this change to shake out anything else that I might've missed. Release note (general change): atomic replication changes are now enabled by default. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
I ran the experiments in
#40370 (comment)
on (essentially) this branch and everything passed.
Going to run another five instances of mixed-headroom and headroom with this
change to shake out anything else that I might've missed.
Release note (general change): atomic replication changes are now enabled by
default.