kvserver: rationalize timestamps in proposals #63045
kvserver: rationalize timestamps in proposals #63045craig[bot] merged 7 commits intocockroachdb:masterfrom
Conversation
d28ce5a to
aae1695
Compare
|
@nvanbenschoten I'm curious if you would have done something differently in the last commit other than reading the clock for |
bdarnell
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r2, 2 of 2 files at r3, 5 of 5 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 464 at r4 (raw file):
wts := cmd.raftCmd.ReplicatedEvalResult.WriteTimestamp if !wts.IsEmpty() && wts.LessEq(b.state.RaftClosedTimestamp) {
The preceding comments are now out of date.
We no longer have any kind of assertion on un-migrated ranges; is that a problem for the transition?
Using the existence of RaftClosedTimestamp as a proxy for "migrated to 21.1" is subtle; can we add a method with a name that indicates this more clearly?
pkg/kv/kvserver/replica_application_state_machine.go, line 498 at r4 (raw file):
b.maxTS.Forward(clockTS) } clockTS := cmd.replicatedResult().ClockSignal.UnsafeToClockTimestamp()
This method is documented as test-only. Shouldn't this use the TryTo version as above? (I'm not familiar with these changes to the hlc package)
pkg/kv/kvserver/replica_proposal.go, line 811 at r4 (raw file):
// side-effects that are to be replicated to all replicas. res.Replicated.IsLeaseRequest = ba.IsLeaseRequest() // Set the timestamp fields. If this range has not "migrated" to 21.1 (as
This means that you won't be able to do rolling upgrades from the previous 21.1 beta to this one. That's not great and it's pretty late in the cycle to be introducing a version barrier like this, although it's hard enough to do range-level migrations that we don't have another practical option if we want to get this in to 21.1.
pkg/kv/kvserver/replica_proposal.go, line 821 at r4 (raw file):
res.Replicated.WriteTimestamp = ba.WriteTimestamp() } else { res.Replicated.ClockSignal = r.store.Clock().Now()
What takes the else branch here? For lease transfers, should we be using the new lease's start time instead of the current time? For things other than lease transfers, do we need to include a clock signal at all?
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal.go, line 811 at r4 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
This means that you won't be able to do rolling upgrades from the previous 21.1 beta to this one. That's not great and it's pretty late in the cycle to be introducing a version barrier like this, although it's hard enough to do range-level migrations that we don't have another practical option if we want to get this in to 21.1.
Yeah... I think requiring stop-the-world for upgrades away from a beta is not the worst thing in the world, as long as the consequences of not doing that are not the worst. And in this case I think they, unfortunately, would be the worst - if a new node proposes a command without a WriteTimestamp to a combination of new and old nodes, the old ones will reject it and the new ones will apply it -> divergence.
I'll think what to do about it.
aae1695 to
f4c6328
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 464 at r4 (raw file):
The preceding comments are now out of date.
done
We no longer have any kind of assertion on un-migrated ranges; is that a problem for the transition?
We never really had the assertion on un-migrated ranges (it was never firing) because RaftClosedTimestamp is zero.
Using the existence of RaftClosedTimestamp as a proxy for "migrated to 21.1" is subtle; can we add a method with a name that indicates this more clearly?
done
pkg/kv/kvserver/replica_application_state_machine.go, line 498 at r4 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
This method is documented as test-only. Shouldn't this use the
TryToversion as above? (I'm not familiar with these changes to the hlc package)
gone
pkg/kv/kvserver/replica_proposal.go, line 821 at r4 (raw file):
What takes the else branch here?
Proposals corresponding to single EndTxns, inline DeleteRangeRequest, ClearRange, RevertRange, HeartbeatTxn, GC, PushTxn, RecoverTxn, RecomputeStats, Migrate.
For lease transfers, should we be using the new lease's start time instead of the current time?
Would that be better? From a clock sync perspective, it would be slightly worse because we're missing the opportunity to sync the clocks better. Are you thinking of code clarity or of the cost of reading the clock?
For things other than lease transfers, do we need to include a clock signal at all?
Propagating a clock signal never hurts... Other than the cost of reading the clock?
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal.go, line 811 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Yeah... I think requiring stop-the-world for upgrades away from a beta is not the worst thing in the world, as long as the consequences of not doing that are not the worst. And in this case I think they, unfortunately, would be the worst - if a new node proposes a command without a WriteTimestamp to a combination of new and old nodes, the old ones will reject it and the new ones will apply it -> divergence.
I'll think what to do about it.
I've went back to a single WriteTimestamp field, and used another field to discriminate "regular writes" from other misc requests.
f4c6328 to
6dc97cb
Compare
bdarnell
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal.go, line 821 at r4 (raw file):
Would that be better? From a clock sync perspective, it would be slightly worse because we're missing the opportunity to sync the clocks better. Are you thinking of code clarity or of the cost of reading the clock?
I'm thinking of code clarity - sending Now() is slightly better for clock sync purposes, but then why not send the ClockSignal=Now() field unconditionally? If it's dependent on the proposal type then it seems like it should be based on the semantics of the proposal instead of just a conditional inclusion of a current-time clock signal.
Anyway, I don't feel strongly about this. I'm fine with including the current time signal (even for proposals that write an intent). I don't think it's worth the brainpower to try to limit this to proposal types where it's strictly necessary or to provide the minimum timestamp required for correctness.
a743f34 to
4ac5c39
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r5, 9 of 9 files at r6, 2 of 2 files at r7, 9 of 9 files at r8, 10 of 10 files at r9, 2 of 2 files at r10, 2 of 2 files at r11.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
pkg/kv/kvserver/replica.go, line 579 at r11 (raw file):
tenantID roachpb.TenantID // Set when first initialized, not modified after // Historical information about the command that set the closed timestamp.
Is this last commit just for debugging? Or were you intending on landing it?
pkg/kv/kvserver/replica_application_state_machine.go, line 332 at r9 (raw file):
isIntentWrite := raftCmd.ReplicatedEvalResult.IsIntentWrite() wts := raftCmd.ReplicatedEvalResult.WriteTimestamp if isIntentWrite && !wts.IsEmpty() && wts.LessEq(*replicaState.GCThreshold) {
Are the changes here backward compatable? What would happen in a mixed v20.2/v21.1 cluster if a request has a timestamp before the GC threshold but is not an intent write? Wouldn't a v20.2 replica reject the request but the v21.1 node would apply it?
pkg/kv/kvserver/replica_application_state_machine.go, line 459 at r10 (raw file):
assertNoCmdClosedTimestampRegression
Did you mean to call this twice? Once here, and once in stageTrivialReplicatedEvalResult?
pkg/kv/kvserver/replica_application_state_machine.go, line 1022 at r10 (raw file):
// down assertions. func raftClosedTimestampAssertionsEnabled() bool { return envutil.EnvOrDefaultBool("COCKROACH_RAFT_CLOSEDTS_ASSERTIONS_ENABLED", true)
Pull this into a static variable so that it is only computed once. There's no need for the function.
pkg/kv/kvserver/replica_application_state_machine.go, line 1042 at r10 (raw file):
return errors.AssertionFailedf( "raft closed timestamp regression in cmd: %x; batch state: %s, command: %s, lease: %s, req: %s", cmd.idKey, existingClosed.String(), newClosed.String(), b.state.Lease, req)
Have you confirmed that these two assertion errors aren't redacting all of the interesting information? That would be a shame.
pkg/kv/kvserver/replica_application_state_machine.go, line 412 at r11 (raw file):
type closedTimestampSetterInfo struct { cmd *replicatedCmd
Will this play well with the memory recycling we perform in replicatedCmdBuf?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 113 at r8 (raw file):
// EvalResultFlags represent the bits in ReplicatedEvalResult.flags. enum EvalResultFlags {
ReplicatedEvalResultFlags?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 137 at r8 (raw file):
// flags indicates various attributes of the command. The interpretation of // the bits is given by EvalResultFlags. int64 flags = 6;
Should this be typed with EvalResultFlags?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 121 at r9 (raw file):
// intents (unless done outside of a txn or the command corresponds to a 1PC // batch). See ba.IsIntentWrite(). IntentWrite = 2;
nit: Do you mind adding the following to ensure that whoever adds the next flag recognizes that these variants are used in a bitfield?
// NextFlag = 4;
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 146 at r9 (raw file):
// the command against the GC threshold. For all other commands, the timestamp // is to be interpreted simply as a clock signal from the proposer - the field is // also used to bump the followers' clocks.
also
4ac5c39 to
6972805
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica.go, line 579 at r11 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this last commit just for debugging? Or were you intending on landing it?
Intending to land it... At least until we figure out the failures.
pkg/kv/kvserver/replica_application_state_machine.go, line 332 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are the changes here backward compatable? What would happen in a mixed v20.2/v21.1 cluster if a request has a timestamp before the GC threshold but is not an intent write? Wouldn't a v20.2 replica reject the request but the v21.1 node would apply it?
like we discussed, I've inverted the flag so that all proposals from 20.2 pass the check here and deterministic behavior is preserved in mixed 20.2/21.1 clusters is kept.
pkg/kv/kvserver/replica_application_state_machine.go, line 459 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
assertNoCmdClosedTimestampRegressionDid you mean to call this twice? Once here, and once in
stageTrivialReplicatedEvalResult?
It was intentional, but you're right - it's silly. I've remove the call in stageTrivialReplicatedEvalResult.
pkg/kv/kvserver/replica_application_state_machine.go, line 1022 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Pull this into a static variable so that it is only computed once. There's no need for the function.
done
pkg/kv/kvserver/replica_application_state_machine.go, line 1042 at r10 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Have you confirmed that these two assertion errors aren't redacting all of the interesting information? That would be a shame.
I've appended a commit to make the types involved not redact.
pkg/kv/kvserver/replica_application_state_machine.go, line 412 at r11 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Will this play well with the memory recycling we perform in
replicatedCmdBuf?
ah, I didn't know about that. Good catch. See now.
pkg/kv/kvserver/replica_proposal.go, line 821 at r4 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Would that be better? From a clock sync perspective, it would be slightly worse because we're missing the opportunity to sync the clocks better. Are you thinking of code clarity or of the cost of reading the clock?
I'm thinking of code clarity - sending Now() is slightly better for clock sync purposes, but then why not send the
ClockSignal=Now()field unconditionally? If it's dependent on the proposal type then it seems like it should be based on the semantics of the proposal instead of just a conditional inclusion of a current-time clock signal.Anyway, I don't feel strongly about this. I'm fine with including the current time signal (even for proposals that write an intent). I don't think it's worth the brainpower to try to limit this to proposal types where it's strictly necessary or to provide the minimum timestamp required for correctness.
In the meantime, I've moved back to a single timestamp field. It has different meanings for different requests... But I don't think it's worth introducing a new field.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 113 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
ReplicatedEvalResultFlags?
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 137 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this be typed with
EvalResultFlags?
I don't think so... Cause this is a bit map. Should it?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 121 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: Do you mind adding the following to ensure that whoever adds the next flag recognizes that these variants are used in a bitfield?
// NextFlag = 4;
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 146 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
also
done
6972805 to
49ab267
Compare
49ab267 to
8dda3ee
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal.go, line 849 at r16 (raw file):
// its lease. res.Replicated.WriteTimestamp = r.store.Clock().Now() res.Replicated.SetNonMVCC()
There is a problem here, caught by the version-upgrade test. We assign or not assign the new flag based on r.mu.state.MigratedToRaftClosedTimestamp() here; it's possible that "the migration" is in flight at this point, the range is not migrated at this point, but the command will be applied after the a preceding command does the migration. In that case, we might not be setting the flag when the application-side would like it to be set, and application fails because it appears to it that an "intent write" command is writing below the closed timestamp. I'm not sure what to do about it yet; I'm thinking we should push the setting of this flag inside the proposal buffer, and do it after the request was sequenced with preceding requests that might be doing the migration.
461d2ac to
8282ce8
Compare
andreimatei
left a comment
There was a problem hiding this comment.
I've backtracked on adding a field to proposals to discriminate IsIntentWrite commands from non-mvcc ones; it seems too hard to correctly make below-Raft code use it correctly. It seems that we need two flags in order to be properly backwards-compatible - one telling you about the character of a proposal and one telling you whether you can use the first flag.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)
nvb
left a comment
There was a problem hiding this comment.
Reviewed 16 of 16 files at r17, 9 of 9 files at r18, 1 of 1 files at r19, 4 of 4 files at r20, 8 of 8 files at r21, 8 of 8 files at r22, 6 of 6 files at r23.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 1041 at r21 (raw file):
} // Assert that the current command is not writing under the closed timestamp.
nit: consider moving this definition and call to this method up above the corresponding ones for assertNoCmdClosedTimestampRegression, as an additional subtle hint that this check does not take into consideration the new closed timestamp, just the previous one.
pkg/kv/kvserver/replica_application_state_machine.go, line 384 at r22 (raw file):
// stageTrivialReplicatedEvalResult. state kvserverpb.ReplicaState closedTimestampSetter closedTimestampSetterInfo
Give this guy a comment as well.
pkg/kv/kvserver/replica_application_state_machine.go, line 410 at r22 (raw file):
} // closedTimestampSetterInfo contains information about the command that last
Let's move this down to the bottom of this file.
pkg/kv/kvserver/replica_application_state_machine.go, line 417 at r22 (raw file):
// NOTE(andrei): We're going to hold on to the request after its application // is complete, which is illegal. But cloning it would be too expensive. We're // onlt using the save requests in case of assertions firing, so let's take
"only" "saved"
pkg/kv/kvserver/replica_application_state_machine.go, line 419 at r22 (raw file):
// onlt using the save requests in case of assertions firing, so let's take // the risk. req *roachpb.BatchRequest
I'm a little concerned that we're now going to leak the last BatchRequest run on each Range. On a node with 10k ranges and batches of 10 10KB writes, that's 1GB of leaked memory.
How much do we get out of recording the entire BatchRequest? Maybe we just record the lease, the lease index, and whether the batch was a lease request or not? And if it was a lease request, maybe the previous lease?
pkg/kv/kvserver/replica_application_state_machine.go, line 845 at r22 (raw file):
if cts := cmd.raftCmd.ClosedTimestamp; cts != nil && !cts.IsEmpty() { b.state.RaftClosedTimestamp = *cts b.closedTimestampSetter.leaseIdx = ctpb.LAI(cmd.leaseIndex)
Let's encapsulate this in a method on closedTimestampSetterInfo.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 130 at r20 (raw file):
// The timestamp at which this command is writing. Used to verify the validity // of the command against the GC threshold and to update the followers' // clocks. If the request that produced this command is not an IntentWrite
If we're using this field for a clock signal again, does the rename to write_timestamp make sense? Should we revert that and keep it as timestamp?
pkg/roachpb/data.go, line 1857 at r23 (raw file):
// String implements the fmt.Stringer interface. func (s LeaseSequence) String() string { return strconv.FormatInt(int64(s), 10)
Was this just useless?
8282ce8 to
b9500dc
Compare
c1dd16e to
07fd1ca
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 1041 at r21 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider moving this definition and call to this method up above the corresponding ones for
assertNoCmdClosedTimestampRegression, as an additional subtle hint that this check does not take into consideration the new closed timestamp, just the previous one.
I've moved up the definition, but the call order was intentional. If a command is both a regression and a write below the previous closed ts, I want the regression to take priority (since that probably explains the low write too).
pkg/kv/kvserver/replica_application_state_machine.go, line 384 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this guy a comment as well.
done
pkg/kv/kvserver/replica_application_state_machine.go, line 410 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's move this down to the bottom of this file.
done
pkg/kv/kvserver/replica_application_state_machine.go, line 417 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"only" "saved"
done
pkg/kv/kvserver/replica_application_state_machine.go, line 419 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm a little concerned that we're now going to leak the last BatchRequest run on each Range. On a node with 10k ranges and batches of 10 10KB writes, that's 1GB of leaked memory.
How much do we get out of recording the entire BatchRequest? Maybe we just record the lease, the lease index, and whether the batch was a lease request or not? And if it was a lease request, maybe the previous lease?
see now; I'm only saving lease requests, plus a flag for splits and merges. Will separately look into including the raw raft log in these assertion messages.
pkg/kv/kvserver/replica_application_state_machine.go, line 845 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's encapsulate this in a method on
closedTimestampSetterInfo.
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 130 at r20 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're using this field for a clock signal again, does the rename to
write_timestampmake sense? Should we revert that and keep it astimestamp?
I prefer write_timestamp; it's better for "usual" requests, and just as good IMO for the others.
pkg/roachpb/data.go, line 1857 at r23 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was this just useless?
mostly useless. It made the LeaseSequence format properly with "%s" (now a %s would result in an ugly message suggesting that you're using the wrong format for an int). Notice the diff below where a %s is changed to %d
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 9 files at r25, 3 of 4 files at r27, 1 of 2 files at r28, 1 of 2 files at r29, 7 of 7 files at r30.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 1358 at r30 (raw file):
// requests would be too expensive: cloning the request is expensive and also // requests can be large in memory. leaseReq *roachpb.RequestLeaseRequest
Do we need to clear this when the previous request was not a lease request?
pkg/kv/kvserver/replica_application_state_machine.go, line 1361 at r30 (raw file):
// split and merge are set if the request was an EndTxn with the respective // commit trigger set. split, merge bool
Same thing here. Should we be clearing these fields at some point?
It was lying about helping with lease protection. Release note: None
The ReplicatedEvalResult.Timestamp used to carry the read timestamp of the request that proposed the respective command. This patch renames it to WriteTimestamp and sets it accordingly. The field is used for two purposes: for bumping the clock on followers such that they don't have values above their clock, and for checking that we're not writing below the GC threshold. Both of these use cases actually want the write timestamps, so they were broken. The second use seems dubious to me (I don't think it's actually needed) but this patch does not take a position on it beyond adding a TODO. Beyond fixing the existing uses of this field, putting the write timestamp in every Raft command has the added benefit that we can use it to assert below Raft that nobody tries to write below the closed timestamp. Before this patch, only the leaseholder was able to do this assertion (because it was the only replica that had access to the write timestamp) and so the leaseholder had to do it as a non-deterministic error (and non-deterministic errors are nasty, even when they're assertion failures). This patch will help turn the assertion into a deterministic one in the future. In addition, by clarifying the meaning of this field, this patch opens the door for cockroachdb#62569 to muddy the field back and give it a special meaning for lease transfers - the lease start time. Also mentioning cockroachdb#55293 because this patch lightly touches on the interaction between requests and the GC threshold. Release note: None
This patch makes below-Raft code not freak out if the field is not set (i.e. it appears to be an empty timestamp). This is in order to leave the door open for future versions to not send the field any more, or at least send it only on some commands. Concretely, this patch allows the proposer to declare that particular request does not need to be checked against the GC threshold below Raft (whether *any* request actually needs to be checked below Raft is a question that we leave open). Release note: None
This patch reworks the ReplicatedEvalResult.WriteTimestamp field (*). Before this patch, WriteTimestamp was always coming from ba.WriteTimestamp(), which is either a transaction's write timestamp or, for non-txn requests, the batch's read timestamp or, for non-MVCC requests, some random clock value. Below Raft, the field is used for updating the followers' clocks, and also to check the request against the GC threshold. This patch sets the WriteTimestamp differently for IntentWrite requests than other requests: - for regular writes, the field remains ba.WriteTimestamp() - for other proposals, the field is a clock reading on the proposer Some requests (e.g. LeaseTransfers) need a clock signal to travel with their proposal, and now they get it (see cockroachdb#62569). [*] An alternative to split the field into two was considered, but it's hard to do now because of backwards compatibility. It can be done in the next release, though, because now all the uses of the WriteTimestamp field tolerate it being empty. Fixes cockroachdb#62569 Release note: None
In cockroachdb#62655 we see that there appears to be something wrong with the Raft closed timestamp. That issue shows an assertion failure about a command trying to write below a timestamp that was promised to have been closed by a previous command. This patch includes a little bit more info in that assertion (the current lease) and adds another two assertions: - an assertion that the closed timestamp itself does not regress. This assertion already existed in stageTrivialReplicatedEvalResult(), but that comes after the failure we've seen. The point of copying this assertion up is to ascertain whether we're screwing up by regressing the closed timestamp, or whether a particular request/command is at fault for not obeying the closed timestamp. - an assertion against closed ts regression when copying the replica state from a staging batch to the replica. All these assertions can be disabled with an env var if things go bad. Fixes cockroachdb#62765 Release note: None
This patch adds historical information to the assertion against closed timestamp regressions. We've seen that assertion fire in cockroachdb#61981. The replica now maintains info about what command last bumped the ClosedTimestamp. Release note: None
This patch makes some types survive log redacting. In particular, it ensures that the assertions added in previous patches don't get too redacted. Release note: None
07fd1ca to
3f38eb2
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
pkg/kv/kvserver/replica_application_state_machine.go, line 1358 at r30 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we need to clear this when the previous request was not a lease request?
done, good catch. I've added a wipe of the whole struct in record().
pkg/kv/kvserver/replica_application_state_machine.go, line 1361 at r30 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same thing here. Should we be clearing these fields at some point?
done
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 9 files at r32, 3 of 4 files at r34, 1 of 2 files at r35, 1 of 2 files at r36, 7 of 7 files at r37.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
andreimatei
left a comment
There was a problem hiding this comment.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
|
Build failed: |
andreimatei
left a comment
There was a problem hiding this comment.
curl: (7) Failed to connect to register.cockroachdb.com port 443: Connection timed out
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @tbg)
|
Build succeeded: |
see individual commits