roachpb,kv,storage: unify OrigTimestamp and RefreshedTimestamp#42236
roachpb,kv,storage: unify OrigTimestamp and RefreshedTimestamp#42236craig[bot] merged 25 commits intocockroachdb:masterfrom
Conversation
895bc4d to
631f19e
Compare
0a28c77 to
343a2c9
Compare
andreimatei
left a comment
There was a problem hiding this comment.
All the renames are now done.
Reviewable seems to refuse to show commits in isolation (cause there's too many of them?). If you prefer, I can split it into multiple PRs. But the only large commit that's not mechanic is unify OrigTimestamp and RefreshedTimestamp.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
343a2c9 to
9ab31ec
Compare
nvb
left a comment
There was a problem hiding this comment.
💯 this is great! Thanks for doing the cleanup.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/internal/client/sender.go, line 344 at r1 (raw file):
// For backwards compatibility with 19.2, set the DeprecatedOrigTimestamp too (although // not really needed by this Mock sender).
Add a TODO.
pkg/kv/txn_interceptor_span_refresher.go, line 279 at r1 (raw file):
// timestamp. Returns whether the refresh was successful or not. func (sr *txnSpanRefresher) tryUpdatingTxnSpans( ctx context.Context, refreshTxn *roachpb.Transaction, refreshFrom hlc.Timestamp,
It's too bad that we lost the comment on this when we made it a param. I'm also not sure how I feel about getting this from the proto instead of sr.refreshedTimestamp. Why did you make that change.
Also, I think we should have a test for the behavior specified in:
// More importantly, reads that have happened since we've previously
// refreshed don't need to be checked below below the timestamp at which
// they've been read (which is the timestamp to which we've previously
// refreshed). Checking below that timestamp (like we would, for example,
// if we simply used txn.OrigTimestamp here), could cause false-positives
// that would fail the refresh.
pkg/roachpb/api.proto, line 1510 at r1 (raw file):
// RefreshRequest is arguments to the Refresh() method, which verifies // that no write has occurred since the a provided timestamp
"since the a provided"
pkg/roachpb/api.proto, line 1514 at r1 (raw file):
// updated according to the write parameter. A transaction must be // supplied with this request. If the key has been written more // recently than the provided txn timestamp, an error is returned
s/provided txn timestamp/provided "refresh from" timestamp/?
pkg/roachpb/api.proto, line 1517 at r1 (raw file):
// and the timestamp cache is not updated. // // The timestamp cache is updated to txn.read_timestamp, like it is for all
nit: wrap to same width as rest of comment.
pkg/roachpb/batch.go, line 41 at r1 (raw file):
// commit timestamp has been forwarded, so that all reads within a txn // observe the same snapshot of the database regardless of how the // provisional commit timestamp evolves..
Double period.
pkg/roachpb/data.go, line 1261 at r1 (raw file):
// specified in the supplied error can be retried at a refreshed timestamp to // avoid a client-side transaction restart. If true, returns a cloned, updated // Transaction object with the commit timestamp and refreshed timestamp set
"provisional commit timestamp"
pkg/roachpb/data.proto, line 334 at r1 (raw file):
// retry. bool commit_timestamp_fixed = 16; // The refreshed timestamp is the transaction's read timestamp. All reads are
This comment needs some love now that "refreshed timestamp" is no longer a thing.
pkg/storage/gc_queue_test.go, line 450 at r1 (raw file):
txn.ReadTimestamp = datum.ts txn.WriteTimestamp = datum.ts txn.DeprecatedOrigTimestamp = datum.ts
Why are we still providing this in tests? We shouldn't be requiring it to be set anywhere on the server or we won't be able to get rid of it in 20.2.
pkg/storage/batcheval/cmd_end_transaction.go, line 241 at r1 (raw file):
"programming error: epoch regression: %d", h.Txn.Epoch, )) } else if h.Txn.Epoch == reply.Txn.Epoch && reply.Txn.Timestamp.Less(h.Txn.OrigTimestamp) {
You could compare against h.Txn.MinTimestamp. I'm not sure that that's worth it though.
pkg/storage/batcheval/cmd_refresh.go, line 50 at r1 (raw file):
refreshFrom := args.RefreshFrom if refreshFrom.IsEmpty() { // Compatibility with 19.2 nodes, which didn't set the args.RefreshFrom field.
Leave a TODO to remove this with a description of when that is possible.
pkg/storage/batcheval/cmd_refresh.go, line 66 at r1 (raw file):
return result.Result{}, err } else if val != nil { if ts := val.Timestamp; !ts.Less(refreshFrom) {
No need for the batcheval: clarify comment about pessimization commit since it's completely removed by the end of this PR.
pkg/storage/batcheval/cmd_refresh_range.go, line 50 at r1 (raw file):
refreshFrom := args.RefreshFrom if refreshFrom.IsEmpty() { // Compatibility with 19.2 nodes, which didn't set the args.RefreshFrom field.
Leave a TODO to remove this with a description of when that is possible.
9ab31ec to
488dc28
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/internal/client/sender.go, line 344 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a TODO.
it's clear...
pkg/kv/txn_interceptor_span_refresher.go, line 279 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's too bad that we lost the comment on this when we made it a param. I'm also not sure how I feel about getting this from the proto instead of
sr.refreshedTimestamp. Why did you make that change.Also, I think we should have a test for the behavior specified in:
// More importantly, reads that have happened since we've previously // refreshed don't need to be checked below below the timestamp at which // they've been read (which is the timestamp to which we've previously // refreshed). Checking below that timestamp (like we would, for example, // if we simply used txn.OrigTimestamp here), could cause false-positives // that would fail the refresh.
Well, I had made the change because OrigTimestamp going away made things awkward for keeping track of what the "refresh from" should be. But you're right, it is better to use sr.refreshedTimestamp. I've now switched to using sr.refreshedTimestamp, but for that I had set that at the beginning of the txn (i.e. on the first batch that flows through) too. PTAL.
Spurred by this, I've also added a commit at the end cleaning up the protection we had against concurrent txn use. PTAL at that too.
The comment is now back. I will write the respective test tomorrow.
pkg/roachpb/api.proto, line 1510 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"since the a provided"
done
pkg/roachpb/api.proto, line 1514 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/provided txn timestamp/provided "refresh from" timestamp/?
done
pkg/roachpb/api.proto, line 1517 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: wrap to same width as rest of comment.
done
pkg/roachpb/batch.go, line 41 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Double period.
done
pkg/roachpb/data.go, line 1261 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"provisional commit timestamp"
done
pkg/roachpb/data.proto, line 334 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This comment needs some love now that "refreshed timestamp" is no longer a thing.
done
pkg/storage/gc_queue_test.go, line 450 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why are we still providing this in tests? We shouldn't be requiring it to be set anywhere on the server or we won't be able to get rid of it in 20.2.
Because we need to override what newTransaction() sets (through the regular txn ctor). In the ctor, setting the OrigTS is not gated by a cluster version cause I didn't take the trouble.
But this can all go in 20.2.
pkg/storage/batcheval/cmd_end_transaction.go, line 241 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You could compare against
h.Txn.MinTimestamp. I'm not sure that that's worth it though.
yeah... I'd leave it alone.
pkg/storage/batcheval/cmd_refresh.go, line 50 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Leave a TODO to remove this with a description of when that is possible.
it clear...
pkg/storage/batcheval/cmd_refresh.go, line 66 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need for the
batcheval: clarify comment about pessimizationcommit since it's completely removed by the end of this PR.
I'll leave it for the history
pkg/storage/batcheval/cmd_refresh_range.go, line 50 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Leave a TODO to remove this with a description of when that is possible.
but it's clear...
andreimatei
left a comment
There was a problem hiding this comment.
And thanks for the read!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
9cc1115 to
6d9aecb
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/txn_interceptor_span_refresher.go, line 279 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Well, I had made the change because OrigTimestamp going away made things awkward for keeping track of what the "refresh from" should be. But you're right, it is better to use
sr.refreshedTimestamp. I've now switched to usingsr.refreshedTimestamp, but for that I had set that at the beginning of the txn (i.e. on the first batch that flows through) too. PTAL.Spurred by this, I've also added a commit at the end cleaning up the protection we had against concurrent txn use. PTAL at that too.
The comment is now back. I will write the respective test tomorrow.
Added the test in the commit about "kv: optimize refresh timestamp ranges" commit.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 86 files at r1, 15 of 15 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/internal/client/sender.go, line 344 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
it's clear...
The point isn't that this isn't clear, it's that the TODO allows you/me/someone to search for TODO.*20.2 and find all the code we can cleanup right after the next release. I guess as long as you have once of these TODOs on the DeprecatedOrigTimestamp field then its fine.
pkg/kv/dist_sender_server_test.go, line 2890 at r2 (raw file):
// - someone else writes "a" @ 200 // - txn attempts to write "a" and is pushed to 200. The refresh succeeds. // - txn reads something that has a value in [100,200]. For example, "a", which
I think 200 is exclusive here, so below "as it would find b@200" also needs to change.
pkg/kv/txn_interceptor_span_refresher.go, line 169 at r1 (raw file):
ba.Txn.ReadTimestamp.Forward(largestRefreshTS) if !sr.appendRefreshSpans(ctx, ba, br) { // The refresh spans are out of date, return a generic client-side retry error.
What allowed us to get rid of this? The concurrent txn detection? Is that enough to detect concurrent use with dist sql flows?
pkg/kv/txn_interceptor_span_refresher.go, line 130 at r2 (raw file):
autoRetryCounter *metric.Counter // requestInFlight is set while a request is being processed by the next
Why is this detection in the span refresher? It seems like a good idea, but I don't see why it belongs here. I'd move it to the TxnCoordSender itself.
pkg/kv/txn_interceptor_span_refresher.go, line 162 at r2 (raw file):
} else if sr.refreshedTimestamp != batchReadTimestamp { // Sanity check: we're supposed to control the read timestamp. What we're // tracking in sr. refreshedTimestamp is not supposed to get out of sync
sr.refreshedTimestamp
pkg/storage/gc_queue_test.go, line 450 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Because we need to override what
newTransaction()sets (through the regular txn ctor). In the ctor, setting theOrigTSis not gated by a cluster version cause I didn't take the trouble.
But this can all go in 20.2.
Ack. Thanks for the explanation.
6d9aecb to
7ffd8f1
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/internal/client/sender.go, line 344 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The point isn't that this isn't clear, it's that the TODO allows you/me/someone to search for
TODO.*20.2and find all the code we can cleanup right after the next release. I guess as long as you have once of these TODOs on theDeprecatedOrigTimestampfield then its fine.
right - one TODO on DeprecatedOrigTimestamp to rule them all
pkg/kv/dist_sender_server_test.go, line 2890 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think 200 is exclusive here, so below "as it would find b@200" also needs to change.
I clarified that the txn is pushed to (200,1). And below it should have been a@200.
See now.
pkg/kv/txn_interceptor_span_refresher.go, line 169 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What allowed us to get rid of this? The concurrent txn detection? Is that enough to detect concurrent use with dist sql flows?
Yes to all the questions.
But as discussed, I've added yet another assertion error back here.
pkg/kv/txn_interceptor_span_refresher.go, line 130 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why is this detection in the span refresher? It seems like a good idea, but I don't see why it belongs here. I'd move it to the TxnCoordSender itself.
moved to the gatekeeper
pkg/kv/txn_interceptor_span_refresher.go, line 162 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
sr.refreshedTimestamp
done
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 86 files at r1, 6 of 6 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
d577744 to
baecca3
Compare
|
bors r+ |
|
Bors r+ |
Merge conflict |
The field is no longer necessary in 20.1. Release note: None
The OrigTimestamp, as it stands, is not a concept that anybody outside KV should understand. What people can understand is the timestamp at which the txn is reading (and they need to also understand that it can change). This is in preparation of the orig_timestamp proto field going away. Release note: None
Release note: None
... for programming errors. Release note: None
Release note: None
This test expected some retries and when it didn't get them it'd just get stuck. Now it'll fail better. Release note: None
Make it use subtests. Even though the subtests are not completely isolated from each other (they don't recreate all the state), it's still useful to run them in isolation. Release note: None
EndTransaction used to assert that the OrigTimestamp on the txn in the request was below the Timestamp from the existing txn record. I intend to remove OrigTimestamp (well, deprecate it), and I don't see how I can keep a similar check. Release note: None
An error message was seemingly not updated when we introduced RefreshedTimestamp. Release note: None
This patch deprecates the txn.OrigTimestamp field and elevates txn.RefreshedTimestamp to be the one and only field that dictates a transaction's read timestamp (modulo backwards compatibility). Before this change, the main field used to dictate the read timestamp was OrigTimestamp. And then code was ratcheting that up to RefreshedTimestamp, if set (although I think not very consistently, albeit I fixed some inconsistencies in prior commits). This is one too many timestamps. The point of having two of them was that it used to be necessary to maintain the timestamp that an epoch began with (hence the "orig" name), in addition to the latest refresh. The reasons for keeping that original timestamp around have gone away by now. So, this patch makes all the code use RefreshedTimestamp everywhere, with fallbacks to OrigTimestamp for compatibility. This will make it easy to delete OrigTimestamp in 20.2. RefreshedTimestamp is now also always set. The next commit will rename RefreshedTimestamp->ReadTimestamp, OrigTimestamp->DeprecatedOrigTimestamp and Meta.Timestamp->WriteTimestamp. The rename is kept separated to help the code review. Release note: None
Release note: None
Release note: None
Release note: None
Performed mechanically. Release note: None
Update all the tests to use ReadTimestamp, to match production. Release note: None
Release note: None
Since we now have a ReadTimestamp, let's also be more descriptive with the write timestamp. I've also considered CommitTimestamp, but, the field in question being fairly mutable, it seemed wrong (and there's also a txn.CommitTimestamp() method which fixes the commit timestamp). Release note: None
Release note: None
This patch detects concurrent use of a txn (i.e. sending a request while another one is in flight) and errors out. It's illegal to use a txn concurrently because of cockroachdb#25329. The txnSpanRefresher already had some detection for this, but it only applied after a refresh. This protection was quite subtle, awkward and misguided (it came from a time when we could use the txn concurrently for RETURNING NOTHING statements). In addition, a previous commit in this PR added an assertion that the read timestamp tracked by the refresher interceptor is in sync with the one that batches flowing through are asking for. That assertion could also be triggered by concurrent batches. This patch adds concurrency detection to the gatekeeper, with a clear error message. Release note: None
baecca3 to
6e1d241
Compare
|
bors r+ |
42236: roachpb,kv,storage: unify OrigTimestamp and RefreshedTimestamp r=andreimatei a=andreimatei This patch deprecates the txn.OrigTimestamp field and elevates txn.RefreshedTimestamp to be the one and only field that dictates a transaction's read timestamp (modulo backwards compatibility). Before this change, the main field used to dictate the read timestamp was OrigTimestamp. And then code was ratcheting that up to RefreshedTimestamp, if set (although I think not very consistently, albeit I fixed some inconsistencies in prior commits). This is one too many timestamps. The point of having two of them was that it used to be necessary to maintain the timestamp that an epoch began with (hence the "orig" name), in addition to the latest refresh. The reasons for keeping that original timestamp around have gone away by now. So, this patch makes all the code use RefreshedTimestamp everywhere, with fallbacks to OrigTimestamp for compatibility. This will make it easy to delete OrigTimestamp in 20.2. RefreshedTimestamp is now also always set. The next commit will rename RefreshedTimestamp->ReadTimestamp and OrigTimestamp->DeprecatedOrigTimestamp. The rename is kept separated to help the code review. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Build succeeded |
This field was left over in 20.1 only for compatibility with 19.2. History is in cockroachdb#42236 - the PR that deprecated the field. Release note: None
This field was left over in 20.1 only for compatibility with 19.2. History is in cockroachdb#42236 - the PR that deprecated the field. Release note: None
51077: importccl: add userfile import tests and import benchmark r=adityamaru a=adityamaru This change adds simple unit tests to ensure we can import from the new userfile ExternalStorage. It also adds a benchmark to compare the performance of importing from nodelocal against importing from userfile storage. Informs: #51222 51440: roachpb: remove txn.DeprecatedOrigTimestamp r=andreimatei a=andreimatei This field was left over in 20.1 only for compatibility with 19.2. History is in #42236 - the PR that deprecated the field. Release note: None 51481: sql: fix a bug with unset type metadata on some join processors r=yuzefovich a=rohany Fixes #51474. This commit fixes a bug where some joiner processors would not hydrate their input type metadata in certain cases, leading to panics at execution. Release note: None 51493: sql: remove filter from scanNode r=RaduBerinde a=RaduBerinde Remove the filter from scanNode. It doesn't provide any benefit anymore and it just causes confusion (e.g. soft limit handling). In addition, we want EXPLAIN to map to exec.Factory calls as much as possible; in this case we want the filter to be shown as a separate node. Release note (sql change): EXPLAIN (PLAN) now shows any filter on a scan as a separate `filter` node. 51513: backupccl: add a basic test with incremental backup on types r=dt a=rohany Work for #50321. As of now, only the `ALTER TYPE ... RENAME` command has been implemented, so add a basic test here for incremental backup picking up changes to the types name. Release note: None 51515: builtins: add num_nulls, num_nonnulls builtins r=jordanlewis a=jordanlewis Fixes #51507. Release note (sql change): added the num_nulls and num_nonnulls builtin functions, which count the number of arguments that are passed to it which are NULL or non-NULL respectively. Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
This patch deprecates the txn.OrigTimestamp field and elevates
txn.RefreshedTimestamp to be the one and only field that dictates a
transaction's read timestamp (modulo backwards compatibility).
Before this change, the main field used to dictate the read timestamp
was OrigTimestamp. And then code was ratcheting that up to
RefreshedTimestamp, if set (although I think not very consistently,
albeit I fixed some inconsistencies in prior commits). This is one too
many timestamps. The point of having two of them was that it used to be
necessary to maintain the timestamp that an epoch began with (hence the
"orig" name), in addition to the latest refresh. The reasons for keeping
that original timestamp around have gone away by now.
So, this patch makes all the code use RefreshedTimestamp everywhere,
with fallbacks to OrigTimestamp for compatibility. This will make it
easy to delete OrigTimestamp in 20.2. RefreshedTimestamp is now also
always set.
The next commit will rename RefreshedTimestamp->ReadTimestamp and
OrigTimestamp->DeprecatedOrigTimestamp. The rename is kept separated to
help the code review.
Release note: None