Skip to content

re-wire the response cache#3077

Merged
tbg merged 14 commits intocockroachdb:masterfrom
tbg:respcache
Nov 11, 2015
Merged

re-wire the response cache#3077
tbg merged 14 commits intocockroachdb:masterfrom
tbg:respcache

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Nov 10, 2015

removes roachpb.ClientCmdID and replay protection for non-transactional commands.
Instead, transactional commands are now cached on a per-ID response cache key. The cached value is the newly introduced sequence number which is increased whenever a new request is sent as part of the transaction (in particular, in client, DistSender's chunking sender and before Store-level retries).
Whenever a request encounters a cached sequence number greater than or equal to the transaction's, a TransactionRetryError is the result; the client will back of and come again.

In particular this fixes #2297 (since we don't return results, but simply an error) and #626 (since the response cache entries are now in 1:1 correspondence to transaction table entries, so #2062 is now the relevant issue covering this).

Commit-by-commit reviewing is suggested, though a casual look through the commit history probably makes sense to avoid comments invalidated by later changes.

Review on Reviewable

@tbg tbg changed the title Respcache re-wire the response cache Nov 10, 2015
@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 10, 2015

Reviewed 3 of 3 files at r1, 7 of 7 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 5 of 5 files at r5, 6 of 6 files at r6, 1 of 1 files at r7, 8 of 8 files at r8, 12 of 12 files at r9, 21 of 21 files at r10.
Review status: all files reviewed at latest revision, 12 unresolved discussions.


client/db_internal_test.go, line 37 [r10] (raw file):
include the numbers in the error


kv/txn_coord_sender.go, line 304 [r10] (raw file):
has a sequence number?


roachpb/batch.go, line 262 [r10] (raw file):
Hm, alright then


storage/client_split_test.go, line 304 [r5] (raw file):
include err?


storage/client_split_test.go, line 315 [r5] (raw file):
include err?


storage/engine/rocksdb/db.cc, line 186 [r2] (raw file):
This local can be removed now


storage/replica.go, line 1108 [r5] (raw file):
dont see the value in this rename


storage/replica.go, line 143 [r10] (raw file):
why not make this a struct instead of encoding all the stufff below?


storage/replica_test.go, line 1513 [r9] (raw file):
can't you do equality or type assertion here?


storage/response_cache.go, line 174 [r5] (raw file):
The timestamp on the value is just so it matches the other argument? this function signature is pretty strange.


storage/response_cache.go, line 77 [r9] (raw file):
this comment is way stale


storage/response_cache.go, line 91 [r9] (raw file):
it should be mentioned somewhere that txn sequence are 1-indexed


Comments from the review on Reviewable.io

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that cmdIDKeys are not used as rocksdb keys, I don't think we need to have a time prefix. They can just be random.

@bdarnell
Copy link
Copy Markdown
Contributor

LGTM

@petermattis
Copy link
Copy Markdown
Collaborator

So much code deletion!


Review status: all files reviewed at latest revision, 5 unresolved discussions.


storage/replica.go, line 907 [r10] (raw file):
Why even random? Seems like you could have a sequence number inside Replica. That seems preferable to a random number for debugging purposes.


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 10, 2015

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


kv/txn_coord_sender.go, line 304 [r10] (raw file):
nope, tracing uses only the transaction ID.


storage/replica.go, line 1108 [r5] (raw file):
if err := func(err) looks mildly confusing.


storage/replica.go, line 143 [r10] (raw file):
MultiRaft uses a string internally.


storage/replica_test.go, line 1513 [r9] (raw file):
no, the replica corruption error is package-internal, and this is in storage_test.


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 10, 2015

added some commits addressing comments, PTAL.

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 10, 2015

Reviewed 12 of 12 files at r11, 3 of 3 files at r12, 3 of 3 files at r13, 1 of 1 files at r14.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


roachpb/api.go, line 54 [r13] (raw file):
instead of a map, how about [...]string{? You'll need to put the sentinel in the array.


roachpb/api.go, line 506 [r13] (raw file):
use a buffer?


Comments from the review on Reviewable.io

@bdarnell
Copy link
Copy Markdown
Contributor

Review status: all files reviewed at latest revision, 3 unresolved discussions.


storage/replica.go, line 907 [r10] (raw file):
Export multiraft.commandIDLen instead of hard-coding 16 here. We can also change that constant to 8 now that our command IDs are smaller.

I'm not sure if it's safe to make this a per-Replica sequence. In multiraft the ids need to be unique per multiraft.group, which mostly correspond to Replicas but have slightly different lifetimes. I'd rather keep it random.


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 10, 2015

Review status: all files reviewed at latest revision, 2 unresolved discussions.


roachpb/api.go, line 54 [r13] (raw file):
that would work for flagToStr, but it would make for a rather awkward definition (the flags are 2^i).


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 10, 2015

PTAL baby one more time

@bdarnell
Copy link
Copy Markdown
Contributor

LGTM

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Nov 10, 2015

Reviewed 4 of 4 files at r15.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 10, 2015

Review status: all files reviewed at latest revision, 1 unresolved discussion.


storage/replica.go, line 907 [r10] (raw file):
Done.


Comments from the review on Reviewable.io

tbg added 13 commits November 10, 2015 18:48
in the near future, command IDs will be linked to transaction IDs,
so the transaction GC can take care of them.
due to the cached error, that's an infinite loop.
see cockroachdb#3067
* the response cache now contains only a sentinel, no actual data
* on a cache hit, returns a TransactionRetryError to the client

closes cockroachdb#2297 (since we simply have the transaction retry, we don't
have to worry about assembling the correct response).
old key schema: /prefix/<cmd_id> -> <response>
new key schema: /prefix/<txn_id> -> <sequence_id>

on a cache hit, if stored sequence_id >= txn.Sequence,
return TransactionRetryError.

txn.Sequence increases when retrying at the Store level,
and when sending a new chunk of batch.
tbg added a commit that referenced this pull request Nov 11, 2015
@tbg tbg merged commit 06ada33 into cockroachdb:master Nov 11, 2015
@tbg tbg deleted the respcache branch November 11, 2015 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CmdIDs and cross-range requests

4 participants