Move read queue to a more general "command" queue. by spencerkimball · Pull Request #81 · cockroachdb/cockroach

spencerkimball · 2014-09-24T22:42:42Z

Over the course of testing distributed transactions, I noticed many
unnecessary restarts on SERIALIZABLE transactions due to the fact
that we previously were always updating the range's timestamp cache
as soon as a read or a write command arrived. For reads or writes
which conflict with an ongoing transaction, the update to the timestamp
cache would happen regardless of whether the read or write succeeded.
In many cases, the read or write would fail on the conflict, so the
restart due to advancing timestamp cache ended up being unnecessary.

This change blends both read/write and read-only commands into a
single command queue, which operates similarly to the old read queue,
only now commands always wait on preceding commands which overlap the
same key(s). Read-only commands don't need to wait on other
read-only commands, so we exempt this case for better read concurrency.

This allows us to wait until after the command is executed to update
the timestamp cache, so we can avoid updating the cache on failures.

Over the course of testing distributed transactions, I noticed many unnecessary restarts on SERIALIZABLE transactions due to the fact that we previously were always updating the range's timestamp cache as soon as a read or a write command arrived. For reads or writes which conflict with an ongoing transaction, the update to the timestamp cache would happen regardless of whether the read or write succeeded. In many cases, the read or write would fail on the conflict, so the restart due to advancing timestamp cache ended up being unnecessary. This change blends both read/write and read-only commands into a single command queue, which operates similarly to the old read queue, only now commands always wait on preceding commands which overlap the same key(s). Read-only commands don't need to wait on other read-only commands, so we exempt this case for better read concurrency. This allows us to wait until after the command is executed to update the timestamp cache, so we can avoid updating the cache on failures.

spencerkimball · 2014-09-24T22:42:59Z

@cockroachdb/developers

spencerkimball · 2014-09-24T22:45:06Z

storage/command_queue.go

There're far fewer lines actually changed in this PR than github is showing. I "git mv" these two files from read_queue.go and read_queue_test.go, but it's not properly showing that in this diff. I'll highlight the actual non-naming and comment changes below for easier reviewing.

Another problem I discovered with SERIALIZABLE transactions was unnecessary restarts with a pattern like: read("a"), write("a"). The issue was that the read would put a timestamp in the timestamp cache for "a". Then the write would arrive for the same transaction and be forced to increment the timestamp it found there (you're not supposed to change past history). However, there's no reason to do the increment if the most recent entry in the timestamp cache was the result of the current transaction. The addition of the txn ID (well md5 of it anyway) to the timestamp cache lets us ignore max entries which our own transaction was responsible for. Also, changed "high water timestamp" in the timestamp cache to the more accurate "low water timestamp".

tbg · 2014-09-25T05:08:24Z

proto/data.go

txnMD5[:md5.Size]

tbg · 2014-09-25T05:16:28Z

LGTM

bdarnell · 2014-09-25T15:19:48Z

proto/data.go

Why not just use the ID as-is? What happens if there is a collision?

The ID as is is large-ish, since we suffix a UUID to the first byte key affected by the transaction. It's probably going to be about 100 bytes. Vs. this which is only adding 16 bytes. It'll mean saving a multiple (maybe 2-3x) on the size of this cache, which I expect to be relatively large when summed over all ranges--perhaps 120M with txn ID (assuming system is using transactions heavily) vs. 40-50M with this MD5 of txn ID.

Collisions would be fairly bad. It would mean possibly writing to an earlier timestamp after a reader had read the previous value at a later timestamp. Rewriting history, in other words. But the odds of a collision with a 16 byte md5 digest are infintesimal...like 1 in a billion trillion or something, given the size of the timestamp cache.

The odds of a random md5 collision are infinitesimal; I'm worried about the ability of a client to construct deliberate collisions. If the UUID is generated server-side then we should be fine, but if it's client-generated then it's possible for the client to select two keys with colliding md5s (and a common prefix so they end up in the same range) and then append the same UUID.

The UUID is generated server-side, but the client could ignore it and supply its own txn ID. There are lots of other ways that an adversarial client could muck things up in the keyspace it can write to. The exact same broken guarantees resulting from a duplicate txn ID, could be more simply achieved by not advancing the timestamp between requests, etc. Clients won't munge the system if they're badly behaved; they'll just munge their own guarantees.

bdarnell · 2014-09-25T15:44:51Z

storage/timestamp_cache.go

If memory is a concern it would be a bit better to leave this as the original [md5.Size]byte instead of converting to a string (which takes up a little extra space and a separate allocation for the GC to consider)

Cool. That's what I'll do.

Move read queue to a more general "command" queue.

We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, #185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()

…tion_change logging location change

…ation

spencerkimball assigned bdarnell Sep 24, 2014

spencerkimball reviewed Sep 24, 2014
View reviewed changes

tbg reviewed Sep 25, 2014
View reviewed changes

bdarnell reviewed Sep 25, 2014
View reviewed changes

Address Tobias' feedback comments.

f41effc

bdarnell reviewed Sep 25, 2014
View reviewed changes

Respond to ben's comments

7ff9385

spencerkimball added a commit that referenced this pull request Sep 26, 2014

Merge pull request #81 from cockroachdb/spencerkimball/command-queue

9de5f19

Move read queue to a more general "command" queue.

spencerkimball merged commit 9de5f19 into master Sep 26, 2014

spencerkimball deleted the spencerkimball/command-queue branch September 26, 2014 18:40

soniabhishek pushed a commit to soniabhishek/cockroach that referenced this pull request Feb 15, 2017

Merge pull request cockroachdb#81 from crowdflux/feature/logging_loca…

190c28c

…tion_change logging location change

pav-kv pushed a commit to pav-kv/cockroach that referenced this pull request Mar 5, 2024

Merge pull request cockroachdb#81 from etcd-io/skip-conf-change-valid…

9903aa6

…ation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move read queue to a more general "command" queue.#81

Move read queue to a more general "command" queue.#81
spencerkimball merged 4 commits intomasterfrom
spencerkimball/command-queue

spencerkimball commented Sep 24, 2014

Uh oh!

spencerkimball commented Sep 24, 2014

Uh oh!

spencerkimball Sep 24, 2014

Uh oh!

tbg Sep 25, 2014

Uh oh!

spencerkimball Sep 25, 2014

Uh oh!

tbg commented Sep 25, 2014

Uh oh!

bdarnell Sep 25, 2014

Uh oh!

spencerkimball Sep 26, 2014

Uh oh!

bdarnell Sep 26, 2014

Uh oh!

spencerkimball Sep 26, 2014

Uh oh!

bdarnell Sep 25, 2014

Uh oh!

spencerkimball Sep 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

spencerkimball commented Sep 24, 2014

Uh oh!

spencerkimball commented Sep 24, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tbg commented Sep 25, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants