Push Txns & Resolve Intents on Conflicts#72
Conversation
Automatically push transactions on write intent failures and resolve conflicting write intents. This is done at the level of Store.ExecuteCmd on command failure. Write/write conflicts are pushed and resolved but then must still return the write intent error to the client so the client can retry with a new client command ID, ensuring idempotence. This way, all original requests (the one which hit the conflict) will always return a write intent error via the response cache. Follow-on requests with the new client command ID can succeed, and with new ClientCmdID, will always succeed on retries. Read/write conflicts are automatically retried instead of returning the write intent error, because they don't have the same idempotency issues with client command ID. Reorg'd a bit of code between store and range to move range-specific checks into range (addition of Range.AddCmd method). Also, added an ability to submit commands to Raft without having to always wait for completion. This is used to "fire and forget" resolve intent cmds. Making sure we cleanup entries pending in the "read queue" is slightly tricky in Range.addReadWriteCmd. Moved writeIntentError and writeTooOldError from mvcc.go to the proto/ subdirectory as they're communicated across RPCs from server to clients. Add the conflicting Key and also a Resolved flag to WriteIntentError. This allows the client to decide whether to backoff (if resolve failed) or retry immediately.
|
@bdarnell @tschottdorf @cockroachdb/developers |
storage/range.go
Outdated
There was a problem hiding this comment.
A log.V(1) is a pretty weak form of "surfaced somewhere"; I'd make this at least a log.Info. I might also log at V(1) even if wait is true.
There was a problem hiding this comment.
I changed this to a warning and added a V(1) info in Range.executeCmd to always log execution of commands.
|
LGTM |
storage/store.go
Outdated
There was a problem hiding this comment.
we may add a log here to see how frequently it may happen.
We may potentially resubmit the command here immediately without a new ClientCmdID to optimize the latency. (need to work around the respCache)
There was a problem hiding this comment.
I added a V(1) log for the method as a whole. I figured we can get a bit fancier with optimizations in future changes. Right now, wanted to make sure we keep our idempotency guarantees. We'll need to start optimizing all of this at some point, but let's do it right, using appropriate benchmarks, and avoid premature optimization.
|
LGTM |
…intents Push Txns & Resolve Intents on Conflicts
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.
engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235)
storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, #185, 0x1)
server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0)
cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343)
cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0)
main main.go:37 main()
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.
engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235)
storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, #185, 0x1)
server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0)
cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343)
cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0)
main main.go:37 main()
…onversion Changes to get the correct start time and end time for random wait ca…
Automatically push transactions on write intent failures and resolve
conflicting write intents. This is done at the level of
Store.ExecuteCmd on command failure. Write/write conflicts are
pushed and resolved but then must still return the write intent
error to the client so the client can retry with a new client command
ID, ensuring idempotence. This way, all original requests (the one
which hit the conflict) will always return a write intent error via
the response cache. Follow-on requests with the new client command
ID can succeed, and with new ClientCmdID, will always succeed on
retries. Read/write conflicts are automatically retried instead of
returning the write intent error, because they don't have the same
idempotency issues with client command ID.
Reorg'd a bit of code between store and range to move range-specific
checks into range (addition of Range.AddCmd method). Also, added an
ability to submit commands to Raft without having to always wait for
completion. This is used to "fire and forget" resolve intent cmds.
Making sure we cleanup entries pending in the "read queue" is slightly
tricky in Range.addReadWriteCmd.
Moved writeIntentError and writeTooOldError from mvcc.go to the proto/
subdirectory as they're communicated across RPCs from server to clients.
Add the conflicting Key and also a Resolved flag to WriteIntentError.
This allows the client to decide whether to backoff (if resolve failed)
or retry immediately.