RFC: range-local intent resolution#1873
Conversation
docs/RPCS/local_intent_resolution.md
Outdated
There was a problem hiding this comment.
We can disallow cross-coordinator transactions without adding a new field. The first request in a transaction is already special since it does not include a transaction ID. We could move the creation of the heartbeat goroutine from sendOne to maybeBeginTxn and return an error if we receive a txn ID that is not in our txns map.
If the sequence number is useful for other purposes then it may be worth including, but I'd like to avoid it if we can because it removes opportunities for parallelism (with a sequence number the only way to parallelize operations is to include them in the same batch; without a sequence number requests may be sent separately in parallel as long as the client is smart enough to merge all the transaction objects before calling EndTransaction).
There was a problem hiding this comment.
Good point. Happy to ditch the sequence number.
|
LGTM |
9e812b9 to
c5a5936
Compare
|
LGTM. This should reduce the mental complexity of our transaction model. |
|
Actually, I did have a question: won't this make it difficult to use transactions when consuming Cockroach over HTTP, in the case where external traffic goes through a load balancer? If your transaction spans multiple round-trips, I wouldn't think there is a guarantee that a VIP will send the individual requests to the same machine. I'm hardly an expert on the subject, but that seems like a relatively common configuration. |
|
Yes, this means that you won't be able to use load balancers that don't enforce stickiness. This may be a good thing, since that configuration would be wasteful anyway (since multiple coordinators would start heartbeat loops). If we wanted to, we could encode the coordinator's node ID in the |
ad1b029 to
d50a6ad
Compare
[ci skip]
|
@mrtracy yeah. Whenever that becomes an issue, there are ways out with some added complexity. For example, the client collecting the intents, or what Ben suggested. Heartbeating may need to be revisited when we make those changes, but it'll be doable. Added that to the comment. |
|
I finished disallowing coordinator-spanning transactions, but since conflict resolution in its current form actually starts coordinator-spanning Txns all the time, I have to untangle that first before I can post the progress for review. |
RFC: range-local intent resolution
a subgoal of cockroachdb#1873 is making sure that a transaction can only write on one coordinator (actually, the RFC proposes limiting reads to a single coordinator as well, but this has proven unnecessary). this change is carried out here. as a slight change from the RFC, we do add an additional flag to the proto.Transaction struct, which contains the information of whether previous requests in the Txn have written data (i.e. laid down intents). not doing that would have mandated giving up on the optimizations for read-only transactions since we would have had to rely on special- casing the first transactional request.
this passes all tests except for one (which I've disabled; maybe @bdarnell has an idea?). Basically carried out a lot of the work announced in RFC cockroachdb#1873: - coordinator sends intents along with EndTransaction - range resolves all it possibly can in the EndTransaction batch, and hands everything else off to an async resolver. - "skipped intents" passed up by the range_command functions are now executed even if the command returns an error. This is required for EndTransaction which finds out that it's been aborted. There are a lot of TODOs documented in the code, and I intend to address all of those that are relevant for correctness. Some parts of the RFC are also not yet implemented (gc'ing Txn entries), but the road should be very clear and I'll add those before merging. I did want to get this out early because it's a fairly complex refactoring and will only get bigger from here.
No description provided.