kv,storage: disable pipelining in merge transaction#27829
kv,storage: disable pipelining in merge transaction#27829craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
This approach seems fine, although it is unfortunate how much plumbing is required just to pass the disable flag all the way down. When I was thinking about this before I also considered a "meta request" that the
Is this true? Wouldn't the same reasoning that allows for lease transfers to avoid waiting out the max offset also apply here? |
It seems that on lease transfers the old leaseholder promises not to serve any more reads, even if those reads are beneath the timestamp of the new lease: cockroach/pkg/storage/replica_range_lease.go Lines 542 to 547 in 19ed9a3 @bdarnell @tschottdorf could one of you weigh in here? The "comments on the stasis period in IsLeaseValid" that supposedly explain all this don't exist, or at least I can't find them. |
bdarnell
left a comment
There was a problem hiding this comment.
I think those comments have migrated to (*Replica).leaseStatus. They don't address transfers directly but there's an explanation of the whole "lease stasis" concept. In effect, a lease transfer retroactively creates a stasis period for the timestamps prior to the transfer. However, we haven't implemented this exactly; we just use the blunter solution of disallowing all reads from a leaseholder that has transferred its lease away.
The same would apply to merges: the RHS could continue to serve reads with timestamps less than mergeTimestamp - maxOffset, but not at mergeTimestamp itself.
This all due to clock uncertainty and the fact that a read right before the end of a lease might need to return a ReadWithinUncertaintyIntervalError. There might be a subtle/clever solution involving communication between the LHS and RHS leaseholders that gives us certainty about the clock. Or maybe QueryIntent can be a special case that is allowed during stasis because it uses a timestamp at which a previous write was known to have occurred.
Reviewable status:
complete! 0 of 0 LGTMs obtained
|
Per discussion in #core, the disable-pipelining knob is the way to go. This is ready for review! |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 8 of 8 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/kv/txn_coord_sender_test.go, line 1729 at r1 (raw file):
} func TestTxnCoordSenderPipelining(t *testing.T) {
Add a comment here explaining what this is testing.
pkg/storage/replica_command.go, line 479 at r1 (raw file):
} // Intents have been placed, so the merge is now in its critical phase. Get
This can be reworked now! The comment is a bit stale, the TODO can be removed, and there's no need to even send the GetSnapshotForMergeRequest through the txn (if you want to switch that back).
| tc.mu.onFinishFn = onFinishFn | ||
| } | ||
|
|
||
| // DisablePipelining is part of the client.TxnSender interface. |
There was a problem hiding this comment.
Are there concerns with data races here when the txn is used concurrently? Maybe just something to comment on.
There was a problem hiding this comment.
Oh, looks like this is protected by a Mutex in the parent. Nevermind, I think.
There was a problem hiding this comment.
And it'll error out when not the first thing in the txn. Ok, we're probably good... :-)
It turns out that pipelining is fundamentally incompatible with merge
transactions. The extra QueryIntent requests generated by pipelining can
get routed to a RHS that is currently blocked for merging and wedge the
merge. Consider the following sequence of events:
1. The merge transaction begins and lays down intents on the meta2 and
local copy of the RHS range descriptor using asynchronous commits.
2. The RHS loses its lease.
3. The merge transaction sends a GetSnapshotForMergeRequest. The
pipeliner injects two QueryIntent requests, one to the meta2 range
descriptor and one to the local range descriptor on the RHS, as
non-transactional requests always flush the pipeline.
4. The QueryIntent request hits the RHS and triggers a lease
acquisition. The lease acquisition notices that the local range
descriptor has a deletion intent, infers that a merge is in
progress, and blocks all traffic, except for GetSnapshotForMerge
requests.
GetSnapshotForMerge requests are allowed to proceed even when the range
is blocked for merging but QueryIntent requests are not. We can't, in
general, allow QueryIntent requests through, as we might return stale
data if the merge has already committed and the LHS is processing
new writes. We could, I think, allow QueryIntents for the local range
descriptor to proceed, but that's a very-specific corner case and hard
to reason about.
Just disable pipelining in the merge transaction to sidestep the issue.
One other option: we could allow the RHS to serve reads up until the
commit timestamp of the merge transaction. That would allow the
GetSnapshotForMerge and QueryIntent requests from the merge transaction
through, as well as any other read traffic executing beneath the merge
transaction's timestamp. The downside is that the new leaseholder would
have to wait out the full clock offset after the merge txn committed to
serve any writes. (Just like the lease stasis period.) It was decided
that this option was too disruptive and complex to pursue for the time
being.
Fix cockroachdb#27820.
benesch
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/kv/txn_coord_sender.go, line 499 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
And it'll error out when not the first thing in the txn. Ok, we're probably good... :-)
Ack.
pkg/kv/txn_coord_sender_test.go, line 1729 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a comment here explaining what this is testing.
Done.
pkg/storage/replica_command.go, line 479 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This can be reworked now! The comment is a bit stale, the TODO can be removed, and there's no need to even send the
GetSnapshotForMergeRequestthrough thetxn(if you want to switch that back).
Done.
|
bors r=nvanbenschoten |
Build failed |
|
Flake. bors r=nvanbenschoten |
27829: kv,storage: disable pipelining in merge transaction r=nvanbenschoten a=benesch
It turns out that pipelining is fundamentally incompatible with merge
transactions. The extra QueryIntent requests generated by pipelining can
get routed to a RHS that is currently blocked for merging and wedge the
merge. Consider the following sequence of events:
1. The merge transaction begins and lays down intents on the meta2 and
local copy of the RHS range descriptor using asynchronous commits.
2. The RHS loses its lease.
3. The merge transaction sends a GetSnapshotForMergeRequest. The
pipeliner injects two QueryIntent requests, one to the meta2 range
descriptor and one to the local range descriptor on the RHS, as
non-transactional requests always flush the pipeline.
4. The QueryIntent request hits the RHS and triggers a lease
acquisition. The lease acquisition notices that the local range
descriptor has a deletion intent, infers that a merge is in
progress, and blocks all traffic, except for GetSnapshotForMerge
requests.
GetSnapshotForMerge requests are allowed to proceed even when the range
is blocked for merging but QueryIntent requests are not. We can't, in
general, allow QueryIntent requests through, as we might return stale
data if the merge has already committed and the LHS is processing
new writes. We could, I think, allow QueryIntents for the local range
descriptor to proceed, but that's a very-specific corner case and hard
to reason about.
Just disable pipelining in the merge transaction to sidestep the issue.
One other option: we could allow the RHS to serve reads up until the
commit timestamp of the merge transaction. That would allow the
GetSnapshotForMerge and QueryIntent requests from the merge transaction
through, as well as any other read traffic executing beneath the merge
transaction's timestamp. The downside is that the new leaseholder would
have to wait out the full clock offset after the merge txn committed to
serve any writes. (Just like the lease stasis period.) It was decided
that this option was too disruptive and complex to pursue for the time
being.
Fix #27820.
27920: stopper: Ensure that Closers and Ctxs are always cleaned up r=benesch a=nvanbenschoten
Before this change, the Stopper made no guarantee that `Closers`
registered by the `AddCloser` method or `Contexts` tied to
the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly
cleaned up. In cases where these methods were called concurrently
with the Stopper stopping, it was possible for the resources to
leak. This change fixes this by handling cases where these methods
are called concurrently with or after the Stopper has stopped. This
allows them to provide stronger external guarantees.
The PR also correctly synchronizes access to the `sCancels` map to
fix #27908.
Release note: None
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
It turns out that pipelining is fundamentally incompatible with merge
transactions. The extra QueryIntent requests generated by pipelining can
get routed to a RHS that is currently blocked for merging and wedge the
merge. Consider the following sequence of events:
The merge transaction begins and lays down intents on the meta2 and
local copy of the RHS range descriptor using asynchronous commits.
The RHS loses its lease.
The merge transaction sends a GetSnapshotForMergeRequest. The
pipeliner injects two QueryIntent requests, one to the meta2 range
descriptor and one to the local range descriptor on the RHS, as
non-transactional requests always flush the pipeline.
The QueryIntent request hits the RHS and triggers a lease
acquisition. The lease acquisition notices that the local range
descriptor has a deletion intent, infers that a merge is in
progress, and blocks all traffic, except for GetSnapshotForMerge
requests.
GetSnapshotForMerge requests are allowed to proceed even when the range
is blocked for merging but QueryIntent requests are not. We can't, in
general, allow QueryIntent requests through, as we might return stale
data if the merge has already committed and the LHS is processing
new writes. We could, I think, allow QueryIntents for the local range
descriptor to proceed, but that's a very-specific corner case and hard
to reason about.
Just disable pipelining in the merge transaction to sidestep the issue.
One other option: we could allow the RHS to serve reads up until the
commit timestamp of the merge transaction. That would allow the
GetSnapshotForMerge and QueryIntent requests from the merge transaction
through, as well as any other read traffic executing beneath the merge
transaction's timestamp. The downside is that the new leaseholder would
have to wait out the full clock offset after the merge txn committed to
serve any writes. (Just like the lease stasis period.) It was decided
that this option was too disruptive and complex to pursue for the time
being.
Fix #27820.