kv: use correct sequence number when scanning for conflicting intents by arulajmani · Pull Request #93175 · cockroachdb/cockroach

arulajmani · 2022-12-06T23:05:55Z

A read only request scans the lock table before it can proceed with dropping latches. It can only evaluate if no conflicting intents are found. While doing so, it also determines if the MVCC scan evaluation needs to consult intent history (by using the interleaved iterator).

The MVCC scan evaluation needs to consult intent history if we discover an intent by the transaction performing the read operation at a higher sequence number or a higher timestamp. The correct sequence numbers to compare here are those on the BatchRequest, and not on the transaction. Before this patch, we were using the sequence number on the transaction, which could lead us to wrongly conclude that the use of an intent interleaving iterator wasn't required.

Specifically, if the batch of the following construction was retried on the server:

b.Scan(a, e)
b.Put(b, "value")

The scan would end up (erroneously) reading "value" at key b.

As part of this patch, I've also renamed ScanConflictingIntents to ScanConflictingIntentsForDroppingLatchesEarly -- the function isn't as generalized as the old name would suggest.

Closes #92217
Closes #92189

Release note: None

cockroach-teamcity · 2022-12-06T23:06:05Z

This change is

arulajmani · 2022-12-06T23:11:54Z

@nvanbenschoten turns out this only happens when we're doing server side retries -- so the hypothesis in #92189 (comment) about why the transaction's sequence number is 2 doesn't quite hold. I'll run this through the debugger and try and figure out what's going on there.

nvb

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/storage/engine.go line 1687 at r1 (raw file):

// `txn` at the supplied `ts` are ignored.
//
// The caller must supply the sequence number of the batch request on behalf of

"the request", not "the batch request", that's what was getting us in trouble.

pkg/storage/engine.go line 1698 at r1 (raw file):

	ctx context.Context,
	reader Reader,
	txn *roachpb.Transaction,

To make this kind of mistake harder to make, should we switch to passing the txnID uuid.UUID? We can compare against uuid.Nil for the non-txn case.

arulajmani

Like we spoke about offline, I think I was too quick to dismiss the hypothesis in the comment linked above; it turns, out that was indeed why we were seeing sequence number 2 the second time around, when the batch was retried. I mistakenly convinced myself that this was happening on a server side retry, because that's what I expected to happen given our construction, but I was missing the splitting that happens in the DistSender to decompose the batch into 2. Given this was the issue, I added the lines to reset the Sequence on the transaction before returning it in the batch response, like we spoke about earlier today.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

pkg/storage/engine.go line 1698 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To make this kind of mistake harder to make, should we switch to passing the txnID uuid.UUID? We can compare against uuid.Nil for the non-txn case.

Done.

tbg · 2022-12-08T08:32:04Z

Please make sure to remove this in this PR:

cockroach/pkg/kv/kvnemesis/kvnemesis_test.go

Lines 184 to 187 in ec095bc

    
           // TODO(arul): remove this line when #92189 is addressed. 
        
           // 
        
           // See: https://github.com/cockroachdb/cockroach/issues/93164. 
        
           sqlutils.MakeSQLRunner(sqlDBs[0]).Exec(t, `SET CLUSTER SETTING kv.transaction.dropping_latches_before_eval.enabled = false`)

A read only request scans the lock table before it can proceed with dropping latches. It can only evaluate if no conflicting intents are found. While doing so, it also determines if the MVCC scan evaluation needs to consult intent history (by using the interleaved iterator). The MVCC scan evaluation needs to consult intent history if we discover an intent by the transaction performing the read operation at a higher sequence number or a higher timestamp. The correct sequence numbers to compare here are those on the `BatchRequest`, and not on the transaction. Before this patch, we were using the sequence number on the transaction, which could lead us to wrongly conclude that the use of an intent interleaving iterator wasn't required. Specifically, if the batch of the following construction was retried: ``` b.Scan(a, e) b.Put(b, "value") ``` The scan would end up (erroneously) reading "value" at key b. As part of this patch, I've also renamed `ScanConflictingIntents` to `ScanConflictingIntentsForDroppingLatchesEarly` -- the function isn't as generalized as the old name would suggest. Closes cockroachdb#92217 Closes cockroachdb#92189 Release note: None

We'd disabled dropping latches early to stabilize kvnemisis, this patch undoes that. Closes cockroachdb#93164 Release note: None

arulajmani · 2022-12-08T15:16:56Z

Added a commit to remove the lines that disabled dropping latches early in kvnemesis.

I also had to revert the reset Sequence on the transaction change -- turns out, we actually rely on this behaviour to ascertain if the transaction has performed any writes or not in the TxnCoordSender.

cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go

Lines 1404 to 1406 in a9fcbd1

    
           func (tc *TxnCoordSender) hasPerformedWritesLocked() bool { 
        
           	return tc.mu.txn.Sequence != 0 
        
           }

Will bors on green.

nvb

Reviewed 3 of 4 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

nvb · 2022-12-08T15:26:41Z

I also had to revert the reset Sequence on the transaction change -- turns out, we actually rely on this behaviour to ascertain if the transaction has performed any writes or not in the TxnCoordSender.

This feels potentially buggy to me. There's no guarantee that the coordinator received a response from a write request. I think we'd want to replace the tc.mu.txn.Sequence != 0 with tc.interceptorAlloc.txnPipeliner.hasAcquiredLocks() like we have in TxnCoordSender.Send.

arulajmani · 2022-12-08T16:11:31Z

This feels potentially buggy to me. There's no guarantee that the coordinator received a response from a write request. I think we'd want to replace the tc.mu.txn.Sequence != 0 with tc.interceptorAlloc.txnPipeliner.hasAcquiredLocks() like we have in TxnCoordSender.Send.

Yeah, this didn't seem great, given how we're relying on command evaluation to have written to this field and then making inferences up here. I'll file an issue about this and address it in a quick patch.

bors r=nvanbenschoten

craig · 2022-12-08T16:53:55Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-08T19:27:49Z

Build succeeded:

Bazel Essential CI (Cockroach)

arulajmani requested a review from nvb December 6, 2022 23:05

arulajmani requested review from a team as code owners December 6, 2022 23:05

nvb approved these changes Dec 7, 2022

View reviewed changes

arulajmani force-pushed the kvnemesis-92189 branch 2 times, most recently from c8837be to 6bbcc3a Compare December 8, 2022 04:53

arulajmani commented Dec 8, 2022

View reviewed changes

arulajmani force-pushed the kvnemesis-92189 branch from 6bbcc3a to bbecbf4 Compare December 8, 2022 15:09

kvnemesis: enable dropping latches early again

7a8de90

We'd disabled dropping latches early to stabilize kvnemisis, this patch undoes that. Closes cockroachdb#93164 Release note: None

nvb approved these changes Dec 8, 2022

View reviewed changes

craig bot merged commit eb02796 into cockroachdb:master Dec 8, 2022

arulajmani mentioned this pull request Dec 15, 2022

kvcoord: reconsider how the TxnCoordSender determines hasPerformedWrites #93738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: use correct sequence number when scanning for conflicting intents#93175

kv: use correct sequence number when scanning for conflicting intents#93175
craig[bot] merged 2 commits intocockroachdb:masterfrom
arulajmani:kvnemesis-92189

arulajmani commented Dec 6, 2022

Uh oh!

cockroach-teamcity commented Dec 6, 2022

Uh oh!

arulajmani commented Dec 6, 2022

Uh oh!

nvb left a comment

Uh oh!

arulajmani left a comment

Uh oh!

tbg commented Dec 8, 2022

Uh oh!

arulajmani commented Dec 8, 2022

Uh oh!

nvb left a comment

Uh oh!

nvb commented Dec 8, 2022

Uh oh!

arulajmani commented Dec 8, 2022

Uh oh!

craig bot commented Dec 8, 2022

Uh oh!

craig bot commented Dec 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arulajmani commented Dec 6, 2022

Uh oh!

cockroach-teamcity commented Dec 6, 2022

Uh oh!

arulajmani commented Dec 6, 2022

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

arulajmani left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented Dec 8, 2022

Uh oh!

arulajmani commented Dec 8, 2022

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Dec 8, 2022

Uh oh!

arulajmani commented Dec 8, 2022

Uh oh!

craig bot commented Dec 8, 2022

Uh oh!

craig bot commented Dec 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants