kv/concurrency: permit lock timestamp regression across durabilities by nvb · Pull Request #47101 · cockroachdb/cockroach

nvb · 2020-04-06T21:55:20Z

Fixes #46526.
Fixes #46779.
Follow up to #46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.

cockroach-teamcity · 2020-04-06T21:55:27Z

This change is

sumeerbhola

CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Just for my understanding -- does CDC push long-running transactions, and if yes can you point me to where that logic exists in the code?

Reviewed 5 of 5 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go, line 1182 at r1 (raw file):

		//  - txn A re-acquires lock at sequence 2, ts 15
		//
		// A lock's timestamp within a given durability level generally should

Isn't this a stronger statement:
A lock's timestamp at a given durability level is not allowed to regress, so ...

Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.

nvb

Just for my understanding -- does CDC push long-running transactions, and if yes can you point me to where that logic exists in the code?

Yes, this is performed by a rangefeed processor's TxnPusher, which is used by a txnPushAttempt task and implemented by rangefeedTxnPusher.

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table.go, line 1182 at r1 (raw file):

Previously, sumeerbhola wrote…

Isn't this a stronger statement:
A lock's timestamp at a given durability level is not allowed to regress, so ...

Done.

craig · 2020-04-07T16:30:43Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2020-04-07T18:12:03Z

Build succeeded

GitHub CI (Cockroach)

nvb requested a review from sumeerbhola April 6, 2020 21:55

nvb added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 6, 2020

sumeerbhola approved these changes Apr 7, 2020

View reviewed changes

nvb force-pushed the nvanbenschoten/lockTableUpdateRegress branch from 138a562 to 096252b Compare April 7, 2020 15:33

nvb commented Apr 7, 2020

View reviewed changes

craig bot merged commit 51488a0 into cockroachdb:master Apr 7, 2020

nvb mentioned this pull request Apr 7, 2020

release-20.1: kv/concurrency: permit lock timestamp regression across durabilities #47139

Merged

nvb deleted the nvanbenschoten/lockTableUpdateRegress branch April 8, 2020 05:37

nvb mentioned this pull request Apr 13, 2020

roachtest: cdc/tpcc-1000/rangefeed=true failed #47187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: permit lock timestamp regression across durabilities#47101

kv/concurrency: permit lock timestamp regression across durabilities#47101
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTableUpdateRegress

nvb commented Apr 6, 2020

Uh oh!

cockroach-teamcity commented Apr 6, 2020

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb left a comment

Uh oh!

craig bot commented Apr 7, 2020

Uh oh!

craig bot commented Apr 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Apr 6, 2020

Uh oh!

cockroach-teamcity commented Apr 6, 2020

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Apr 7, 2020

Build failed (retrying...)

Uh oh!

craig bot commented Apr 7, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants