kv/concurrency: never regress timestamp in lockTable on lock acquisition by nvb · Pull Request #46391 · cockroachdb/cockroach

nvb · 2020-03-21T03:22:51Z

Fixes #46290.
Fixes #43735.
Fixes #41425.
Fixes #43809.

This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly.

This matches the corresponding behavior in MVCC:

cockroach/pkg/storage/mvcc.go

Line 1631 in 92107b5

// This case occurs when we're writing a key twice within a

This leniency is needed for sequences of events like the following:

txn A acquires lock at epoch 1, ts 10
txn B pushes txn A to ts 20
txn B updates lock to ts 20
txn A restarts at ts 15 without noticing that it has been pushes
txn A re-acquires lock at epoch 2, ts 15
we hit the lock timestamp regression assertion

We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system.

Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion.

Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. This is a fairly large PR, but there's only a single line of non-testing code touched.

cockroach-teamcity · 2020-03-21T03:23:03Z

This change is

sumeerbhola

Reviewed 1 of 1 files at r2, 2 of 2 files at r3, 5 of 5 files at r4, 5 of 5 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):

		}
		l.holder.holder[durability].txn = txn
		l.holder.holder[durability].ts.Forward(ts)

can you add a code comment similar to the one in the PR description that justifies this ratcheting?
Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.

pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):

# Lock is reacquired at same epoch with lower timestamp. This is allowed,
# see TestMVCCHistories/put_out_of_order. The new sequence number is added
# but the timestamp is not regressed.

this probably answers my question about #46019

pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):

data: "k"/0.000000001,0 -> /BYTES/v3

# We're expecting v2 here.

s/v2/v3/

These still aren't great, but now they're at least all unique.

Fixes cockroachdb#42310. Now the test actually does what it says it will do. Release justification: testing only

Tests a scenario where a put operation of an older timestamp but a higher epoch comes after a put operation of a newer timestamp but an older epoch. The timestamp of the resulting intent remains equal to the higher timestamp - it does not regress. Transaction coordinators shouldn't issue this series of operations directly, but it is possible to create such a situation if the transaction is pushed. Release justification: testing only

This commit updates the TestConcurrencyManagerBasic framework to bind all `on-lock-acquired` and `on-lock-updated` commands to specific requests. This prevents misuse of these hooks. For instance, it ensures that latches are properly held before allowing the hooks to be executed (by nature of the guard being in guardsByReqName) and it ensures that the keys provided to the hooks agree with those in the request definition. Release justification: testing only

Fixes cockroachdb#46290. Fixes cockroachdb#43735. Fixes cockroachdb#41425. This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly. This matches the corresponding behavior in MVCC: https://github.com/cockroachdb/cockroach/blob/92107b551bbafe54fddb496442c590cb6feb5d65/pkg/storage/mvcc.go#L1631 This leniency is needed for sequences of events like the following: - txn A acquires lock at epoch 1, ts 10 - txn B pushes txn A to ts 20 - txn B updates lock to ts 20 - txn A restarts at ts 15 without noticing that it has been pushes - txn A re-acquires lock at epoch 2, ts 15 - we hit the lock timestamp regression assertion We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion. Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.

nvb

TFTR!

bors r=sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):

can you add a code comment similar to the one in the PR description that justifies this ratcheting?

Done.

Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.

Discussed below.

pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):

Previously, sumeerbhola wrote…

this probably answers my question about #46019

Yes, within an epoch, all writes should be written with increasing timestamps, but one of the writes could land after a lock is pushed to a higher timestamp. This test exercises this.

pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):

Previously, sumeerbhola wrote…

s/v2/v3/

Done.

craig · 2020-03-23T16:45:31Z

Build succeeded

GitHub CI (Cockroach)

Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.

47092: backupccl: stop including system db as complete in full cluster backup r=pbardea a=pbardea When we perform a full-cluster backup we were including the system database as a "complete database", however full cluster backup only backs up a subset of the tables in the system table. This led to bugs around taking several backups with revision history since we would look for changes in all table descriptors for system tables, but we should have only been looking for differences in system tables that were backed up. Fixes #47050. Release justification: fixes release blocker. Release note (bug fix): in some cases where system tables have changed, incremental, full-cluster BACKUPs with revision history were sometimes incorrectly disallowed. 47101: kv/concurrency: permit lock timestamp regression across durabilities r=nvanbenschoten a=nvanbenschoten Fixes #46526. Fixes #46779. Follow up to #46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. Co-authored-by: Paul Bardea <paul@pbardea.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.

nvb requested a review from sumeerbhola March 21, 2020 03:22

nvb mentioned this pull request Mar 21, 2020

release: 20.1.0-beta.3 #45994

Closed

24 tasks

sumeerbhola approved these changes Mar 23, 2020

View reviewed changes

nvb added 4 commits March 23, 2020 11:25

kv/concurrency: improve lockTable errors

94e7b96

These still aren't great, but now they're at least all unique.

storage: fix TestMVCCHistories/put_new_epoch_lower_sequence

4bb1d32

Fixes cockroachdb#42310. Now the test actually does what it says it will do. Release justification: testing only

danhhz mentioned this pull request Mar 23, 2020

roachtest: cdc/ledger/rangefeed=true failed #43809

Closed

nvb force-pushed the nvanbenschoten/lockTableAcquireUpgrade branch from bd331fd to 3069773 Compare March 23, 2020 15:55

nvb commented Mar 23, 2020

View reviewed changes

craig bot merged commit 504b999 into cockroachdb:master Mar 23, 2020

nvb deleted the nvanbenschoten/lockTableAcquireUpgrade branch March 23, 2020 19:54

irfansharif mentioned this pull request Mar 25, 2020

roachtest: scaledata/filesystem_simulator/nodes=3 failed #43273

Closed

This was referenced Apr 2, 2020

roachtest: cdc/tpcc-1000/rangefeed=true failed #46526

Closed

roachtest: cdc/tpcc-1000/rangefeed=true failed #46137

Closed

nvb mentioned this pull request Apr 6, 2020

kv/concurrency: permit lock timestamp regression across durabilities #47101

Merged

nvb mentioned this pull request Apr 7, 2020

release-20.1: kv/concurrency: permit lock timestamp regression across durabilities #47139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: never regress timestamp in lockTable on lock acquisition#46391

kv/concurrency: never regress timestamp in lockTable on lock acquisition#46391
craig[bot] merged 5 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTableAcquireUpgrade

nvb commented Mar 21, 2020 •

edited

Loading

Uh oh!

cockroach-teamcity commented Mar 21, 2020

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb left a comment

Uh oh!

craig bot commented Mar 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Mar 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Mar 21, 2020

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Mar 23, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvb commented Mar 21, 2020 •

edited

Loading