Skip to content

kv/concurrency: never regress timestamp in lockTable on lock acquisition#46391

Merged
craig[bot] merged 5 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTableAcquireUpgrade
Mar 23, 2020
Merged

kv/concurrency: never regress timestamp in lockTable on lock acquisition#46391
craig[bot] merged 5 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTableAcquireUpgrade

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Mar 21, 2020

Fixes #46290.
Fixes #43735.
Fixes #41425.
Fixes #43809.

This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly.

This matches the corresponding behavior in MVCC:

// This case occurs when we're writing a key twice within a

This leniency is needed for sequences of events like the following:

  • txn A acquires lock at epoch 1, ts 10
  • txn B pushes txn A to ts 20
  • txn B updates lock to ts 20
  • txn A restarts at ts 15 without noticing that it has been pushes
  • txn A re-acquires lock at epoch 2, ts 15
  • we hit the lock timestamp regression assertion

We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system.

Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion.

Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. This is a fairly large PR, but there's only a single line of non-testing code touched.

@nvb nvb requested a review from sumeerbhola March 21, 2020 03:22
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb mentioned this pull request Mar 21, 2020
24 tasks
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, 2 of 2 files at r3, 5 of 5 files at r4, 5 of 5 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):

		}
		l.holder.holder[durability].txn = txn
		l.holder.holder[durability].ts.Forward(ts)

can you add a code comment similar to the one in the PR description that justifies this ratcheting?
Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.


pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):

# Lock is reacquired at same epoch with lower timestamp. This is allowed,
# see TestMVCCHistories/put_out_of_order. The new sequence number is added
# but the timestamp is not regressed.

this probably answers my question about #46019


pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):

data: "k"/0.000000001,0 -> /BYTES/v3

# We're expecting v2 here.

s/v2/v3/

nvb added 4 commits March 23, 2020 11:25
These still aren't great, but now they're at least all unique.
Fixes cockroachdb#42310.

Now the test actually does what it says it will do.

Release justification: testing only
Tests a scenario where a put operation of an older timestamp but a
higher epoch comes after a put operation of a newer timestamp but an
older epoch. The timestamp of the resulting intent remains equal to
the higher timestamp - it does not regress.

Transaction coordinators shouldn't issue this series of operations
directly, but it is possible to create such a situation if the
transaction is pushed.

Release justification: testing only
This commit updates the TestConcurrencyManagerBasic framework to bind
all `on-lock-acquired` and `on-lock-updated` commands to specific
requests. This prevents misuse of these hooks. For instance, it ensures
that latches are properly held before allowing the hooks to be executed
(by nature of the guard being in guardsByReqName) and it ensures that
the keys provided to the hooks agree with those in the request
definition.

Release justification: testing only
Fixes cockroachdb#46290.
Fixes cockroachdb#43735.
Fixes cockroachdb#41425.

This change adjusts the lockTable to be more lenient to the timestamp
of new lock acquisitions. Specifically, it lifts the restriction that
all calls to AcquireLock use monotonically increasing timestamps.
Instead, it properly handles apparent timestamp regressions by
ratcheting lock timestamps instead of replacing them directly.

This matches the corresponding behavior in MVCC:
  https://github.com/cockroachdb/cockroach/blob/92107b551bbafe54fddb496442c590cb6feb5d65/pkg/storage/mvcc.go#L1631

This leniency is needed for sequences of events like the following:
- txn A acquires lock at epoch 1, ts 10
- txn B pushes txn A to ts 20
- txn B updates lock to ts 20
- txn A restarts at ts 15 without noticing that it has been pushes
- txn A re-acquires lock at epoch 2, ts 15
- we hit the lock timestamp regression assertion

We see this frequently in CDC roachtests because the rangefeed
processor performs high-priority timestamp pushes on long-running
transactions. Outside of CDC, this is rare in our system.

Release note (bug fix): CDC no longer combines with long
running transactions to trigger an assertion.

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence
of events occurred. This was typically rare, but was much more
common when CDC was in use.
@nvb nvb force-pushed the nvanbenschoten/lockTableAcquireUpgrade branch from bd331fd to 3069773 Compare March 23, 2020 15:55
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=sumeerbhola

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):

can you add a code comment similar to the one in the PR description that justifies this ratcheting?

Done.

Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.

Discussed below.


pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):

Previously, sumeerbhola wrote…

this probably answers my question about #46019

Yes, within an epoch, all writes should be written with increasing timestamps, but one of the writes could land after a lock is pushed to a higher timestamp. This test exercises this.


pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):

Previously, sumeerbhola wrote…

s/v2/v3/

Done.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 23, 2020

Build succeeded

@craig craig bot merged commit 504b999 into cockroachdb:master Mar 23, 2020
@nvb nvb deleted the nvanbenschoten/lockTableAcquireUpgrade branch March 23, 2020 19:54
nvb added a commit to nvb/cockroach that referenced this pull request Apr 6, 2020
Fixes cockroachdb#46526.
Fixes cockroachdb#46779.
Follow up to cockroachdb#46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.
nvb added a commit to nvb/cockroach that referenced this pull request Apr 7, 2020
Fixes cockroachdb#46526.
Fixes cockroachdb#46779.
Follow up to cockroachdb#46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.
craig bot pushed a commit that referenced this pull request Apr 7, 2020
47092: backupccl: stop including system db as complete in full cluster backup r=pbardea a=pbardea

When we perform a full-cluster backup we were including the system
database as a "complete database", however full cluster backup only
backs up a subset of the tables in the system table.

This led to bugs around taking several backups with revision history
since we would look for changes in all table descriptors for system
tables, but we should have only been looking for differences in system
tables that were backed up.

Fixes #47050.

Release justification: fixes release blocker.
Release note (bug fix): in some cases where system tables have changed,
incremental, full-cluster BACKUPs with revision history were sometimes
incorrectly disallowed.

47101: kv/concurrency: permit lock timestamp regression across durabilities r=nvanbenschoten a=nvanbenschoten

Fixes #46526.
Fixes #46779.
Follow up to #46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.

Co-authored-by: Paul Bardea <paul@pbardea.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvb added a commit to nvb/cockroach that referenced this pull request Apr 7, 2020
Fixes cockroachdb#46526.
Fixes cockroachdb#46779.
Follow up to cockroachdb#46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants