kv/concurrency: never regress timestamp in lockTable on lock acquisition#46391
Conversation
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, 2 of 2 files at r3, 5 of 5 files at r4, 5 of 5 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):
} l.holder.holder[durability].txn = txn l.holder.holder[durability].ts.Forward(ts)
can you add a code comment similar to the one in the PR description that justifies this ratcheting?
Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.
pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):
# Lock is reacquired at same epoch with lower timestamp. This is allowed, # see TestMVCCHistories/put_out_of_order. The new sequence number is added # but the timestamp is not regressed.
this probably answers my question about #46019
pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):
data: "k"/0.000000001,0 -> /BYTES/v3 # We're expecting v2 here.
s/v2/v3/
These still aren't great, but now they're at least all unique.
Fixes cockroachdb#42310. Now the test actually does what it says it will do. Release justification: testing only
Tests a scenario where a put operation of an older timestamp but a higher epoch comes after a put operation of a newer timestamp but an older epoch. The timestamp of the resulting intent remains equal to the higher timestamp - it does not regress. Transaction coordinators shouldn't issue this series of operations directly, but it is possible to create such a situation if the transaction is pushed. Release justification: testing only
This commit updates the TestConcurrencyManagerBasic framework to bind all `on-lock-acquired` and `on-lock-updated` commands to specific requests. This prevents misuse of these hooks. For instance, it ensures that latches are properly held before allowing the hooks to be executed (by nature of the guard being in guardsByReqName) and it ensures that the keys provided to the hooks agree with those in the request definition. Release justification: testing only
Fixes cockroachdb#46290. Fixes cockroachdb#43735. Fixes cockroachdb#41425. This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly. This matches the corresponding behavior in MVCC: https://github.com/cockroachdb/cockroach/blob/92107b551bbafe54fddb496442c590cb6feb5d65/pkg/storage/mvcc.go#L1631 This leniency is needed for sequences of events like the following: - txn A acquires lock at epoch 1, ts 10 - txn B pushes txn A to ts 20 - txn B updates lock to ts 20 - txn A restarts at ts 15 without noticing that it has been pushes - txn A re-acquires lock at epoch 2, ts 15 - we hit the lock timestamp regression assertion We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion. Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.
bd331fd to
3069773
Compare
nvb
left a comment
There was a problem hiding this comment.
TFTR!
bors r=sumeerbhola
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table.go, line 1167 at r5 (raw file):
can you add a code comment similar to the one in the PR description that justifies this ratcheting?
Done.
Also, could the scenario described in #46019 also have resulted in timestamp regression -- say sequence 2 was at an earlier timestamp than sequence 3 and both are being replayed.
Discussed below.
pkg/kv/kvserver/concurrency/testdata/lock_table/lock_changes, line 36 at r5 (raw file):
Previously, sumeerbhola wrote…
this probably answers my question about #46019
Yes, within an epoch, all writes should be written with increasing timestamps, but one of the writes could land after a lock is pushed to a higher timestamp. This test exercises this.
pkg/storage/testdata/mvcc_histories/put_new_epoch_lower_sequence, line 50 at r2 (raw file):
Previously, sumeerbhola wrote…
s/v2/v3/
Done.
Build succeeded |
Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.
Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.
47092: backupccl: stop including system db as complete in full cluster backup r=pbardea a=pbardea When we perform a full-cluster backup we were including the system database as a "complete database", however full cluster backup only backs up a subset of the tables in the system table. This led to bugs around taking several backups with revision history since we would look for changes in all table descriptors for system tables, but we should have only been looking for differences in system tables that were backed up. Fixes #47050. Release justification: fixes release blocker. Release note (bug fix): in some cases where system tables have changed, incremental, full-cluster BACKUPs with revision history were sometimes incorrectly disallowed. 47101: kv/concurrency: permit lock timestamp regression across durabilities r=nvanbenschoten a=nvanbenschoten Fixes #46526. Fixes #46779. Follow up to #46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. Co-authored-by: Paul Bardea <paul@pbardea.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Fixes cockroachdb#46526. Fixes cockroachdb#46779. Follow up to cockroachdb#46391. This change adjusts the lockTable to allow lock timestamp regressions when necessary. Specifically, it allows a lock's timestamp as reported by getLockerInfo to regress if it is acquired at a lower timestamp and a different durability than it was previously held with. This is necessary to support because the hard constraint which we must uphold here that the lockHolderInfo for a replicated lock cannot diverge from the replicated state machine in such a way that its timestamp in the lockTable exceeds that in the replicated keyspace. If this invariant were to be violated, we'd risk infinite lock-discovery loops for requests that conflict with the lock as is written in the replicated state machine but not as is reflected in the lockTable. Lock timestamp regressions are safe from the perspective of other transactions because the request which re-acquired the lock at the lower timestamp must have been holding a write latch at or below the new lock's timestamp. This means that no conflicting requests could be evaluating concurrently. Instead, all will need to re-scan the lockTable once they acquire latches and will notice the reduced timestamp at that point, which may cause them to conflict with the lock even if they had not conflicted before. In a sense, it is no different than the first time a lock is added to the lockTable. I considered special-casing this logic to look at the new lock's durability and only allow the regression in the case that the new lock was replicated and instead forwarding the acquisition timestamp in the case that the new lock was unreplicated, but doing so seemed complex and not clearly worth it. The rest of the lock-table supports these lock timestamp regressions, so adding complexity to conditionally avoid the case for certain state transitions, based on the lock durabilities, didn't seem worthwhile. I'm happy to reconsider this decision. Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion with the text "lock timestamp regression". Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use.
Fixes #46290.
Fixes #43735.
Fixes #41425.
Fixes #43809.
This change adjusts the lockTable to be more lenient to the timestamp of new lock acquisitions. Specifically, it lifts the restriction that all calls to AcquireLock use monotonically increasing timestamps. Instead, it properly handles apparent timestamp regressions by ratcheting lock timestamps instead of replacing them directly.
This matches the corresponding behavior in MVCC:
cockroach/pkg/storage/mvcc.go
Line 1631 in 92107b5
This leniency is needed for sequences of events like the following:
We see this frequently in CDC roachtests because the rangefeed processor performs high-priority timestamp pushes on long-running transactions. Outside of CDC, this is rare in our system.
Release note (bug fix): CDC no longer combines with long running transactions to trigger an assertion.
Release justification: fixes a high-priority bug in existing functionality. The bug could crash a server if the right sequence of events occurred. This was typically rare, but was much more common when CDC was in use. This is a fairly large PR, but there's only a single line of non-testing code touched.