kv/tscache: implement timestamp cache serialization#118299
Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom Jan 31, 2024
Merged
kv/tscache: implement timestamp cache serialization#118299craig[bot] merged 3 commits intocockroachdb:masterfrom
craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
This was the only caller and the separation of concerns did not seem meaningful. Release note: None
This commit refactors sklPage.lookupTimestampRange to scan using a visitor function. This will be used in a subsequent commit to implement serialization of an sklPage. Release note: None
Informs cockroachdb#61986. This commit adds a new `Serialize` function to `tscache.Cache` implementations. This serialization uses the `readsummary/rspb.Segment` representation added in 0950a1e. Serialization of the production `sklImpl` uses the Segment merging logic added in e4fc6f1 in order to merge together a partial serializations of each individual `sklPage` in the data structure. Timestamp cache serialization will be used to address cockroachdb#61986. Release note: None
Member
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jan 29, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
arulajmani
approved these changes
Jan 31, 2024
Collaborator
arulajmani
left a comment
There was a problem hiding this comment.
Sorry for the long turnaround.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 7 of 7 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/tscache/cache_test.go line 898 at r3 (raw file):
}) // Clear the cache and re-populate it from the serialized spans.
Nice test!
Contributor
Author
|
TFTR! bors r+ |
Contributor
|
Build succeeded: |
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jan 31, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Feb 1, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
craig bot
pushed a commit
that referenced
this pull request
Feb 1, 2024
115746: kv: log on excessive latch hold duration r=lyang24 a=lyang24 This commit aims to help observability by logging request holding latches over threshold. long_latch_hold_duration is a new cluster setting that is introduced to set the latch holding time threshold, latches held over the threshold will be logged at most every second. To achieve logging spanlatch.manager now contains a pointer to cluster setting. Fixes: #114609 Release note: None 118300: kv: ship high-resolution tscache summaries during lease transfers and range merges r=nvanbenschoten a=nvanbenschoten Fixes #61986. Fixes #117486. Unblocks #118000. This commit uses the new `tscache.Cache.Serialize` method introduced in #118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors. Co-authored-by: lyang24 <lanqingy@usc.edu> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Informs #61986.
This commit adds a new
Serializefunction totscache.Cacheimplementations. This serialization uses thereadsummary/rspb.Segmentrepresentation added in 0950a1e. Serialization of the productionsklImpluses the Segment merging logic added in e4fc6f1 in order to merge together a partial serializations of each individualsklPagein the data structure.Timestamp cache serialization will be used to address #61986.
Release note: None