[DNM] sql/kv: introduce dynamic staleness reads with staleness bounds#62239
Closed
nvb wants to merge 1 commit intocockroachdb:masterfrom
Closed
[DNM] sql/kv: introduce dynamic staleness reads with staleness bounds#62239nvb wants to merge 1 commit intocockroachdb:masterfrom
nvb wants to merge 1 commit intocockroachdb:masterfrom
Conversation
This commit introduces a form of single-row dynamic staleness reads with user-provided staleness bounds. It does so through the introduction of a new `with_max_staleness` function that can be used in an AS OF SYSTEM TIME clause to configure a single-statement, read-only transaction to use a dynamic staleness timestamp. When this function is used, Cockroach chooses the newest timestamp within the staleness bound that allows execution of the reads at the closest available replica without blocking. If such a read is not satisfiable from the closest available replica without exceeding the staleness bound, the query is rejected, though we could also consider redirecting to other replicas. This is just a prototype to get the end-to-end functionality hooked up. The biggest piece missing here is any query plan validation to actually enforce the single-row limitation we desire here. Without enforcing a single-row limitation, this form of read can serve an inconsistent snapshot across different rows.
Member
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jun 3, 2021
Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters. Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact stateless read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system. Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places. Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause: - `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)` - `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)` The approach discussed in this RFC has a prototype in cockroachdb#62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads.
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jun 14, 2021
Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters. Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact staleness read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system. Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places. Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause: - `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)` - `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)` The approach discussed in this RFC has a prototype in cockroachdb#62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads.
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jun 22, 2021
Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters. Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact staleness read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system. Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places. Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause: - `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)` - `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)` The approach discussed in this RFC has a prototype in cockroachdb#62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads.
nvb
added a commit
to nvb/cockroach
that referenced
this pull request
Jun 30, 2021
Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters. Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact staleness read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system. Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places. Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause: - `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)` - `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)` The approach discussed in this RFC has a prototype in cockroachdb#62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads.
craig bot
pushed a commit
that referenced
this pull request
Jun 30, 2021
66020: rfc: Bounded Staleness Reads r=nvanbenschoten a=nvanbenschoten Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters. Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact staleness read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system. Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places. Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause: - `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)` - `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)` The approach discussed in this RFC has a prototype in #62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads. The RFC lays out a three-stage progression towards the generalized implementation of bounded staleness reads. We only intend to implement the first step in the v21.2 release. 67055: roachtest: add error tracing around gorm r=rafiss a=otan Release note: None Refs: #66825 Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
This was referenced Jul 13, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a form of single-row dynamic staleness reads with
user-provided staleness bounds. It does so through the introduction of a
new
with_max_stalenessfunction that can be used in an AS OF SYSTEMTIME clause to configure a single-statement, read-only transaction to
use a dynamic staleness timestamp.
When this function is used, Cockroach chooses the newest timestamp
within the staleness bound that allows execution of the reads at the
closest available replica without blocking. If such a read is not
satisfiable from the closest available replica without exceeding the
staleness bound, the query is rejected, though we could also consider
redirecting to other replicas.
This is just a prototype to get the end-to-end functionality hooked up.
The biggest piece missing here is any query plan validation to actually
enforce the single-row limitation we desire here. Without enforcing a
single-row limitation, this form of read can serve an inconsistent
snapshot across different rows.
bounded.staleness.demo.mov