Skip to content

[DNM] sql/kv: introduce dynamic staleness reads with staleness bounds#62239

Closed
nvb wants to merge 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/boundedStalenessReads
Closed

[DNM] sql/kv: introduce dynamic staleness reads with staleness bounds#62239
nvb wants to merge 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/boundedStalenessReads

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Mar 19, 2021

This commit introduces a form of single-row dynamic staleness reads with
user-provided staleness bounds. It does so through the introduction of a
new with_max_staleness function that can be used in an AS OF SYSTEM
TIME clause to configure a single-statement, read-only transaction to
use a dynamic staleness timestamp.

When this function is used, Cockroach chooses the newest timestamp
within the staleness bound that allows execution of the reads at the
closest available replica without blocking. If such a read is not
satisfiable from the closest available replica without exceeding the
staleness bound, the query is rejected, though we could also consider
redirecting to other replicas.

This is just a prototype to get the end-to-end functionality hooked up.
The biggest piece missing here is any query plan validation to actually
enforce the single-row limitation we desire here. Without enforcing a
single-row limitation, this form of read can serve an inconsistent
snapshot across different rows.

bounded.staleness.demo.mov

This commit introduces a form of single-row dynamic staleness reads with
user-provided staleness bounds. It does so through the introduction of a
new `with_max_staleness` function that can be used in an AS OF SYSTEM
TIME clause to configure a single-statement, read-only transaction to
use a dynamic staleness timestamp.

When this function is used, Cockroach chooses the newest timestamp
within the staleness bound that allows execution of the reads at the
closest available replica without blocking. If such a read is not
satisfiable from the closest available replica without exceeding the
staleness bound, the query is rejected, though we could also consider
redirecting to other replicas.

This is just a prototype to get the end-to-end functionality hooked up.
The biggest piece missing here is any query plan validation to actually
enforce the single-row limitation we desire here. Without enforcing a
single-row limitation, this form of read can serve an inconsistent
snapshot across different rows.
@nvb nvb requested a review from a team March 19, 2021 07:28
@nvb nvb requested a review from a team as a code owner March 19, 2021 07:28
@nvb nvb requested review from miretskiy and removed request for a team March 19, 2021 07:28
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb removed request for a team and miretskiy March 19, 2021 07:29
nvb added a commit to nvb/cockroach that referenced this pull request Jun 3, 2021
Bounded staleness reads are a form of historical read-only queries that use a
dynamic, system-determined timestamp, subject to a user-provided staleness
bound, to read from nearby replicas while minimizing data staleness. They
provide a new way to perform follower reads off local replicas to minimize query
latency in multi-region clusters.

Bounded staleness reads complement CockroachDB's existing mechanism for
performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md)
and later adopted in [this RFC](20181227_follower_reads_implementation.md).
This original form of follower reads is more precisely classified as an
exact stateless read, meaning that the read occurs at a statically chosen
timestamp, regardless of the state of the system.

Exact staleness and bounded staleness reads can exist side-by-side, as there are
trade-offs between the two in terms of cost, staleness, and applicability. In
general, bounded staleness reads are more powerful because they minimize
staleness while being tolerant to variable replication lag, but they come at the
expense of being more costly and usable in fewer places.

Bounded staleness queries are limited in use to single-statement read-only
queries, and only a subset of read-only queries at that. They will be accessed
in the same way as exact bounded staleness reads - through a pair of new
functions that can be passed to an `AS OF SYSTEM TIME` clause:
- `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)`
- `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)`

The approach discussed in this RFC has a prototype in
cockroachdb#62239 which, while not identical
to what is proposed here, is similar and demonstrates the high-level changes
that are needed to support bounded staleness reads.
nvb added a commit to nvb/cockroach that referenced this pull request Jun 14, 2021
Bounded staleness reads are a form of historical read-only queries that use a
dynamic, system-determined timestamp, subject to a user-provided staleness
bound, to read from nearby replicas while minimizing data staleness. They
provide a new way to perform follower reads off local replicas to minimize query
latency in multi-region clusters.

Bounded staleness reads complement CockroachDB's existing mechanism for
performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md)
and later adopted in [this RFC](20181227_follower_reads_implementation.md).
This original form of follower reads is more precisely classified as an
exact staleness read, meaning that the read occurs at a statically chosen
timestamp, regardless of the state of the system.

Exact staleness and bounded staleness reads can exist side-by-side, as there are
trade-offs between the two in terms of cost, staleness, and applicability. In
general, bounded staleness reads are more powerful because they minimize
staleness while being tolerant to variable replication lag, but they come at the
expense of being more costly and usable in fewer places.

Bounded staleness queries are limited in use to single-statement read-only
queries, and only a subset of read-only queries at that. They will be accessed
in the same way as exact bounded staleness reads - through a pair of new
functions that can be passed to an `AS OF SYSTEM TIME` clause:
- `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)`
- `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)`

The approach discussed in this RFC has a prototype in
cockroachdb#62239 which, while not identical
to what is proposed here, is similar and demonstrates the high-level changes
that are needed to support bounded staleness reads.
nvb added a commit to nvb/cockroach that referenced this pull request Jun 22, 2021
Bounded staleness reads are a form of historical read-only queries that use a
dynamic, system-determined timestamp, subject to a user-provided staleness
bound, to read from nearby replicas while minimizing data staleness. They
provide a new way to perform follower reads off local replicas to minimize query
latency in multi-region clusters.

Bounded staleness reads complement CockroachDB's existing mechanism for
performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md)
and later adopted in [this RFC](20181227_follower_reads_implementation.md).
This original form of follower reads is more precisely classified as an
exact staleness read, meaning that the read occurs at a statically chosen
timestamp, regardless of the state of the system.

Exact staleness and bounded staleness reads can exist side-by-side, as there are
trade-offs between the two in terms of cost, staleness, and applicability. In
general, bounded staleness reads are more powerful because they minimize
staleness while being tolerant to variable replication lag, but they come at the
expense of being more costly and usable in fewer places.

Bounded staleness queries are limited in use to single-statement read-only
queries, and only a subset of read-only queries at that. They will be accessed
in the same way as exact bounded staleness reads - through a pair of new
functions that can be passed to an `AS OF SYSTEM TIME` clause:
- `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)`
- `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)`

The approach discussed in this RFC has a prototype in
cockroachdb#62239 which, while not identical
to what is proposed here, is similar and demonstrates the high-level changes
that are needed to support bounded staleness reads.
nvb added a commit to nvb/cockroach that referenced this pull request Jun 30, 2021
Bounded staleness reads are a form of historical read-only queries that use a
dynamic, system-determined timestamp, subject to a user-provided staleness
bound, to read from nearby replicas while minimizing data staleness. They
provide a new way to perform follower reads off local replicas to minimize query
latency in multi-region clusters.

Bounded staleness reads complement CockroachDB's existing mechanism for
performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md)
and later adopted in [this RFC](20181227_follower_reads_implementation.md).
This original form of follower reads is more precisely classified as an
exact staleness read, meaning that the read occurs at a statically chosen
timestamp, regardless of the state of the system.

Exact staleness and bounded staleness reads can exist side-by-side, as there are
trade-offs between the two in terms of cost, staleness, and applicability. In
general, bounded staleness reads are more powerful because they minimize
staleness while being tolerant to variable replication lag, but they come at the
expense of being more costly and usable in fewer places.

Bounded staleness queries are limited in use to single-statement read-only
queries, and only a subset of read-only queries at that. They will be accessed
in the same way as exact bounded staleness reads - through a pair of new
functions that can be passed to an `AS OF SYSTEM TIME` clause:
- `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)`
- `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)`

The approach discussed in this RFC has a prototype in
cockroachdb#62239 which, while not identical
to what is proposed here, is similar and demonstrates the high-level changes
that are needed to support bounded staleness reads.
craig bot pushed a commit that referenced this pull request Jun 30, 2021
66020: rfc: Bounded Staleness Reads r=nvanbenschoten a=nvanbenschoten

Bounded staleness reads are a form of historical read-only queries that use a dynamic, system-determined timestamp, subject to a user-provided staleness bound, to read from nearby replicas while minimizing data staleness. They provide a new way to perform follower reads off local replicas to minimize query latency in multi-region clusters.

Bounded staleness reads complement CockroachDB's existing mechanism for performing follower reads, which was originally proposed in [this RFC](20180603_follower_reads.md) and later adopted in [this RFC](20181227_follower_reads_implementation.md). This original form of follower reads is more precisely classified as an exact staleness read, meaning that the read occurs at a statically chosen timestamp, regardless of the state of the system.

Exact staleness and bounded staleness reads can exist side-by-side, as there are trade-offs between the two in terms of cost, staleness, and applicability. In general, bounded staleness reads are more powerful because they minimize staleness while being tolerant to variable replication lag, but they come at the expense of being more costly and usable in fewer places.

Bounded staleness queries are limited in use to single-statement read-only queries, and only a subset of read-only queries at that. They will be accessed in the same way as exact bounded staleness reads - through a pair of new functions that can be passed to an `AS OF SYSTEM TIME` clause:
- `SELECT ... FROM ... AS OF SYSTEM TIME with_min_timestamp(TIMESTAMP)`
- `SELECT ... FROM ... AS OF SYSTEM TIME with_max_staleness(INTERVAL)`

The approach discussed in this RFC has a prototype in #62239 which, while not identical to what is proposed here, is similar and demonstrates the high-level changes that are needed to support bounded staleness reads.

The RFC lays out a three-stage progression towards the generalized implementation of bounded staleness reads. We only intend to implement the first step in the v21.2 release.

67055: roachtest: add error tracing around gorm r=rafiss a=otan

Release note: None

Refs: #66825

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@nvb nvb closed this Sep 10, 2021
@nvb nvb deleted the nvanbenschoten/boundedStalenessReads branch September 14, 2021 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants