Skip to content

kv: implement server-side negotiation fast-path and Txn.NegotiateAndSend#68194

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/negotiateAndSend
Aug 17, 2021
Merged

kv: implement server-side negotiation fast-path and Txn.NegotiateAndSend#68194
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/negotiateAndSend

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jul 28, 2021

Fixes #67551.
Fixes #67552.
Fixes #67553.
Touches #67562.

Bounded-staleness read orchestration consists of two phases - negotiation and execution. Negotiation determines the timestamp to run the query at in order to ensure that the read will not block on replication or on conflicting transactions. Execution then uses this timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation fast-path. This fast-path allows a bounded staleness read request that lands on a single range to perform its negotiation phase and execution phase in a single RPC.

The server-side negotiation fast-path provides two benefits:

  1. it avoids two network hops in the common-case where a bounded staleness read is targeting a single range. This in an important performance optimization for single-row point lookups.
  2. it provides stronger guarantees around minimizing staleness during bounded staleness reads. Bounded staleness reads that hit the server-side fast-path use their target replica's most up-to-date resolved timestamp, so they are as fresh as possible. Bounded staleness reads that miss the fast-path and perform explicit negotiation (see below) consult a cache, so they may use an out-of-date, suboptimal resolved timestamp, as long as it is fresh enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the (*Txn).NegotiateAndSend method detailed in the bounded staleness RFC. NegotiateAndSend is a specialized version of Send that is capable of orchestrating a bounded-staleness read through a transaction, given a read-only BatchRequest with a min_timestamp_bound set in its Header. If the method returns successfully, the transaction will have been given a fixed timestamp equal to the timestamp that the read-only request was evaluated at.

@nvb nvb requested review from a team, aliher1911, otan and tbg July 28, 2021 19:00
@nvb nvb requested a review from a team as a code owner July 28, 2021 19:00
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch 3 times, most recently from 7ac1c02 to 8b290a8 Compare July 29, 2021 03:56
@tbg tbg removed their request for review July 29, 2021 15:54
Copy link
Copy Markdown
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me (can build on top of this), but can't verify the KV stuff as i'm not familiar enough with it all

}

func (e *MinTimestampBoundUnsatisfiableError) Error() string {
return e.message(nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a pgcode to this? or is there a place up the stack we usually add a pgcode to this kind of thing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a way that I'm aware of to attach pgcodes to KV-layer errors. Instead, we have a translation layer in pkg/sql/row. See ConvertBatchError (which is growing a bit in #68042).

@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch from 8b290a8 to 9b4e491 Compare August 2, 2021 02:54
@blathers-crl blathers-crl bot requested a review from otan August 2, 2021 02:54
nvb added a commit to nvb/cockroach that referenced this pull request Aug 2, 2021
This commit strengthens the guarantees of the sidetransportAccess, which
now promises that the returned timestamp from its get method will never
regress across sequential calls. This is made possible by retaining two
closed timestamps in the access - a `cur` closed timestamp that is known
to be applied and an optional `next` closed timestamp that is not yet
applied. Using two timestamps ensures that we never lose information
about an applied closed timestamp, so we can make the stronger guarantee
to users of the abstraction that returned timestamp will always increase
monotonically across consecutive calls.

This was causing flakiness in tests added in cockroachdb#68194. The property-based
tests in that PR were asserting (in one form or another) that after a
`QueryResolvedTimestamp` request returns a resolved timestamp, a
subsequent read-only request at that timestamp will never block on
replication (i.e. the closed timestamp) or on conflicting transactions
when executed. These tests were very rarely finding this not to be the
case. The reason for this was that it was possible for the
`sidetransportAccess` to regress across two sequential calls.
nvb added a commit to nvb/cockroach that referenced this pull request Aug 2, 2021
This commit strengthens the guarantees of the sidetransportAccess, which
now promises that the returned timestamp from its get method will never
regress across sequential calls. This is made possible by retaining two
closed timestamps in the access - a `cur` closed timestamp that is known
to be applied and an optional `next` closed timestamp that is not yet
applied. Using two timestamps ensures that we never lose information
about an applied closed timestamp, so we can make the stronger guarantee
to users of the abstraction that returned timestamp will always increase
monotonically across consecutive calls.

This was causing flakiness in tests added in cockroachdb#68194. The property-based
tests in that PR were asserting (in one form or another) that after a
`QueryResolvedTimestamp` request returns a resolved timestamp, a
subsequent read-only request at that timestamp will never block on
replication (i.e. the closed timestamp) or on conflicting transactions
when executed. These tests were very rarely finding this not to be the
case. The reason for this was that it was possible for the
`sidetransportAccess` to regress across two sequential calls.
@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch 2 times, most recently from 5c86cac to f2d0f09 Compare August 2, 2021 17:36
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 2, 2021

@otan I appended a third commit here which adds a new min_timestamp_bound_strict flag to the BatchRequest.Header struct. This flag specifies whether a bounded staleness read whose min_timestamp_bound cannot be satisfied by the first replica it visits without blocking should be rejected with a MinTimestampBoundUnsatisfiableError or will be redirected to the leaseholder and permitted to block on conflicting transactions. You'll need this to implement the nearest_only option as well as to implement the descriptor iteration logic that steps back across table descriptor versions.

@aliher1911 I assigned you as a reviewer on this (just the last 3 commits). Let me know if you'd like to walk through this together at some point. I'm happy to explain the code or the context surrounding this change.

@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch from f2d0f09 to b1ffa97 Compare August 2, 2021 18:15
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 3, 2021

@otan I added one more commit here with the max_timestamp_bound field we had talked about.

nvb added a commit to nvb/cockroach that referenced this pull request Aug 4, 2021
This commit asserts that a transaction has not been used to read or to
write by the time that `SetFixedTimestamp` is called on it.

This was extracted from cockroachdb#68194 and modified to return an error from
`SetFixedTimestamp` on misuse instead of fatal-ing. This provides a
sufficient, temporary backstop for cockroachdb#68216 until the conn executor logic
is fixed:

```
root@127.0.0.1:26257/movr> create table t (x int);
CREATE TABLE

root@127.0.0.1:26257/movr> insert into t values (1);
INSERT 1

root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t;
   crdb_internal_mvcc_timestamp  | x
---------------------------------+----
  1628094563935439000.0000000000 | 1
(1 row)

root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string;
BEGIN

root@127.0.0.1:26257/movr  OPEN> select * from t;
  x
-----
(0 rows)

root@127.0.0.1:26257/movr  OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000;
ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1()
runtime/asm_amd64.s:1371: goexit()

HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

root@127.0.0.1:26257/? ERROR>
```
nvb added a commit to nvb/cockroach that referenced this pull request Aug 4, 2021
This commit asserts that a transaction has not been used to read or to
write by the time that `SetFixedTimestamp` is called on it.

This was extracted from cockroachdb#68194 and modified to return an error from
`SetFixedTimestamp` on misuse instead of fatal-ing. This provides a
sufficient, temporary backstop for cockroachdb#68216 until the conn executor logic
is fixed:

```
root@127.0.0.1:26257/movr> create table t (x int);
CREATE TABLE

root@127.0.0.1:26257/movr> insert into t values (1);
INSERT 1

root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t;
   crdb_internal_mvcc_timestamp  | x
---------------------------------+----
  1628094563935439000.0000000000 | 1
(1 row)

root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string;
BEGIN

root@127.0.0.1:26257/movr  OPEN> select * from t;
  x
-----
(0 rows)

root@127.0.0.1:26257/movr  OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000;
ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1()
runtime/asm_amd64.s:1371: goexit()

HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

root@127.0.0.1:26257/? ERROR>
```
craig bot pushed a commit that referenced this pull request Aug 5, 2021
67866: sql: implement SQL Stats flush logic r=Azhng a=Azhng

Previous PR: #67805
Next Chained PR: #67090

## First Commit

sql: remove `count` from stmt/txn stats system table

Previously, system.statement_statistics and
system.transaction_statistics table includes a `count` column
that corresponds to `roachpb.StatementStatistics.Count` and
`roachpb.TransactionStatistics.Count` fields respectively.
The objective for that column is to make
`INSERT ON CONFLICT DO UPDATE` style query easy. However,
since early prototyping have shown that
`INSERT ON CONFLICT DO UPDATE` style statement is quite inefficient,
the SQL Stats flush mechanism will be implemented using
separate queries INSERT and UPDATE statements.
This column is no longer userful and it would require special handling.
Removing this column simplifies the flush logic and removes the
need for special handlings.

Release note (sql change): count column is removed from
 system.statement_statistics and system.transaction_statistics
 tables.

## Second Commit

sql: implement persistedsqlstats flush logic

This commit implements the initial flush logic of the
persisted sql stats subsystem.

Release note: None

68426: kv: assert txn unused in SetFixedTimestamp r=nvanbenschoten a=nvanbenschoten

This commit asserts that a transaction has not been used to read or to
write by the time that `SetFixedTimestamp` is called on it.

This was extracted from #68194 and modified to return an error from
`SetFixedTimestamp` on misuse instead of fatal-ing. This provides a
sufficient, temporary backstop for #68216 until the conn executor logic
is fixed:

```
root@127.0.0.1:26257/movr> create table t (x int);
CREATE TABLE

root@127.0.0.1:26257/movr> insert into t values (1);
INSERT 1

root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t;
   crdb_internal_mvcc_timestamp  | x
---------------------------------+----
  1628094563935439000.0000000000 | 1
(1 row)

root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string;
BEGIN

root@127.0.0.1:26257/movr  OPEN> select * from t;
  x
-----
(0 rows)

root@127.0.0.1:26257/movr  OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000;
ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1()
runtime/asm_amd64.s:1371: goexit()

HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

root@127.0.0.1:26257/? ERROR>
```

68442: kv: include RangeID in rangefeed goroutine stacks r=nvanbenschoten a=nvanbenschoten

This commit includes the RangeID in each of a rangefeed processor and
its registations' associated goroutine stacks. This is a cheap and easy
way to get better observability into the ranges that have active
rangefeeds. It also tells us where those goroutines are spending their
time.

This will also become easier to use in Go 1.17, which improved the
format of stack traces.

68443: parser: add VIRTUAL syntax to help r=RaduBerinde a=rafiss

Release note: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch from 7fc008e to b8fa893 Compare August 5, 2021 20:53
sajjadrizvi pushed a commit to sajjadrizvi/cockroach that referenced this pull request Aug 10, 2021
This commit asserts that a transaction has not been used to read or to
write by the time that `SetFixedTimestamp` is called on it.

This was extracted from cockroachdb#68194 and modified to return an error from
`SetFixedTimestamp` on misuse instead of fatal-ing. This provides a
sufficient, temporary backstop for cockroachdb#68216 until the conn executor logic
is fixed:

```
root@127.0.0.1:26257/movr> create table t (x int);
CREATE TABLE

root@127.0.0.1:26257/movr> insert into t values (1);
INSERT 1

root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t;
   crdb_internal_mvcc_timestamp  | x
---------------------------------+----
  1628094563935439000.0000000000 | 1
(1 row)

root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string;
BEGIN

root@127.0.0.1:26257/movr  OPEN> select * from t;
  x
-----
(0 rows)

root@127.0.0.1:26257/movr  OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000;
ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1()
runtime/asm_amd64.s:1371: goexit()

HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

root@127.0.0.1:26257/? ERROR>
```
craig bot pushed a commit that referenced this pull request Aug 10, 2021
68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten

Fixes #67513.

Please ignore the first two commits, which are from #68026.

In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. 

A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned.

This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock.

The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released.

Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.

@jordanlewis I'm adding you for the review, but feel free to re-assign.

68313: kv: avoid regressions in closed timestamp sidetransportAccess r=nvanbenschoten a=nvanbenschoten

This commit strengthens the guarantees of the `sidetransportAccess`, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - a `cur` closed timestamp that is known to be applied and an optional `next` closed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls.

This was causing flakiness in tests added in #68194. The property-based tests in that PR were asserting (in one form or another) that after a `QueryResolvedTimestamp` request returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for the `sidetransportAccess` to regress across two sequential calls.

cc. @andreimatei, but not assigning because you're away.

68553: util/log: start garbage collection routines for old log files r=knz a=rauchenstein

This fixes a regression.  The call the routine that deletes old log
files when max-group-size is configured was incorrectly removed in a
refactor, so that functionality was disabled.  This change reinstates
it.

Release note (bug fix): File logs respect max-group-size configuration.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Jay Rauchenstein <rauchenstein@cockroachlabs.com>
@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch 2 times, most recently from 77e11c7 to 6c63a6a Compare August 10, 2021 23:09
@nvb nvb requested a review from aayushshah15 August 11, 2021 02:59
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 11, 2021

@aayushshah15 thanks for offering to give this a review. It's now rebased on master, so it no longer has any dependencies.

Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod that one question.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @nvanbenschoten, and @otan)


pkg/kv/txn.go, line 1129 at r8 (raw file):

	assert(!ba.IsLocking(), "batch must not be locking")
	assert(txn.typ == RootTxn, "txn must be root")
	assert(!txn.CommitTimestampFixed(), "txn commit timestamp must not be fixed")

Would it ever be possible for the SQL layer to set the max_timestamp_bound lower than a given min_timestamp_bound? is it worth asserting against that here?
I see that executeServerSideBoundedStalenessNegotiation could return a timestamp lower than the min_timestamp_bound in such a case.


pkg/server/node.go, line 1037 at r7 (raw file):

		ctx, newSpan = tracing.EnsureChildSpan(ctx, tr, opName)
		// Set the same span.kind tag as the gRPC interceptor.
		newSpan.SetTag(ext.SpanKindRPCServer.Key, ext.SpanKindRPCServer.Value)

OOC, was this needed to get the OnlyFollowerReads helper to work?

nvb added 2 commits August 16, 2021 17:57
Fixes cockroachdb#67551.
Fixes cockroachdb#67552.
Fixes cockroachdb#67553.
Touches cockroachdb#67562.

Bounded-staleness read orchestration consists of two phases -
negotiation and execution. Negotiation determines the timestamp to run
the query at in order to ensure that the read will not block on
replication or on conflicting transactions. Execution then uses this
timestamp to run the read request.

This commit implements the bounded staleness server-side negotiation
fast-path. This fast-path allows a bounded staleness read request that
lands on a single range to perform its negotiation phase and execution
phase in a single RPC.

The server-side negotiation fast-path provides two benefits:
1. it avoids two network hops in the common-case where a bounded
   staleness read is targeting a single range. This in an important
   performance optimization for single-row point lookups.
2. it provides stronger guarantees around minimizing staleness during
   bounded staleness reads. Bounded staleness reads that hit the
   server-side fast-path use their target replica's most up-to-date
   resolved timestamp, so they are as fresh as possible. Bounded
   staleness reads that miss the fast-path and perform explicit
   negotiation (see below) consult a cache, so they may use an
   out-of-date, suboptimal resolved timestamp, as long as it is fresh
   enough to satisfy the staleness bound of the request.

The commit then uses this new functionality to implement the
`(*Txn).NegotiateAndSend` method detailed in the bounded staleness RFC.
`NegotiateAndSend` is a specialized version of `Send` that is capable of
orchestrating a bounded-staleness read through a transaction, given a
read-only BatchRequest with a `min_timestamp_bound` set in its Header.
If the method returns successfully, the transaction will have been given
a fixed timestamp equal to the timestamp that the read-only request was
evaluated at.
This commit adds a `min_timestamp_bound_strict` flag to the BatchRequest
header. This flag specifies whether a bounded staleness read whose
min_timestamp_bound cannot be satisfied by the first replica it visits
(subject to routing_policy) without blocking should be rejected with a
MinTimestampBoundUnsatisfiableError or will be redirected to the
leaseholder and permitted to block on conflicting transactions. If the
flag is true, blocking is never permitted and users should be prepared
to handle MinTimestampBoundUnsatisfiableErrors. If the flag is false,
blocking is permitted and MinTimestampBoundUnsatisfiableErrors will
never be returned.

This will be used by SQL to implement the `nearest_only` bounded
staleness option described in the RFC. It will also be used to implement
the descriptor iteration logic that steps back across table descriptor
versions and only ever blocks on the earliest descriptor.
@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch from 6c63a6a to 04eb552 Compare August 16, 2021 22:16
@nvb nvb requested a review from a team as a code owner August 16, 2021 22:16
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @aliher1911, and @otan)


pkg/kv/txn.go, line 1129 at r8 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Would it ever be possible for the SQL layer to set the max_timestamp_bound lower than a given min_timestamp_bound? is it worth asserting against that here?
I see that executeServerSideBoundedStalenessNegotiation could return a timestamp lower than the min_timestamp_bound in such a case.

Good point. I added a check for this.


pkg/server/node.go, line 1037 at r7 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

OOC, was this needed to get the OnlyFollowerReads helper to work?

Yes, exactly. OnlyFollowerReads matches on sp.Tags["span.kind"] == "server", which we weren't setting on the local RPC-bypass.

This commit adds a `max_timestamp_bound` field to the BatchRequest
header. This field places an upper bound on the dynamically chosen
timestamp during a bounded staleness read. If the field is set and a
resolved timestamp over the request's key span is computed to be greater
than the maximum timestamp bound, the batch timestamp will be set to the
maximum timestamp bound instead, then the batch will be evaluated at this
timestamp, and it will be recorded in the BatchResponse Header.

This will be used by SQL to ensure that once a schema version is chosen
for a given table, bounded staleness reads on that table do not observe
data past its expiration time. Without this, it would be possible for
these reads to observe key-value state that is incoherent with the table
descriptor in use.
@nvb nvb force-pushed the nvanbenschoten/negotiateAndSend branch from 04eb552 to 2556ac6 Compare August 17, 2021 01:40
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 17, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 17, 2021

Build succeeded:

@craig craig bot merged commit dd82053 into cockroachdb:master Aug 17, 2021
@nvb nvb deleted the nvanbenschoten/negotiateAndSend branch August 28, 2021 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants