Skip to content

kv: implement commit-wait for future time transaction commits#61110

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/commitWait
Feb 27, 2021
Merged

kv: implement commit-wait for future time transaction commits#61110
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/commitWait

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Feb 25, 2021

Fixes #57687.
Related to #52745.

This PR introduces a "commit-wait" sleep stage after a transaction commits, which is entered if doing so is deemed necessary for consistency.

By default, commit-wait is only necessary for transactions that commit with a future-time timestamp that leads the local HLC clock. This is because CockroachDB's consistency model depends on all transactions waiting until their commit timestamp is below their gateway clock. In doing so, transactions ensure that at the time that they complete, all other clocks in the system (i.e. on all possible gateways) will be no more than the max_offset below the transaction's commit timestamp. This property ensures that all causally dependent transactions will have an uncertainty interval (see GlobalUncertaintyLimit) that exceeds the original transaction's commit timestamp, preventing stale reads. Without the wait, it would be possible for a read-write transaction to write a future-time value and then for a causally dependent transaction to read below that future-time value, violating "read your writes".

The property must also hold for read-only transactions, which may have a commit timestamp in the future due to an uncertainty restart after observing a future-time value in their uncertainty interval. In such cases, the property that the transaction must wait for the local HLC clock to exceed its commit timestamp is not necessary to prevent stale reads, but it is necessary to ensure monotonic reads. Without the wait, it would be possible for a read-only transaction coordinated on a gateway with a fast clock to return a future-time value and then for a causally dependent read-only transaction coordinated on a gateway with a slow clock to read below that future-time value, violating "monotonic reads".

In practice, most transactions do not need to wait at all, because their commit timestamps were pulled from an HLC clock (i.e. are not synthetic) and so they will be guaranteed to lead the local HLC's clock, assuming proper HLC time propagation. Only transactions whose commit timestamps were pushed into the future will need to wait, like those who wrote to a global_read range and got bumped by the closed timestamp or those who conflicted (write-read or write-write) with an existing future-time value.

However, CockroachDB also supports a stricter model of consistency through its "linearizable" flag. When in linearizable mode (also known as "strict serializable" mode), all writing transactions (but not read-only transactions) must wait an additional max_offset after committing to ensure that their commit timestamp is below the current HLC clock time of any other node in the system. In doing so, all causally dependent transactions are guaranteed to start with higher timestamps, regardless of the gateway they use. This ensures that all causally dependent transactions commit with higher timestamps, even if their read and writes sets do not conflict with the original transaction's. This obviates the need for uncertainty intervals and prevents the "causal reverse" anamoly which can be observed by a third, concurrent transaction.

For more, see https://www.cockroachlabs.com/blog/consistency-model/ and docs/RFCS/20200811_non_blocking_txns.md.


The PR also fixes a bug by properly marking read-only txn as aborted on rollback, which was missed in a85115a.

We were assuming that all calls to commitReadOnlyTxnLocked were for EndTxn requests with the Commit flag set to true, but this is not the case. This was not only confusing, but it was also leading to the txn.commit metric being incremented on rollback of a read-only transaction, instead of the txn.aborts metric.

Release justification: New functionality.

@nvb nvb requested review from a team and andreimatei February 25, 2021 05:54
@nvb nvb requested a review from a team as a code owner February 25, 2021 05:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb force-pushed the nvanbenschoten/commitWait branch from a722798 to 33ec88b Compare February 25, 2021 06:56
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: very satisfying

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go, line 1388 at r2 (raw file):

				require.Empty(t, errC)

				adv := futureOffset / 5

I don't really understand what's going on here
Generally I think some words about how the test is going to test the sleep duration would help


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go, line 1402 at r2 (raw file):

			}
			require.True(t, minClockTS.Less(s.Clock.Now()))
		} else {

nit: put the else branch first and return early


pkg/kv/kvclient/kvcoord/txn_metrics.go, line 26 at r2 (raw file):

	Commits1PC      *metric.Counter // Commits which finished in a single phase
	ParallelCommits *metric.Counter // Commits which entered the STAGING state
	CommitWaits     *metric.Counter // Commits that waited for linearizability

"single-key linearizability" :)


pkg/kv/kvclient/kvcoord/txn_metrics.go, line 77 at r2 (raw file):

	metaCommitWaitRates = metric.Metadata{
		Name:        "txn.commit_waits",
		Help:        "Number of KV transactions that had to commit-wait on commit",

perhaps "Had to sleep on commit in order to ensure serializability. This generally happens to transactions writing to global ranges.".


pkg/util/hlc/hlc.go, line 490 at r2 (raw file):

}

// SleepUntil sleeps until the given time.

consider adding more words to the comment emphasizing the all the caclulations are done in HLC time, not physical time

nvb added 2 commits February 26, 2021 23:53
This was missed in a85115a.

We were assuming that all calls to `commitReadOnlyTxnLocked` were for
`EndTxn` requests with the Commit flag set to true, but this is not the
case. This was not only confusing, but it was also leading to the
`txn.commit` metric being incremented on rollback of a read-only
transaction, instead of the `txn.aborts` metric.

Release justification: bug fix
Fixes cockroachdb#57687.
Related to cockroachdb#52745.

This commit introduces a "commit-wait" sleep stage after a transaction
commits, which is entered if doing so is deemed
necessary for consistency.

By default, commit-wait is only necessary for transactions that commit
with a future-time timestamp that leads the local HLC clock. This is
because CockroachDB's consistency model depends on all transactions
waiting until their commit timestamp is below their gateway clock. In
doing so, transactions ensure that at the time that they complete, all
other clocks in the system (i.e. on all possible gateways) will be no
more than the max_offset below the transaction's commit timestamp. This
property ensures that all causally dependent transactions will have an
uncertainty interval (see GlobalUncertaintyLimit) that exceeds the
original transaction's commit timestamp, preventing stale reads. Without
the wait, it would be possible for a read-write transaction to write a
future-time value and then for a causally dependent transaction to read
below that future-time value, violating "read your writes".

The property must also hold for read-only transactions, which may have a
commit timestamp in the future due to an uncertainty restart after
observing a future-time value in their uncertainty interval. In such
cases, the property that the transaction must wait for the local HLC
clock to exceed its commit timestamp is not necessary to prevent stale
reads, but it is necessary to ensure monotonic reads. Without the wait,
it would be possible for a read-only transaction coordinated on a
gateway with a fast clock to return a future-time value and then for a
causally dependent read-only transaction coordinated on a gateway with a
slow clock to read below that future-time value, violating "monotonic
reads".

In practice, most transactions do not need to wait at all, because their
commit timestamps were pulled from an HLC clock (i.e. are not synthetic)
and so they will be guaranteed to lead the local HLC's clock, assuming
proper HLC time propagation. Only transactions whose commit timestamps
were pushed into the future will need to wait, like those who wrote to a
global_read range and got bumped by the closed timestamp or those who
conflicted (write-read or write-write) with an existing future-time
value.

However, CockroachDB also supports a stricter model of consistency
through its "linearizable" flag. When in linearizable mode (also known
as "strict serializable" mode), all writing transactions (but not
read-only transactions) must wait an additional max_offset after
committing to ensure that their commit timestamp is below the current
HLC clock time of any other node in the system. In doing so, all
causally dependent transactions are guaranteed to start with higher
timestamps, regardless of the gateway they use. This ensures that all
causally dependent transactions commit with higher timestamps, even if
their read and writes sets do not conflict with the original
transaction's. This obviates the need for uncertainty intervals and
prevents the "causal reverse" anamoly which can be observed by a third,
concurrent transaction.

For more, see https://www.cockroachlabs.com/blog/consistency-model/ and
docs/RFCS/20200811_non_blocking_txns.md.

Release notes: None

Release justification: New functionality.
@nvb nvb force-pushed the nvanbenschoten/commitWait branch from 33ec88b to 7ec4212 Compare February 27, 2021 05:08
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go, line 1388 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't really understand what's going on here
Generally I think some words about how the test is going to test the sleep duration would help

Done.


pkg/kv/kvclient/kvcoord/txn_coord_sender_test.go, line 1402 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: put the else branch first and return early

Done.


pkg/kv/kvclient/kvcoord/txn_metrics.go, line 26 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

"single-key linearizability" :)

Not if the --linearizability flag is set!


pkg/kv/kvclient/kvcoord/txn_metrics.go, line 77 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

perhaps "Had to sleep on commit in order to ensure serializability. This generally happens to transactions writing to global ranges.".

Done.


pkg/util/hlc/hlc.go, line 490 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider adding more words to the comment emphasizing the all the caclulations are done in HLC time, not physical time

Done.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Feb 27, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 27, 2021

Build succeeded:

@craig craig bot merged commit dcc8ab0 into cockroachdb:master Feb 27, 2021
@nvb nvb deleted the nvanbenschoten/commitWait branch March 1, 2021 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv: implement commit-wait for future time transaction commits

3 participants