mvcc: don't write below provisional commit timestamp on WriteTooOld errors by nvb · Pull Request #35086 · cockroachdb/cockroach

nvb · 2019-02-20T06:12:28Z

This change fixes #34853, which appears to have been broken by
57d0201#diff-41e9e1b7829f8e25daeb564f42c26017L1514.
That change made it so that intents are now written at a transaction's
provisional commit timestamp instead of its original timestamp (which is
fantastic!). In doing so, it removed special-purpose code to forward the
MVCCLogicalOpDetails timestamp to the provisional commit timestamp of the
transaction because it assumed that the new writeTimestamp variable that the
change introduced already assumed this role.

Unfortunately, the change allowed this writeTimestamp variable to regress
below the transaction's provisional commit timestamp in subtle cases where a
committed value ended up between the transaction's original timestamp and its
provisional commit timestamp (a case that generates WriteTooOldErrors). In
this case, we were replacing the writeTimestamp with the committed value's
timestamp + 1 logical tick. This regression allowed MVCCWriteIntentOps to look
like they were ignoring closed timestamps.

The fix was to either introduce the special purpose code again or to avoid
letting writeTimestamp regress in this WriteTooOldError code path. This PR
chose the latter solution.

In doing so, this PR also improves upon the benefit already provided by #32487 -
it reduces the number of intents that need to be moved on transaction commit.
This is because it ensures that if a transaction has an original (read)
timestamp below a committed value and a provisional commit (write) timestamp
above a committed value, an intent left by that transaction will be in a
"committable" state without a need to rewrite it during intent resolution.

Release note: None

cockroach-teamcity · 2019-02-20T06:12:35Z

This change is

tbg

Great that you got to the bottom of this so fast! Man, that method is hard to follow. I gave this what I hope is a thorough review but will give it another one when you've figured out the lost update.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @nvanbenschoten, and @tbg)

pkg/storage/engine/mvcc.go, line 1466 at r1 (raw file):

			}
		} else if !metaTimestamp.Less(readTimestamp) {
			// This is the case where we're trying to write under a committed

Not necessarily, right? So far we only know that there's a committed write above the read timestamp (but the write timestamp could be above the committed value). Nvm, you mull on that below.

pkg/storage/engine/mvcc.go, line 1483 at r1 (raw file):

			// we'll need to refresh anyway, right? It might also be interacting
			// poorly with our logic in `evaluateBatch`. Any idea @bdarnell or
			// @tbg?

For snapshot transactions, this is just lost update -- two txn start at origts=100, their commit ts advances to 150 and 200 respectively, and then txn1 writes key=value1 (and commits) and txn2 cputs key from nil to value2, in effect wiping out value1 without ever having to observe it. I also don't see how this applies to serializable txns, though. As you mention, if we lay down an intent we're going to have to refresh anyway, and that should see the update that would be lost and force a restart. Worth getting to the bottom of even though I'm not sure avoiding the WTO is a good idea here. After all, what are the odds the write will survive the refresh? At that point we might as well check right here whether it would survive a refresh, though that seems to be optimizing for something we don't know is worth the added complexity.

PS this method probably has a cyclomatic complexity of MaxUint64... not that I see any easy way of changing that.

pkg/storage/engine/mvcc.go, line 1484 at r1 (raw file):

			// poorly with our logic in `evaluateBatch`. Any idea @bdarnell or
			// @tbg?
			writeTimestamp.Forward(metaTimestamp.Next())

Does this writeTimestamp ever make it back to the txn in the response?

pkg/storage/engine/mvcc.go, line 1497 at r1 (raw file):

			} else {
				// Outside of a transaction, read the latest value and advance
				// the write timestamp to the latest value's timestamp + 1. The

This part of the comment seems out of place because this was already done (also before this PR).

pkg/storage/engine/mvcc.go, line 1498 at r1 (raw file):

				// Outside of a transaction, read the latest value and advance
				// the write timestamp to the latest value's timestamp + 1. The
				// new timestamp is returned to the caller in maybeTooOldErr.

Confused by this. So we do actually write but then also return maybeTooOldErr? I doubt that.

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)

pkg/storage/engine/mvcc.go, line 1466 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Not necessarily, right? So far we only know that there's a committed write above the read timestamp (but the write timestamp could be above the committed value). Nvm, you mull on that below.

Ack.

pkg/storage/engine/mvcc.go, line 1483 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

For snapshot transactions, this is just lost update -- two txn start at origts=100, their commit ts advances to 150 and 200 respectively, and then txn1 writes key=value1 (and commits) and txn2 cputs key from nil to value2, in effect wiping out value1 without ever having to observe it. I also don't see how this applies to serializable txns, though. As you mention, if we lay down an intent we're going to have to refresh anyway, and that should see the update that would be lost and force a restart. Worth getting to the bottom of even though I'm not sure avoiding the WTO is a good idea here. After all, what are the odds the write will survive the refresh? At that point we might as well check right here whether it would survive a refresh, though that seems to be optimizing for something we don't know is worth the added complexity.

PS this method probably has a cyclomatic complexity of MaxUint64... not that I see any easy way of changing that.

Mystery solved!

pkg/storage/engine/mvcc.go, line 1484 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Does this writeTimestamp ever make it back to the txn in the response?

Yes, the writeTimestamp makes it back into the txn response here:

cockroach/pkg/storage/replica_evaluate.go

Line 288 in df358ac

ba.Txn.Timestamp.Forward(tErr.ActualTimestamp)

pkg/storage/engine/mvcc.go, line 1497 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This part of the comment seems out of place because this was already done (also before this PR).

Updated.

pkg/storage/engine/mvcc.go, line 1498 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Confused by this. So we do actually write but then also return maybeTooOldErr? I doubt that.

It's surprising, but yes we do. Dating all the way back to 4220d4a!

tbg

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @nvanbenschoten)

pkg/storage/engine/mvcc.go, line 1483 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Mystery solved!

How does this mechanism work exactly? If a CPut runs into a WTOErr, it "locally" (store?) retries the batch at the suggested timestamp and then returns the txn retry error? How does your attempted change break that? Shouldn't we be removing the WTO, but still returning a txn that needs to refresh if it wants to have any chance of committing? Why does it get to commit anyway?

pkg/storage/engine/mvcc.go, line 1498 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It's surprising, but yes we do. Dating all the way back to 4220d4a!

But we're not in a transaction in this branch? The write must not be committed, so why do it?
This is probably not the right PR to be pulling on this thread, but add a TODO or some explanation, if applicable.

…rrors Fixes cockroachdb#34853. This change fixes cockroachdb#34853, which appears to have been broken by cockroachdb@57d0201#diff-41e9e1b7829f8e25daeb564f42c26017L1514. That change made it so that intents are now written at a transaction's provisional commit timestamp instead of its original timestamp (which is fantastic!). In doing so, it removed special purpose code to forward the `MVCCLogicalOpDetails` timestamp to the provisional commit timestamp of the transaction because it assumed that the new `writeTimestamp` variable that the change introduced already assumed this role. Unfortunately, the change allowed this `writeTimestamp` variable to regress below the transaction's provisional commit timestamp in subtle cases where a committed value ended up between the transaction's original timestamp and its provisional commit timestamp (a case that generates `WriteTooOldError`s). In this case, we were replacing the `writeTimestamp` with the committed value's timestamp + 1 logical tick. This regression allowed `MVCCWriteIntentOp`s to look like they were ignoring closed timestamps. The fix was to either introduce the special purpose code again or to avoid letting `writeTimestamp` regress in this `WriteTooOldError` code path. This PR chose the latter solution. In doing so, this PR also improves upon the benefit already provided by cockroachdb#32487 - it reduces the number of intents that need to be moved on transaction commit. This is because it ensures that if a transaction has an original (read) timestamp below a committed value and a provisional commit (write) timestamp above a committed value, an intent left by that transaction will be in a "committable" state without a need to rewrite it during intent resolution. Release note: None

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)

pkg/storage/engine/mvcc.go, line 1483 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

How does this mechanism work exactly? If a CPut runs into a WTOErr, it "locally" (store?) retries the batch at the suggested timestamp and then returns the txn retry error? How does your attempted change break that? Shouldn't we be removing the WTO, but still returning a txn that needs to refresh if it wants to have any chance of committing? Why does it get to commit anyway?

The mechanism is all in evaluateBatch. The idea isn't to locally retry anything. Instead, that function's purpose is to run all commands in a batch even if they are throwing WTO errors to determine the highest timestamp needed for later retries.

The function then makes a few decision. First, for non-transactional requests, it always returns the largest WriteTooOldError. So to answer the question below, we will still return the error instead of committing, but we will wait until the entire non-transactional batch has finished evaluating so that we can return the "best" error.

For transactional requests, things are more interesting. For suitable request types like blind Puts, it sets the batch responses timestamp and its WriteTooOld flag. It can then go ahead and writes these intents. However, for others that are not blind like CPut, it returns the WriteTooOld error immediately instead of writing the intents. This allows these request types to avoid ever needing to be refreshed. This is also what broke when I changed this to rely on CPuts refreshing.

I added some more commentary in the code.

pkg/storage/engine/mvcc.go, line 1498 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

But we're not in a transaction in this branch? The write must not be committed, so why do it?
This is probably not the right PR to be pulling on this thread, but add a TODO or some explanation, if applicable.

Answered above.

tbg

Reviewed 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

nvb · 2019-02-21T16:27:49Z

bors r=tbg

35081: sql: Fix test flake r=andy-kimball a=andy-kimball Filter out PushTxn/ResolveIntent trace messages, since they represent intent cleanup from earlier operations that interferes with subsequent tests. This was already being done in other variations, so this commit extends that to all the variations that are doing filtering of messages. Fixes #34911 Release note: None 35086: mvcc: don't write below provisional commit timestamp on WriteTooOld errors r=tbg a=nvanbenschoten Fixes #34853. This change fixes #34853, which appears to have been broken by 57d0201#diff-41e9e1b7829f8e25daeb564f42c26017L1514. That change made it so that intents are now written at a transaction's provisional commit timestamp instead of its original timestamp (which is fantastic!). In doing so, it removed special-purpose code to forward the `MVCCLogicalOpDetails` timestamp to the provisional commit timestamp of the transaction because it assumed that the new `writeTimestamp` variable that the change introduced already assumed this role. Unfortunately, the change allowed this `writeTimestamp` variable to regress below the transaction's provisional commit timestamp in subtle cases where a committed value ended up between the transaction's original timestamp and its provisional commit timestamp (a case that generates `WriteTooOldError`s). In this case, we were replacing the `writeTimestamp` with the committed value's timestamp + 1 logical tick. This regression allowed `MVCCWriteIntentOp`s to look like they were ignoring closed timestamps. The fix was to either introduce the special purpose code again or to avoid letting `writeTimestamp` regress in this `WriteTooOldError` code path. This PR chose the latter solution. In doing so, this PR also improves upon the benefit already provided by #32487 - it reduces the number of intents that need to be moved on transaction commit. This is because it ensures that if a transaction has an original (read) timestamp below a committed value and a provisional commit (write) timestamp above a committed value, an intent left by that transaction will be in a "committable" state without a need to rewrite it during intent resolution. Release note: None 35112: kv: assert an absence of retries in TestTxnCoordSenderRetries r=tbg,andreimatei a=nvanbenschoten The test was only asserting that cases that should observe retries did, not the cases that should not observe retries did not. The PR also fixes `TestPropagateTxnOnError`, which wasn't really testing anything. We now set txn.Writing on the client, so it wasn't verifying what it thought it was. We switch it to assert that observed timestamps are propagated on error instead. 35118: storage: Downgrade a spammy log message r=tbg a=bdarnell This log message only occurs on clusters with multiple stores per node, but on those clusters it fires multiple times per second. Release note: None Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Ben Darnell <ben@bendarnell.com>

craig · 2019-02-21T17:04:05Z

Build succeeded

GitHub CI (Cockroach)

nvb requested review from a team, danhhz and tbg February 20, 2019 06:12

tbg reviewed Feb 20, 2019

View reviewed changes

nvb force-pushed the nvanbenschoten/writeTooOldTs branch from ab8ccbf to 5e2f362 Compare February 20, 2019 23:42

nvb commented Feb 20, 2019

View reviewed changes

tbg reviewed Feb 21, 2019

View reviewed changes

nvb force-pushed the nvanbenschoten/writeTooOldTs branch from 5e2f362 to 6b49279 Compare February 21, 2019 15:27

nvb commented Feb 21, 2019

View reviewed changes

tbg approved these changes Feb 21, 2019

View reviewed changes

craig bot merged commit 6b49279 into cockroachdb:master Feb 21, 2019

nvb deleted the nvanbenschoten/writeTooOldTs branch February 21, 2019 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mvcc: don't write below provisional commit timestamp on WriteTooOld errors#35086

mvcc: don't write below provisional commit timestamp on WriteTooOld errors#35086
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/writeTooOldTs

nvb commented Feb 20, 2019

Uh oh!

cockroach-teamcity commented Feb 20, 2019

Uh oh!

tbg left a comment

Uh oh!

nvb left a comment

Uh oh!

tbg left a comment

Uh oh!

nvb left a comment

Uh oh!

tbg left a comment

Uh oh!

nvb commented Feb 21, 2019

Uh oh!

craig bot commented Feb 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Feb 20, 2019

Uh oh!

cockroach-teamcity commented Feb 20, 2019

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Feb 21, 2019

Uh oh!

craig bot commented Feb 21, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants