rangefeed: fix handling of aborted transactions found by txnPushAttempt by nvb · Pull Request #35889 · cockroachdb/cockroach

nvb · 2019-03-18T18:51:40Z

The fix in #35777 inadvertently broke txnPushAttempt and its interaction with
transactions that were found to be aborted. This commit restores this handling
by reverting part of the assertion added in #35772 and splitting up the handling
of "aborted intents" from the handling of "aborted transactions".

Aborted transactions were always one of the most tricky situations to handle in
rangefeed. The challenge is that rangefeed may be tracking a transaction that
has been abandoned and may want to push it to unblock its resolved timestamp,
but it doesn't have a way to know where the aborted transaction's intents live
on its range. The aborted transaction record can't tell us this either. So the
only thing rangefeed can do is push the transaction and convince itself that the
aborted intents on its range can be safely ignored. This poses a problem for
accurate reference counting of intents though. We need to be able to discard a
transaction and still handle the eventual resolution of its aborted intents.

To handle this properly, the change splits up MVCCAbortIntentOp into
MVCCAbortIntentOp and MVCCAbortTxnOp. Before this change, the single
operation type was attempting to fill both roles, but they mean fundamentally
different things. MVCCAbortTxnOp is then given back its ability to delete
transactions from the queue directly without worrying about their reference
count. Next, the unresolvedIntentQueue is extended to be a little more
flexible. Instead of asserting that reference counts never drop below zero, it just
ignores changes that would make reference counts negative. This has the potential
to "leak" reference counts that saturate at zero and are then incremented again,
but we know that this will only ever happen for aborted transactions, which can
always be pushed again to rediscover that they have been aborted.

Release note: None

cockroach-teamcity · 2019-03-18T18:51:48Z

This change is

danhhz

@tbg and/or @bdarnell can I get your eyes on this, too? I'm not quite the careful reviewer that the two of you are and it seems like this is all more complicated than anyone initially thought

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @nvanbenschoten)

pkg/storage/engine/enginepb/mvcc3.proto, line 247 at r1 (raw file):

  MVCCCommitIntentOp commit_intent = 4;
  MVCCAbortIntentOp  abort_intent  = 5;
  MVCCAbortTxnOp     abort_txn     = 6;

What happens if one of these gets sent during a rolling upgrade to a node that doesn't have this branch in the if? Looks like a panic?

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @danhhz, and @tbg)

pkg/storage/engine/enginepb/mvcc3.proto, line 247 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

What happens if one of these gets sent during a rolling upgrade to a node that doesn't have this branch in the if? Looks like a panic?

We only ever send them to our local rangefeed from txnPushAttempt tasks, so that's not a concern.

nvb · 2019-03-18T23:25:55Z

pkg/storage/rangefeed/resolved_timestamp.go

 	return false
 }

+// Del removes the transaction from the queue. It returns whether the update had


This method is exactly what was removed in https://github.com/cockroachdb/cockroach/pull/35777/files#diff-425bf0ccd6243bd7bb88910fc308bdf4L424.

tbg

What's the reasoning behind clamping the refcounts to zero? If we let them go negative, wouldn't they either go back to zero or the txn would be deleted wholesale thanks to having been aborted? I'm asking because a) that seems conceptually simpler and b) perhaps we could retain some assertions? For example, if the queue is init'ed and we're seeing a negative refcount, then either it goes away organically or we eventually find that the txn is aborted. In particular, a "stuck" negative refcount for which we discover a committed txn would trigger an error. Maybe this is is also possible with the clamping, not sure. I don't understand the significance of the value zero. I thought once a refcount of zero was reached, the txn would be deleted from the queue, but I don't think that's happening here.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)

pkg/storage/engine/enginepb/mvcc3.proto, line 219 at r1 (raw file):

instance

pkg/storage/rangefeed/processor_test.go, line 667 at r1 (raw file):

	p.ConsumeLogicalOps(
		writeIntentOpWithKey(txn1Meta.ID, txn1Meta.Key, txn1Meta.Timestamp),
		writeIntentOpWithKey(txn2Meta.ID, txn2Meta.Key, txn2Meta.Timestamp),

What's this change about?

nvb

What's the reasoning behind clamping the refcounts to zero?
I thought once a refcount of zero was reached, the txn would be deleted from the queue

That is what's happening here. We never store ref counts of zero in the queue. We always remove it in that case. You can think of a MVCCAbortIntentOp as causing a -1 refcount delta and a MVCCAbortTxnOp as a causing a -count refcount delta. But notice that we're now overcounting intents, so we need to clamp at zero. This is ok though, because we know that we'll only ever be saturating at zero for aborted transactions.

I think you might be suggesting that we let ref counts go negative when we overcount and then rely on a second MVCCAbortTxnOp to bring the count back to zero. That wouldn't work because it could result in a txn being added back into the queue with a count of -1 after a MVCCAbortIntentOp and causing a resolved timestamp regression.

Did that answer your question?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @tbg)

pkg/storage/engine/enginepb/mvcc3.proto, line 219 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

instance

Done.

pkg/storage/rangefeed/processor_test.go, line 667 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

What's this change about?

This test was erroneously passing before this change because it only had a single intent for the transaction that was eventually discovered as aborted. This allowed the single ref count decrement to function as a full deletion. With this change, the test now only passes when an aborted txn results in a MVCCAbortTxnOp instead of a MVCCAbortIntentOp.

nvb · 2019-03-19T12:00:14Z

Also for what it's worth, this is how this all worked up until #35777. The logic here isn't really new, it's just being adapted to work with that PR, which realized that aborted intents don't always indicate aborted transactions.

tbg

Yep, thanks

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @nvanbenschoten, and @tbg)

pkg/storage/rangefeed/resolved_timestamp.go, line 454 at r1 (raw file):

	} else if txn.refCount < 0 && !uiq.allowNegRefCount {
		// Ignore txn.
		return false

Already taking into account that the callers only use deltas of 0, 1, -1 (I had to go look that up because it's not obvious, so perhaps the code should be written more robustly, with larger negative deltas it seems that more oddities come up incl negative final refcounts) so it looks like would only be able to hit the branch if txn.refCount before the delta was 0 but then how is it in the queue? The way I picture this is

txn is aborted and deleted from queue
txn writes intent a, gets a refcount of one
txn intent gets aborted, refcount -1 hits the deletion branch

or

txn aborted and deleted
txn intent gets aborted (hits the !ok branch above)

so what example hits this code? (might want to add it as a comment while you're at it)

tbg

Reviewed 9 of 11 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @tbg)

pkg/storage/rangefeed/resolved_timestamp.go, line 454 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Already taking into account that the callers only use deltas of 0, 1, -1 (I had to go look that up because it's not obvious, so perhaps the code should be written more robustly, with larger negative deltas it seems that more oddities come up incl negative final refcounts) so it looks like would only be able to hit the branch if txn.refCount before the delta was 0 but then how is it in the queue? The way I picture this is

txn is aborted and deleted from queue

txn writes intent a, gets a refcount of one

txn intent gets aborted, refcount -1 hits the deletion branch

or

txn aborted and deleted

txn intent gets aborted (hits the !ok branch above)

so what example hits this code? (might want to add it as a comment while you're at it)

Yeah this case wasn't being exercised by the external interface of the type, but I included it for robustness. You're completely right though that it's not actually what we want. Instead, we want it to be collapsed in with the previous case like:

if txn.refCount == 0 || (txn.refCount < 0 && !uiq.allowNegRefCount) {

pkg/storage/rangefeed/resolved_timestamp.go, line 467 at r2 (raw file):

// Del removes the transaction from the queue. It returns whether the update had
// an effect on the oldest transaction in the queue.
func (uiq *unresolvedIntentQueue) Del(txnID uuid.UUID) bool {

@tbg I asked on slack, but I'll keep the conversation on here. Would you find the following more clear?

func (uiq *unresolvedIntentQueue) Del(txnID uuid.UUID) bool {
	if uiq.allowNegRefCount {
		panic("cannot Del while allowNegRefCount")
	}
	return uiq.updateTxn(txnID, nil, hlc.Timestamp{}, math.MinInt64)
}

To avoid underflow, we'd need the allowNegRefCount check, which is kind of gross.

Fixes cockroachdb#35816. The fix in cockroachdb#35777 inadvertently broke `txnPushAttempt` and its interaction with transactions that were found to be aborted. This commit restores this handling by reverting part of the assertion added in cockroachdb#35772 and splitting up the handling of "aborted intents" from the handling of "aborted transactions". Aborted transactions were always one of the most tricky situations to handle in rangefeed. The challenge is that rangefeed may be tracking a transaction that has been abandoned and may want to push it to unblock its resolved timestamp, but it doesn't have a way to know where the aborted transaction's intents live on its range. The aborted transaction record can't tell us this either. So the only thing rangefeed can do is push the transaction and convince itself that the aborted intents on its range can be safely ignored. This poses a problem for accurate reference counting of intents though. We need to be able to discard a transaction and still handle the eventual resolution of its aborted intents. To handle this properly, the change splits up `MVCCAbortIntentOp` into `MVCCAbortIntentOp` and `MVCCAbortTxnOp`. Before this change, the single operation type was attempting to fill both roles, but they mean fundamentally different things. `MVCCAbortTxnOp` is then given back its ability to delete transactions from the queue directly without worrying about their reference count. Next, the `unresolvedIntentQueue` is extended to be a little more flexible. Instead of asserting that reference counts never drop below zero, it just ignores changes that would make reference counts negative. This has the potential to "leak" reference counts that saturate at zero and are then incremented again, but we know that this will only ever happen for aborted transactions, which can always be pushed again to rediscover that they have been aborted. Release note: None

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @tbg)

pkg/storage/rangefeed/resolved_timestamp.go, line 454 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah this case wasn't being exercised by the external interface of the type, but I included it for robustness. You're completely right though that it's not actually what we want. Instead, we want it to be collapsed in with the previous case like:
if txn.refCount == 0 || (txn.refCount < 0 && !uiq.allowNegRefCount) {

Done.

pkg/storage/rangefeed/resolved_timestamp.go, line 467 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

@tbg I asked on slack, but I'll keep the conversation on here. Would you find the following more clear?
func (uiq *unresolvedIntentQueue) Del(txnID uuid.UUID) bool {
	if uiq.allowNegRefCount {
		panic("cannot Del while allowNegRefCount")
	}
	return uiq.updateTxn(txnID, nil, hlc.Timestamp{}, math.MinInt64)
}
To avoid underflow, we'd need the allowNegRefCount check, which is kind of gross.

Ok, I added this as a comment instead.

nvb · 2019-03-19T20:14:55Z

@tbg I'm not sure whether you were planning on giving this another look or not. I'm hoping to get it in for tonight's roachtests.

bdarnell

LGTM, although I'm not very familiar with this code.

Reviewed 8 of 11 files at r1, 2 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

nvb · 2019-03-19T20:58:54Z

TFTRs!

bors r+

tbg · 2019-03-19T21:05:47Z

Merge it, I'll take another pass tomorrow but have no reason to assume there's anything I'd want to have changed

35889: rangefeed: fix handling of aborted transactions found by txnPushAttempt r=nvanbenschoten a=nvanbenschoten Fixes #35816. The fix in #35777 inadvertently broke `txnPushAttempt` and its interaction with transactions that were found to be aborted. This commit restores this handling by reverting part of the assertion added in #35772 and splitting up the handling of "aborted intents" from the handling of "aborted transactions". Aborted transactions were always one of the most tricky situations to handle in rangefeed. The challenge is that rangefeed may be tracking a transaction that has been abandoned and may want to push it to unblock its resolved timestamp, but it doesn't have a way to know where the aborted transaction's intents live on its range. The aborted transaction record can't tell us this either. So the only thing rangefeed can do is push the transaction and convince itself that the aborted intents on its range can be safely ignored. This poses a problem for accurate reference counting of intents though. We need to be able to discard a transaction and still handle the eventual resolution of its aborted intents. To handle this properly, the change splits up `MVCCAbortIntentOp` into `MVCCAbortIntentOp` and `MVCCAbortTxnOp`. Before this change, the single operation type was attempting to fill both roles, but they mean fundamentally different things. `MVCCAbortTxnOp` is then given back its ability to delete transactions from the queue directly without worrying about their reference count. Next, the `unresolvedIntentQueue` is extended to be a little more flexible. Instead of asserting that reference counts never drop below zero, it just ignores changes that would make reference counts negative. This has the potential to "leak" reference counts that saturate at zero and are then incremented again, but we know that this will only ever happen for aborted transactions, which can always be pushed again to rediscover that they have been aborted. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

tbg

Reviewed 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained

craig · 2019-03-19T21:51:04Z

Build succeeded

GitHub CI (Cockroach)

nvb requested review from a team and danhhz March 18, 2019 18:51

danhhz reviewed Mar 18, 2019

View reviewed changes

danhhz requested review from bdarnell and tbg March 18, 2019 18:59

nvb commented Mar 18, 2019

View reviewed changes

danhhz mentioned this pull request Mar 18, 2019

roachtest: cdc/ledger/rangefeed=true failed #35114

Closed

nvb commented Mar 18, 2019

View reviewed changes

tbg reviewed Mar 19, 2019

View reviewed changes

nvb force-pushed the nvanbenschoten/fixRefCount branch from 9628a32 to 145d156 Compare March 19, 2019 11:49

nvb commented Mar 19, 2019

View reviewed changes

tbg reviewed Mar 19, 2019

View reviewed changes

nvb commented Mar 19, 2019

View reviewed changes

nvb force-pushed the nvanbenschoten/fixRefCount branch from 145d156 to aa73a72 Compare March 19, 2019 17:00

nvb commented Mar 19, 2019

View reviewed changes

bdarnell approved these changes Mar 19, 2019

View reviewed changes

tbg approved these changes Mar 19, 2019

View reviewed changes

craig bot merged commit aa73a72 into cockroachdb:master Mar 19, 2019

nvb mentioned this pull request Mar 21, 2019

release-19.1: rangefeed: fix handling of aborted transactions found by txnPushAttempt #36014

Merged

nvb deleted the nvanbenschoten/fixRefCount branch March 25, 2019 23:09

aliher1911 mentioned this pull request Sep 26, 2023

rangefeed: don't advance resolved ts on txn abort #111218

Closed

Conversation

nvb commented Mar 18, 2019

Uh oh!

cockroach-teamcity commented Mar 18, 2019

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb Mar 18, 2019

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Mar 19, 2019

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Mar 19, 2019

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Mar 19, 2019

Uh oh!

tbg commented Mar 19, 2019 via email

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Mar 19, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants