Skip to content

kv: prioritize severe errors when merging partial batches in DistSender#38579

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixTxnAbortPanic
Jul 1, 2019
Merged

kv: prioritize severe errors when merging partial batches in DistSender#38579
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixTxnAbortPanic

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jun 30, 2019

Fixes #36024.
Fixes #36094.

8b5bafb ensured that all transaction state was propagated by DistSender on errors. In doing so, it touched that fact that DistSender drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see pErr.UpdateTxn(resp.pErr.GetTxn())).

This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the TxnCoordSender, triggering panics like the ones we see in #36024 and #36094.

This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in DistSender. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the DistSender equivalent of 574e805, which is why they now share code.

@nvb nvb requested review from a team and andreimatei June 30, 2019 22:12
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb mentioned this pull request Jun 30, 2019
14 tasks
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/roachpb/errors.go, line 42 at r2 (raw file):

var _ error = &UnhandledRetryableError{}

// transactionRestartError is an interface implemented by errors that cause

nit: move this where it was to avoid the diff unless you had a reason to move it


pkg/roachpb/errors.go, line 54 at r2 (raw file):

}

// ErrorPriority is used to rank errors such that the "best" one is chosen to be

say that highest score beats lower

nvb added 2 commits July 1, 2019 16:58
This was accidentally lost in 1ff3556.

Release note: None
Fixes cockroachdb#36024.
Fixes cockroachdb#36094.

8b5bafb ensured that all transaction state was propagated by DistSender on
errors. In doing so, it touched that fact that DistSender drops all but the
first error that it sees. It ensured that even though this was the case, the
error metadata from these dropped errors would still be propagated (see
`pErr.UpdateTxn(resp.pErr.GetTxn())`).

This has an unintended consequence where it was now possible for a non-aborting
transaction retry error to be updated with an ABORTED transaction proto. This
caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024
and cockroachdb#36094.

This change fixes this by being smarter about which errors get dropped when
concurrent partial batches each hit an error in DistSender. It does this by
prioritizing the most severe errors and merging transaction state into those.
In a lot of ways, this is the DistSender equivalent of 574e805, which is why
they now share code.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/fixTxnAbortPanic branch from 5532def to 48bb3b7 Compare July 1, 2019 21:01
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=andreimatei

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/roachpb/errors.go, line 42 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: move this where it was to avoid the diff unless you had a reason to move it

It seems to belong next to UnhandledRetryableError.


pkg/roachpb/errors.go, line 54 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

say that highest score beats lower

Done.

craig bot pushed a commit that referenced this pull request Jul 1, 2019
38579: kv: prioritize severe errors when merging partial batches in DistSender r=andreimatei a=nvanbenschoten

Fixes #36024.
Fixes #36094.

8b5bafb ensured that all transaction state was propagated by `DistSender` on errors. In doing so, it touched that fact that `DistSender` drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`).

This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the `TxnCoordSender`, triggering panics like the ones we see in #36024 and #36094.

This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in `DistSender`. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the `DistSender` equivalent of 574e805, which is why they now share code.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 1, 2019

Build succeeded

@craig craig bot merged commit 48bb3b7 into cockroachdb:master Jul 1, 2019
@nvb nvb deleted the nvanbenschoten/fixTxnAbortPanic branch July 3, 2019 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: schemachange/index/tpcc/w=1000 failed roachtest: schemachange/index/tpcc/w=100 failed

3 participants