Skip to content

kv: stop the heartbeat loop on rollback errors#26479

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
andreimatei:txn-rollback
Jun 7, 2018
Merged

kv: stop the heartbeat loop on rollback errors#26479
craig[bot] merged 1 commit intocockroachdb:masterfrom
andreimatei:txn-rollback

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

A txn's heartbeat loop is generally stopped when, upon a successful
request, the response's txn is no longer PENDING. This was insufficient;
the loop should always be closed after an EndTransaction(commit=false),
regardless of whether it results in success or error.
The heartbeat loop happens to be currently closed by the context
cancelation that the db.Txn() API performs, but that is going away.

This fixes #26434 - TestTxnCoordSenderRetries had become flaky after #25541
because #25441 caused EndTransactions to be sent in some situations
where they weren't before. What's going on in that test is that a
subtest was leaking a heartbeat loop that was stopped after the subtest
finished; the EndTxn sent when the loop finally was being stopped was
interfering with a CommandFilter installed by a different subtest.
Before #25441, the first subtest was waiting for the heartbeat loop to
be done because of its own CommandFilter. However, with #25441, the
first subtest's CommandFilter was being satisfied by a different, newly
introduced EndTransaction.

Release note: None

@andreimatei andreimatei requested a review from a team June 6, 2018 19:48
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Jun 6, 2018

:lgtm:


Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jun 7, 2018

Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, all commit checks successful.


pkg/kv/txn_coord_sender.go, line 534 at r1 (raw file):

			log.VEventf(ctx, 2, "error: %s", pErr)

			// On rollback error, stop the heartbeat loop. No more requests can come

Why not put this in updateState? This seems like a random place for this logic.


Comments from Reviewable

A txn's heartbeat loop is generally stopped when, upon a successful
request, the response's txn is no longer PENDING. This was insufficient;
the loop should always be closed after an EndTransaction(commit=false),
regardless of whether it results in success or error.
The heartbeat loop happens to be currently closed by the context
cancelation that the db.Txn() API performs, but that is going away.

This fixes cockroachdb#26434 - TestTxnCoordSenderRetries had become flaky after cockroachdb#25541
because cockroachdb#25441 caused EndTransactions to be sent in some situations
where they weren't before. What's going on in that test is that a
subtest was leaking a heartbeat loop that was stopped after the subtest
finished; the EndTxn sent when the loop finally was being stopped was
interfering with a CommandFilter installed by a different subtest.
Before cockroachdb#25441, the first subtest was waiting for the heartbeat loop to
be done because of its own CommandFilter. However, with cockroachdb#25441, the
first subtest's CommandFilter was being satisfied by a different, newly
introduced EndTransaction.

Release note: None
@andreimatei
Copy link
Copy Markdown
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/kv/txn_coord_sender.go, line 534 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why not put this in updateState? This seems like a random place for this logic.

Done.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jun 7, 2018

:lgtm:


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor Author

bors r+


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

craig bot pushed a commit that referenced this pull request Jun 7, 2018
26479: kv: stop the heartbeat loop on rollback errors r=andreimatei a=andreimatei

A txn's heartbeat loop is generally stopped when, upon a successful
request, the response's txn is no longer PENDING. This was insufficient;
the loop should always be closed after an EndTransaction(commit=false),
regardless of whether it results in success or error.
The heartbeat loop happens to be currently closed by the context
cancelation that the db.Txn() API performs, but that is going away.

This fixes #26434 - TestTxnCoordSenderRetries had become flaky after #25541
because #25441 caused EndTransactions to be sent in some situations
where they weren't before. What's going on in that test is that a
subtest was leaking a heartbeat loop that was stopped after the subtest
finished; the EndTxn sent when the loop finally was being stopped was
interfering with a CommandFilter installed by a different subtest.
Before #25441, the first subtest was waiting for the heartbeat loop to
be done because of its own CommandFilter. However, with #25441, the
first subtest's CommandFilter was being satisfied by a different, newly
introduced EndTransaction.

Release note: None

26516: sql: disable optimizer if force-lookup-joins flag set r=RaduBerinde a=RaduBerinde

The experimental flag to force lookup joins doesn't work with the
optimizer (it is disabled on opt-generated plans). To allow the flag
to still function for now, we disable the optimizer if the flag is
set.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 7, 2018

Build succeeded

@craig craig bot merged commit 7f39653 into cockroachdb:master Jun 7, 2018
@andreimatei andreimatei deleted the txn-rollback branch June 7, 2018 19:09
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jun 7, 2018
Copying from cockroachdb#26524
What's going on here is that this test relies on a rollback (kv-level)
working with a canceled context. It never really worked (the first
rollback attempt failed), however certain canceled contexts were also
stopping the heartbeat loop as a byproduct, and that was doing its own
cleanup. The recent cockroachdb#26479 changed things, however - there would be no
more stopping of the heartbeat loop after a failed rollback attempt.
This is a legit enough regression from cockroachdb#26479, but it's being fixed in
the imminent cockroachdb#23055 which makes the rollbacks with canceled contexts
work properly.

Touches cockroachdb#26524

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv: TestTxnCoordSenderRetries is failing under stress

4 participants