sql: rollback transaction eagerly upon error#140160
sql: rollback transaction eagerly upon error#140160craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
michae2
left a comment
There was a problem hiding this comment.
Looks pretty good!
Some test ideas:
- something with cockroach_restart
- something with pausable portals
- something with implicit txn retry
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rafiss)
pkg/sql/conn_executor_savepoints.go line 34 at r1 (raw file):
ctx context.Context, s *tree.Savepoint, res RestrictedCommandResult, ) (fsm.Event, fsm.EventPayload, error) { ex.state.mu.hasSavepoints = true
Would we want to set hasSavepoints to false somewhere in execRelease to allow the eager rollback of a transaction that fails after all created savepoints have been released?
rafiss
left a comment
There was a problem hiding this comment.
something with cockroach_restart
I added a test with savepoints. I don't think cockroach_restart needs any other special testing, but I could be missing something, so just let me know what else you had in mind.
something with pausable portals
I was thinking this would be covered by our metamorphic testing. But I can add a test for a specific scenario if you can think of one.
something with implicit txn retry
Added this test.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/conn_executor_savepoints.go line 34 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Would we want to set
hasSavepointsto false somewhere in execRelease to allow the eager rollback of a transaction that fails after all created savepoints have been released?
done. good point.
f6b1e73 to
5d4bcf2
Compare
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/conn_fsm.go line 418 at r2 (raw file):
ts.mu.Lock() defer ts.mu.Unlock() if !ts.mu.hasSavepoints {
Something occurred to me: if this is a nested connExecutor created by InternalExecutor.initConnEx, sharing a transaction with its parent connExecutor, then will it know about the savepoints in its parent connExecutor?
Maybe we also need to check that this is a top-level connExecutor? Or we need to add savepoint info to InternalExecutor.extraTxnState?
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/conn_fsm.go line 418 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Something occurred to me: if this is a nested connExecutor created by
InternalExecutor.initConnEx, sharing a transaction with its parent connExecutor, then will it know about the savepoints in its parent connExecutor?Maybe we also need to check that this is a top-level connExecutor? Or we need to add savepoint info to
InternalExecutor.extraTxnState?
When we're running an internal executor delegated from a parent transaction, the connExecutor uses a different state machine. It's further down below:
Line 704 in 5a8f03f
The transaction is never rolled back in those state transitions.
michae2
left a comment
There was a problem hiding this comment.
Reviewed 7 of 7 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/conn_fsm.go line 418 at r2 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
When we're running an internal executor delegated from a parent transaction, the connExecutor uses a different state machine. It's further down below:
Line 704 in 5a8f03f
The transaction is never rolled back in those state transitions.
Great, thank you!
pkg/sql/conn_executor_test.go line 1905 at r2 (raw file):
err = conn1.QueryRowContext(ctx, `SELECT v FROM t WHERE k = 4`).Scan(&v) require.NoError(t, err) require.Equal(t, 400, v)
one idea for a cockroach_restart test is something like:
-- conn 1
INSERT INTO t VALUES (5, 5), (6, 6);
BEGIN;
SAVEPOINT cockroach_restart;
SELECT * FROM t WHERE k = 5;
-- conn 2 (to break serializability)
UPDATE t SET v = 50 WHERE k = 5;
SELECT * FROM t WHERE k = 6;
-- conn 1
UPDATE t SET v = 60 WHERE k = 6;
-- conn 2 (blocks, to prove locks are still held)
UPDATE t SET v = v + 600 WHERE k = 6;
-- conn 1 (should fail with RETRY_SERIALIZABLE)
RELEASE SAVEPOINT cockroach_restart;
-- simulate the advanced retry
ROLLBACK TO SAVEPOINT cockroach_restart;
SAVEPOINT cockroach_restart;
SELECT * FROM t WHERE k = 5;
UPDATE t SET v = 61 WHERE k = 6;
RELEASE SAVEPOINT cockroach_restart;
-- conn 2 (to prove that ordering was correct and locks are now released)
SELECT * FROM t WHERE k = 6;
-- conn 1
COMMIT;pkg/sql/conn_executor_test.go line 1916 at r4 (raw file):
// layer, but for now this is the expected behavior. // See https://github.com/cockroachdb/cockroach/issues/117020. func TestRetriableErrorDuringTransactionHoldsLocks(t *testing.T) {
Thanks for adding this!
pkg/sql/conn_executor_exec.go line 186 at r3 (raw file):
ex.startIdleInSessionTimeout() if ex.sessionData().IdleInTransactionSessionTimeout > 0 {
If both IdleInTransactionSessionTimeout and txnTimeoutRemaining are > 0, do we need to set both timers? Seems like we only need to set the one that will fire first?
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/conn_executor_exec.go line 186 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
If both IdleInTransactionSessionTimeout and txnTimeoutRemaining are > 0, do we need to set both timers? Seems like we only need to set the one that will fire first?
the two timers have slightly different behaviors. as per the commend i added below:
// The transaction_timeout setting should move the transaction to the
// aborted state.
// NOTE: In Postgres, the transaction_timeout causes the entire session
// to be terminated. We intentionally diverge from that behavior.
however, the IdleInTransactionSessionTimeout timer causes the session to be terminated.
pkg/sql/conn_executor_test.go line 1905 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
one idea for a cockroach_restart test is something like:
-- conn 1 INSERT INTO t VALUES (5, 5), (6, 6); BEGIN; SAVEPOINT cockroach_restart; SELECT * FROM t WHERE k = 5; -- conn 2 (to break serializability) UPDATE t SET v = 50 WHERE k = 5; SELECT * FROM t WHERE k = 6; -- conn 1 UPDATE t SET v = 60 WHERE k = 6; -- conn 2 (blocks, to prove locks are still held) UPDATE t SET v = v + 600 WHERE k = 6; -- conn 1 (should fail with RETRY_SERIALIZABLE) RELEASE SAVEPOINT cockroach_restart; -- simulate the advanced retry ROLLBACK TO SAVEPOINT cockroach_restart; SAVEPOINT cockroach_restart; SELECT * FROM t WHERE k = 5; UPDATE t SET v = 61 WHERE k = 6; RELEASE SAVEPOINT cockroach_restart; -- conn 2 (to prove that ordering was correct and locks are now released) SELECT * FROM t WHERE k = 6; -- conn 1 COMMIT;
nice! let me give this a try
rafiss
left a comment
There was a problem hiding this comment.
thanks for the reviews! this is ready for another look.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
michae2
left a comment
There was a problem hiding this comment.
Just one suggestion. This is great!
Reviewed 4 of 4 files at r5, 3 of 3 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/conn_executor_exec.go line 186 at r3 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
the two timers have slightly different behaviors. as per the commend i added below:
// The transaction_timeout setting should move the transaction to the // aborted state. // NOTE: In Postgres, the transaction_timeout causes the entire session // to be terminated. We intentionally diverge from that behavior.however, the IdleInTransactionSessionTimeout timer causes the session to be terminated.
Ah, I missed that. Thanks!
pkg/sql/conn_executor_test.go line 1936 at r5 (raw file):
_, err = conn1.ExecContext(ctx, `RELEASE SAVEPOINT cockroach_restart`) require.ErrorContains(t, err, "failed preemptive refresh due to encountered recently written committed value")
nit: Because we're not letting the conn2 update block the entire time, I think it would be a slightly better test to try the conn2 update right after releasing the cockroach_restart savepoint. This would validate that locks are still held after releasing the cockroach_restart savepoint.
If we have not used any savepoints, it's safe to rollback the transaction immediately when a non-retryable error is encountered. This releases locks sooner, which is useful so we avoid needing to wait until the client sends a ROLLBACK statement. Release note (bug fix): Transactions that enter the aborted state will now release locks they are holding immediately, as long as there is no SAVEPOINT active in the transaction.
Previously, the transaction_timeout setting would only cause the transaction to abort if the timeout had already expired by the time the next statement is received by CRDB. Now we create a timer that fires if the timeout expires while we are waiting for the next statement. There's no release note, since up until the previous commit, which changed the behavior so transactions eagerly release locks, there was no functional difference caused by lazily timing out the transaction. Release note: None
Release note: None
rafiss
left a comment
There was a problem hiding this comment.
tftr!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @michae2)
pkg/sql/conn_executor_test.go line 1936 at r5 (raw file):
Previously, michae2 (Michael Erickson) wrote…
nit: Because we're not letting the conn2 update block the entire time, I think it would be a slightly better test to try the conn2 update right after releasing the cockroach_restart savepoint. This would validate that locks are still held after releasing the cockroach_restart savepoint.
good point! added the assertion after too
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 3 of 7 files at r2, 4 of 4 files at r8, 3 of 3 files at r9, 1 of 1 files at r10, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale)
A comment has been added to explain logic added in cockroachdb#140160. Release note: None
142062: test: add test filter environment variable to cloud unit tests r=jeffswenson a=kev-cao Epic: none Release note: None 142079: sql: add comment for auto-rollback r=mgartner a=mgartner A comment has been added to explain logic added in #140160. Epic: None Release note: None Co-authored-by: Kevin Cao <kevin.cao@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
sql: rollback transaction eagerly upon error
If we have not used any savepoints, it's safe to rollback the
transaction immediately when a non-retryable error is encountered.
This releases locks sooner, which is useful so we avoid needing to wait
until the client sends a ROLLBACK statement.
Release note (bug fix): Transactions that enter the aborted state will
now release locks they are holding immediately, as long as there is no
SAVEPOINT active in the transaction.
sql: make transaction_timeout abort transaction eagerly
Previously, the transaction_timeout setting would only cause the
transaction to abort if the timeout had already expired by the time the
next statement is received by CRDB. Now we create a timer that fires if
the timeout expires while we are waiting for the next statement.
There's no release note, since up until the previous commit, which
changed the behavior so transactions eagerly release locks, there was no
functional difference caused by lazily timing out the transaction.
Release note: None
sql: add test for implicit txn holding locks across retries
Release note: None
fixes #140234