mvcc: ensure serializability by fixing dirty reads and secondary index duplicates#11662
Conversation
7679026 to
bb463b1
Compare
bb463b1 to
173e092
Compare
8c205f9 to
90c529c
Compare
|
Note There is one possible optimization: during rollback, we could track all chains affected by rolled-back transactions. Then, after all necessary rollbacks are complete, we could:
This would reduce false positives. Currently:
The proposed approach would:
I'm uncertain how critical it is for us to reduce these false-positive aborts, but it's worth noting this optimization opportunity exists. |
63e7234 to
9af9a1a
Compare
c3a1487 to
327939d
Compare
changelogs/unreleased/gh-11686-mvcc-insert-after-delete-may-cause-duplicates.md
Outdated
Show resolved
Hide resolved
changelogs/unreleased/gh-11660-mvcc-rollback-may-cause-duplicates.md
Outdated
Show resolved
Hide resolved
Please open a ticket for it. |
69f0a9a to
8a53894
Compare
|
@Astronomax Please, rebase to the newest master. It seems the workflow fails because of it. |
drewdzzz
left a comment
There was a problem hiding this comment.
Thank you for your fixes. Overall, it looks good to me - only a few small things left!
changelogs/unreleased/gh-11660-mvcc-rollback-may-cause-duplicates.md
Outdated
Show resolved
Hide resolved
changelogs/unreleased/gh-11802-mvcc-rollback-may-cause-dirty-gap-read.md
Outdated
Show resolved
Hide resolved
drewdzzz
left a comment
There was a problem hiding this comment.
Thank you for your tremendous work on this patch! I believe, this PR is one of the major points in memtx MVCC development.
|
I suspect that Doc Team review is outdated (AFAIR, they approved this PR before the last two commits). If so, please, re-request their review. |
changelogs/unreleased/gh-11660-mvcc-rollback-may-cause-duplicates.md
Outdated
Show resolved
Hide resolved
Introduces the `abort_on_rollback` flag for inplace gap items to control gap-reader transaction abortion on rollback. This change is a prerequisite for future fixes related to per-index `is_own_change` flags, it will help prevent new false-positive cases from occurring. After you look at one of the subsequent commits — "memtx: introduce `is_own_change` flag for each secondary index", which fixes issue tarantool#11686, it will become clear that some gap-reader transactions must be rolled back only while preparing a replace/insert statement, but not during a rollback. ``` space:replace{21, 30} -- committed tuple `{21, 30}` TXN1 begin() TXN1 space:replace{21, 31}') TXN1 space.index.sk:select{30} -> nil TXN2 begin() TXN2 space:delete{21} TXN2 commit() -> rollback due to WAL IO error ``` - The operation `space.index.sk:select{30}` in `TXN1` will now correctly track a gap, because `box.space.test:replace{21, 31}` in `TXN1` does not guarantee that some tuple `{x, 30}`, `x != 21` from another concurrent transaction won't be inserted and committed before `space.sk:select{30}` from `TXN1`. (see tarantool#11686, tarantool#11687) - Transaction `TXN2` executes `box.space.test:delete{21}` - Transaction `TXN2` is rolled back. Since this `space:delete{21}` deleted the tuple `{21, 30}`, it will reappear after the rollback It turns out that some gap-reader transactions for key `30` in the secondary index should not be rolled back. Specifically, the gap-reader `TXN1` can be skipped. All inplace gap items corresponding to such transactions can be marked with the `abort_on_rollback = false` flag at the moment of their creation. Part of tarantool#11686, tarantool#11687 NO_DOC=bugfix NO_TEST=will be added later NO_CHANGELOG=will be added later
Removed the `is_own_change` output parameter from `check_dup()` in `memtx_tx.c` - now it directly modifies `txn_stmt::is_own_change`. This change prepares for further modifications where the ownership check logic will be extended (to be implemented in the next commit). Part of tarantool#11686, tarantool#11687 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
This commit fixes two Memtx-MVCC related issues:
- A bug when a transaction performing insert-after-delete with the same
primary key (e.g., delete{4} followed by insert{4, 3}) could create
secondary key duplicates.
- A bug when a transaction performing get-after-replace could dirty-read
nothing.
Both problems was connected with the `is_own_change` flag in the
transactional statement. Its truth or falsity did not allow us to say
anything about secondary indices.
Similar flags were introduced separately for each index.
The statement-level flag remains, but it now has a different name
(`is_own_delete`) and semantics. This flag is only used for DELETE
statements; for INSERT/REPLACE, it is always `false`.
`delete_stmt->is_own_delete` means the statement will either delete some
tuple from the same transaction or won't delete anything because the same
transaction previously deleted this key.
For INSERT/REPLACE statements, `stmt->is_own_change` has been replaced by
`stmt->add_story->link[0].is_own_change`, which is equivalent to the
`stmt->is_own_change` that existed before this commit.
Closes tarantool#11686, tarantool#11687
NO_DOC=bugfix
Duplicates need to be processed (aborted) not only during preparing, but also during rollback (see the next commit). Factor this processing out into a separate function. Part of tarantool#11660 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring
This commit fixes a Memtx-MVCC related bug that could lead to duplicates in secondary indexes after rollback. To guarantee the absence of duplicates in secondary indexes, MVCC maintains the following invariant for all in-progress transactions: ``` If an in-progress story `x` conflicts with a story `y` in some secondary index (i.e., they have the same key in that index), then `x` must also conflict with `y` in the primary key. ``` `x` and `y` may belong to the same transaction or different ones. The case where `x` and `y` belong to the same transaction is trivial. If `x` and `y` belong to different transactions, then `y` must be the last prepared story in the chain corresponding to that index. This implies that the invariant may break for some transactions when the last prepared story in a chain changes (either when another story becomes last or when the last story's `del_psn` becomes 0). All such cases must be handled - any transactions that violate the invariant (i.e., start duplicating prepared tuples) must be aborted. Rollback often leads to changes in the last prepared story within chains. However, this case was previously overlooked, which could result in duplicates after rollback. This commit adds the missing handling for rollback scenarios. Closes tarantool#11660 NO_DOC=bugfix
This commit fixes a Memtx-MVCC related bug that could lead to dirty gap read in secondary indexes after rollback. One REPLACE transaction (prepared but not yet committed) creates a temporary situation where another concurrent transaction cannot see a specific key in a secondary index (a "read gap"). When the first transaction then rolls back, the previous (replaced) tuple becomes visible again, and the "read gap" becomes irrelevant (the key becomes visible once more). In this case, the second transaction, which read the gap, should be aborted. However, it successfully commits, leading to a non-serializable schedule. This commit fixes the issue by adding the necessary handling during rollback. Now, all such irrelevant gaps are aborted. Closes tarantool#11802 NO_DOC=bugfix
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.2
git worktree add -d .worktree/backport/release/3.2/11662 origin/release/3.2
cd .worktree/backport/release/3.2/11662
git switch --create backport/release/3.2/11662
git cherry-pick -x 9a000e50dfb2693f817bf899b1b02473b76c7a36 78fc900ec841ec6cfe14b167ab0884b5a542609d 76ba7ee9eacb0815126b9cbc61fb92b3b543d2f8 d4f9f9b9570201b06a649d87b3ff23d4053c7d44 bd6e12b3ca90bc9b698ad15a0e41583baae821b9 2a9a463e5c303dca9c877f6e0e89d4d6665898de |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/3.3
git worktree add -d .worktree/backport/release/3.3/11662 origin/release/3.3
cd .worktree/backport/release/3.3/11662
git switch --create backport/release/3.3/11662
git cherry-pick -x 9a000e50dfb2693f817bf899b1b02473b76c7a36 78fc900ec841ec6cfe14b167ab0884b5a542609d 76ba7ee9eacb0815126b9cbc61fb92b3b543d2f8 d4f9f9b9570201b06a649d87b3ff23d4053c7d44 bd6e12b3ca90bc9b698ad15a0e41583baae821b9 2a9a463e5c303dca9c877f6e0e89d4d6665898de |
|
Successfully created backport PR for |
|
Successfully created backport PR for |
Backport summary
|
|
@Astronomax, please, prepare backports for 3.2 and 3.3 |
The patchset enforces serializable isolation by addressing the following MVCC bugs:
Rollback may cause duplicates in secondary indexes.
To guarantee the absence of duplicates in secondary indexes, MVCC maintains the following invariant for all in-progress transactions:
xandymay belong to the same transaction or different ones. The case wherexandybelong to the same transaction is trivial. Ifxandybelong to different transactions, thenymust be the last prepared story in the chain corresponding to that index. This implies that the invariant may break for some transactions when the last prepared story in a chain changes (either when another story becomes last or when the last story'sdel_psnbecomes 0). All such cases must be handled - any transactions that violate the invariant (i.e., start duplicating prepared tuples) must be aborted.Rollback often leads to changes in the last prepared story within chains. However, this case was previously overlooked, which could result in duplicates after rollback. This patch adds the missing handling for rollback scenarios.
Closes mvcc: rollback may cause duplicates in secondary indexes #11660
Insert-after-delete in single transaction may cause duplicates in secondary indexes.
The issue was related to MVCC assuming that if it inserts a tuple
x={key, ...}that doesn't conflict with any other tuple on the primary keykey(because it previously executeddelete(key)removing some tupley={key, ...}with the same primary key), thenxcouldn't possibly be a duplicate in any secondary index. However, this is obviously false. To make such a conclusion for a specific secondary index, the deleted tupleymust have the same key value in that index as tuplex.The problem was connected with the
is_own_changeflag in the transactional statement. Its truth or falsity did not allow us to say anything about secondary indices.Similar flags were introduced separately for each index. The statement-level flag remains, but it now has a different name (
is_own_delete) and semantics. This flag is only used for DELETE statements; for INSERT/REPLACE, it is alwaysfalse.delete_stmt->is_own_deletemeans the statement will either delete some tuple from the same transaction or won't delete anything because the same transaction previously deleted this key.For INSERT/REPLACE statements,
stmt->is_own_changehas been replaced bystmt->add_story->link[0].is_own_change, which is equivalent to thestmt->is_own_changethat existed before this commit.Closes mvcc: insert-after-delete in single transaction may cause duplicates in secondary indexes #11686
Get-after-replace in single transaction may cause dirty read in secondary index.
The issue was related to MVCC assuming that if it performed a
replacewith a new tuplex={key, ...}that also deleted some tupley={key, key2}with the same primary key, then a subsequentindex.sk:get(key2)in the same transaction wouldn't return anything. This would be true if the transaction had deletedyusingdeleterather thanreplace, becausedeleteguarantees it will definitely remove exactly what was returned to the user (it tracks reads ifywas inserted by another transaction, orywas inserted by the same transaction, in which case this guarantee arises automatically).This issue was automatically fixed by the same
is_own_changeflag-related fix described above. Now when MVCC performsgeton a secondary key, it checks theis_own_changeflag for that secondary index and determines whether it can automatically guarantee it won't read anything, or whether it needs to create agaptracker to enforce this guarantee.Closes mvcc: get-after-replace in single transaction may cause dirty read in secondary index #11687
Rollback may cause dirty gap read in secondary index.
The essence of the problem: One REPLACE transaction (prepared but not yet committed) creates a temporary situation where another concurrent transaction cannot see a specific key in a secondary index (a "read gap"). When the first transaction then rolls back, the previous (replaced) tuple becomes visible again, and the "read gap" becomes irrelevant (the key becomes visible once more). In this case, the second transaction, which read the gap, should be aborted. However, it successfully commits, leading to a non-serializable schedule.
This patch fixes the issue by adding the necessary handling during rollback. Now, all such irrelevant gaps are aborted.
Closes mvcc: rollback may cause dirty gap read #11802