[backport 3.4] mvcc: ensure serializability by fixing dirty reads and secondary index duplicates#11985
Merged
sergepetrenko merged 6 commits intorelease/3.4from Oct 31, 2025
Merged
Conversation
Introduces the `abort_on_rollback` flag for inplace gap items to control gap-reader transaction abortion on rollback. This change is a prerequisite for future fixes related to per-index `is_own_change` flags, it will help prevent new false-positive cases from occurring. After you look at one of the subsequent commits — "memtx: introduce `is_own_change` flag for each secondary index", which fixes issue #11686, it will become clear that some gap-reader transactions must be rolled back only while preparing a replace/insert statement, but not during a rollback. ``` space:replace{21, 30} -- committed tuple `{21, 30}` TXN1 begin() TXN1 space:replace{21, 31}') TXN1 space.index.sk:select{30} -> nil TXN2 begin() TXN2 space:delete{21} TXN2 commit() -> rollback due to WAL IO error ``` - The operation `space.index.sk:select{30}` in `TXN1` will now correctly track a gap, because `box.space.test:replace{21, 31}` in `TXN1` does not guarantee that some tuple `{x, 30}`, `x != 21` from another concurrent transaction won't be inserted and committed before `space.sk:select{30}` from `TXN1`. (see #11686, #11687) - Transaction `TXN2` executes `box.space.test:delete{21}` - Transaction `TXN2` is rolled back. Since this `space:delete{21}` deleted the tuple `{21, 30}`, it will reappear after the rollback It turns out that some gap-reader transactions for key `30` in the secondary index should not be rolled back. Specifically, the gap-reader `TXN1` can be skipped. All inplace gap items corresponding to such transactions can be marked with the `abort_on_rollback = false` flag at the moment of their creation. Part of #11686, #11687 NO_DOC=bugfix NO_TEST=will be added later NO_CHANGELOG=will be added later (cherry picked from commit 9a000e5)
Removed the `is_own_change` output parameter from `check_dup()` in `memtx_tx.c` - now it directly modifies `txn_stmt::is_own_change`. This change prepares for further modifications where the ownership check logic will be extended (to be implemented in the next commit). Part of #11686, #11687 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 78fc900)
This commit fixes two Memtx-MVCC related issues:
- A bug when a transaction performing insert-after-delete with the same
primary key (e.g., delete{4} followed by insert{4, 3}) could create
secondary key duplicates.
- A bug when a transaction performing get-after-replace could dirty-read
nothing.
Both problems was connected with the `is_own_change` flag in the
transactional statement. Its truth or falsity did not allow us to say
anything about secondary indices.
Similar flags were introduced separately for each index.
The statement-level flag remains, but it now has a different name
(`is_own_delete`) and semantics. This flag is only used for DELETE
statements; for INSERT/REPLACE, it is always `false`.
`delete_stmt->is_own_delete` means the statement will either delete some
tuple from the same transaction or won't delete anything because the same
transaction previously deleted this key.
For INSERT/REPLACE statements, `stmt->is_own_change` has been replaced by
`stmt->add_story->link[0].is_own_change`, which is equivalent to the
`stmt->is_own_change` that existed before this commit.
Closes #11686, #11687
NO_DOC=bugfix
(cherry picked from commit 76ba7ee)
This commit fixes a Memtx-MVCC related bug that could lead to duplicates in secondary indexes after rollback. To guarantee the absence of duplicates in secondary indexes, MVCC maintains the following invariant for all in-progress transactions: ``` If an in-progress story `x` conflicts with a story `y` in some secondary index (i.e., they have the same key in that index), then `x` must also conflict with `y` in the primary key. ``` `x` and `y` may belong to the same transaction or different ones. The case where `x` and `y` belong to the same transaction is trivial. If `x` and `y` belong to different transactions, then `y` must be the last prepared story in the chain corresponding to that index. This implies that the invariant may break for some transactions when the last prepared story in a chain changes (either when another story becomes last or when the last story's `del_psn` becomes 0). All such cases must be handled - any transactions that violate the invariant (i.e., start duplicating prepared tuples) must be aborted. Rollback often leads to changes in the last prepared story within chains. However, this case was previously overlooked, which could result in duplicates after rollback. This commit adds the missing handling for rollback scenarios. Closes #11660 NO_DOC=bugfix (cherry picked from commit bd6e12b)
This commit fixes a Memtx-MVCC related bug that could lead to dirty gap read in secondary indexes after rollback. One REPLACE transaction (prepared but not yet committed) creates a temporary situation where another concurrent transaction cannot see a specific key in a secondary index (a "read gap"). When the first transaction then rolls back, the previous (replaced) tuple becomes visible again, and the "read gap" becomes irrelevant (the key becomes visible once more). In this case, the second transaction, which read the gap, should be aborted. However, it successfully commits, leading to a non-serializable schedule. This commit fixes the issue by adding the necessary handling during rollback. Now, all such irrelevant gaps are aborted. Closes #11802 NO_DOC=bugfix (cherry picked from commit 2a9a463)
lenkis
approved these changes
Oct 29, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(This PR is a backport of #11662 to
release/3.4to a future3.4.2release.)The patchset enforces serializable isolation by addressing the following MVCC bugs:
Rollback may cause duplicates in secondary indexes.
To guarantee the absence of duplicates in secondary indexes, MVCC maintains the following invariant for all in-progress transactions:
xandymay belong to the same transaction or different ones. The case wherexandybelong to the same transaction is trivial. Ifxandybelong to different transactions, thenymust be the last prepared story in the chain corresponding to that index. This implies that the invariant may break for some transactions when the last prepared story in a chain changes (either when another story becomes last or when the last story'sdel_psnbecomes 0). All such cases must be handled - any transactions that violate the invariant (i.e., start duplicating prepared tuples) must be aborted.Rollback often leads to changes in the last prepared story within chains. However, this case was previously overlooked, which could result in duplicates after rollback. This patch adds the missing handling for rollback scenarios.
Closes mvcc: rollback may cause duplicates in secondary indexes #11660
Insert-after-delete in single transaction may cause duplicates in secondary indexes.
The issue was related to MVCC assuming that if it inserts a tuple
x={key, ...}that doesn't conflict with any other tuple on the primary keykey(because it previously executeddelete(key)removing some tupley={key, ...}with the same primary key), thenxcouldn't possibly be a duplicate in any secondary index. However, this is obviously false. To make such a conclusion for a specific secondary index, the deleted tupleymust have the same key value in that index as tuplex.The problem was connected with the
is_own_changeflag in the transactional statement. Its truth or falsity did not allow us to say anything about secondary indices.Similar flags were introduced separately for each index. The statement-level flag remains, but it now has a different name (
is_own_delete) and semantics. This flag is only used for DELETE statements; for INSERT/REPLACE, it is alwaysfalse.delete_stmt->is_own_deletemeans the statement will either delete some tuple from the same transaction or won't delete anything because the same transaction previously deleted this key.For INSERT/REPLACE statements,
stmt->is_own_changehas been replaced bystmt->add_story->link[0].is_own_change, which is equivalent to thestmt->is_own_changethat existed before this commit.Closes mvcc: insert-after-delete in single transaction may cause duplicates in secondary indexes #11686
Get-after-replace in single transaction may cause dirty read in secondary index.
The issue was related to MVCC assuming that if it performed a
replacewith a new tuplex={key, ...}that also deleted some tupley={key, key2}with the same primary key, then a subsequentindex.sk:get(key2)in the same transaction wouldn't return anything. This would be true if the transaction had deletedyusingdeleterather thanreplace, becausedeleteguarantees it will definitely remove exactly what was returned to the user (it tracks reads ifywas inserted by another transaction, orywas inserted by the same transaction, in which case this guarantee arises automatically).This issue was automatically fixed by the same
is_own_changeflag-related fix described above. Now when MVCC performsgeton a secondary key, it checks theis_own_changeflag for that secondary index and determines whether it can automatically guarantee it won't read anything, or whether it needs to create agaptracker to enforce this guarantee.Closes mvcc: get-after-replace in single transaction may cause dirty read in secondary index #11687
Rollback may cause dirty gap read in secondary index.
The essence of the problem: One REPLACE transaction (prepared but not yet committed) creates a temporary situation where another concurrent transaction cannot see a specific key in a secondary index (a "read gap"). When the first transaction then rolls back, the previous (replaced) tuple becomes visible again, and the "read gap" becomes irrelevant (the key becomes visible once more). In this case, the second transaction, which read the gap, should be aborted. However, it successfully commits, leading to a non-serializable schedule.
This patch fixes the issue by adding the necessary handling during rollback. Now, all such irrelevant gaps are aborted.
Closes mvcc: rollback may cause dirty gap read #11802