Skip to content

[backport 3.4] mvcc: ensure serializability by fixing dirty reads and secondary index duplicates#11985

Merged
sergepetrenko merged 6 commits intorelease/3.4from
backport/release/3.4/11662
Oct 31, 2025
Merged

[backport 3.4] mvcc: ensure serializability by fixing dirty reads and secondary index duplicates#11985
sergepetrenko merged 6 commits intorelease/3.4from
backport/release/3.4/11662

Conversation

@TarantoolBot
Copy link
Collaborator

@TarantoolBot TarantoolBot commented Oct 27, 2025

(This PR is a backport of #11662 to release/3.4 to a future 3.4.2 release.)


The patchset enforces serializable isolation by addressing the following MVCC bugs:

  • Rollback may cause duplicates in secondary indexes.
    To guarantee the absence of duplicates in secondary indexes, MVCC maintains the following invariant for all in-progress transactions:

    If an in-progress story `x` conflicts with a story `y` in some secondary index
    (i.e., they have the same key in that index), then `x` must also conflict with `y` in the primary key.
    

    x and y may belong to the same transaction or different ones. The case where x and y belong to the same transaction is trivial. If x and y belong to different transactions, then y must be the last prepared story in the chain corresponding to that index. This implies that the invariant may break for some transactions when the last prepared story in a chain changes (either when another story becomes last or when the last story's del_psn becomes 0). All such cases must be handled - any transactions that violate the invariant (i.e., start duplicating prepared tuples) must be aborted.

    Rollback often leads to changes in the last prepared story within chains. However, this case was previously overlooked, which could result in duplicates after rollback. This patch adds the missing handling for rollback scenarios.

    Closes mvcc: rollback may cause duplicates in secondary indexes #11660

  • Insert-after-delete in single transaction may cause duplicates in secondary indexes.
    The issue was related to MVCC assuming that if it inserts a tuple x={key, ...} that doesn't conflict with any other tuple on the primary key key (because it previously executed delete(key) removing some tuple y={key, ...} with the same primary key), then x couldn't possibly be a duplicate in any secondary index. However, this is obviously false. To make such a conclusion for a specific secondary index, the deleted tuple y must have the same key value in that index as tuple x.

    The problem was connected with the is_own_change flag in the transactional statement. Its truth or falsity did not allow us to say anything about secondary indices.

    Similar flags were introduced separately for each index. The statement-level flag remains, but it now has a different name (is_own_delete) and semantics. This flag is only used for DELETE statements; for INSERT/REPLACE, it is always false. delete_stmt->is_own_delete means the statement will either delete some tuple from the same transaction or won't delete anything because the same transaction previously deleted this key.

    For INSERT/REPLACE statements, stmt->is_own_change has been replaced by stmt->add_story->link[0].is_own_change, which is equivalent to the stmt->is_own_change that existed before this commit.

    Closes mvcc: insert-after-delete in single transaction may cause duplicates in secondary indexes #11686

  • Get-after-replace in single transaction may cause dirty read in secondary index.
    The issue was related to MVCC assuming that if it performed a replace with a new tuple x={key, ...} that also deleted some tuple y={key, key2} with the same primary key, then a subsequent index.sk:get(key2) in the same transaction wouldn't return anything. This would be true if the transaction had deleted y using delete rather than replace, because delete guarantees it will definitely remove exactly what was returned to the user (it tracks reads if y was inserted by another transaction, or y was inserted by the same transaction, in which case this guarantee arises automatically).

    This issue was automatically fixed by the same is_own_change flag-related fix described above. Now when MVCC performs get on a secondary key, it checks the is_own_change flag for that secondary index and determines whether it can automatically guarantee it won't read anything, or whether it needs to create a gap tracker to enforce this guarantee.

    Closes mvcc: get-after-replace in single transaction may cause dirty read in secondary index #11687

  • Rollback may cause dirty gap read in secondary index.
    The essence of the problem: One REPLACE transaction (prepared but not yet committed) creates a temporary situation where another concurrent transaction cannot see a specific key in a secondary index (a "read gap"). When the first transaction then rolls back, the previous (replaced) tuple becomes visible again, and the "read gap" becomes irrelevant (the key becomes visible once more). In this case, the second transaction, which read the gap, should be aborted. However, it successfully commits, leading to a non-serializable schedule.

    This patch fixes the issue by adding the necessary handling during rollback. Now, all such irrelevant gaps are aborted.

    Closes mvcc: rollback may cause dirty gap read #11802

Introduces the `abort_on_rollback` flag for inplace gap items to control
gap-reader transaction abortion on rollback. This change is a prerequisite
for future fixes related to per-index `is_own_change` flags, it will help
prevent new false-positive cases from occurring.

After you look at one of the subsequent commits — "memtx: introduce
`is_own_change` flag for each secondary index", which fixes issue #11686,
it will become clear that some gap-reader transactions must be rolled back
only while preparing a replace/insert statement, but not during a rollback.

```
space:replace{21, 30} -- committed tuple `{21, 30}`
TXN1 begin()
TXN1 space:replace{21, 31}')
TXN1 space.index.sk:select{30} -> nil
TXN2 begin()
TXN2 space:delete{21}
TXN2 commit() -> rollback due to WAL IO error
```

- The operation `space.index.sk:select{30}` in `TXN1` will now correctly
track a gap, because `box.space.test:replace{21, 31}` in `TXN1` does not
guarantee that some tuple `{x, 30}`, `x != 21` from another concurrent
transaction won't be inserted and committed before `space.sk:select{30}`
from `TXN1`. (see #11686, #11687)
- Transaction `TXN2` executes `box.space.test:delete{21}`
- Transaction `TXN2` is rolled back. Since this `space:delete{21}` deleted
the tuple `{21, 30}`, it will reappear after the rollback

It turns out that some gap-reader transactions for key `30` in the
secondary index should not be rolled back. Specifically, the gap-reader
`TXN1` can be skipped. All inplace gap items corresponding to such
transactions can be marked with the `abort_on_rollback = false` flag at
the moment of their creation.

Part of #11686, #11687

NO_DOC=bugfix
NO_TEST=will be added later
NO_CHANGELOG=will be added later

(cherry picked from commit 9a000e5)
Removed the `is_own_change` output parameter from `check_dup()` in
`memtx_tx.c` - now it directly modifies `txn_stmt::is_own_change`.
This change prepares for further modifications where the ownership
check logic will be extended (to be implemented in the next commit).

Part of #11686, #11687

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 78fc900)
This commit fixes two Memtx-MVCC related issues:

- A bug when a transaction performing insert-after-delete with the same
  primary key (e.g., delete{4} followed by insert{4, 3}) could create
  secondary key duplicates.

- A bug when a transaction performing get-after-replace could dirty-read
  nothing.

Both problems was connected with the `is_own_change` flag in the
transactional statement. Its truth or falsity did not allow us to say
anything about secondary indices.

Similar flags were introduced separately for each index.
The statement-level flag remains, but it now has a different name
(`is_own_delete`) and semantics. This flag is only used for DELETE
statements; for INSERT/REPLACE, it is always `false`.
`delete_stmt->is_own_delete` means the statement will either delete some
tuple from the same transaction or won't delete anything because the same
transaction previously deleted this key.

For INSERT/REPLACE statements, `stmt->is_own_change` has been replaced by
`stmt->add_story->link[0].is_own_change`, which is equivalent to the
`stmt->is_own_change` that existed before this commit.

Closes #11686, #11687

NO_DOC=bugfix

(cherry picked from commit 76ba7ee)
Duplicates need to be processed (aborted) not only during preparing,
but also during rollback (see the next commit). Factor this processing
out into a separate function.

Part of #11660

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit d4f9f9b)
This commit fixes a Memtx-MVCC related bug that could lead to duplicates
in secondary indexes after rollback.

To guarantee the absence of duplicates in secondary indexes, MVCC
maintains the following invariant for all in-progress transactions:
```
If an in-progress story `x` conflicts with a story `y` in some secondary
index (i.e., they have the same key in that index), then `x` must also
conflict with `y` in the primary key.
```
`x` and `y` may belong to the same transaction or different ones. The case
where `x` and `y` belong to the same transaction is trivial. If `x` and `y`
belong to different transactions, then `y` must be the last prepared story
in the chain corresponding to that index. This implies that the invariant
may break for some transactions when the last prepared story in a chain
changes (either when another story becomes last or when the last story's
`del_psn` becomes 0). All such cases must be handled - any transactions
that violate the invariant (i.e., start duplicating prepared tuples) must
be aborted.

Rollback often leads to changes in the last prepared story within chains.
However, this case was previously overlooked, which could result in
duplicates after rollback. This commit adds the missing handling for
rollback scenarios.

Closes #11660

NO_DOC=bugfix

(cherry picked from commit bd6e12b)
This commit fixes a Memtx-MVCC related bug that could lead to dirty gap
read in secondary indexes after rollback.

One REPLACE transaction (prepared but not yet committed) creates a
temporary situation where another concurrent transaction cannot see a
specific key in a secondary index (a "read gap"). When the first
transaction then rolls back, the previous (replaced) tuple becomes visible
again, and the "read gap" becomes irrelevant (the key becomes visible once
more). In this case, the second transaction, which read the gap, should be
aborted. However, it successfully commits, leading to a non-serializable
schedule.

This commit fixes the issue by adding the necessary handling during
rollback. Now, all such irrelevant gaps are aborted.

Closes #11802

NO_DOC=bugfix

(cherry picked from commit 2a9a463)
@TarantoolBot TarantoolBot requested a review from a team as a code owner October 27, 2025 09:48
@TarantoolBot TarantoolBot changed the title [Backport release/3.4] mvcc: ensure serializability by fixing dirty reads and secondary index duplicates [backport 3.4] mvcc: ensure serializability by fixing dirty reads and secondary index duplicates Oct 27, 2025
@coveralls
Copy link

Coverage Status

coverage: 87.573% (+0.008%) from 87.565%
when pulling 0b418e2 on backport/release/3.4/11662
into 27b8df7
on release/3.4
.

@sergepetrenko sergepetrenko merged commit 74ed973 into release/3.4 Oct 31, 2025
25 checks passed
@sergepetrenko sergepetrenko deleted the backport/release/3.4/11662 branch October 31, 2025 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants