Skip to content

vinyl: fix deferred delete reader not aborted on WAL error#11990

Merged
locker merged 1 commit intotarantool:masterfrom
locker:gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix
Oct 29, 2025
Merged

vinyl: fix deferred delete reader not aborted on WAL error#11990
locker merged 1 commit intotarantool:masterfrom
locker:gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix

Conversation

@locker
Copy link
Member

@locker locker commented Oct 27, 2025

With deferred DELETEs enabled, space.delete() writes the DELETE statement to the primary index write set only while statements for secondary indexes are generated either when the transaction is prepared (if the deleted tuple is found in the memory layer or cache) or on compaction.

Now imagine the following scenario:

  1. There are tuples {1,1} and {2,2} in the memory layer of a space with enabled deferred DELETEs, the primary index over field 1, and a unique secondary index over field 2.
  2. A transaction deletes {2,2} by the primary key and blocks on WAL. When the statement is executed, the DELETE statement is written only to the primary index write set because of enabled deferred DELETEs, but when the transaction is prepared (vy_tx_prepare()), the corresponding statement for the secondary index is generated because the deleted tuple is found in the in-memory layer, see vy_tx_handle_deferred_delete().
  3. In the meantime, another transaction replaces {1,1} with {1,2} and yields execution. It doesn't violate the unique constraint of the secondary index because {2,2} was deleted although not yet confirmed.
  4. Imagine a transient WAL error happens and the first transaction is rolled back. The second transaction must be aborted too in this case because it implicitly read the DELETE statement prepared but not confirmed by the first transaction, but it is successfully committed, resulting in the unique constraint violation!

When a transaction is rolled back due to a WAL write error, we do abort all transactions that read statements prepared by it, see vy_tx_rollback_after_prepare(), but to do that, we iterate over statements that were inserted into the transaction write set. The problem is, we don't insert deferred DELETE statements generated when a transaction is prepared in vy_tx_handle_deferred_delete() into the write set because we don't need them for lookups at that point. As a result, we don't abort transactions that read deferred DELETE statements from secondary indexes.

Let's fix this issue by aborting all dependent transactions for each statement inserted/deleted from the LSM tree. We already loop over all transaction statements in vy_tx_rollback_after_prepare() and vy_tx_prepare so there's no need to iterate over the write set entries separately.

There's another related issue that may result in the same anomaly. The problem is that we don't track the non-matching primary index statement in the transaction write set when we look up the full tuple by a key found in a secondary index, see vy_get_by_secondary_tuple(). Normally, this is fine because we don't care if the primary index statement is overwritten - we never returned it to the user. However, if the statement hasn't been confirmed, it may still be rolled back due to a WAL error reinstantiating the tuple matching the secondary index key and making the lookup invalid. Note that this may happen only if generation of the deferred DELETE is postponed to compaction. Let's fix this issue by tracking unconfirmed primary index statements in vy_get_by_secondary_tuple() even if they don't match the secondary index key.

Closes #11969

@locker locker requested a review from a team as a code owner October 27, 2025 13:18
@coveralls
Copy link

coveralls commented Oct 27, 2025

Coverage Status

coverage: 87.665%. remained the same
when pulling 3904a04 on locker:gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix
into 9941bd5
on tarantool:master
.

@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Oct 28, 2025
@locker locker force-pushed the gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix branch from 47b9f14 to b55bd8c Compare October 28, 2025 12:06
@locker locker assigned drewdzzz and unassigned locker Oct 28, 2025
@locker locker requested a review from drewdzzz October 28, 2025 12:18
Copy link
Contributor

@drewdzzz drewdzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Oct 29, 2025
With deferred DELETEs enabled, `space.delete()` writes the DELETE
statement to the primary index write set only while statements for
secondary indexes are generated either when the transaction is prepared
(if the deleted tuple is found in the memory layer or cache) or on
compaction.

Now imagine the following scenario:

1. There are tuples `{1,1}` and `{2,2}` in the memory layer of a space
   with enabled deferred DELETEs, the primary index over field 1, and
   a unique secondary index over field 2.
2. A transaction deletes `{2,2}` by the primary key and blocks on WAL.
   When the statement is executed, the DELETE statement is written only
   to the primary index write set because of enabled deferred DELETEs,
   but when the transaction is prepared (`vy_tx_prepare()`),
   the corresponding statement for the secondary index is generated
   because the deleted tuple is found in the in-memory layer, see
   `vy_tx_handle_deferred_delete()`.
3. In the meantime, another transaction replaces `{1,1}` with `{1,2}`
   and yields execution. It doesn't violate the unique constraint of
   the secondary index because `{2,2}` was deleted although not yet
   confirmed.
4. Imagine a transient WAL error happens and the first transaction is
   rolled back. The second transaction must be aborted too in this case
   because it implicitly read the DELETE statement prepared but not
   confirmed by the first transaction, but it is successfully committed,
   resulting in the unique constraint violation!

When a transaction is rolled back due to a WAL write error, we do
abort all transactions that read statements prepared by it, see
`vy_tx_rollback_after_prepare()`, but to do that, we iterate over
statements that were inserted into the transaction write set.
The problem is, we don't insert deferred DELETE statements generated
when a transaction is prepared in `vy_tx_handle_deferred_delete()` into
the write set because we don't need them for lookups at that point.
As a result, we don't abort transactions that read deferred DELETE
statements from secondary indexes.

Let's fix this issue by aborting all dependent transactions for each
statement inserted/deleted from the LSM tree. We already loop over
all transaction statements in `vy_tx_rollback_after_prepare()` and
`vy_tx_prepare` so there's no need to iterate over the write set entries
separately.

There's another related issue that may result in the same anomaly.
The problem is that we don't track the non-matching primary index
statement in the transaction write set when we look up the full tuple
by a key found in a secondary index, see `vy_get_by_secondary_tuple()`.
Normally, this is fine because we don't care if the primary index
statement is overwritten - we never returned it to the user. However,
if the statement hasn't been confirmed, it may still be rolled back
due to a WAL error reinstantiating the tuple matching the secondary
index key and making the lookup invalid. Note that this may happen
only if generation of the deferred DELETE is postponed to compaction.
Let's fix this issue by tracking unconfirmed primary index statements
in `vy_get_by_secondary_tuple()` even if they don't match the secondary
index key.

Closes tarantool#11969

NO_DOC=bug fix
@locker locker force-pushed the gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix branch from b55bd8c to 3904a04 Compare October 29, 2025 08:26
@locker locker added the full-ci Enables all tests for a pull request label Oct 29, 2025
@locker locker merged commit b81f858 into tarantool:master Oct 29, 2025
56 of 59 checks passed
@locker locker deleted the gh-11969-vy-deferred-delete-reader-abort-on-wal-error-fix branch October 29, 2025 10:33
@locker
Copy link
Member Author

locker commented Oct 29, 2025

Cherry-picked to 3.2, 3.3, 3.4, 3.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vinyl: rollback may cause duplicate in secondary index with deferred deletes

4 participants