vinyl: fix deferred delete reader not aborted on WAL error#11990
Merged
locker merged 1 commit intotarantool:masterfrom Oct 29, 2025
Merged
Conversation
drewdzzz
reviewed
Oct 28, 2025
47b9f14 to
b55bd8c
Compare
With deferred DELETEs enabled, `space.delete()` writes the DELETE
statement to the primary index write set only while statements for
secondary indexes are generated either when the transaction is prepared
(if the deleted tuple is found in the memory layer or cache) or on
compaction.
Now imagine the following scenario:
1. There are tuples `{1,1}` and `{2,2}` in the memory layer of a space
with enabled deferred DELETEs, the primary index over field 1, and
a unique secondary index over field 2.
2. A transaction deletes `{2,2}` by the primary key and blocks on WAL.
When the statement is executed, the DELETE statement is written only
to the primary index write set because of enabled deferred DELETEs,
but when the transaction is prepared (`vy_tx_prepare()`),
the corresponding statement for the secondary index is generated
because the deleted tuple is found in the in-memory layer, see
`vy_tx_handle_deferred_delete()`.
3. In the meantime, another transaction replaces `{1,1}` with `{1,2}`
and yields execution. It doesn't violate the unique constraint of
the secondary index because `{2,2}` was deleted although not yet
confirmed.
4. Imagine a transient WAL error happens and the first transaction is
rolled back. The second transaction must be aborted too in this case
because it implicitly read the DELETE statement prepared but not
confirmed by the first transaction, but it is successfully committed,
resulting in the unique constraint violation!
When a transaction is rolled back due to a WAL write error, we do
abort all transactions that read statements prepared by it, see
`vy_tx_rollback_after_prepare()`, but to do that, we iterate over
statements that were inserted into the transaction write set.
The problem is, we don't insert deferred DELETE statements generated
when a transaction is prepared in `vy_tx_handle_deferred_delete()` into
the write set because we don't need them for lookups at that point.
As a result, we don't abort transactions that read deferred DELETE
statements from secondary indexes.
Let's fix this issue by aborting all dependent transactions for each
statement inserted/deleted from the LSM tree. We already loop over
all transaction statements in `vy_tx_rollback_after_prepare()` and
`vy_tx_prepare` so there's no need to iterate over the write set entries
separately.
There's another related issue that may result in the same anomaly.
The problem is that we don't track the non-matching primary index
statement in the transaction write set when we look up the full tuple
by a key found in a secondary index, see `vy_get_by_secondary_tuple()`.
Normally, this is fine because we don't care if the primary index
statement is overwritten - we never returned it to the user. However,
if the statement hasn't been confirmed, it may still be rolled back
due to a WAL error reinstantiating the tuple matching the secondary
index key and making the lookup invalid. Note that this may happen
only if generation of the deferred DELETE is postponed to compaction.
Let's fix this issue by tracking unconfirmed primary index statements
in `vy_get_by_secondary_tuple()` even if they don't match the secondary
index key.
Closes tarantool#11969
NO_DOC=bug fix
b55bd8c to
3904a04
Compare
lenkis
approved these changes
Oct 29, 2025
Member
Author
|
Cherry-picked to 3.2, 3.3, 3.4, 3.5. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With deferred DELETEs enabled,
space.delete()writes the DELETE statement to the primary index write set only while statements for secondary indexes are generated either when the transaction is prepared (if the deleted tuple is found in the memory layer or cache) or on compaction.Now imagine the following scenario:
{1,1}and{2,2}in the memory layer of a space with enabled deferred DELETEs, the primary index over field 1, and a unique secondary index over field 2.{2,2}by the primary key and blocks on WAL. When the statement is executed, the DELETE statement is written only to the primary index write set because of enabled deferred DELETEs, but when the transaction is prepared (vy_tx_prepare()), the corresponding statement for the secondary index is generated because the deleted tuple is found in the in-memory layer, seevy_tx_handle_deferred_delete().{1,1}with{1,2}and yields execution. It doesn't violate the unique constraint of the secondary index because{2,2}was deleted although not yet confirmed.When a transaction is rolled back due to a WAL write error, we do abort all transactions that read statements prepared by it, see
vy_tx_rollback_after_prepare(), but to do that, we iterate over statements that were inserted into the transaction write set. The problem is, we don't insert deferred DELETE statements generated when a transaction is prepared invy_tx_handle_deferred_delete()into the write set because we don't need them for lookups at that point. As a result, we don't abort transactions that read deferred DELETE statements from secondary indexes.Let's fix this issue by aborting all dependent transactions for each statement inserted/deleted from the LSM tree. We already loop over all transaction statements in
vy_tx_rollback_after_prepare()andvy_tx_prepareso there's no need to iterate over the write set entries separately.There's another related issue that may result in the same anomaly. The problem is that we don't track the non-matching primary index statement in the transaction write set when we look up the full tuple by a key found in a secondary index, see
vy_get_by_secondary_tuple(). Normally, this is fine because we don't care if the primary index statement is overwritten - we never returned it to the user. However, if the statement hasn't been confirmed, it may still be rolled back due to a WAL error reinstantiating the tuple matching the secondary index key and making the lookup invalid. Note that this may happen only if generation of the deferred DELETE is postponed to compaction. Let's fix this issue by tracking unconfirmed primary index statements invy_get_by_secondary_tuple()even if they don't match the secondary index key.Closes #11969