Skip to content

vinyl: fix abort of writers on DDL#10916

Merged
locker merged 1 commit intotarantool:masterfrom
locker:vy-abort-writers-on-ddl-fix
Dec 12, 2024
Merged

vinyl: fix abort of writers on DDL#10916
locker merged 1 commit intotarantool:masterfrom
locker:vy-abort-writers-on-ddl-fix

Conversation

@locker
Copy link
Member

@locker locker commented Dec 10, 2024

When a DDL statement is committed, all transactions writing to the space must be aborted so that they don't access the stale txn_stmt::space pointer on commit. In Vinyl we abort all writers from the special space_invalidate engine callback, where we iterate over all writing transactions and abort those of them that have a write set entry pointing to the primary key of the invalidated space, see commit d3e1236 ("vinyl: abort affected transactions when space is removed from cache"). We also abort transactions that haven't yet created a write set entry because they need to read the disk first, for example, to perform a tuple update, see commit 4f62ec2 ("vinyl: make tx_manager_abort_writers_for_ddl more thorough").

However, as spotted by @drewdzzz, this code doesn't always work as expected. The problem is, unless dropped, the primary index is moved to the new space container so using the primary index stored in the old space container to abort transactions is wrong: there can't possibly be any transactions associated with it. We don't get a crash because we have an extra check in vy_lsm_set that aborts the transaction if the space format was updated (it is updated on every space change), see commit e3c5fde ("vinyl: fix DDL with active transactions").

Another problem with space_invalidate is that it doesn't abort transactions that turned out to be no-op, such as an update of a non-existing key because such transactions don't add any entries to the write set. Still, they must be aborted because they have a txn_stmt pointing to the old space. Letting them proceed to commit would result in a use-after-free bug, see #10707.

Let's fix space_invalidate by making it iterate over all statements of writing transactions instead of checking their Vinyl write sets. To achieve that, we store a pointer to struct txn in struct vy_tx. This lets us drop vy_tx::last_stmt_space and vy_tx::isolation because they can be accessed via txn. We also replace the format check in vy_lsm_set mentioned above with an assertion because such transactions should be aborted by space_invalidate now.

Closes #10707

When a DDL statement is committed, all transactions writing to the space
must be aborted so that they don't access the stale `txn_stmt::space`
pointer on commit. In Vinyl we abort all writers from the special
`space_invalidate` engine callback, where we iterate over all writing
transactions and abort those of them that have a write set entry
pointing to the primary key of the invalidated space, see commit
d3e1236 ("vinyl: abort affected transactions when space is removed
from cache"). We also abort transactions that haven't yet created
a write set entry because they need to read the disk first, for example,
to perform a tuple update, see commit 4f62ec2 ("vinyl: make
tx_manager_abort_writers_for_ddl more thorough").

However, as spotted by @drewdzzz, this code doesn't always work as
expected. The problem is, unless dropped, the primary index is moved to
the new space container so using the primary index stored in the old
space container to abort transactions is wrong: there can't possibly be
any transactions associated with it. We don't get a crash because we
have an extra check in `vy_lsm_set` that aborts the transaction if
the space format was updated (it is updated on every space change),
see commit e3c5fde ("vinyl: fix DDL with active transactions").

Another problem with `space_invalidate` is that it doesn't abort
transactions that turned out to be no-op, such as an update of
a non-existing key because such transactions don't add any entries
to the write set. Still, they must be aborted because they have
a `txn_stmt` pointing to the old space. Letting them proceed to commit
would result in a use-after-free bug, see tarantool#10707.

Let's fix `space_invalidate` by making it iterate over all statements
of writing transactions instead of checking their Vinyl write sets.
To achieve that, we store a pointer to `struct txn` in `struct vy_tx`.
This lets us drop `vy_tx::last_stmt_space` and `vy_tx::isolation`
because they can be accessed via `txn`. We also replace the format
check in `vy_lsm_set` mentioned above with an assertion because such
transactions should be aborted by `space_invalidate` now.

Closes tarantool#10707

NO_DOC=bug fix
@locker locker requested a review from a team as a code owner December 10, 2024 11:12
@locker locker requested a review from drewdzzz December 10, 2024 11:33
@coveralls
Copy link

Coverage Status

coverage: 87.349% (-0.002%) from 87.351%
when pulling ca5f165 on locker:vy-abort-writers-on-ddl-fix
into d64e29d
on tarantool:master
.

@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Dec 11, 2024
@drewdzzz drewdzzz self-requested a review December 11, 2024 09:30
Copy link
Contributor

@drewdzzz drewdzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's nice that we don't need to handle writes that haven't written something yet anymore!

@locker locker added the full-ci Enables all tests for a pull request label Dec 12, 2024
@locker locker merged commit d1fbf91 into tarantool:master Dec 12, 2024
@locker locker deleted the vy-abort-writers-on-ddl-fix branch December 12, 2024 09:42
@locker
Copy link
Member Author

locker commented Dec 12, 2024

Cherry-picked to 3.2 and 3.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AddressSanitizer: heap-use-after-free on address 0x5140002f77ec at pc 0x62b92f82fb5b bp 0x73673f27d7c0 sp 0x73673f27d7b8

4 participants