Skip to content

box: handle consequent DDL operations#10559

Merged
locker merged 2 commits intotarantool:masterfrom
drewdzzz:ddl_crashes
Oct 18, 2024
Merged

box: handle consequent DDL operations#10559
locker merged 2 commits intotarantool:masterfrom
drewdzzz:ddl_crashes

Conversation

@drewdzzz
Copy link
Contributor

@drewdzzz drewdzzz commented Sep 11, 2024

The patch fixes a crash with consequent DDL operations when MVCC is disabled and an unexplainable error when MVCC is enabled.

Closes #10235
Closes #10262
Closes tarantool/security#131

@coveralls
Copy link

coveralls commented Sep 11, 2024

Coverage Status

coverage: 87.331% (+0.008%) from 87.323%
when pulling 2e7fe5c on drewdzzz:ddl_crashes
into bf09135
on tarantool:master
.

@drewdzzz drewdzzz changed the title Fix a crash in DDL without MVCC box: handle consequent DDL operations Sep 12, 2024
@drewdzzz drewdzzz marked this pull request as ready for review September 13, 2024 10:08
@drewdzzz drewdzzz requested review from a team and ligurio as code owners September 13, 2024 10:08
@locker locker assigned drewdzzz and unassigned locker Sep 13, 2024
@drewdzzz drewdzzz marked this pull request as draft September 17, 2024 14:00
@drewdzzz drewdzzz marked this pull request as ready for review September 19, 2024 13:17
@drewdzzz drewdzzz requested a review from locker September 19, 2024 13:26
@drewdzzz drewdzzz assigned locker and sergepetrenko and unassigned drewdzzz Sep 19, 2024
@locker locker assigned drewdzzz and unassigned locker Sep 24, 2024
@drewdzzz drewdzzz requested a review from locker September 24, 2024 15:12
@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Sep 24, 2024
@locker locker assigned drewdzzz and unassigned locker Sep 25, 2024
@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Oct 2, 2024
@drewdzzz drewdzzz requested review from Serpentian and removed request for sergepetrenko October 16, 2024 10:42
@locker locker assigned drewdzzz and unassigned locker Oct 16, 2024
@drewdzzz drewdzzz added asan-ci Enables asan build tests for a pull request and removed asan-ci Enables asan build tests for a pull request labels Oct 17, 2024
@drewdzzz drewdzzz requested a review from locker October 17, 2024 09:43
@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Oct 17, 2024
@locker locker assigned drewdzzz and unassigned locker Oct 17, 2024
Yielding DDL operations acquire DDL lock so that the space cannot be
modified under its feet. However, there is a case when it actually can:
if a yielding DDL has started when there is another DDL is being
committed and it gets rolled back due to WAL error, `struct space`
created by rolled back DDL is deleted - and it's the space being altered
by the yielding DDL. In order to fix this problem, let's simply wait for
all previous alters to be committed.

We could use `wal_sync` to wait for all previous transactions to be
committed, but it is more complicated - we need to use `wal_sync` for
single instance and `txn_limbo_wait_last_txn` when the limbo queue has
an owner. Such approach has more pitfalls and requires more tests to
cover all cases. When relying on `struct alter_space` directly, all
situations are handled with the same logic.

Alternative solutions that we have tried:
1. Throw an error in the case when user tries to alter space when
   there is another non-committed alter. Such approach breaks applier
   since it applies rows asynchronously. Trying applier to execute
   operations synchronously breaks it even harder.
2. Do not use space in `build_index` and `check_format` methods. In this
   case, there is another problem: rollback order. We have to rollback
   previous alters firstly, and the in-progress one can be rolled back
   only after it's over. It breaks fundamental memtx invariant: rollback
   order must be reverse of replace order. We could try to use
   `before_replace` triggers for alter, but the patch would be bulky.

Closes #10235

NO_DOC=bugfix
Since we often search spaces, users, funcs and so on in internal caches
that have `read-committed` isolation level (prepared tuples are seen),
let's always allow to read prepared tuples of system spaces.

Another advantage of such approach is that we never handle MVCC when
working with system spaces, so after the commit they will behave in the
same way - prepared tuples will be seen. The only difference is that
readers of prepared rows will be aborted if the row will be rolled back.

By the way, the inconsistency between internal caches and system spaces
could lead to crash in some sophisticated scenarios - the commit fixes
this problem as well because now system spaces and internal caches are
synchronized.

Closes #10262
Closes tarantool/security#131

NO_DOC=bugfix
@drewdzzz drewdzzz removed their assignment Oct 17, 2024
Copy link
Contributor

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! No objections

@Serpentian Serpentian assigned locker and unassigned Serpentian Oct 17, 2024
@locker locker assigned drewdzzz and unassigned locker Oct 17, 2024
@drewdzzz drewdzzz added full-ci Enables all tests for a pull request and removed do not merge Not ready to be merged labels Oct 17, 2024
@drewdzzz drewdzzz assigned locker and unassigned drewdzzz Oct 17, 2024
@locker locker merged commit b33f17b into tarantool:master Oct 18, 2024
@locker
Copy link
Member

locker commented Oct 18, 2024

Cherry-picked to 2.11 and 3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

builtin/box/schema.lua:1814: attempt to index local 'tuple' (a nil value) Segmentation fault in space_fill_index_map()

7 participants