Skip to content

box: handle cancelling fiber waiting in wal queue#11273

Merged
sergepetrenko merged 1 commit intotarantool:masterfrom
nshy:gh-11078-fix-wal-queue-fiber-cancel
Mar 21, 2025
Merged

box: handle cancelling fiber waiting in wal queue#11273
sergepetrenko merged 1 commit intotarantool:masterfrom
nshy:gh-11078-fix-wal-queue-fiber-cancel

Conversation

@nshy
Copy link
Contributor

@nshy nshy commented Mar 19, 2025

Currently if fiber waiting in wal queue is cancelled it will go on writing to all breaking queue order. So wal is corrupted so that even next Tarantool restart may fail. Unfortunately we cancel fibers on Tarantool shutdown so the issue may arise just buy terminating Tarantool instance.

Let's instead fail the cancelled request and all the newer request in the queue.

Work around is to disable wal queue by box.cfg{wal_queue_max_size = 0} so that only one write request can be in the queue.

Closes #11078

@nshy nshy requested a review from a team as a code owner March 19, 2025 11:54
@coveralls
Copy link

coveralls commented Mar 19, 2025

Coverage Status

coverage: 87.474% (-0.002%) from 87.476%
when pulling 6dbbd02 on nshy:gh-11078-fix-wal-queue-fiber-cancel
into 413e4e2
on tarantool:master
.

@nshy nshy requested a review from locker March 19, 2025 12:28
@locker locker requested a review from sergepetrenko March 19, 2025 13:39
@locker locker assigned nshy and unassigned locker Mar 19, 2025
@sergepetrenko sergepetrenko self-assigned this Mar 19, 2025
@nshy nshy assigned locker and sergepetrenko and unassigned nshy and sergepetrenko Mar 20, 2025
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

Just one question.

@locker locker assigned nshy and unassigned locker and sergepetrenko Mar 20, 2025
@nshy nshy assigned locker and sergepetrenko and unassigned nshy Mar 20, 2025
@sergepetrenko sergepetrenko removed their assignment Mar 20, 2025
@locker locker assigned nshy and unassigned locker Mar 20, 2025
@nshy
Copy link
Contributor Author

nshy commented Mar 20, 2025

Fixed a few nits.

diff
diff --git a/src/box/journal.c b/src/box/journal.c
index b1064c9c93..fcfb492117 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -123,7 +123,7 @@ journal_queue_wait(struct journal_entry *entry)
                struct stailq rollback;
                stailq_cut_tail(&journal_queue.requests, &prev_entry->fifo,
                                &rollback);
-               /* Pop this entry. */
+               /* Pop this request. */
                VERIFY(stailq_shift_entry(&rollback,
                                          typeof(*entry), fifo) == entry);
                stailq_reverse(&rollback);
diff --git a/src/box/journal.h b/src/box/journal.h
index d77434266d..d70b861151 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -202,7 +202,7 @@ journal_queue_is_full(void)
 int
 journal_queue_wait(struct journal_entry *entry);

-/** Empty the queue by waking everyone in it up and put self to queue tail. */
+/** Flush journal queue. Next wal_sync() will sync flushed requests. */
 void
 journal_queue_flush(void);

diff --git a/test/box-luatest/gh_11078_wal_queue_fiber_cancel_test.lua b/test/box-lua>
index 85b1225f06..12daacc644 100644
--- a/test/box-luatest/gh_11078_wal_queue_fiber_cancel_test.lua
+++ b/test/box-luatest/gh_11078_wal_queue_fiber_cancel_test.lua
@@ -18,8 +18,8 @@ g.after_all(function(cg)
 end)

 g.after_each(function(cg)
-    box.error.injection.set('ERRINJ_WAL_DELAY', false)
     cg.server:exec(function()
+        box.error.injection.set('ERRINJ_WAL_DELAY', false)
         box.space.test:drop()
     end)
 end)

@nshy nshy added backport/2.11 Automatically create a 2.11 backport PR backport/3.2 Automatically create a 3.2 backport PR backport/3.3 Automatically create a 3.3 backport PR labels Mar 20, 2025
@nshy nshy added the full-ci Enables all tests for a pull request label Mar 20, 2025
Currently if fiber waiting in WAL queue is cancelled it will go on
writing request to WAL breaking queue order. So WAL is corrupted so that
even next Tarantool restart may fail. Unfortunately we cancel fibers on
Tarantool shutdown so the issue may arise just buy terminating Tarantool
instance.

Let's instead fail the cancelled request and all the newer request in
the queue.

Work around is to disable WAL queue by `box.cfg{wal_queue_max_size = 0}`
so that only one write request can be in the queue.

Closes tarantool#11078

NO_DOC=bugfix
@nshy nshy force-pushed the gh-11078-fix-wal-queue-fiber-cancel branch from 195565b to 6dbbd02 Compare March 20, 2025 15:26
@nshy nshy assigned sergepetrenko and unassigned nshy Mar 21, 2025
@sergepetrenko sergepetrenko merged commit def4456 into tarantool:master Mar 21, 2025
44 checks passed
@TarantoolBot
Copy link
Collaborator

Backport failed for release/2.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release/2.11
git worktree add -d .worktree/backport/release/2.11/11273 origin/release/2.11
cd .worktree/backport/release/2.11/11273
git switch --create backport/release/2.11/11273
git cherry-pick -x def445684df0d4b847292767b92c02adba51074b

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.2:

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.3:

@TarantoolBot
Copy link
Collaborator

Backport summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/2.11 Automatically create a 2.11 backport PR backport/3.2 Automatically create a 3.2 backport PR backport/3.3 Automatically create a 3.3 backport PR full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WAL rows order is messed up if fiber waiting for journal size threshold is cancelled

6 participants