Skip to content

Tuple missing in secondary index after WAL rollback with enabled deferred DELETEs #11140

@locker

Description

@locker

Bug description

If a statement deleting a tuple from a space with a non-unique secondary index and enabled deferred DELETE is rolled back due to a WAL error, and another fiber manages to read a key interval spanning the deleted key from the secondary index while the statement is unconfirmed, the tuple won't be reachable via the secondary index after the rollback.

Tarantool version:

Tarantool 3.4.0-entrypoint-119-gb9c5a1d96410
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/home/vlad/src/tarantool/tarantool/build/debug/install -DENABLE_BACKTRACE=TRUE
Compiler: GNU-13.2.0
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/vlad/src/tarantool/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror -g -ggdb -O0
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/vlad/src/tarantool/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror -g -ggdb -O0

Steps to reproduce

Run the following script with a debug Tarantool build (needed for error injections to work):

local fiber = require('fiber')
local json = require('json')

os.execute('rm -rf [0-9]* tarantool.log')
box.cfg{
    log = 'tarantool.log',
    vinyl_defer_deletes = true,
}

local s = box.schema.space.create('test', {engine = 'vinyl'})
s:create_index('pk')
s:create_index('sk', {unique = false, parts = {2, 'unsigned'}})

s:insert{1, 10}
s:insert{2, 20}
s:insert{3, 30}
box.snapshot()

box.error.injection.set('ERRINJ_WAL_DELAY', true)

local f = fiber.new(s.delete, s, {2})
f:set_joinable(true)
fiber.yield()

box.begin{txn_isolation = 'read-committed'}
s.index.sk:select()
box.commit()

box.error.injection.set('ERRINJ_WAL_WRITE', true)
box.error.injection.set('ERRINJ_WAL_DELAY', false)
f:join()

box.begin{txn_isolation = 'read-committed'}
print('primary index   :', json.encode(s.index.pk:select()))
print('secondary index :', json.encode(s.index.sk:select()))
box.commit()

os.exit(0)

Actual behavior

The script prints:

primary index   :       [[1,10],[2,20],[3,30]]
secondary index :       [[1,10],[3,30]]

Expected behavior

The script is expected to print:

primary index   :       [[1,10],[2,20],[3,30]]
secondary index :       [[1,10],[2,20],[3,30]]

Notes

Turning off the tuple cache eliminates the issue:

--- test.lua	2025-02-12 18:53:39.514804925 +0300
+++ test-no-tuple-cache.lua	2025-02-12 18:54:30.835693238 +0300
@@ -4,6 +4,7 @@
 os.execute('rm -rf [0-9]* tarantool.log')
 box.cfg{
     log = 'tarantool.log',
+    vinyl_cache = 0,
     vinyl_defer_deletes = true,
 }
 

Disabling deferred DELETEs helps as well:

--- test.lua	2025-02-12 18:53:39.514804925 +0300
+++ test-no-deferred-deletes.lua	2025-02-12 18:54:13.535393154 +0300
@@ -4,7 +4,6 @@
 os.execute('rm -rf [0-9]* tarantool.log')
 box.cfg{
     log = 'tarantool.log',
-    vinyl_defer_deletes = true,
 }
 
 local s = box.schema.space.create('test', {engine = 'vinyl'})

The cause of the issue is similar to #10879: the tuple cache of a secondary index isn't invalidated on WAL rollback. The problem is that with deferred DELETEs enabled, nothing is written to the secondary index on commit hence there's nothing we can use to invalidate the cache on WAL rollback.

Metadata

Metadata

Assignees

Labels

2.11Target is 2.11 and all newer release/master branches3.2Target is 3.2 and all newer release/master branches3.3Target is 3.3 and all newer release/master branchesbugSomething isn't workingvinyl

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions