-
Notifications
You must be signed in to change notification settings - Fork 403
Description
Bug description
MVCC read inconsistency: rolled-back prepared transaction creates read gap, but dependent transaction commits successfully.
Imagine a transaction TXN0 successfully executed insert{1, 1} and committed. Now consider two concurrent transactions, TXN1 (in f1 fiber) and TXN2 (in f2 fiber).
TXN1 executes replace{1, 1} and becomes prepared but does not commit (it gets stuck during the WAL write operation). Subsequently, transaction TXN2 performs a get{1} on a secondary index and receives nil (a "gap" in memtx MVCC terms).
Then, TXN1 rolls back (for example, due to a WAL I/O error). After this, TXN2 should obviously also roll back, but instead, it commits successfully.
The replace{1, 3} operation in TXN2 is not essential; it is included merely for clarity to demonstrate that TXN2 comes after the transaction TXN0 (which performed the insert{1, 1}) in the serialization order.
The txn_isolation = 'read-committed' setting is crucial here. It is what forces TXN2 to read nil instead of {1, 1}. An alternative setup would be txn_isolation = 'best-effort', but in that case, any read statement must be added before the get{1}.
Steps to reproduce
Run the following reproducer.lua script:
fiber = require('fiber')
box.cfg{
memtx_use_mvcc_engine=true,
txn_isolation = 'read-committed',
}
box.schema.space.create("test")
box.space.test:format{{'a', type='unsigned'}, {'b', type='unsigned'}}
box.space.test:create_index("pk", {parts={{'a'}}})
box.space.test:create_index("sk", {parts={{'b'}}, unique=true})
box.space.test:truncate()
box.space.test:insert{1, 1}
-- block WAL queue
box.cfg{wal_queue_max_size=1}
box.error.injection.set('ERRINJ_WAL_DELAY', true)
box.begin()
box.space.test:insert{10000, 10000}
box.commit({wait='none'})
f1 = fiber.create(function()
box.space.test:replace{1, 2}
end)
f1:set_joinable(true)
local cond = fiber.cond()
f2 = fiber.create(function()
box.begin()
local res = box.space.test.index.sk:get{1}
print(res)
box.space.test:replace{1, 3}
cond:wait()
box.commit()
end)
f2:set_joinable(true)
box.error.injection.set('ERRINJ_WAL_IO', true)
box.error.injection.set('ERRINJ_WAL_DELAY', false)
local ok, err = f1:join()
print(ok, err)
box.error.injection.set('ERRINJ_WAL_IO', false)
cond:signal()
ok, err = f2:join()
print(ok, err)
print(require('yaml').encode(box.space.test:select{}))
box.space.test:drop()
os.exit(0)How to run:
$ tarantool -i reproducer.luaActual output:
nil
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_IO = true
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_DELAY = false
2025-08-28 12:30:21.189 [1266505] main/117/lua wal.c:1368 E> Failed to write to disk {"type":"ClientError","code":40,"name":"WAL_IO","trace":[{"file":"./src/box/wal.c","line":1368}]}
false Failed to write to disk
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_IO = false
true nil
---
- [1, 3]
- [10000, 10000]
...
Actual behavior
The second transaction (f2 fiber) was not rolled back, resulting in a non-serializable execution.
Expected behavior
The second transaction is rolled back.