Skip to content

mvcc: rollback may cause dirty gap read #11802

@Astronomax

Description

@Astronomax

Bug description

MVCC read inconsistency: rolled-back prepared transaction creates read gap, but dependent transaction commits successfully.

Imagine a transaction TXN0 successfully executed insert{1, 1} and committed. Now consider two concurrent transactions, TXN1 (in f1 fiber) and TXN2 (in f2 fiber).

TXN1 executes replace{1, 1} and becomes prepared but does not commit (it gets stuck during the WAL write operation). Subsequently, transaction TXN2 performs a get{1} on a secondary index and receives nil (a "gap" in memtx MVCC terms).

Then, TXN1 rolls back (for example, due to a WAL I/O error). After this, TXN2 should obviously also roll back, but instead, it commits successfully.

The replace{1, 3} operation in TXN2 is not essential; it is included merely for clarity to demonstrate that TXN2 comes after the transaction TXN0 (which performed the insert{1, 1}) in the serialization order.

The txn_isolation = 'read-committed' setting is crucial here. It is what forces TXN2 to read nil instead of {1, 1}. An alternative setup would be txn_isolation = 'best-effort', but in that case, any read statement must be added before the get{1}.

Steps to reproduce

Run the following reproducer.lua script:

fiber = require('fiber')

box.cfg{
	memtx_use_mvcc_engine=true,
	txn_isolation = 'read-committed',
}

box.schema.space.create("test")
box.space.test:format{{'a', type='unsigned'}, {'b', type='unsigned'}}
box.space.test:create_index("pk", {parts={{'a'}}})
box.space.test:create_index("sk", {parts={{'b'}}, unique=true})

box.space.test:truncate()

box.space.test:insert{1, 1}

-- block WAL queue
box.cfg{wal_queue_max_size=1}
box.error.injection.set('ERRINJ_WAL_DELAY', true)
box.begin()
	box.space.test:insert{10000, 10000}
box.commit({wait='none'})

f1 = fiber.create(function()
	box.space.test:replace{1, 2}
end)
f1:set_joinable(true)

local cond = fiber.cond()
f2 = fiber.create(function()
	box.begin()
		local res = box.space.test.index.sk:get{1}
		print(res)
		box.space.test:replace{1, 3}
		cond:wait()
	box.commit()
end)
f2:set_joinable(true)

box.error.injection.set('ERRINJ_WAL_IO', true)
box.error.injection.set('ERRINJ_WAL_DELAY', false)

local ok, err = f1:join()
print(ok, err)

box.error.injection.set('ERRINJ_WAL_IO', false)

cond:signal()
ok, err = f2:join()
print(ok, err)

print(require('yaml').encode(box.space.test:select{}))

box.space.test:drop()
os.exit(0)

How to run:

$ tarantool -i reproducer.lua

Actual output:

nil
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_IO = true
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_DELAY = false
2025-08-28 12:30:21.189 [1266505] main/117/lua wal.c:1368 E> Failed to write to disk {"type":"ClientError","code":40,"name":"WAL_IO","trace":[{"file":"./src/box/wal.c","line":1368}]}
false	Failed to write to disk
2025-08-28 12:30:21.189 [1266505] main/104/init.lua error.cc:389 I> ERRINJ_WAL_IO = false
true	nil
---
- [1, 3]
- [10000, 10000]
...

Actual behavior

The second transaction (f2 fiber) was not rolled back, resulting in a non-serializable execution.

Expected behavior

The second transaction is rolled back.

Metadata

Metadata

Assignees

Labels

3.2Target is 3.2 and all newer release/master branchesbugSomething isn't workingmvcc

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions