-
Notifications
You must be signed in to change notification settings - Fork 403
Description
Description
When rollback happens, all the transactions must be rolled back in reverse order of their preparation. And WAL handles it - it sends to TX its queue in reversed order on rollback. However, a problem happens when WAL queue is full and there are some transactions waiting for a space to join the queue - these transactions must be rolled back before ones from WAL queue (and strictly in reversed order), but they are not.
That's a very serious problem that has caused several fuzzing crashes.
The issue definitely causes #10802 and #10082 (the second one can also be triggered by other problems, though).
Potential cause of https://github.com/tarantool/tarantool-ee/issues/999 and #10283.
Reproducer
local fiber = require('fiber')
local os = require('os')
-- Cleanup
os.execute('rm 000*')
-- Start Tarantool with limited WAL queue.
box.cfg{wal_queue_max_size = 1024}
s = box.schema.space.create('test')
s:create_index('pk')
-- Set WAL delay and do a bunch of transactions replacing the same key.
-- Note that some transactions won't enqueue WAL because of the size limitation.
box.error.injection.set('ERRINJ_WAL_DELAY', true)
for i = 1, 1500 do
fiber.create(function()
s:replace{1, i}
-- s:truncate()
end)
end
-- Inject an error and wait for crash.
box.error.injection.set('ERRINJ_WAL_DELAY', false)
box.error.injection.set('ERRINJ_WAL_WRITE', true)
fiber.sleep(1)Output:
index.cc:232 E> ER_TUPLE_FOUND: Duplicate key exists in unique index "pk" in space "test" with old tuple - [1, 1500] and new tuple - [1, 16]
Assertion failed: (0), function memtx_engine_rollback_statement, file memtx_engine.cc, line 696.
If you comment out s:replace{1} and uncomment s:truncate(), you will face another crash:
Assertion failed: (old_space_by_id == old_space), function space_cache_replace, file space_cache.c, line 149.