Skip to content

DELETE statement may be lost on compaction because of deferred DELETE #10895

@locker

Description

@locker

Bug description

Under certain circumstances a DELETE statement may be erroneously optimized out on compaction. The issue can't result in a crash or an invalid query result; the only consequence would be a garbage statement not being collected from a run file in time (it can still be collected if the deleted tuple is overwritten again though).

Tarantool version:

Tarantool 3.4.0-entrypoint-5-gb6d4983311f7
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/home/vlad/src/tarantool/tarantool/build/debug/install -DENABLE_BACKTRACE=TRUE
Compiler: GNU-13.2.0
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/vlad/src/tarantool/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror -g -ggdb -O0
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/home/vlad/src/tarantool/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror -g -ggdb -O0

Steps to reproduce

Run the following script with a debug binary:

os.execute('rm -rf [0-9]*')

local fiber = require('fiber')

box.cfg{log_level = 'warn'}

local s = box.schema.space.create('test', {
    engine = 'vinyl',
    defer_deletes = true,
})
s:create_index('primary')
s:create_index('secondary', {
    unique = false,
    parts = {2, 'unsigned'},
})

-- Temporarily block compaction.
box.error.injection.set('ERRINJ_VY_COMPACTION_DELAY', true)

-- Create primary and secondary index run files with INSERT{1, 10}.
s:insert({1, 10})
box.snapshot()

-- Create the primary index run file with DELETE{1}.
--
-- Generation of DELETE{1, 10} for the secondary index is deferred
-- until the primary index compaction.
s:delete({1})
box.snapshot()

-- Trigger primary index compaction (it's still blocked though).
s.index.primary:compact()

-- Create primary and secondary index run files with INSERT{1, 10}.
s:upsert({1, 10}, {})
box.snapshot()

-- Create a read view referring to the last INSERT{1, 10}.
local f = fiber.create(function()
    box.begin()
    s:select()
    fiber.sleep(9000)
end)
fiber.sleep(0.1)

-- Unblock compaction and wait for it to complete.
--
-- Compaction of the primary index generates DELETE{1, 10} for
-- the older INSERT{1, 10} stored in the secondary index.
box.error.injection.set('ERRINJ_VY_COMPACTION_DELAY', false)
while box.stat.vinyl().scheduler.tasks_inprogress > 0 do
    fiber.sleep(0.1)
end

-- Write DELETE{1, 10} + INSERT{1, 20} to the secondary index.
--
-- Now, the memory level of the secondary index contains two DELETE{1, 10}
-- statements: the first one is generated by the primary index compaction for
-- the older INSERT; the second one is generated by the UPSERT for the newer
-- INSERT. The newer DELETE should overwrite the older one, but due to a bug
-- in the Vinyl write iterator, the older DELETE overwrites the newer one,
-- as a result the newer INSERT is never purged.
s:upsert({1, 20}, {{'=', 2, 20}})
box.snapshot()

f:cancel()

-- Delete INSERT{1, 20}, trigger primary index compaction to generate DELETE
-- for the secondary index, then dump and compaction of the secondary index.
s:delete({1})
box.snapshot()
s.index.primary:compact()
while box.stat.vinyl().scheduler.tasks_inprogress > 0 do
    fiber.sleep(0.1)
end

box.snapshot()
s.index.secondary:compact()
while box.stat.vinyl().scheduler.tasks_inprogress > 0 do
    fiber.sleep(0.1)
end

-- No statements should be left in either of the indexes, but due to
-- the aforementioned bug, the secondary index run file still contains
-- the newer INSERT{1, 10}.
print('primary index rows:', s.index.primary:stat().rows)
print('secondary index rows:', s.index.secondary:stat().rows)

os.exit(0)

Actual behavior

primary index rows:     0
secondary index rows:   1

Expected behavior

primary index rows:     0
secondary index rows:   0

Notes

Here's the content of the secondary index run file left after the test:

$ tt cat 512/1/00000000000000000030.run
   • Running cat with files: [512/1/00000000000000000030.run]

• Result of cat: the file "512/1/00000000000000000030.run" is processed below •
---
HEADER:
  lsn: 6
  type: REPLACE
BODY:
  tuple: [10, 1]
---
HEADER:
  type: ROWINDEX
BODY:
  data: !!binary AAAAAA==
...

Here's the optimization in the Vinyl write iterator that causes the bug:

/*
* Optimization 4: discard a DELETE statement referenced
* by a read view if it is preceded by another DELETE for
* the same key.
*/
if (prev.stmt != NULL &&
vy_stmt_type(prev.stmt) == IPROTO_DELETE &&
vy_stmt_type(h->entry.stmt) == IPROTO_DELETE) {
vy_write_history_destroy(h);
rv->history = NULL;
return 0;
}

Metadata

Metadata

Assignees

Labels

2.11Target is 2.11 and all newer release/master branches3.2Target is 3.2 and all newer release/master branchesbugSomething isn't workingvinyl

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions