Skip to content

Alter of PK while having non-unique indexes doesn't work properly #10951

@drewdzzz

Description

@drewdzzz

Sometimes we can do several yielding DDL operations for one alter statement. For example, if we alter primary index while having non-unique secondary indexes, we have to rebuild the primary and all non-unique secondary indexes, one after another. The problem is index build process handles concurrent writes only when this particular index is being build, so when the first index is built, it doesn't handle concurrent writes while other ones are being built.

Example: start NEW_PK build -> insert{1, 1} -> NEW_PK is built -> start NEW_SK build -> s:delete{1, 1} -> NEW_SK is built -> alter is over, the new indexes are visible to user. In this scenario, we handled concurrent [1, 1] write and inserted it to NEW_PK. However, after it was built and we started another yielding DDL, we stopped translating concurrent writes to the NEW_PK. So, after the build, NEW_PK will contain tuple [1, 1] despite it was deleted.

The problem can even lead to a crash since the deleted tuple will be unreferenced.

Debug reproducer (assertion failure)
-- Cleanup directory
os.execute('rm 000*')

local fiber = require('fiber')
local log = require('log')
local UNIQUE_SECONDARY = false

box.cfg{}

-- Create space and populate it with data
local s = box.schema.space.create('test')
s:create_index('pk')
s:create_index('sk1', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk2', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk3', {parts = {2}, unique = UNIQUE_SECONDARY})
box.begin()
-- Quite large space so that index build will take some time
for i = 1, 1e4 do
    s:replace{i, i}
end
box.commit()

local ddl = fiber.create(function()
    s.index.pk:alter({parts = {2}})
end)
ddl:set_joinable(true)

for _ = 1, 1e4 do
    local v = math.random(1, 1e4)
    s:delete{v}
    v = math.random(1e5, 1e6)
    s:replace{v, v}
    fiber.yield()
    collectgarbage('collect')
end

log.info({ddl:join()})
Assertion failed: (has_optional_parts || (field_a != NULL && field_b != NULL)), function tuple_compare_slowpath, file tuple_compare.cc, line 751.
Release reproducer (invalid indexes)
-- Cleanup directory
os.execute('rm 000*')

local fiber = require('fiber')
local log = require('log')
local UNIQUE_SECONDARY = false
fiber.set_max_slice(30)

box.cfg{}

local s = box.schema.space.create('test')
s:create_index('pk')
s:create_index('sk1', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk2', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk3', {parts = {2}, unique = UNIQUE_SECONDARY})
box.begin()
for i = 1, 1e6 do
    s:replace{i, i}
end
box.commit()

local ddl = fiber.create(function()
    s.index.pk:alter({parts = {2}})
    log.info("PK is altered!")
end)
ddl:set_joinable(true)

for i = 1, 4e3 do
    box.begin()
    s:delete{i}
    box.commit()
    collectgarbage('collect')
end
log.info("Replaces are done!")

log.info({ddl:join()})

log.info(s.index.pk:select(nil, {limit = 10}))
log.info(s.index.sk1:select(nil, {limit = 10}))
log.info(s.index.sk2:select(nil, {limit = 10}))
log.info(s.index.sk3:select(nil, {limit = 10}))

Despite all the indexes are built on the same column, they show different tuples.

I> [[1001,1001],[1002,1002],[1003,1003],[1004,1004],[1005,1005],[1006,1006],[1007,1007],[1008,1008],[1009,1009],[1010,1010]]
I> [[2000,2000],[2001,2001],[2002,2002],[2003,2003],[2004,2004],[2005,2005],[2006,2006],[2007,2007],[2008,2008],[2009,2009]]
I> [[2998,2998],[2999,2999],[3000,3000],[3001,3001],[3002,3002],[3003,3003],[3004,3004],[3005,3005],[3006,3006],[3007,3007]]
I> [[4001,4001],[4002,4002],[4003,4003],[4004,4004],[4005,4005],[4006,4006],[4007,4007],[4008,4008],[4009,4009],[4010,4010]]

Was found during investigation of TNT-1247.

Metadata

Metadata

Assignees

Labels

3.2Target is 3.2 and all newer release/master branchesbugSomething isn't workingcrash

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions