-
Notifications
You must be signed in to change notification settings - Fork 403
Description
Sometimes we can do several yielding DDL operations for one alter statement. For example, if we alter primary index while having non-unique secondary indexes, we have to rebuild the primary and all non-unique secondary indexes, one after another. The problem is index build process handles concurrent writes only when this particular index is being build, so when the first index is built, it doesn't handle concurrent writes while other ones are being built.
Example: start NEW_PK build -> insert{1, 1} -> NEW_PK is built -> start NEW_SK build -> s:delete{1, 1} -> NEW_SK is built -> alter is over, the new indexes are visible to user. In this scenario, we handled concurrent [1, 1] write and inserted it to NEW_PK. However, after it was built and we started another yielding DDL, we stopped translating concurrent writes to the NEW_PK. So, after the build, NEW_PK will contain tuple [1, 1] despite it was deleted.
The problem can even lead to a crash since the deleted tuple will be unreferenced.
Debug reproducer (assertion failure)
-- Cleanup directory
os.execute('rm 000*')
local fiber = require('fiber')
local log = require('log')
local UNIQUE_SECONDARY = false
box.cfg{}
-- Create space and populate it with data
local s = box.schema.space.create('test')
s:create_index('pk')
s:create_index('sk1', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk2', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk3', {parts = {2}, unique = UNIQUE_SECONDARY})
box.begin()
-- Quite large space so that index build will take some time
for i = 1, 1e4 do
s:replace{i, i}
end
box.commit()
local ddl = fiber.create(function()
s.index.pk:alter({parts = {2}})
end)
ddl:set_joinable(true)
for _ = 1, 1e4 do
local v = math.random(1, 1e4)
s:delete{v}
v = math.random(1e5, 1e6)
s:replace{v, v}
fiber.yield()
collectgarbage('collect')
end
log.info({ddl:join()})
Assertion failed: (has_optional_parts || (field_a != NULL && field_b != NULL)), function tuple_compare_slowpath, file tuple_compare.cc, line 751.Release reproducer (invalid indexes)
-- Cleanup directory
os.execute('rm 000*')
local fiber = require('fiber')
local log = require('log')
local UNIQUE_SECONDARY = false
fiber.set_max_slice(30)
box.cfg{}
local s = box.schema.space.create('test')
s:create_index('pk')
s:create_index('sk1', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk2', {parts = {2}, unique = UNIQUE_SECONDARY})
s:create_index('sk3', {parts = {2}, unique = UNIQUE_SECONDARY})
box.begin()
for i = 1, 1e6 do
s:replace{i, i}
end
box.commit()
local ddl = fiber.create(function()
s.index.pk:alter({parts = {2}})
log.info("PK is altered!")
end)
ddl:set_joinable(true)
for i = 1, 4e3 do
box.begin()
s:delete{i}
box.commit()
collectgarbage('collect')
end
log.info("Replaces are done!")
log.info({ddl:join()})
log.info(s.index.pk:select(nil, {limit = 10}))
log.info(s.index.sk1:select(nil, {limit = 10}))
log.info(s.index.sk2:select(nil, {limit = 10}))
log.info(s.index.sk3:select(nil, {limit = 10}))
Despite all the indexes are built on the same column, they show different tuples.
I> [[1001,1001],[1002,1002],[1003,1003],[1004,1004],[1005,1005],[1006,1006],[1007,1007],[1008,1008],[1009,1009],[1010,1010]]
I> [[2000,2000],[2001,2001],[2002,2002],[2003,2003],[2004,2004],[2005,2005],[2006,2006],[2007,2007],[2008,2008],[2009,2009]]
I> [[2998,2998],[2999,2999],[3000,3000],[3001,3001],[3002,3002],[3003,3003],[3004,3004],[3005,3005],[3006,3006],[3007,3007]]
I> [[4001,4001],[4002,4002],[4003,4003],[4004,4004],[4005,4005],[4006,4006],[4007,4007],[4008,4008],[4009,4009],[4010,4010]]Was found during investigation of TNT-1247.