Skip to content

replication: applier times out when wal_queue_max_size is reached #11837

@Gerold103

Description

@Gerold103

Similar to #11836, but it only leads to the replica timing out, not to getting stuck forever.

Reproducer:

--
-- Instance 1
--
-- Step 1
--
fiber = require('fiber')
log = require('log')
data = string.rep('a', 1000)
box.cfg{
    listen = 3313,
    replication = {3313, 3314},
}
box.schema.user.grant('guest', 'super')
s = box.schema.create_space('test')
_ = s:create_index('pk')
--
-- Step 3
--
function make_txn_fiber(id, on_commit)
    return _G.fiber.create(function()
        _G.fiber.self():name(('worker-%d'):format(id))
        _G.fiber.self():set_joinable(true)
        box.begin()
        box.on_commit(function()
            log.info(('Committing %d'):format(id))
            if on_commit then
                on_commit()
            end
        end)
        log.info(('Start %d'):format(id))
        s:insert{id, data}
        box.commit()
    end)
end

f1 = make_txn_fiber(1)
f2 = make_txn_fiber(2)
f3 = make_txn_fiber(3)
--
-- Step 4
--
-- Observe that the replica gets disconnected due to no acks received during the
-- replication timeout.
--


--
-- Instance 2
--
-- Step 2
--
fiber = require('fiber')
log = require('log')
json = require('json')
box.cfg{
    listen = 3314,
    replication = {3313, 3314},
    wal_queue_max_size = 1000,
    read_only = true,
}
box.error.injection.set('ERRINJ_WAL_DELAY', true)
function make_on_replace(space_name)
    return function(old, new)
        log.info(('%s: %s -> %s'):format(space_name, json.encode(old), json.encode(new)))
    end
end
s = box.space.test
_ = s:on_replace(make_on_replace(s.name))

I would expect the applier would still be sending acks, but just having the same older vclock.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions