Skip to content

vinyl: replication slows down master #10846

@sergos

Description

@sergos

Replication slows down significantly during the initial join, hijacking the master's TX for the vinyl data fetch.
After master made a big enough vinyl space I joined a replica. It started rather quick and no issues with master speed.

Replica log shows speed about 100Krps:

2024-11-21 17:50:13.853 [72025] main/111/applier/3303 I> 1.0M rows received                                                                                                                                                                              
2024-11-21 17:50:14.957 [72025] main/111/applier/3303 I> 1.1M rows received                                                                                                                                                                              

Later on replication slowed down with master's TX is flexing:

68194 s.ostan+  20   0 2438696   1.3g  19128 R 99.9  0.3 145:00.61 tarantool                                                                                                                                                                             
68195 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:00.00  `- applier_1                                                                                                                                                                         
68196 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:01.39  `- iproto                                                                                                                                                                            
68197 s.ostan+  20   0 2438696   1.3g  19128 S  0.3  0.3  12:29.63  `- wal                                                                                                                                                                               
68199 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:02.19  `- coio                                                                                                                                                                              
68200 s.ostan+  20   0 2438696   1.3g  19128 S  2.7  0.3  58:21.94  `- vinyl.reader.0                                                                                                                                                                    
68228 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3  12:41.18  `- vinyl.dump.0                                                                                                                                                                      
68252 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3  21:14.92  `- vinyl.compactio                                                                                                                                                                   
68253 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3  21:47.87  `- vinyl.compactio                                                                                                                                                                   
68254 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3  21:54.09  `- vinyl.compactio                                                                                                                                                                   
70997 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:00.87  `- coio                                                                                                                                                                              
75305 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:00.48  `- coio                                                                                                                                                                              
75306 s.ostan+  20   0 2438696   1.3g  19128 S  0.0  0.3   0:00.51  `- coio                                                                                                                                                                              

And replica is relaxed:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                                               
72025 s.ostan+  20   0 1743972 871644  18108 S  3.0  0.2   4:14.43 tarantool                                                                                                                                                                             
72027 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:00.00  `- applier_1                                                                                                                                                                         
72028 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:00.00  `- iproto                                                                                                                                                                            
72029 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:00.00  `- wal                                                                                                                                                                               
72030 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:00.01  `- coio                                                                                                                                                                              
72031 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:00.01  `- coio                                                                                                                                                                              
72052 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:41.29  `- vinyl.dump.0                                                                                                                                                                      
72230 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:49.79  `- vinyl.compactio                                                                                                                                                                   
72231 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:53.71  `- vinyl.compactio                                                                                                                                                                   
72232 s.ostan+  20   0 1743972 871644  18108 S  0.0  0.2   0:40.07  `- vinyl.compactio                                                                                                                                                                   

While making a very modest progress of 3Krps:

2024-11-21 19:22:15.850 [72025] main/111/applier/3303 I> 33.4M rows received
2024-11-21 19:22:44.573 [72025] main/111/applier/3303 I> 33.5M rows received

The backtrace on master is following:

Thread 1 (Thread 0x7f4be7c70c00 (LWP 68194)):
#0  0x000055d52a4fafc1 in vy_mem_iterator_step (itr=0x55d52cd9fa08) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:294
#1  0x000055d52a4fb2f8 in vy_mem_iterator_find_lsn (itr=0x55d52cd9fa08) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:334
#2  0x000055d52a4fc4be in vy_mem_iterator_seek (itr=<optimized out>, last=...) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:450
#3  0x000055d52a506959 in vy_mem_iterator_restore (itr=itr@entry=0x55d52cd9fa08, last=..., history=history@entry=0x55d52cd9fa88) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:615
#4  0x000055d52a5238f1 in vy_read_iterator_restore_mem (next=0x7f4b93f80980, itr=0x55d52d065920) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:454
#5  vy_read_iterator_advance (itr=0x55d52d065920) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:605
#6  vy_read_iterator_next (itr=itr@entry=0x55d52d065920, result=result@entry=0x7f4b93f80a10) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:949
#7  0x000055d52a4e7e8e in vinyl_engine_join (engine=<optimized out>, arg=<optimized out>, stream=0x55d52d051060) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vinyl.c:3138
#8  0x000055d52a4d6c3c in engine_join (ctx=ctx@entry=0x7f4b93f80b10, stream=stream@entry=0x55d52d051060) at /home/s.ostanevich/workspaces/t.sergos2/src/box/engine.c:230
#9  0x000055d52a5b91ee in engine_join_xc (stream=0x55d52d051060, ctx=0x7f4b93f80b10) at /home/s.ostanevich/workspaces/t.sergos2/src/box/engine.h:662
#10 relay_initial_join (io=io@entry=0x7f4ba9840e08, sync=<optimized out>, vclock=vclock@entry=0x7f4b93f80c80, replica_version_id=197121, cursor=cursor@entry=0x0) at /home/s.ostanevich/workspaces/t.sergos2/src/box/relay.cc:493
#11 0x000055d52a594fc3 in box_process_join (io=io@entry=0x7f4ba9840e08, header=header@entry=0x7f4ba9810120) at /home/s.ostanevich/workspaces/t.sergos2/src/box/box.cc:4662
#12 0x000055d52a4997b5 in tx_process_replication (m=0x7f4ba9810100) at /home/s.ostanevich/workspaces/t.sergos2/src/box/iproto.cc:2838
#13 0x000055d52a69c553 in cmsg_deliver (msg=msg@entry=0x7f4ba9810100) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/cbus.c:350
#14 0x000055d52a69d22f in fiber_pool_f (ap=<error reading variable: value has been optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber_pool.c:65
#15 0x000055d52a4939dc in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=<optimized out>, ap=<optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber.h:1324
#16 0x000055d52a699a96 in fiber_loop (data=<optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber.c:1162
#17 0x000055d52a91f108 in coro_init () at /home/s.ostanevich/workspaces/t.sergos2/third_party/coro/coro.c:108
--master.lua
local fiber = require('fiber')

box.cfg {
    vinyl_memory = 512*1024*1024,
    listen = 3303,
}

box.once('init',
         function()
             box.schema.create_space('p', {engine='vinyl'})
             box.space.p:create_index('p')
             box.space.p:create_index('s', {parts={2}, unique=false})
             box.space.p:create_index('t', {parts={3}, unique=false})
             box.schema.user.grant('guest','replication')
         end
        )

local pause_load = 0
local space_size = 1e9
local fibers_number = 50

load_fibers = {}

function pause(seconds)
    if seconds == nil or type(seconds) ~= 'number' then
        return "Use number of seconds to pause the load, 0 to run"
    end
    pause_load = seconds
end

function load()
    for j = 1, fibers_number do
        load_fibers[j] = fiber.new(function()
            for i = 1, space_size / fibers_number do
                pcall(box.space.p.replace, box.space.p, {i * j, i, i})
                if i % 1000 == 0 then
                    if pause_load and type(pause_load) == 'number' then
                        fiber.sleep(pause_load)
                    else
                        fiber.yield()
                    end
                end
            end
        end)
        fiber.set_joinable(load_fibers[j], true)
        fiber.name(string.format('load-%d', j))
    end
end

function wait_load()
    for j = 1, fibers_number do
        fiber.join(load_fibers[j])
    end
end

require'console'.start()

Run load() for about 10 mins, then run replica along:

box.cfg {
    vinyl_memory = 512*1024*1024,
    listen=3304,
    replication={
        3303,
        3304,
    },
    replication_connect_timeout=200,
}        
require'console'.start()

Metadata

Metadata

Assignees

Labels

2.11Target is 2.11 and all newer release/master branches3.2Target is 3.2 and all newer release/master branchesbugSomething isn't workingvinyl

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions