-
Notifications
You must be signed in to change notification settings - Fork 403
Closed
Labels
2.11Target is 2.11 and all newer release/master branchesTarget is 2.11 and all newer release/master branches3.2Target is 3.2 and all newer release/master branchesTarget is 3.2 and all newer release/master branchesbugSomething isn't workingSomething isn't workingvinyl
Description
Replication slows down significantly during the initial join, hijacking the master's TX for the vinyl data fetch.
After master made a big enough vinyl space I joined a replica. It started rather quick and no issues with master speed.
Replica log shows speed about 100Krps:
2024-11-21 17:50:13.853 [72025] main/111/applier/3303 I> 1.0M rows received
2024-11-21 17:50:14.957 [72025] main/111/applier/3303 I> 1.1M rows received
Later on replication slowed down with master's TX is flexing:
68194 s.ostan+ 20 0 2438696 1.3g 19128 R 99.9 0.3 145:00.61 tarantool
68195 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:00.00 `- applier_1
68196 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:01.39 `- iproto
68197 s.ostan+ 20 0 2438696 1.3g 19128 S 0.3 0.3 12:29.63 `- wal
68199 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:02.19 `- coio
68200 s.ostan+ 20 0 2438696 1.3g 19128 S 2.7 0.3 58:21.94 `- vinyl.reader.0
68228 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 12:41.18 `- vinyl.dump.0
68252 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 21:14.92 `- vinyl.compactio
68253 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 21:47.87 `- vinyl.compactio
68254 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 21:54.09 `- vinyl.compactio
70997 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:00.87 `- coio
75305 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:00.48 `- coio
75306 s.ostan+ 20 0 2438696 1.3g 19128 S 0.0 0.3 0:00.51 `- coio
And replica is relaxed:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
72025 s.ostan+ 20 0 1743972 871644 18108 S 3.0 0.2 4:14.43 tarantool
72027 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:00.00 `- applier_1
72028 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:00.00 `- iproto
72029 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:00.00 `- wal
72030 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:00.01 `- coio
72031 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:00.01 `- coio
72052 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:41.29 `- vinyl.dump.0
72230 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:49.79 `- vinyl.compactio
72231 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:53.71 `- vinyl.compactio
72232 s.ostan+ 20 0 1743972 871644 18108 S 0.0 0.2 0:40.07 `- vinyl.compactio
While making a very modest progress of 3Krps:
2024-11-21 19:22:15.850 [72025] main/111/applier/3303 I> 33.4M rows received
2024-11-21 19:22:44.573 [72025] main/111/applier/3303 I> 33.5M rows received
The backtrace on master is following:
Thread 1 (Thread 0x7f4be7c70c00 (LWP 68194)):
#0 0x000055d52a4fafc1 in vy_mem_iterator_step (itr=0x55d52cd9fa08) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:294
#1 0x000055d52a4fb2f8 in vy_mem_iterator_find_lsn (itr=0x55d52cd9fa08) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:334
#2 0x000055d52a4fc4be in vy_mem_iterator_seek (itr=<optimized out>, last=...) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:450
#3 0x000055d52a506959 in vy_mem_iterator_restore (itr=itr@entry=0x55d52cd9fa08, last=..., history=history@entry=0x55d52cd9fa88) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_mem.c:615
#4 0x000055d52a5238f1 in vy_read_iterator_restore_mem (next=0x7f4b93f80980, itr=0x55d52d065920) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:454
#5 vy_read_iterator_advance (itr=0x55d52d065920) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:605
#6 vy_read_iterator_next (itr=itr@entry=0x55d52d065920, result=result@entry=0x7f4b93f80a10) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vy_read_iterator.c:949
#7 0x000055d52a4e7e8e in vinyl_engine_join (engine=<optimized out>, arg=<optimized out>, stream=0x55d52d051060) at /home/s.ostanevich/workspaces/t.sergos2/src/box/vinyl.c:3138
#8 0x000055d52a4d6c3c in engine_join (ctx=ctx@entry=0x7f4b93f80b10, stream=stream@entry=0x55d52d051060) at /home/s.ostanevich/workspaces/t.sergos2/src/box/engine.c:230
#9 0x000055d52a5b91ee in engine_join_xc (stream=0x55d52d051060, ctx=0x7f4b93f80b10) at /home/s.ostanevich/workspaces/t.sergos2/src/box/engine.h:662
#10 relay_initial_join (io=io@entry=0x7f4ba9840e08, sync=<optimized out>, vclock=vclock@entry=0x7f4b93f80c80, replica_version_id=197121, cursor=cursor@entry=0x0) at /home/s.ostanevich/workspaces/t.sergos2/src/box/relay.cc:493
#11 0x000055d52a594fc3 in box_process_join (io=io@entry=0x7f4ba9840e08, header=header@entry=0x7f4ba9810120) at /home/s.ostanevich/workspaces/t.sergos2/src/box/box.cc:4662
#12 0x000055d52a4997b5 in tx_process_replication (m=0x7f4ba9810100) at /home/s.ostanevich/workspaces/t.sergos2/src/box/iproto.cc:2838
#13 0x000055d52a69c553 in cmsg_deliver (msg=msg@entry=0x7f4ba9810100) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/cbus.c:350
#14 0x000055d52a69d22f in fiber_pool_f (ap=<error reading variable: value has been optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber_pool.c:65
#15 0x000055d52a4939dc in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=<optimized out>, ap=<optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber.h:1324
#16 0x000055d52a699a96 in fiber_loop (data=<optimized out>) at /home/s.ostanevich/workspaces/t.sergos2/src/lib/core/fiber.c:1162
#17 0x000055d52a91f108 in coro_init () at /home/s.ostanevich/workspaces/t.sergos2/third_party/coro/coro.c:108
--master.lua
local fiber = require('fiber')
box.cfg {
vinyl_memory = 512*1024*1024,
listen = 3303,
}
box.once('init',
function()
box.schema.create_space('p', {engine='vinyl'})
box.space.p:create_index('p')
box.space.p:create_index('s', {parts={2}, unique=false})
box.space.p:create_index('t', {parts={3}, unique=false})
box.schema.user.grant('guest','replication')
end
)
local pause_load = 0
local space_size = 1e9
local fibers_number = 50
load_fibers = {}
function pause(seconds)
if seconds == nil or type(seconds) ~= 'number' then
return "Use number of seconds to pause the load, 0 to run"
end
pause_load = seconds
end
function load()
for j = 1, fibers_number do
load_fibers[j] = fiber.new(function()
for i = 1, space_size / fibers_number do
pcall(box.space.p.replace, box.space.p, {i * j, i, i})
if i % 1000 == 0 then
if pause_load and type(pause_load) == 'number' then
fiber.sleep(pause_load)
else
fiber.yield()
end
end
end
end)
fiber.set_joinable(load_fibers[j], true)
fiber.name(string.format('load-%d', j))
end
end
function wait_load()
for j = 1, fibers_number do
fiber.join(load_fibers[j])
end
end
require'console'.start()Run load() for about 10 mins, then run replica along:
box.cfg {
vinyl_memory = 512*1024*1024,
listen=3304,
replication={
3303,
3304,
},
replication_connect_timeout=200,
}
require'console'.start()Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
2.11Target is 2.11 and all newer release/master branchesTarget is 2.11 and all newer release/master branches3.2Target is 3.2 and all newer release/master branchesTarget is 3.2 and all newer release/master branchesbugSomething isn't workingSomething isn't workingvinyl