Skip to content

Scatter large account of empty region is very slow #17412

@mayjiang0203

Description

@mayjiang0203

Bug Report

What version of TiKV are you using?

What operating system and CPU are you using?

Steps to reproduce

Import one large table split into more than 6K regions during one minute and needs to be scattered.
image

What did you expect?

Scatter succeeded.

What did happened?

Scatter pile up at adding learner.
image

Related tikv logs.

[2024/08/20 06:13:07.783 +00:00] [INFO] [peer.rs:316] ["replicate peer"] [create_by_peer_store_id=65] [create_by_peer_id=17107450] [store_id=46] [peer_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raft.rs:2660] ["switched to configuration"] [config="Configuration { voters: Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }"] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raft.rs:1127] ["became follower at term 0"] [term=0] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raft.rs:388] [newRaft] [peers="Configuration { incoming: Configuration { voters: {} }, outgoing: Configuration { voters: {} } }"] ["last term"=0] ["last index"=0] [applied=0] [commit=0] [term=0] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raw_node.rs:315] ["RawNode created with id 17107724."] [id=17107724] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raft.rs:1362] ["received a message with higher term from 17107450"] ["msg type"=MsgHeartbeat] [message_term=6] [term=0] [from=17107450] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:13:07.783 +00:00] [INFO] [raft.rs:1127] ["became follower at term 6"] [term=6] [raft_id=17107724] [region_id=17107444]
[2024/08/20 06:24:33.820 +00:00] [INFO] [snap.rs:263] ["saving snapshot file"] [file=/home/ec2-user/deploy/tikv-20160/data/snap/rev_17107444_6_7_(default|lock|write).sst] [snap_key=17107444_6_7]

New peers receive the snapshot after becoming followers 10min.
It should be caused by too many snapshots triaging the throttling. The default concurrent-send-snap-limit and concurrent-recv-snap-limit are both 32.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.report/customerCustomers have encountered this bug.type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions