-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Open
Description
Describe the bug
While re-shard operation, if the key to migrate is an hashset, depending on the size and number of fields in hashset, the migrate operation time out and causes Redis to failover.
In our case, we have a hashset of size 300 MB and 3 Million fields.
The main problem is as migrate is blocking command, it blocks the primary, due to which Redis thinks the primary is down and causes a failover.
StackExchange Error Logs:
TimeStamp: 2024-03-08T16:53:32.534223Z
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 4100ms elapsed, timeout is 4000ms), command=MIGRATE, next: some_random_key, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, last-in: 2, cur-in: 0, sync-ops: 0, async-ops: 2193844, serverEndpoint: 172.20.0.6:6380, conn-sec: 670.01, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: mtcache000002(SE.Redis-v2.6.116.40240), PerfCounterHelperkeyHashSlot: 9271, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=1,Free=32766,Min=8,Max=32767), POOL: (Threads=6,QueuedItems=0,CompletedItems=6635139), v: 2.6.116.40240 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts),
We get two of such errors before Redis failover, in redis, the node_timeout is set to 5 seconds.
Redis Logs:
16:53:35.041 * FAIL message received from 1a52537ed371931ec4436e02afdaae61fd061c17 about 42b37d2039622543514545a6cba3807e4db0b776
16:53:35.133 # Start of election delayed for 805 milliseconds (rank #0, offset 1269560724769).
16:53:36.736 # Configuration change detected. Reconfiguring myself as a replica of 42ac8718f2670e442fe598923f6069a66148e717
16:53:36.739 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
SLOWLOG output, showing that migrate took 9.3 seconds
We are looking for:
- How to migrate large hashes/hashset
- How to avoid migrate time out to cause Redis Primary failover?
Metadata
Metadata
Assignees
Labels
No labels