Skip to content

Update clusterMoveNodeSlots to also move importing slots and migrating slots#2370

Merged
enjoy-binbin merged 5 commits into
valkey-io:unstablefrom
enjoy-binbin:move_slots
Jul 28, 2025
Merged

Update clusterMoveNodeSlots to also move importing slots and migrating slots#2370
enjoy-binbin merged 5 commits into
valkey-io:unstablefrom
enjoy-binbin:move_slots

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented Jul 21, 2025

Copy link
Copy Markdown
Member

In #2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see #445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes #2363.

…g slots

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

Copy link
Copy Markdown
Member Author

@PingXie Please take a look before we moving forward.

@codecov

codecov Bot commented Jul 21, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.44%. Comparing base (663dac9) to head (baaa0ec).
⚠️ Report is 134 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2370      +/-   ##
============================================
+ Coverage     71.30%   71.44%   +0.13%     
============================================
  Files           123      123              
  Lines         67122    67156      +34     
============================================
+ Hits          47859    47977     +118     
+ Misses        19263    19179      -84     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.66% <100.00%> (-0.21%) ⬇️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 22, 2025
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

Copy link
Copy Markdown
Member Author

A test log example:

37491:M 22 Jul 2025 11:23:37.405 # DEBUG LOG: ========== I am primary 1 ==========
37491:M 22 Jul 2025 11:23:38.716 * Cluster state changed: ok
### Starting test The role change and the slot ownership change should be an atomic operation in tests/unit/cluster/manual-failover.tcl

# cluster setslot
# Move slot 0 from R0 to R1. Move slot 5462 from R1 to R0.
37491:M 22 Jul 2025 11:23:47.123 * Importing slot 0 from node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 ()
37491:M 22 Jul 2025 11:23:47.124 * Migrating slot 5462 to node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 ()

# Failover occurred in R0/R3 shard.
# Move importing-source and migrating-target to R3.
37491:M 22 Jul 2025 11:23:47.160 * Failover occurred in migration source. Update importing source for slot 0 to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.
37491:M 22 Jul 2025 11:23:47.161 * Failover occurred in migration target. Slot 5462 is now being migrated to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.

# Move importing-source and migrating-target to R3.
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0; node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 () lost 5462 slot(s) and failed over to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () with a config epoch of 4
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in migration source. Update importing source of 1 slot(s) to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in migration target. Update migrating target of 1 slot(s) to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.

37491:M 22 Jul 2025 11:23:47.161 * Node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 () is now a replica of node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0

@enjoy-binbin enjoy-binbin requested a review from PingXie July 22, 2025 03:27
Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c

@PingXie PingXie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @enjoy-binbin

Comment thread src/cluster_legacy.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
@PingXie

PingXie commented Jul 27, 2025

Copy link
Copy Markdown
Member

Hmm, after reading through #2363, I realize that we now have duplicated logic in two places clusterUpdateSlotsConfigWith and clusterMoveNodeSlots. need to think about this some more...

@PingXie

PingXie commented Jul 28, 2025

Copy link
Copy Markdown
Member

I realize that we now have duplicated logic in two places clusterUpdateSlotsConfigWith and clusterMoveNodeSlots. need to think about this some more...

ok clusterMoveNodeSlots is a special case of clusterUpdateSlotsConfigWith. I played with the code a bit but didn't see a clean way to single source the two. I think your implementation is good.

the only minor thing is that the same logging statements in clusterUpdateSlotsConfigWith are at NOTICE so I'd suggest lowering them to VERBOSE as well to be consistent, for the same-shard path only. we can keep the logging statements for the different-shard path at NOTICE.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin requested a review from PingXie July 28, 2025 03:05

@PingXie PingXie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enjoy-binbin enjoy-binbin merged commit a481fe2 into valkey-io:unstable Jul 28, 2025
57 of 61 checks passed
@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 8.0 Jul 28, 2025
@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 8.1 Jul 28, 2025
@zuiderkwast zuiderkwast moved this from To be backported to Don't backport yet in Valkey 8.0 Sep 30, 2025
@ranshid ranshid moved this from To be backported to Don't backport yet in Valkey 8.1 Sep 30, 2025
@enjoy-binbin enjoy-binbin deleted the move_slots branch February 9, 2026 04:43
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 24, 2026
…g slots (valkey-io#2370)

In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see valkey-io#445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes valkey-io#2363.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 27, 2026
…g slots (valkey-io#2370)

In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see valkey-io#445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes valkey-io#2363.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit a3907ad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cluster run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Don't backport yet
Status: Don't backport yet
Status: Done

Development

Successfully merging this pull request may close these issues.

[test-failure] Slot-migration related

5 participants