Skip to content

Slot migration improvement#445

Merged
PingXie merged 2 commits into
valkey-io:unstablefrom
PingXie:slot_migration2
May 7, 2024
Merged

Slot migration improvement#445
PingXie merged 2 commits into
valkey-io:unstablefrom
PingXie:slot_migration2

Conversation

@PingXie

@PingXie PingXie commented May 7, 2024

Copy link
Copy Markdown
Member

See #245 for the review discussion.

Overview

This PR significantly enhances the reliability and automation of
the Valkey cluster re-sharding process, specifically during slot
migrations in the face of primary failures. These updates address
critical failure issues that previously required extensive manual
intervention and could lead to data loss or inconsistent cluster states.

Enhancements

Automatic Failover Support in Empty Shards

The cluster now supports automatic failover in shards that do not
own any slots, which is common during scaling operations. This
improvement ensures high availability and resilience from the
outset of shard expansion.

Replication of Slot Migration States

All CLUSTER SETSLOT commands are now initially executed on replica
nodes before the primary. This ensures that the slot migration state
is consistent within the shard, preventing state loss in the event of
primary failure. A new timeout parameter has been introduced, allowing
users to specify the duration in milliseconds to wait for replication
to complete, with a default set at 2 seconds.

CLUSTER SETSLOT slot { IMPORTING node-id | MIGRATING node-id | NODE node-id | STABLE } [ TIMEOUT timeout ]

Recovery of Logical Migration Links

The update automatically repairs the logical links between source
and target nodes during failovers. This ensures that requests are
correctly redirected to the new primary in the target shard after
a primary failure, maintaining cluster integrity.

Enhanced Support for New Replicas

New replicas added to shards involved in slot migrations will now
automatically inherit the slot's migration state as part of their
initialization. This ensures that new replicas are immediately
consistent with the rest of the shard.

Improved Logging for Slot Migrations

Additional logging has been implemented to provide operators with
clearer insights into the slot migration processes and automatic
recovery actions, aiding in monitoring and troubleshooting.

Additional Changes

cluster-allow-replica-migration

When cluster-allow-replica-migration is disabled, primary nodes that
lose their last slot to another shard will no longer automatically become
replicas of the receiving shard. Instead, they will remain in their own
shards, which will now be empty, having no slots assigned to them.

Fixes #21.

…ding process,

specifically during slot migrations in the face of primary failures.

Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
@codecov

codecov Bot commented May 7, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 82.82443% with 45 lines in your changes are missing coverage. Please review.

Project coverage is 68.89%. Comparing base (93f8a19) to head (fa0285c).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #445      +/-   ##
============================================
+ Coverage     68.43%   68.89%   +0.45%     
============================================
  Files           109      109              
  Lines         61681    61785     +104     
============================================
+ Hits          42214    42565     +351     
+ Misses        19467    19220     -247     
Files Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/debug.c 53.47% <100.00%> (+0.65%) ⬆️
src/networking.c 85.08% <100.00%> (ø)
src/rdb.c 76.22% <100.00%> (-0.05%) ⬇️
src/replication.c 86.01% <100.00%> (-0.29%) ⬇️
src/server.c 88.60% <100.00%> (+0.47%) ⬆️
src/blocked.c 91.80% <91.66%> (-0.06%) ⬇️
src/cluster_legacy.c 83.16% <81.11%> (+8.46%) ⬆️

... and 12 files with indirect coverage changes

@PingXie PingXie merged commit 6e7af94 into valkey-io:unstable May 7, 2024
@enjoy-binbin

Copy link
Copy Markdown
Member

i see the (squash merge) commit message is missing (Signed-off-by footer is also missing)

@srgsanky

srgsanky commented May 8, 2024

Copy link
Copy Markdown
Contributor

Does any of the CI runs do make test-cluster? We might want to enable cluster tests in at least one run @madolson

I am able to consistently get a test failure after this commit. @PingXie can you please take a look?

./runtest-cluster --single tests/12-replica-migration-2

@madolson

madolson commented May 8, 2024

Copy link
Copy Markdown
Member

@srgsanky We're running most of the cluster tests as part of #442.

zuiderkwast added a commit that referenced this pull request May 24, 2024
After READONLY, make a cluster replica behave as its primary regarding
returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key
doesn't exist or if it has already been migrated to another shard as
part of an ongoing slot migration. Therefore, without an ASK redirect in
this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to
continue reading from a replica after following a redirect, it needs to
figure out the replicas of that new primary using CLUSTER SHARDS or
similar.

This is related to #21 and has been made possible by the introduction of
Replication of Slot Migration States in #445.

----

Release notes:

During cluster slot migration, replicas are able to return -ASK
redirects and -TRYAGAIN.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
PingXie pushed a commit that referenced this pull request Jul 16, 2024
Fixes a regression introduced in PR #445, which allowed a message from a
replica
to update the slot ownership of its primary. The regression results in a
`replicaof` cycle, causing server crashes due to the cycle detection
assert. The
fix restores the previous behavior where only primary senders can
trigger
`clusterUpdateSlotsConfigWith`.

Additional changes:

* Handling of primaries without slots is obsoleted by new handling of
when a
  sender that was a replica announces that it is now a primary.
* Replication loop detection code is unchanged but shifted downwards.
* Some variables are renamed for better readability and some are
introduced to
  avoid repeated memcmp() calls.

Fixes #753.

---------

Signed-off-by: Ping Xie <pingxie@google.com>
enjoy-binbin added a commit that referenced this pull request Jul 11, 2025
… failover (#2301)

After a failover occurs, when the PING-PONG of the replica, that is,
the old primary, reaches other shard nodes faster, the corresponding
code path will be reached. We will adjust the sender to be a replica
and sender_claimed_primary to be the primary node, but in myself view,
slots still belong to the sender, which is a replica.

This actually restores part of the code in 28976a9.
It was lost in the changes to #445 and #754.

These 3 changes are about avoiding the replicaof cycle but ended up creating
a "cycle" of assumptions themselves. when #445 removed this logic, it was
assumed that a replica could also drive clusterUpdateSlotsConfigWith
but this was actually a regression that resulted in a replicaof cycle (#753).
#754 fixed the replicaof cycle by disallowing a replica to drive
clusterUpdateSlotsConfigWith but missed restoring the original logic.

Added a new `DEBUG DISABLE-CLUSTER-RECONNECTION <0|1>` this will prevent
cluster nodes from reconnecting so that the issue can be reproduce in test.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
enjoy-binbin added a commit that referenced this pull request Jul 28, 2025
…g slots (#2370)

In #2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see #445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes #2363.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 24, 2026
…g slots (valkey-io#2370)

In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see valkey-io#445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes valkey-io#2363.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 27, 2026
…g slots (valkey-io#2370)

In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see valkey-io#445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes valkey-io#2363.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit a3907ad)
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request May 25, 2026
In this case, in R2'view, we will create a replication chain and
not able to recover. Chain: R2->primary->R1
```
T0    primary              R1                     R2
T1                failover done, broadcase P1
T2   receive P1
T3   send P2
T3                                          receive P2
T4                                          receive P1
```

Pick some code from valkey-io/valkey#445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Improve slot migration reliability

4 participants