Fix empty shard reconfiguration after CLUSTER RESET SOFT by enjoy-binbin · Pull Request #2989 · valkey-io/valkey

enjoy-binbin · 2025-12-29T11:51:39Z

This change started with #445, which means it has been present since Valkey 8.0.

In this case, an emptry shard, primary and replica, when the replica do a
CLUSTER RESET SOFT, replica will become a new primary, and the primary will
become a replica, that is wrong in this case. The configuration is incorrectly
inverted: R4 becomes the new primary and R0 becomes a replica of R4.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
    R 1 cluster ADDSLOTSRANGE 0 5460
    R 2 cluster ADDSLOTSRANGE 5461 10922
    R 3 cluster ADDSLOTSRANGE 10923 16383
}

start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
    test "Empty shard will not be reconfigured after the cluster soft reset" {
        R 4 cluster reset soft

        # R0 will become a replica of R4.
    }
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

The reason is that in clusterUpdateSlotsConfigWith, we have this logic, to
handle failover within empty shards:

    /* Handle a special case where new_primary is not set but both sender
     * and myself own no slots and in the same shard. Set the sender as
     * the new primary if my current config epoch is lower than the
     * sender's. Make sure the empty shard can be reconfigured later
     * after a failover. */
    if (!new_primary && myself->replicaof != sender && sender_slots == 0 && myself->numslots == 0 &&
        nodeEpoch(myself) < senderConfigEpoch && are_in_same_shard) {
        new_primary = sender;
    }

But we don't have the right shard_id, so the CLUSTER RESET SOFT case become
a FAILOVER case and cause the trouble. See #2586 for more details.

Extract the shard_id extension processing into a separate lightweight
function (clusterProcessShardIdExtension) and call it before
clusterUpdateSlotsConfigWith. This ensures the sender's shard_id is
up-to-date when making shard membership decisions, without triggering
side effects from other extensions like FORGOTTEN_NODE. The subsequent
updateShardId call in clusterProcessPingExtensions becomes a no-op
since the shard_id is already current.

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin · 2025-12-29T12:07:49Z

I'm not sure if we can do this, also see #2586 for more details. It's affecting our control panel.

In this case, an emptry shard, primary and replica, when the replica do a CLUSTER RESET SOFT,
replica will become a new primary, and the primary will become a replica, that is wrong in this case.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
    R 1 cluster ADDSLOTSRANGE 0 5460
    R 2 cluster ADDSLOTSRANGE 5461 10922
    R 3 cluster ADDSLOTSRANGE 10923 16383
}

start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
    test "Empty shard will not be reconfigured after the cluster soft reset" {
        R 4 cluster reset soft

        # R0 will become a replica of R4.
    }
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

The reason is that in clusterUpdateSlotsConfigWith, we have this logic:

    /* Handle a special case where new_primary is not set but both sender
     * and myself own no slots and in the same shard. Set the sender as
     * the new primary if my current config epoch is lower than the
     * sender's. Make sure the empty shard can be reconfigured later
     * after a failover. */
    if (!new_primary && myself->replicaof != sender && sender_slots == 0 && myself->numslots == 0 &&
        nodeEpoch(myself) < senderConfigEpoch && are_in_same_shard) {
        new_primary = sender;
    }

But we don't have the right shard_id, so the CLUSTER RESET SOFT case become a FAILOVER case and cause the trouble.

@PingXie @madolson @hpatro Do you guys have any good ideas?

codecov · 2025-12-29T12:25:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.56%. Comparing base (1db8bab) to head (f22d4c7).

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2989      +/-   ##
============================================
+ Coverage     76.53%   76.56%   +0.02%     
============================================
  Files           157      157              
  Lines         79025    79039      +14     
============================================
+ Hits          60481    60514      +33     
+ Misses        18544    18525      -19

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`88.27% <100.00%> (+0.25%)`	⬆️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PingXie · 2026-01-04T21:09:09Z


+            /* We try to process extensions before the clusterUpdateSlotsConfigWith,
+             * because it relies on extensions such as shard_id. */
+            clusterProcessPingExtensions(hdr, link);


I think we discussed this solution in #2586 and concluded that the behavior is hard to reason about. also there is code consuming shard_id before this line in this function so it is not clear to me how to justify the change. lastly, I am generally against calling a state mutating method like clusterProcessPingExtentions in the middle of a work flow. for instance, this method can delete forgotten nodes reported in CLUSTERMSG_EXT_TYPE_FORGOTTEN_NODE. I think another option could be processing just the shard_id PING extension. BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

I wonder if we should "promote" the CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED flag to a "global" flag so that we could reuse the #2586 solution (skipping clusterUpdateSlotsConfigWith until the shard id stabilizes).

Cc @deepakrn

CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED can't fix the reset soft case if i remember correctly, i did try it on that branch, the old primary node still became a replica, although i did not take a deep look and figure it up.

BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

Yes, I remember some of the problems here, as i remember this can not fix the reset soft case. We may need to wait for the role change before updating the shard_id. The new shard_id was rejected here.

static void updateShardId(clusterNode *node, const char *shard_id) { /* Ensure replica shard IDs match their primary's to maintain cluster consistency. * * Shard ID updates must prioritize the primary, then propagate to replicas. * This is critical due to the eventual consistency of shard IDs during cluster * expansion. New replicas might replicate from a primary before fully * synchronizing shard IDs with the rest of the cluster. * * Without this enforcement, a temporary inconsistency can arise where a * replica's shard ID diverges from its primary's. This inconsistency is * persisted in the primary's nodes.conf file. While this divergence will * eventually resolve, if the primary crashes beforehand, it will enter a * crash-restart loop due to the mismatch in its nodes.conf. */ if (shard_id && nodeIsReplica(node) && memcmp(clusterNodeGetPrimary(node)->shard_id, shard_id, CLUSTER_NAMELEN) != 0) { serverLog( LL_NOTICE, "Shard id %.40s update request for node id %.40s diverges from existing primary shard id %.40s, rejecting!", shard_id, node->name, clusterNodeGetPrimary(node)->shard_id); return; }

I think another option could be processing just the shard_id PING extension

I can give it a try, if we ultimately decide to.

@enjoy-binbin - I would like to understand in what scenario would the solution in #2586 not work. How does cluster reset soft cause a failover? Is there a way to guard the thing causing failover using the SHARD_ID_UNINITIALIZED flag?

Marking of shard_id as uninitialized for a particular node until it receives a direct ping will let us potentially ignore any type of updates from that node.

You can run this test on your branch, i once try it and it fail in your branch, i don't have the details right now though.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3. proc my_slot_allocation2 {masters replicas} { R 1 cluster ADDSLOTSRANGE 0 5460 R 2 cluster ADDSLOTSRANGE 5461 10922 R 3 cluster ADDSLOTSRANGE 10923 16383 } start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} { test "Empty shard will not be reconfigured after the cluster soft reset" { R 4 cluster reset soft # R0 will become a replica of R4. } } my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

The git sha is your branch

R 0 logs:

### Starting server for test 68135:M 09 Jan 2026 14:31:21.695 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 68135:M 09 Jan 2026 14:31:21.695 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=68135, just started 68135:M 09 Jan 2026 14:31:21.695 * Configuration loaded 68135:M 09 Jan 2026 14:31:21.696 * monotonic clock: POSIX clock_gettime 68135:M 09 Jan 2026 14:31:21.696 # Failed to write PID file: Permission denied .+^+. .+#########+. .+########+########+. Valkey 255.255.255 (07c84f5e/0) 64 bit .+########+' '+########+. .########+' .+. '+########. Running in cluster mode |####+' .+#######+. '+####| Port: 21115 |###| .+###############+. |###| PID: 68135 |###| |#####*'' ''*#####| |###| |###| |####' .-. '####| |###| |###| |###( (@@@) )###| |###| https://valkey.io |###| |####. '-' .####| |###| |###| |#####*. .*#####| |###| |###| '+#####| |#####+' |###| |####+. +##| |#+' .+####| '#######+ |##| .+########' '+###| |##| .+########+' '| |####+########+' +#########+' '+v+' 68135:M 09 Jan 2026 14:31:21.697 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128. 68135:M 09 Jan 2026 14:31:21.697 * No cluster configuration found, I'm 147548acc7ff529db0b822f84c0b4f2bf1c4a009 68135:M 09 Jan 2026 14:31:21.711 * Server initialized 68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections tcp 68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections unix 68135:M 09 Jan 2026 14:31:21.823 - Accepted 127.0.0.1:59019 68135:M 09 Jan 2026 14:31:21.823 - Client closed connection id=2 addr=127.0.0.1:59019 laddr=127.0.0.1:21115 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1 68135:M 09 Jan 2026 14:31:21.830 - Accepted 127.0.0.1:59020 68135:M 09 Jan 2026 14:31:21.831 * configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH 68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21114 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver='). 68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21113 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver='). 68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21112 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver='). 68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21111 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver='). 68135:M 09 Jan 2026 14:31:21.843 # Missing implement of connection type tls 68135:M 09 Jan 2026 14:31:21.936 - Accepting cluster node connection from 127.0.0.1:59025 68135:M 09 Jan 2026 14:31:21.936 * IP address for this node updated to 127.0.0.1 68135:M 09 Jan 2026 14:31:21.936 * Successfully completed handshake with 4cf8e6e4ee002bbb059f58cd7715df616e2322df () 68135:M 09 Jan 2026 14:31:21.950 - Accepting cluster node connection from 127.0.0.1:59026 68135:M 09 Jan 2026 14:31:21.950 * Successfully completed handshake with a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () 68135:M 09 Jan 2026 14:31:21.950 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 933ced2c2666ea7bc70aad26df33b1dc7b41e62a 68135:M 09 Jan 2026 14:31:21.969 - Accepting cluster node connection from 127.0.0.1:59027 68135:M 09 Jan 2026 14:31:21.969 * Successfully completed handshake with c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () 68135:M 09 Jan 2026 14:31:21.969 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 95343bcf8cb28e42821c3e311ac311f6a9a1a1d5 68135:M 09 Jan 2026 14:31:22.006 - Accepting cluster node connection from 127.0.0.1:59028 68135:M 09 Jan 2026 14:31:22.006 * Successfully completed handshake with 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () 68135:M 09 Jan 2026 14:31:22.006 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 5b45a73ed27120220e92fb78f2d21b6e6d3050f7 68135:M 09 Jan 2026 14:31:22.235 - Accepted 127.0.0.1:59041 68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is no longer primary of shard 502a2cf81c5905cc12979f271291e73e6cc2b0bf; removed all 0 slot(s) it used to own 68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now part of shard adfeced14b0029ddb7708b168a20d24d33bec43b 68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now a replica of node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () in shard adfeced14b0029ddb7708b168a20d24d33bec43b 68135:M 09 Jan 2026 14:31:22.258 # DEBUG LOG: ========== I am primary 0 ========== 68135:M 09 Jan 2026 14:31:22.263 * Replica 127.0.0.1:21111 asks for synchronization 68135:M 09 Jan 2026 14:31:22.263 * Full resync requested by replica 127.0.0.1:21111 68135:M 09 Jan 2026 14:31:22.263 * Replication backlog created, my new replication IDs are 'a54e9798a181182f85e4df5fc9ed0d1503eedac1' and '0000000000000000000000000000000000000000' 68135:M 09 Jan 2026 14:31:22.263 * Starting BGSAVE for SYNC with target: replicas sockets using: normal sync 68135:M 09 Jan 2026 14:31:22.263 * Background RDB transfer started by pid 68172 to pipe through parent process 68172:C 09 Jan 2026 14:31:22.264 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB 68135:M 09 Jan 2026 14:31:22.267 * Diskless rdb transfer, done reading from pipe, 1 replicas still up. 68135:M 09 Jan 2026 14:31:22.321 * Background RDB transfer terminated with success 68135:M 09 Jan 2026 14:31:22.321 * Streamed RDB transfer with replica 127.0.0.1:21111 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming 68135:M 09 Jan 2026 14:31:22.321 * Synchronization with replica 127.0.0.1:21111 succeeded 68135:M 09 Jan 2026 14:31:23.750 * Cluster state changed: ok ### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl 68135:M 09 Jan 2026 14:31:31.876 - Client closed connection id=11 addr=127.0.0.1:59041 laddr=127.0.0.1:21115 fd=23 name= age=9 idle=0 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=replconf user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=599 tot-net-out=41 tot-cmds=15 68135:M 09 Jan 2026 14:31:31.876 * Connection with replica 127.0.0.1:21111 lost. 68135:M 09 Jan 2026 14:31:31.888 - Client closed connection id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= age=10 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=624 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|info user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=6028 tot-net-out=227007 tot-cmds=203 68135:M 09 Jan 2026 14:31:31.921 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b 68135:M 09 Jan 2026 14:31:31.921 * Mismatch in topology information for sender node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b 68135:M 09 Jan 2026 14:31:31.921 * Configuration change detected. Reconfiguring myself as a replica of node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b 68135:S 09 Jan 2026 14:31:31.921 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer. 68135:S 09 Jan 2026 14:31:31.921 * Connecting to PRIMARY 127.0.0.1:21111 68135:S 09 Jan 2026 14:31:31.921 * PRIMARY <-> REPLICA sync started 68135:S 09 Jan 2026 14:31:31.940 * Non blocking connect for SYNC fired the event. 68135:S 09 Jan 2026 14:31:31.940 * Primary replied to PING, replication can continue... 68135:S 09 Jan 2026 14:31:31.940 * (Non critical) Primary does not understand REPLCONF SET-CLUSTER-NODE-ID: -ERR Unknown node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 68135:S 09 Jan 2026 14:31:31.940 * Trying a partial resynchronization (request a54e9798a181182f85e4df5fc9ed0d1503eedac1:15). 68135:S 09 Jan 2026 14:31:31.940 * Successful partial resynchronization with primary. 68135:S 09 Jan 2026 14:31:31.940 * Primary replication ID changed to 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e 68135:S 09 Jan 2026 14:31:31.940 * PRIMARY <-> REPLICA sync: Primary accepted a Partial Resynchronization. 68135:signal-handler (1767940292) Received SIGTERM scheduling shutdown... 68135:S 09 Jan 2026 14:31:32.614 - Accepting cluster node connection from 127.0.0.1:59053 68135:S 09 Jan 2026 14:31:32.614 * User requested shutdown... 68135:S 09 Jan 2026 14:31:32.614 * Removing the pid file. 68135:S 09 Jan 2026 14:31:32.614 * Saving the cluster configuration file before exiting. 68135:S 09 Jan 2026 14:31:32.635 * Removing the unix socket file. 68135:S 09 Jan 2026 14:31:32.636 # Valkey is now ready to exit, bye bye...

R 4 logs:

### Starting server for test 67965:M 09 Jan 2026 14:31:21.117 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 67965:M 09 Jan 2026 14:31:21.117 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=67965, just started 67965:M 09 Jan 2026 14:31:21.117 * Configuration loaded 67965:M 09 Jan 2026 14:31:21.117 * monotonic clock: POSIX clock_gettime 67965:M 09 Jan 2026 14:31:21.118 # Failed to write PID file: Permission denied .+^+. .+#########+. .+########+########+. Valkey 255.255.255 (07c84f5e/0) 64 bit .+########+' '+########+. .########+' .+. '+########. Running in cluster mode |####+' .+#######+. '+####| Port: 21111 |###| .+###############+. |###| PID: 67965 |###| |#####*'' ''*#####| |###| |###| |####' .-. '####| |###| |###| |###( (@@@) )###| |###| https://valkey.io |###| |####. '-' .####| |###| |###| |#####*. .*#####| |###| |###| '+#####| |#####+' |###| |####+. +##| |#+' .+####| '#######+ |##| .+########' '+###| |##| .+########+' '| |####+########+' +#########+' '+v+' 67965:M 09 Jan 2026 14:31:21.118 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128. 67965:M 09 Jan 2026 14:31:21.118 * No cluster configuration found, I'm 4cf8e6e4ee002bbb059f58cd7715df616e2322df 67965:M 09 Jan 2026 14:31:21.127 * Server initialized 67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections tcp 67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections unix 67965:M 09 Jan 2026 14:31:21.213 - Accepted 127.0.0.1:59011 67965:M 09 Jan 2026 14:31:21.214 - Client closed connection id=2 addr=127.0.0.1:59011 laddr=127.0.0.1:21111 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1 67965:M 09 Jan 2026 14:31:21.220 - Accepted 127.0.0.1:59012 67965:M 09 Jan 2026 14:31:21.836 * configEpoch set to 5 via CLUSTER SET-CONFIG-EPOCH 67965:M 09 Jan 2026 14:31:21.915 - Accepting cluster node connection from 127.0.0.1:59024 67965:M 09 Jan 2026 14:31:21.915 * IP address for this node updated to 127.0.0.1 67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 851fa9cd0d51c55a6354e49a14f03e65f2001364 67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb 67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard c39496129f886c8d26a09d3592055a0b84710864 67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 20e3596ffbe2293fe219127b9bbaedb706a6fc17 67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard e8f0a8d091f60e24770322126f56b3cf7d155b59 67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3 67965:M 09 Jan 2026 14:31:22.162 - Accepting cluster node connection from 127.0.0.1:59034 67965:M 09 Jan 2026 14:31:22.182 - Accepting cluster node connection from 127.0.0.1:59037 67965:M 09 Jan 2026 14:31:22.207 - Accepting cluster node connection from 127.0.0.1:59038 67965:M 09 Jan 2026 14:31:22.234 # Missing implement of connection type tls 67965:S 09 Jan 2026 14:31:22.235 * Connecting to PRIMARY 127.0.0.1:21115 67965:S 09 Jan 2026 14:31:22.235 * PRIMARY <-> REPLICA sync started 67965:S 09 Jan 2026 14:31:22.235 * Cluster state changed: ok 67965:S 09 Jan 2026 14:31:22.236 * Non blocking connect for SYNC fired the event. 67965:S 09 Jan 2026 14:31:22.236 * Primary replied to PING, replication can continue... 67965:S 09 Jan 2026 14:31:22.258 * Partial resynchronization not possible (no cached primary) 67965:S 09 Jan 2026 14:31:22.263 * Full resync from primary: a54e9798a181182f85e4df5fc9ed0d1503eedac1:0 67965:S 09 Jan 2026 14:31:22.267 * Replica main thread creating Bio thread to save RDB to disk 67965:S 09 Jan 2026 14:31:22.267 * Replica bio thread: PRIMARY <-> REPLICA sync: receiving streamed RDB from primary with EOF to disk 67965:S 09 Jan 2026 14:31:22.268 * Replica bio thread: Done downloading RDB 67965:S 09 Jan 2026 14:31:22.268 # DEBUG LOG: ========== I am replica 4 ========== 67965:S 09 Jan 2026 14:31:23.144 * Replica main thread detected RDB download completion in Bio thread 67965:S 09 Jan 2026 14:31:23.144 * Loading the RDB and finalizing primary-replica sync... 67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Flushing old data 67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Loading DB in memory 67965:S 09 Jan 2026 14:31:23.150 * Loading RDB produced by Valkey version 255.255.255 67965:S 09 Jan 2026 14:31:23.150 * RDB age 1 seconds 67965:S 09 Jan 2026 14:31:23.150 * RDB memory usage when created 2.94 Mb 67965:S 09 Jan 2026 14:31:23.150 * Done loading RDB, keys loaded: 0, keys expired: 0. 67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Finished with success ### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl 67965:S 09 Jan 2026 14:31:31.876 * Cluster reset (user request from 'id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= user=default lib-name= lib-ver='). 67965:S 09 Jan 2026 14:31:31.876 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b 67965:M 09 Jan 2026 14:31:31.876 * Connection with primary lost. 67965:M 09 Jan 2026 14:31:31.876 * Caching the disconnected primary state. 67965:M 09 Jan 2026 14:31:31.876 * Discarding previously cached primary state. 67965:M 09 Jan 2026 14:31:31.876 * Setting secondary replication ID to a54e9798a181182f85e4df5fc9ed0d1503eedac1, valid up to offset: 15. New replication ID is 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e 67965:M 09 Jan 2026 14:31:31.877 # Cluster state changed: fail 67965:M 09 Jan 2026 14:31:31.877 # Cluster is currently down: I am part of a minority partition. ### Starting test Check for memory leaks (pid 68135) in tests/unit/cluster/replica-migration.tcl 67965:M 09 Jan 2026 14:31:31.907 - Accepting cluster node connection from 127.0.0.1:59048 67965:M 09 Jan 2026 14:31:31.921 - Accepting cluster node connection from 127.0.0.1:59049 67965:M 09 Jan 2026 14:31:31.921 - Accepted 127.0.0.1:59050 67965:M 09 Jan 2026 14:31:31.940 * Replica 127.0.0.1:21115 asks for synchronization 67965:M 09 Jan 2026 14:31:31.940 * Partial resynchronization request from 127.0.0.1:21115 accepted. Sending 0 bytes of backlog starting from offset 15. 67965:M 09 Jan 2026 14:31:31.959 - Accepting cluster node connection from 127.0.0.1:59051 67965:M 09 Jan 2026 14:31:31.962 - Accepting cluster node connection from 127.0.0.1:59052 67965:M 09 Jan 2026 14:31:32.636 - Client closed connection id=7 addr=127.0.0.1:59050 laddr=127.0.0.1:21111 fd=17 name= age=1 idle=1 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=333 tot-net-out=82 tot-cmds=6 67965:M 09 Jan 2026 14:31:32.636 * Connection with replica 127.0.0.1:21115 lost. ### Starting test Check for memory leaks (pid 68110) in tests/unit/cluster/replica-migration.tcl 67965:M 09 Jan 2026 14:31:32.931 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 5223ecc8e829765d5214f99ba5453da981d4c391 67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3 67965:M 09 Jan 2026 14:31:32.933 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration. 67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard e736912dd440bb6b0dc0c0f0162c7657b261bb2e 67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb 67965:M 09 Jan 2026 14:31:33.032 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused ### Starting test Check for memory leaks (pid 68084) in tests/unit/cluster/replica-migration.tcl 67965:M 09 Jan 2026 14:31:33.133 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.133 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.234 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.234 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.335 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.335 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.343 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 7cf2b27ddcb5ef258caf9b4a1f109f1824d65311 67965:M 09 Jan 2026 14:31:33.436 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.436 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused ### Starting test Check for memory leaks (pid 68056) in tests/unit/cluster/replica-migration.tcl 67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.537 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.638 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.738 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.839 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused 67965:M 09 Jan 2026 14:31:33.884 - Client closed connection id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= age=12 idle=2 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=628 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|reset user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=400 tot-net-out=4216 tot-cmds=10 67965:M 09 Jan 2026 14:31:34.165 * NODE 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () possibly failing. 67965:M 09 Jan 2026 14:31:34.165 * NODE c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () possibly failing. 67965:M 09 Jan 2026 14:31:34.166 * Cluster state changed: ok 67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused 67965:M 09 Jan 2026 14:31:34.166 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused 67965:M 09 Jan 2026 14:31:34.166 - Connection with Node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d at 127.0.0.1:31112 failed: Connection refused 67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused 67965:signal-handler (1767940294) Received SIGTERM scheduling shutdown... 67965:M 09 Jan 2026 14:31:34.266 * User requested shutdown... 67965:M 09 Jan 2026 14:31:34.266 * Removing the pid file. 67965:M 09 Jan 2026 14:31:34.266 * Saving the cluster configuration file before exiting. 67965:M 09 Jan 2026 14:31:34.280 * Removing the unix socket file. 67965:M 09 Jan 2026 14:31:34.280 # Valkey is now ready to exit, bye bye...

Signed-off-by: Binbin <binloveplay1314@qq.com>

When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>

When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see #2283. However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of #2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>

Signed-off-by: Binbin <binloveplay1314@qq.com>

PingXie · 2026-02-24T05:36:03Z

@enjoy-binbin, I don't think we should take this change. Please see my concerns at #2989 (comment).

enjoy-binbin · 2026-02-24T09:57:55Z

@PingXie Yes, i understand the concerns. I am just trying to merge the unstable code. Since we are all here, i looked at it again.

I wonder if we should "promote" the CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED flag to a "global" flag so that we could reuse the #2586 solution (skipping clusterUpdateSlotsConfigWith until the shard id stabilizes).

So in the comments i mention CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED is not working and can't fix the issue.

I think another option could be processing just the shard_id PING extension. BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

And sadly this is not working either. I forgot the details at the very fisrt beginning, but now i have it. updateShardId rely on the node flag to update the shard_id, see #573. So this means we need to fisrt update node flags and then process the shard id extension. #573 will prevent us from updating shard id in here. Unless we find a way to handle these attr in order.

static void updateShardId(clusterNode *node, const char *shard_id) {
    /* Ensure replica shard IDs match their primary's to maintain cluster consistency.
     *
     * Shard ID updates must prioritize the primary, then propagate to replicas.
     * This is critical due to the eventual consistency of shard IDs during cluster
     * expansion. New replicas might replicate from a primary before fully
     * synchronizing shard IDs with the rest of the cluster.
     *
     * Without this enforcement, a temporary inconsistency can arise where a
     * replica's shard ID diverges from its primary's. This inconsistency is
     * persisted in the primary's nodes.conf file. While this divergence will
     * eventually resolve, if the primary crashes beforehand, it will enter a
     * crash-restart loop due to the mismatch in its nodes.conf. */
    if (shard_id && nodeIsReplica(node) &&
        memcmp(clusterNodeGetPrimary(node)->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
        serverLog(
            LL_NOTICE,
            "Shard id %.40s update request for node id %.40s diverges from existing primary shard id %.40s, rejecting!",
            shard_id, node->name, clusterNodeGetPrimary(node)->shard_id);
        return;
    }

Do you have other ideas? It's getting really tricky. @hpatro feel free to jump in.

Signed-off-by: Binbin <binloveplay1314@qq.com>

…3192) When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Signed-off-by: Binbin <binloveplay1314@qq.com>

zuiderkwast · 2026-04-10T08:44:23Z

@enjoy-binbin I didn't follow the discussion. Did you change the implementation after Ping's concern (#2989 (comment), #2989 (comment))?

enjoy-binbin · 2026-04-10T08:58:36Z

Sorry, it is a tricky issue, the fix is not perfect, i guess we need more people to kick in, let me try to summarize the situation.

Move the call to clusterProcessPingExtensions to the very beginning. Ping raised concerns regarding this approach here: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)
Use CLUSTERMSG_EXT_TYPE_FORGOTTEN_NODE, the solution mentioned in Fix two primaries scenario due to unknown shard_id #2586. However, this approach fails to resolve the issue; see: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)
Introduce a new clusterProcessShardIdExtension and move only the shard_id processing to the very beginning. Moving it to the front also appears insufficient to resolve the issue (see updateShardId function); see: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)

This PR adopts a compromise: it introduces clusterProcessShardIdExtension, but instead of moving it to the absolute beginning, it simply invokes it before clusterUpdateSlotsConfigWith. This is not a perfect solution either, as the shard_id may vary during the course of packet processing.

Try handling the extension before calling clusterUpdateSlotsConfigWith

11b23ff

Signed-off-by: Binbin <binloveplay1314@qq.com>

github-actions Bot assigned enjoy-binbin Dec 29, 2025

Update test

94114b0

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin commented Dec 29, 2025

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

PingXie reviewed Jan 4, 2026

View reviewed changes

some code review from Ping

49f86b4

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added this to Valkey 9.1 Feb 9, 2026

enjoy-binbin moved this to Needs Review in Valkey 9.1 Feb 9, 2026

enjoy-binbin mentioned this pull request Feb 12, 2026

Logging fix or improvement around new shard ID generation #3192

Merged

madolson requested a review from PingXie February 23, 2026 17:09

Merge remote-tracking branch 'upstream/unstable' into extension_fix

fb482c0

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin commented Feb 24, 2026

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

Apply suggestion from @enjoy-binbin

ad4bf5d

Signed-off-by: Binbin <binloveplay1314@qq.com>

tmp commit

1589f66

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin added 2 commits April 3, 2026 18:29

Merge remote-tracking branch 'upstream/unstable' into extension_fix

947be62

Signed-off-by: Binbin <binloveplay1314@qq.com>

only process shard_id

f22d4c7

Signed-off-by: Binbin <binloveplay1314@qq.com>

enjoy-binbin changed the title ~~Try handling the extension before calling clusterUpdateSlotsConfigWith~~ Fix empty shard reconfiguration after CLUSTER RESET SOFT Apr 3, 2026

enjoy-binbin requested a review from zuiderkwast April 3, 2026 10:41

madolson added this to Valkey 10 May 18, 2026

madolson moved this to Needs Review in Valkey 10 May 18, 2026

madolson removed this from Valkey 9.1 May 18, 2026

Uh oh!

Conversation

enjoy-binbin commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enjoy-binbin commented Dec 29, 2025

Uh oh!

Uh oh!

codecov Bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PingXie Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

deepakrn Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PingXie commented Feb 24, 2026

Uh oh!

enjoy-binbin commented Feb 24, 2026

Uh oh!

zuiderkwast commented Apr 10, 2026

Uh oh!

enjoy-binbin commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

enjoy-binbin commented Dec 29, 2025 •

edited

Loading

codecov Bot commented Dec 29, 2025 •

edited

Loading