Skip to content

Fix empty shard reconfiguration after CLUSTER RESET SOFT#2989

Open
enjoy-binbin wants to merge 8 commits into
valkey-io:unstablefrom
enjoy-binbin:extension_fix
Open

Fix empty shard reconfiguration after CLUSTER RESET SOFT#2989
enjoy-binbin wants to merge 8 commits into
valkey-io:unstablefrom
enjoy-binbin:extension_fix

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented Dec 29, 2025

Copy link
Copy Markdown
Member

This change started with #445, which means it has been present since Valkey 8.0.

In this case, an emptry shard, primary and replica, when the replica do a
CLUSTER RESET SOFT, replica will become a new primary, and the primary will
become a replica, that is wrong in this case. The configuration is incorrectly
inverted: R4 becomes the new primary and R0 becomes a replica of R4.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
    R 1 cluster ADDSLOTSRANGE 0 5460
    R 2 cluster ADDSLOTSRANGE 5461 10922
    R 3 cluster ADDSLOTSRANGE 10923 16383
}

start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
    test "Empty shard will not be reconfigured after the cluster soft reset" {
        R 4 cluster reset soft

        # R0 will become a replica of R4.
    }
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

The reason is that in clusterUpdateSlotsConfigWith, we have this logic, to
handle failover within empty shards:

    /* Handle a special case where new_primary is not set but both sender
     * and myself own no slots and in the same shard. Set the sender as
     * the new primary if my current config epoch is lower than the
     * sender's. Make sure the empty shard can be reconfigured later
     * after a failover. */
    if (!new_primary && myself->replicaof != sender && sender_slots == 0 && myself->numslots == 0 &&
        nodeEpoch(myself) < senderConfigEpoch && are_in_same_shard) {
        new_primary = sender;
    }

But we don't have the right shard_id, so the CLUSTER RESET SOFT case become
a FAILOVER case and cause the trouble. See #2586 for more details.

Extract the shard_id extension processing into a separate lightweight
function (clusterProcessShardIdExtension) and call it before
clusterUpdateSlotsConfigWith. This ensures the sender's shard_id is
up-to-date when making shard membership decisions, without triggering
side effects from other extensions like FORGOTTEN_NODE. The subsequent
updateShardId call in clusterProcessPingExtensions becomes a no-op
since the shard_id is already current.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

Copy link
Copy Markdown
Member Author

I'm not sure if we can do this, also see #2586 for more details. It's affecting our control panel.

In this case, an emptry shard, primary and replica, when the replica do a CLUSTER RESET SOFT,
replica will become a new primary, and the primary will become a replica, that is wrong in this case.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
    R 1 cluster ADDSLOTSRANGE 0 5460
    R 2 cluster ADDSLOTSRANGE 5461 10922
    R 3 cluster ADDSLOTSRANGE 10923 16383
}

start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
    test "Empty shard will not be reconfigured after the cluster soft reset" {
        R 4 cluster reset soft

        # R0 will become a replica of R4.
    }
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

The reason is that in clusterUpdateSlotsConfigWith, we have this logic:

    /* Handle a special case where new_primary is not set but both sender
     * and myself own no slots and in the same shard. Set the sender as
     * the new primary if my current config epoch is lower than the
     * sender's. Make sure the empty shard can be reconfigured later
     * after a failover. */
    if (!new_primary && myself->replicaof != sender && sender_slots == 0 && myself->numslots == 0 &&
        nodeEpoch(myself) < senderConfigEpoch && are_in_same_shard) {
        new_primary = sender;
    }

But we don't have the right shard_id, so the CLUSTER RESET SOFT case become a FAILOVER case and cause the trouble.

@PingXie @madolson @hpatro Do you guys have any good ideas?

Comment thread src/cluster_legacy.c Outdated
@codecov

codecov Bot commented Dec 29, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.56%. Comparing base (1db8bab) to head (f22d4c7).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2989      +/-   ##
============================================
+ Coverage     76.53%   76.56%   +0.02%     
============================================
  Files           157      157              
  Lines         79025    79039      +14     
============================================
+ Hits          60481    60514      +33     
+ Misses        18544    18525      -19     
Files with missing lines Coverage Δ
src/cluster_legacy.c 88.27% <100.00%> (+0.25%) ⬆️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated
Comment thread src/cluster_legacy.c Outdated

/* We try to process extensions before the clusterUpdateSlotsConfigWith,
* because it relies on extensions such as shard_id. */
clusterProcessPingExtensions(hdr, link);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this solution in #2586 and concluded that the behavior is hard to reason about. also there is code consuming shard_id before this line in this function so it is not clear to me how to justify the change. lastly, I am generally against calling a state mutating method like clusterProcessPingExtentions in the middle of a work flow. for instance, this method can delete forgotten nodes reported in CLUSTERMSG_EXT_TYPE_FORGOTTEN_NODE. I think another option could be processing just the shard_id PING extension. BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

I wonder if we should "promote" the CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED flag to a "global" flag so that we could reuse the #2586 solution (skipping clusterUpdateSlotsConfigWith until the shard id stabilizes).

Cc @deepakrn

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED can't fix the reset soft case if i remember correctly, i did try it on that branch, the old primary node still became a replica, although i did not take a deep look and figure it up.

BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

Yes, I remember some of the problems here, as i remember this can not fix the reset soft case. We may need to wait for the role change before updating the shard_id. The new shard_id was rejected here.

static void updateShardId(clusterNode *node, const char *shard_id) {
    /* Ensure replica shard IDs match their primary's to maintain cluster consistency.
     *
     * Shard ID updates must prioritize the primary, then propagate to replicas.
     * This is critical due to the eventual consistency of shard IDs during cluster
     * expansion. New replicas might replicate from a primary before fully
     * synchronizing shard IDs with the rest of the cluster.
     *
     * Without this enforcement, a temporary inconsistency can arise where a
     * replica's shard ID diverges from its primary's. This inconsistency is
     * persisted in the primary's nodes.conf file. While this divergence will
     * eventually resolve, if the primary crashes beforehand, it will enter a
     * crash-restart loop due to the mismatch in its nodes.conf. */
    if (shard_id && nodeIsReplica(node) &&
        memcmp(clusterNodeGetPrimary(node)->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
        serverLog(
            LL_NOTICE,
            "Shard id %.40s update request for node id %.40s diverges from existing primary shard id %.40s, rejecting!",
            shard_id, node->name, clusterNodeGetPrimary(node)->shard_id);
        return;
    }

I think another option could be processing just the shard_id PING extension

I can give it a try, if we ultimately decide to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enjoy-binbin - I would like to understand in what scenario would the solution in #2586 not work. How does cluster reset soft cause a failover? Is there a way to guard the thing causing failover using the SHARD_ID_UNINITIALIZED flag?

Marking of shard_id as uninitialized for a particular node until it receives a direct ping will let us potentially ignore any type of updates from that node.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run this test on your branch, i once try it and it fail in your branch, i don't have the details right now though.

# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
    R 1 cluster ADDSLOTSRANGE 0 5460
    R 2 cluster ADDSLOTSRANGE 5461 10922
    R 3 cluster ADDSLOTSRANGE 10923 16383
}

start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
    test "Empty shard will not be reconfigured after the cluster soft reset" {
        R 4 cluster reset soft

        # R0 will become a replica of R4.
    }
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git sha is your branch

R 0 logs:

### Starting server for test 
68135:M 09 Jan 2026 14:31:21.695 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
68135:M 09 Jan 2026 14:31:21.695 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=68135, just started
68135:M 09 Jan 2026 14:31:21.695 * Configuration loaded
68135:M 09 Jan 2026 14:31:21.696 * monotonic clock: POSIX clock_gettime
68135:M 09 Jan 2026 14:31:21.696 # Failed to write PID file: Permission denied
                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 255.255.255 (07c84f5e/0) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in cluster mode
 |####+'     .+#######+.     '+####|    Port: 21115
 |###|   .+###############+.   |###|    PID: 68135                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+'                                                

68135:M 09 Jan 2026 14:31:21.697 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
68135:M 09 Jan 2026 14:31:21.697 * No cluster configuration found, I'm 147548acc7ff529db0b822f84c0b4f2bf1c4a009
68135:M 09 Jan 2026 14:31:21.711 * Server initialized
68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections tcp
68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections unix
68135:M 09 Jan 2026 14:31:21.823 - Accepted 127.0.0.1:59019
68135:M 09 Jan 2026 14:31:21.823 - Client closed connection id=2 addr=127.0.0.1:59019 laddr=127.0.0.1:21115 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
68135:M 09 Jan 2026 14:31:21.830 - Accepted 127.0.0.1:59020
68135:M 09 Jan 2026 14:31:21.831 * configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21114 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21113 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21112 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21111 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.843 # Missing implement of connection type tls
68135:M 09 Jan 2026 14:31:21.936 - Accepting cluster node connection from 127.0.0.1:59025
68135:M 09 Jan 2026 14:31:21.936 * IP address for this node updated to 127.0.0.1
68135:M 09 Jan 2026 14:31:21.936 * Successfully completed handshake with 4cf8e6e4ee002bbb059f58cd7715df616e2322df ()
68135:M 09 Jan 2026 14:31:21.950 - Accepting cluster node connection from 127.0.0.1:59026
68135:M 09 Jan 2026 14:31:21.950 * Successfully completed handshake with a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d ()
68135:M 09 Jan 2026 14:31:21.950 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 933ced2c2666ea7bc70aad26df33b1dc7b41e62a
68135:M 09 Jan 2026 14:31:21.969 - Accepting cluster node connection from 127.0.0.1:59027
68135:M 09 Jan 2026 14:31:21.969 * Successfully completed handshake with c1f3aa72acdb4331c5d282ba6bfee27445dcd96a ()
68135:M 09 Jan 2026 14:31:21.969 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 95343bcf8cb28e42821c3e311ac311f6a9a1a1d5
68135:M 09 Jan 2026 14:31:22.006 - Accepting cluster node connection from 127.0.0.1:59028
68135:M 09 Jan 2026 14:31:22.006 * Successfully completed handshake with 7872677129aedb6bcc9ff2ceb05b23df272c1c67 ()
68135:M 09 Jan 2026 14:31:22.006 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 5b45a73ed27120220e92fb78f2d21b6e6d3050f7
68135:M 09 Jan 2026 14:31:22.235 - Accepted 127.0.0.1:59041
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is no longer primary of shard 502a2cf81c5905cc12979f271291e73e6cc2b0bf; removed all 0 slot(s) it used to own
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now part of shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now a replica of node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:22.258 # DEBUG LOG: ========== I am primary 0 ==========
68135:M 09 Jan 2026 14:31:22.263 * Replica 127.0.0.1:21111 asks for synchronization
68135:M 09 Jan 2026 14:31:22.263 * Full resync requested by replica 127.0.0.1:21111
68135:M 09 Jan 2026 14:31:22.263 * Replication backlog created, my new replication IDs are 'a54e9798a181182f85e4df5fc9ed0d1503eedac1' and '0000000000000000000000000000000000000000'
68135:M 09 Jan 2026 14:31:22.263 * Starting BGSAVE for SYNC with target: replicas sockets using: normal sync
68135:M 09 Jan 2026 14:31:22.263 * Background RDB transfer started by pid 68172 to pipe through parent process
68172:C 09 Jan 2026 14:31:22.264 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
68135:M 09 Jan 2026 14:31:22.267 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
68135:M 09 Jan 2026 14:31:22.321 * Background RDB transfer terminated with success
68135:M 09 Jan 2026 14:31:22.321 * Streamed RDB transfer with replica 127.0.0.1:21111 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
68135:M 09 Jan 2026 14:31:22.321 * Synchronization with replica 127.0.0.1:21111 succeeded
68135:M 09 Jan 2026 14:31:23.750 * Cluster state changed: ok
### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl
68135:M 09 Jan 2026 14:31:31.876 - Client closed connection id=11 addr=127.0.0.1:59041 laddr=127.0.0.1:21115 fd=23 name= age=9 idle=0 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=replconf user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=599 tot-net-out=41 tot-cmds=15
68135:M 09 Jan 2026 14:31:31.876 * Connection with replica 127.0.0.1:21111 lost.
68135:M 09 Jan 2026 14:31:31.888 - Client closed connection id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= age=10 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=624 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|info user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=6028 tot-net-out=227007 tot-cmds=203
68135:M 09 Jan 2026 14:31:31.921 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:31.921 * Mismatch in topology information for sender node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:31.921 * Configuration change detected. Reconfiguring myself as a replica of node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:S 09 Jan 2026 14:31:31.921 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer.
68135:S 09 Jan 2026 14:31:31.921 * Connecting to PRIMARY 127.0.0.1:21111
68135:S 09 Jan 2026 14:31:31.921 * PRIMARY <-> REPLICA sync started
68135:S 09 Jan 2026 14:31:31.940 * Non blocking connect for SYNC fired the event.
68135:S 09 Jan 2026 14:31:31.940 * Primary replied to PING, replication can continue...
68135:S 09 Jan 2026 14:31:31.940 * (Non critical) Primary does not understand REPLCONF SET-CLUSTER-NODE-ID: -ERR Unknown node 147548acc7ff529db0b822f84c0b4f2bf1c4a009
68135:S 09 Jan 2026 14:31:31.940 * Trying a partial resynchronization (request a54e9798a181182f85e4df5fc9ed0d1503eedac1:15).
68135:S 09 Jan 2026 14:31:31.940 * Successful partial resynchronization with primary.
68135:S 09 Jan 2026 14:31:31.940 * Primary replication ID changed to 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e
68135:S 09 Jan 2026 14:31:31.940 * PRIMARY <-> REPLICA sync: Primary accepted a Partial Resynchronization.
68135:signal-handler (1767940292) Received SIGTERM scheduling shutdown...
68135:S 09 Jan 2026 14:31:32.614 - Accepting cluster node connection from 127.0.0.1:59053
68135:S 09 Jan 2026 14:31:32.614 * User requested shutdown...
68135:S 09 Jan 2026 14:31:32.614 * Removing the pid file.
68135:S 09 Jan 2026 14:31:32.614 * Saving the cluster configuration file before exiting.
68135:S 09 Jan 2026 14:31:32.635 * Removing the unix socket file.
68135:S 09 Jan 2026 14:31:32.636 # Valkey is now ready to exit, bye bye...

R 4 logs:

### Starting server for test 
67965:M 09 Jan 2026 14:31:21.117 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
67965:M 09 Jan 2026 14:31:21.117 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=67965, just started
67965:M 09 Jan 2026 14:31:21.117 * Configuration loaded
67965:M 09 Jan 2026 14:31:21.117 * monotonic clock: POSIX clock_gettime
67965:M 09 Jan 2026 14:31:21.118 # Failed to write PID file: Permission denied
                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 255.255.255 (07c84f5e/0) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in cluster mode
 |####+'     .+#######+.     '+####|    Port: 21111
 |###|   .+###############+.   |###|    PID: 67965                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+'                                                

67965:M 09 Jan 2026 14:31:21.118 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
67965:M 09 Jan 2026 14:31:21.118 * No cluster configuration found, I'm 4cf8e6e4ee002bbb059f58cd7715df616e2322df
67965:M 09 Jan 2026 14:31:21.127 * Server initialized
67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections tcp
67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections unix
67965:M 09 Jan 2026 14:31:21.213 - Accepted 127.0.0.1:59011
67965:M 09 Jan 2026 14:31:21.214 - Client closed connection id=2 addr=127.0.0.1:59011 laddr=127.0.0.1:21111 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
67965:M 09 Jan 2026 14:31:21.220 - Accepted 127.0.0.1:59012
67965:M 09 Jan 2026 14:31:21.836 * configEpoch set to 5 via CLUSTER SET-CONFIG-EPOCH
67965:M 09 Jan 2026 14:31:21.915 - Accepting cluster node connection from 127.0.0.1:59024
67965:M 09 Jan 2026 14:31:21.915 * IP address for this node updated to 127.0.0.1
67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 851fa9cd0d51c55a6354e49a14f03e65f2001364
67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb
67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard c39496129f886c8d26a09d3592055a0b84710864
67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 20e3596ffbe2293fe219127b9bbaedb706a6fc17
67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard e8f0a8d091f60e24770322126f56b3cf7d155b59
67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3
67965:M 09 Jan 2026 14:31:22.162 - Accepting cluster node connection from 127.0.0.1:59034
67965:M 09 Jan 2026 14:31:22.182 - Accepting cluster node connection from 127.0.0.1:59037
67965:M 09 Jan 2026 14:31:22.207 - Accepting cluster node connection from 127.0.0.1:59038
67965:M 09 Jan 2026 14:31:22.234 # Missing implement of connection type tls
67965:S 09 Jan 2026 14:31:22.235 * Connecting to PRIMARY 127.0.0.1:21115
67965:S 09 Jan 2026 14:31:22.235 * PRIMARY <-> REPLICA sync started
67965:S 09 Jan 2026 14:31:22.235 * Cluster state changed: ok
67965:S 09 Jan 2026 14:31:22.236 * Non blocking connect for SYNC fired the event.
67965:S 09 Jan 2026 14:31:22.236 * Primary replied to PING, replication can continue...
67965:S 09 Jan 2026 14:31:22.258 * Partial resynchronization not possible (no cached primary)
67965:S 09 Jan 2026 14:31:22.263 * Full resync from primary: a54e9798a181182f85e4df5fc9ed0d1503eedac1:0
67965:S 09 Jan 2026 14:31:22.267 * Replica main thread creating Bio thread to save RDB to disk
67965:S 09 Jan 2026 14:31:22.267 * Replica bio thread: PRIMARY <-> REPLICA sync: receiving streamed RDB from primary with EOF to disk
67965:S 09 Jan 2026 14:31:22.268 * Replica bio thread: Done downloading RDB
67965:S 09 Jan 2026 14:31:22.268 # DEBUG LOG: ========== I am replica 4 ==========
67965:S 09 Jan 2026 14:31:23.144 * Replica main thread detected RDB download completion in Bio thread
67965:S 09 Jan 2026 14:31:23.144 * Loading the RDB and finalizing primary-replica sync...
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Flushing old data
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Loading DB in memory
67965:S 09 Jan 2026 14:31:23.150 * Loading RDB produced by Valkey version 255.255.255
67965:S 09 Jan 2026 14:31:23.150 * RDB age 1 seconds
67965:S 09 Jan 2026 14:31:23.150 * RDB memory usage when created 2.94 Mb
67965:S 09 Jan 2026 14:31:23.150 * Done loading RDB, keys loaded: 0, keys expired: 0.
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Finished with success
### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl
67965:S 09 Jan 2026 14:31:31.876 * Cluster reset (user request from 'id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= user=default lib-name= lib-ver=').
67965:S 09 Jan 2026 14:31:31.876 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b
67965:M 09 Jan 2026 14:31:31.876 * Connection with primary lost.
67965:M 09 Jan 2026 14:31:31.876 * Caching the disconnected primary state.
67965:M 09 Jan 2026 14:31:31.876 * Discarding previously cached primary state.
67965:M 09 Jan 2026 14:31:31.876 * Setting secondary replication ID to a54e9798a181182f85e4df5fc9ed0d1503eedac1, valid up to offset: 15. New replication ID is 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e
67965:M 09 Jan 2026 14:31:31.877 # Cluster state changed: fail
67965:M 09 Jan 2026 14:31:31.877 # Cluster is currently down: I am part of a minority partition.
### Starting test Check for memory leaks (pid 68135) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:31.907 - Accepting cluster node connection from 127.0.0.1:59048
67965:M 09 Jan 2026 14:31:31.921 - Accepting cluster node connection from 127.0.0.1:59049
67965:M 09 Jan 2026 14:31:31.921 - Accepted 127.0.0.1:59050
67965:M 09 Jan 2026 14:31:31.940 * Replica 127.0.0.1:21115 asks for synchronization
67965:M 09 Jan 2026 14:31:31.940 * Partial resynchronization request from 127.0.0.1:21115 accepted. Sending 0 bytes of backlog starting from offset 15.
67965:M 09 Jan 2026 14:31:31.959 - Accepting cluster node connection from 127.0.0.1:59051
67965:M 09 Jan 2026 14:31:31.962 - Accepting cluster node connection from 127.0.0.1:59052
67965:M 09 Jan 2026 14:31:32.636 - Client closed connection id=7 addr=127.0.0.1:59050 laddr=127.0.0.1:21111 fd=17 name= age=1 idle=1 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=333 tot-net-out=82 tot-cmds=6
67965:M 09 Jan 2026 14:31:32.636 * Connection with replica 127.0.0.1:21115 lost.
### Starting test Check for memory leaks (pid 68110) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:32.931 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 5223ecc8e829765d5214f99ba5453da981d4c391
67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3
67965:M 09 Jan 2026 14:31:32.933 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard e736912dd440bb6b0dc0c0f0162c7657b261bb2e
67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb
67965:M 09 Jan 2026 14:31:33.032 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
### Starting test Check for memory leaks (pid 68084) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:33.133 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.133 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.234 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.234 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.335 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.335 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.343 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 7cf2b27ddcb5ef258caf9b4a1f109f1824d65311
67965:M 09 Jan 2026 14:31:33.436 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.436 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
### Starting test Check for memory leaks (pid 68056) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.884 - Client closed connection id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= age=12 idle=2 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=628 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|reset user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=400 tot-net-out=4216 tot-cmds=10
67965:M 09 Jan 2026 14:31:34.165 * NODE 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () possibly failing.
67965:M 09 Jan 2026 14:31:34.165 * NODE c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () possibly failing.
67965:M 09 Jan 2026 14:31:34.166 * Cluster state changed: ok
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d at 127.0.0.1:31112 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:signal-handler (1767940294) Received SIGTERM scheduling shutdown...
67965:M 09 Jan 2026 14:31:34.266 * User requested shutdown...
67965:M 09 Jan 2026 14:31:34.266 * Removing the pid file.
67965:M 09 Jan 2026 14:31:34.266 * Saving the cluster configuration file before exiting.
67965:M 09 Jan 2026 14:31:34.280 * Removing the unix socket file.
67965:M 09 Jan 2026 14:31:34.280 # Valkey is now ready to exit, bye bye...

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin moved this to Needs Review in Valkey 9.1 Feb 9, 2026
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Feb 12, 2026
When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
@madolson madolson requested a review from PingXie February 23, 2026 17:09
enjoy-binbin added a commit that referenced this pull request Feb 24, 2026
When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see #2283.

However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of #2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/cluster_legacy.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
@PingXie

PingXie commented Feb 24, 2026

Copy link
Copy Markdown
Member

@enjoy-binbin, I don't think we should take this change. Please see my concerns at #2989 (comment).

@enjoy-binbin

Copy link
Copy Markdown
Member Author

@PingXie Yes, i understand the concerns. I am just trying to merge the unstable code. Since we are all here, i looked at it again.

I wonder if we should "promote" the CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED flag to a "global" flag so that we could reuse the #2586 solution (skipping clusterUpdateSlotsConfigWith until the shard id stabilizes).

So in the comments i mention CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED is not working and can't fix the issue.

I think another option could be processing just the shard_id PING extension. BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?

And sadly this is not working either. I forgot the details at the very fisrt beginning, but now i have it. updateShardId rely on the node flag to update the shard_id, see #573. So this means we need to fisrt update node flags and then process the shard id extension. #573 will prevent us from updating shard id in here. Unless we find a way to handle these attr in order.

static void updateShardId(clusterNode *node, const char *shard_id) {
    /* Ensure replica shard IDs match their primary's to maintain cluster consistency.
     *
     * Shard ID updates must prioritize the primary, then propagate to replicas.
     * This is critical due to the eventual consistency of shard IDs during cluster
     * expansion. New replicas might replicate from a primary before fully
     * synchronizing shard IDs with the rest of the cluster.
     *
     * Without this enforcement, a temporary inconsistency can arise where a
     * replica's shard ID diverges from its primary's. This inconsistency is
     * persisted in the primary's nodes.conf file. While this divergence will
     * eventually resolve, if the primary crashes beforehand, it will enter a
     * crash-restart loop due to the mismatch in its nodes.conf. */
    if (shard_id && nodeIsReplica(node) &&
        memcmp(clusterNodeGetPrimary(node)->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
        serverLog(
            LL_NOTICE,
            "Shard id %.40s update request for node id %.40s diverges from existing primary shard id %.40s, rejecting!",
            shard_id, node->name, clusterNodeGetPrimary(node)->shard_id);
        return;
    }

Do you have other ideas? It's getting really tricky. @hpatro feel free to jump in.

Signed-off-by: Binbin <binloveplay1314@qq.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…3192)

When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin changed the title Try handling the extension before calling clusterUpdateSlotsConfigWith Fix empty shard reconfiguration after CLUSTER RESET SOFT Apr 3, 2026
@enjoy-binbin enjoy-binbin requested a review from zuiderkwast April 3, 2026 10:41
@zuiderkwast

Copy link
Copy Markdown
Contributor

@enjoy-binbin I didn't follow the discussion. Did you change the implementation after Ping's concern (#2989 (comment), #2989 (comment))?

@enjoy-binbin

Copy link
Copy Markdown
Member Author

Sorry, it is a tricky issue, the fix is not perfect, i guess we need more people to kick in, let me try to summarize the situation.

  1. Move the call to clusterProcessPingExtensions to the very beginning. Ping raised concerns regarding this approach here: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)
  2. Use CLUSTERMSG_EXT_TYPE_FORGOTTEN_NODE, the solution mentioned in Fix two primaries scenario due to unknown shard_id #2586. However, this approach fails to resolve the issue; see: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)
  3. Introduce a new clusterProcessShardIdExtension and move only the shard_id processing to the very beginning. Moving it to the front also appears insufficient to resolve the issue (see updateShardId function); see: Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989 (comment)

This PR adopts a compromise: it introduces clusterProcessShardIdExtension, but instead of moving it to the absolute beginning, it simply invokes it before clusterUpdateSlotsConfigWith. This is not a perfect solution either, as the shard_id may vary during the course of packet processing.

@madolson madolson moved this to Needs Review in Valkey 10 May 18, 2026
@madolson madolson removed this from Valkey 9.1 May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

5 participants