Fix empty shard reconfiguration after CLUSTER RESET SOFT#2989
Fix empty shard reconfiguration after CLUSTER RESET SOFT#2989enjoy-binbin wants to merge 8 commits into
Conversation
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
I'm not sure if we can do this, also see #2586 for more details. It's affecting our control panel. In this case, an emptry shard, primary and replica, when the replica do a CLUSTER RESET SOFT, The reason is that in clusterUpdateSlotsConfigWith, we have this logic: But we don't have the right shard_id, so the CLUSTER RESET SOFT case become a FAILOVER case and cause the trouble. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2989 +/- ##
============================================
+ Coverage 76.53% 76.56% +0.02%
============================================
Files 157 157
Lines 79025 79039 +14
============================================
+ Hits 60481 60514 +33
+ Misses 18544 18525 -19
🚀 New features to boost your workflow:
|
|
|
||
| /* We try to process extensions before the clusterUpdateSlotsConfigWith, | ||
| * because it relies on extensions such as shard_id. */ | ||
| clusterProcessPingExtensions(hdr, link); |
There was a problem hiding this comment.
I think we discussed this solution in #2586 and concluded that the behavior is hard to reason about. also there is code consuming shard_id before this line in this function so it is not clear to me how to justify the change. lastly, I am generally against calling a state mutating method like clusterProcessPingExtentions in the middle of a work flow. for instance, this method can delete forgotten nodes reported in CLUSTERMSG_EXT_TYPE_FORGOTTEN_NODE. I think another option could be processing just the shard_id PING extension. BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?
I wonder if we should "promote" the CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED flag to a "global" flag so that we could reuse the #2586 solution (skipping clusterUpdateSlotsConfigWith until the shard id stabilizes).
Cc @deepakrn
There was a problem hiding this comment.
CLUSTER_LOCAL_NODE_SHARD_ID_UNINITIALIZED can't fix the reset soft case if i remember correctly, i did try it on that branch, the old primary node still became a replica, although i did not take a deep look and figure it up.
BTW, still do it at the beginning of clusterProcessPacket but I vaguely remember @hpatro had a concern about the cyclic replicaOf detection?
Yes, I remember some of the problems here, as i remember this can not fix the reset soft case. We may need to wait for the role change before updating the shard_id. The new shard_id was rejected here.
static void updateShardId(clusterNode *node, const char *shard_id) {
/* Ensure replica shard IDs match their primary's to maintain cluster consistency.
*
* Shard ID updates must prioritize the primary, then propagate to replicas.
* This is critical due to the eventual consistency of shard IDs during cluster
* expansion. New replicas might replicate from a primary before fully
* synchronizing shard IDs with the rest of the cluster.
*
* Without this enforcement, a temporary inconsistency can arise where a
* replica's shard ID diverges from its primary's. This inconsistency is
* persisted in the primary's nodes.conf file. While this divergence will
* eventually resolve, if the primary crashes beforehand, it will enter a
* crash-restart loop due to the mismatch in its nodes.conf. */
if (shard_id && nodeIsReplica(node) &&
memcmp(clusterNodeGetPrimary(node)->shard_id, shard_id, CLUSTER_NAMELEN) != 0) {
serverLog(
LL_NOTICE,
"Shard id %.40s update request for node id %.40s diverges from existing primary shard id %.40s, rejecting!",
shard_id, node->name, clusterNodeGetPrimary(node)->shard_id);
return;
}
I think another option could be processing just the shard_id PING extension
I can give it a try, if we ultimately decide to.
There was a problem hiding this comment.
@enjoy-binbin - I would like to understand in what scenario would the solution in #2586 not work. How does cluster reset soft cause a failover? Is there a way to guard the thing causing failover using the SHARD_ID_UNINITIALIZED flag?
Marking of shard_id as uninitialized for a particular node until it receives a direct ping will let us potentially ignore any type of updates from that node.
There was a problem hiding this comment.
You can run this test on your branch, i once try it and it fail in your branch, i don't have the details right now though.
# R0 is an empty shard, the slots are distributed evenly among R1/R2/R3.
proc my_slot_allocation2 {masters replicas} {
R 1 cluster ADDSLOTSRANGE 0 5460
R 2 cluster ADDSLOTSRANGE 5461 10922
R 3 cluster ADDSLOTSRANGE 10923 16383
}
start_cluster 4 1 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
test "Empty shard will not be reconfigured after the cluster soft reset" {
R 4 cluster reset soft
# R0 will become a replica of R4.
}
} my_slot_allocation2 cluster_allocate_replicas ;# start_cluster
There was a problem hiding this comment.
The git sha is your branch
R 0 logs:
### Starting server for test
68135:M 09 Jan 2026 14:31:21.695 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
68135:M 09 Jan 2026 14:31:21.695 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=68135, just started
68135:M 09 Jan 2026 14:31:21.695 * Configuration loaded
68135:M 09 Jan 2026 14:31:21.696 * monotonic clock: POSIX clock_gettime
68135:M 09 Jan 2026 14:31:21.696 # Failed to write PID file: Permission denied
.+^+.
.+#########+.
.+########+########+. Valkey 255.255.255 (07c84f5e/0) 64 bit
.+########+' '+########+.
.########+' .+. '+########. Running in cluster mode
|####+' .+#######+. '+####| Port: 21115
|###| .+###############+. |###| PID: 68135
|###| |#####*'' ''*#####| |###|
|###| |####' .-. '####| |###|
|###| |###( (@@@) )###| |###| https://valkey.io
|###| |####. '-' .####| |###|
|###| |#####*. .*#####| |###|
|###| '+#####| |#####+' |###|
|####+. +##| |#+' .+####|
'#######+ |##| .+########'
'+###| |##| .+########+'
'| |####+########+'
+#########+'
'+v+'
68135:M 09 Jan 2026 14:31:21.697 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
68135:M 09 Jan 2026 14:31:21.697 * No cluster configuration found, I'm 147548acc7ff529db0b822f84c0b4f2bf1c4a009
68135:M 09 Jan 2026 14:31:21.711 * Server initialized
68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections tcp
68135:M 09 Jan 2026 14:31:21.711 * Ready to accept connections unix
68135:M 09 Jan 2026 14:31:21.823 - Accepted 127.0.0.1:59019
68135:M 09 Jan 2026 14:31:21.823 - Client closed connection id=2 addr=127.0.0.1:59019 laddr=127.0.0.1:21115 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
68135:M 09 Jan 2026 14:31:21.830 - Accepted 127.0.0.1:59020
68135:M 09 Jan 2026 14:31:21.831 * configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21114 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21113 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21112 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.838 * Cluster meet 127.0.0.1:21111 (user request from 'id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= user=default lib-name= lib-ver=').
68135:M 09 Jan 2026 14:31:21.843 # Missing implement of connection type tls
68135:M 09 Jan 2026 14:31:21.936 - Accepting cluster node connection from 127.0.0.1:59025
68135:M 09 Jan 2026 14:31:21.936 * IP address for this node updated to 127.0.0.1
68135:M 09 Jan 2026 14:31:21.936 * Successfully completed handshake with 4cf8e6e4ee002bbb059f58cd7715df616e2322df ()
68135:M 09 Jan 2026 14:31:21.950 - Accepting cluster node connection from 127.0.0.1:59026
68135:M 09 Jan 2026 14:31:21.950 * Successfully completed handshake with a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d ()
68135:M 09 Jan 2026 14:31:21.950 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 933ced2c2666ea7bc70aad26df33b1dc7b41e62a
68135:M 09 Jan 2026 14:31:21.969 - Accepting cluster node connection from 127.0.0.1:59027
68135:M 09 Jan 2026 14:31:21.969 * Successfully completed handshake with c1f3aa72acdb4331c5d282ba6bfee27445dcd96a ()
68135:M 09 Jan 2026 14:31:21.969 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 95343bcf8cb28e42821c3e311ac311f6a9a1a1d5
68135:M 09 Jan 2026 14:31:22.006 - Accepting cluster node connection from 127.0.0.1:59028
68135:M 09 Jan 2026 14:31:22.006 * Successfully completed handshake with 7872677129aedb6bcc9ff2ceb05b23df272c1c67 ()
68135:M 09 Jan 2026 14:31:22.006 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 5b45a73ed27120220e92fb78f2d21b6e6d3050f7
68135:M 09 Jan 2026 14:31:22.235 - Accepted 127.0.0.1:59041
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is no longer primary of shard 502a2cf81c5905cc12979f271291e73e6cc2b0bf; removed all 0 slot(s) it used to own
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now part of shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:22.236 * Node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () is now a replica of node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:22.258 # DEBUG LOG: ========== I am primary 0 ==========
68135:M 09 Jan 2026 14:31:22.263 * Replica 127.0.0.1:21111 asks for synchronization
68135:M 09 Jan 2026 14:31:22.263 * Full resync requested by replica 127.0.0.1:21111
68135:M 09 Jan 2026 14:31:22.263 * Replication backlog created, my new replication IDs are 'a54e9798a181182f85e4df5fc9ed0d1503eedac1' and '0000000000000000000000000000000000000000'
68135:M 09 Jan 2026 14:31:22.263 * Starting BGSAVE for SYNC with target: replicas sockets using: normal sync
68135:M 09 Jan 2026 14:31:22.263 * Background RDB transfer started by pid 68172 to pipe through parent process
68172:C 09 Jan 2026 14:31:22.264 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
68135:M 09 Jan 2026 14:31:22.267 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
68135:M 09 Jan 2026 14:31:22.321 * Background RDB transfer terminated with success
68135:M 09 Jan 2026 14:31:22.321 * Streamed RDB transfer with replica 127.0.0.1:21111 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
68135:M 09 Jan 2026 14:31:22.321 * Synchronization with replica 127.0.0.1:21111 succeeded
68135:M 09 Jan 2026 14:31:23.750 * Cluster state changed: ok
### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl
68135:M 09 Jan 2026 14:31:31.876 - Client closed connection id=11 addr=127.0.0.1:59041 laddr=127.0.0.1:21115 fd=23 name= age=9 idle=0 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=replconf user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=599 tot-net-out=41 tot-cmds=15
68135:M 09 Jan 2026 14:31:31.876 * Connection with replica 127.0.0.1:21111 lost.
68135:M 09 Jan 2026 14:31:31.888 - Client closed connection id=3 addr=127.0.0.1:59020 laddr=127.0.0.1:21115 fd=14 name= age=10 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=624 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|info user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=6028 tot-net-out=227007 tot-cmds=203
68135:M 09 Jan 2026 14:31:31.921 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:31.921 * Mismatch in topology information for sender node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:M 09 Jan 2026 14:31:31.921 * Configuration change detected. Reconfiguring myself as a replica of node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () in shard adfeced14b0029ddb7708b168a20d24d33bec43b
68135:S 09 Jan 2026 14:31:31.921 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer.
68135:S 09 Jan 2026 14:31:31.921 * Connecting to PRIMARY 127.0.0.1:21111
68135:S 09 Jan 2026 14:31:31.921 * PRIMARY <-> REPLICA sync started
68135:S 09 Jan 2026 14:31:31.940 * Non blocking connect for SYNC fired the event.
68135:S 09 Jan 2026 14:31:31.940 * Primary replied to PING, replication can continue...
68135:S 09 Jan 2026 14:31:31.940 * (Non critical) Primary does not understand REPLCONF SET-CLUSTER-NODE-ID: -ERR Unknown node 147548acc7ff529db0b822f84c0b4f2bf1c4a009
68135:S 09 Jan 2026 14:31:31.940 * Trying a partial resynchronization (request a54e9798a181182f85e4df5fc9ed0d1503eedac1:15).
68135:S 09 Jan 2026 14:31:31.940 * Successful partial resynchronization with primary.
68135:S 09 Jan 2026 14:31:31.940 * Primary replication ID changed to 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e
68135:S 09 Jan 2026 14:31:31.940 * PRIMARY <-> REPLICA sync: Primary accepted a Partial Resynchronization.
68135:signal-handler (1767940292) Received SIGTERM scheduling shutdown...
68135:S 09 Jan 2026 14:31:32.614 - Accepting cluster node connection from 127.0.0.1:59053
68135:S 09 Jan 2026 14:31:32.614 * User requested shutdown...
68135:S 09 Jan 2026 14:31:32.614 * Removing the pid file.
68135:S 09 Jan 2026 14:31:32.614 * Saving the cluster configuration file before exiting.
68135:S 09 Jan 2026 14:31:32.635 * Removing the unix socket file.
68135:S 09 Jan 2026 14:31:32.636 # Valkey is now ready to exit, bye bye...
R 4 logs:
### Starting server for test
67965:M 09 Jan 2026 14:31:21.117 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
67965:M 09 Jan 2026 14:31:21.117 * Valkey version=255.255.255, bits=64, commit=07c84f5e, modified=0, pid=67965, just started
67965:M 09 Jan 2026 14:31:21.117 * Configuration loaded
67965:M 09 Jan 2026 14:31:21.117 * monotonic clock: POSIX clock_gettime
67965:M 09 Jan 2026 14:31:21.118 # Failed to write PID file: Permission denied
.+^+.
.+#########+.
.+########+########+. Valkey 255.255.255 (07c84f5e/0) 64 bit
.+########+' '+########+.
.########+' .+. '+########. Running in cluster mode
|####+' .+#######+. '+####| Port: 21111
|###| .+###############+. |###| PID: 67965
|###| |#####*'' ''*#####| |###|
|###| |####' .-. '####| |###|
|###| |###( (@@@) )###| |###| https://valkey.io
|###| |####. '-' .####| |###|
|###| |#####*. .*#####| |###|
|###| '+#####| |#####+' |###|
|####+. +##| |#+' .+####|
'#######+ |##| .+########'
'+###| |##| .+########+'
'| |####+########+'
+#########+'
'+v+'
67965:M 09 Jan 2026 14:31:21.118 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
67965:M 09 Jan 2026 14:31:21.118 * No cluster configuration found, I'm 4cf8e6e4ee002bbb059f58cd7715df616e2322df
67965:M 09 Jan 2026 14:31:21.127 * Server initialized
67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections tcp
67965:M 09 Jan 2026 14:31:21.127 * Ready to accept connections unix
67965:M 09 Jan 2026 14:31:21.213 - Accepted 127.0.0.1:59011
67965:M 09 Jan 2026 14:31:21.214 - Client closed connection id=2 addr=127.0.0.1:59011 laddr=127.0.0.1:21111 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
67965:M 09 Jan 2026 14:31:21.220 - Accepted 127.0.0.1:59012
67965:M 09 Jan 2026 14:31:21.836 * configEpoch set to 5 via CLUSTER SET-CONFIG-EPOCH
67965:M 09 Jan 2026 14:31:21.915 - Accepting cluster node connection from 127.0.0.1:59024
67965:M 09 Jan 2026 14:31:21.915 * IP address for this node updated to 127.0.0.1
67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard 851fa9cd0d51c55a6354e49a14f03e65f2001364
67965:M 09 Jan 2026 14:31:22.142 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb
67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard c39496129f886c8d26a09d3592055a0b84710864
67965:M 09 Jan 2026 14:31:22.143 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 20e3596ffbe2293fe219127b9bbaedb706a6fc17
67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard e8f0a8d091f60e24770322126f56b3cf7d155b59
67965:M 09 Jan 2026 14:31:22.162 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3
67965:M 09 Jan 2026 14:31:22.162 - Accepting cluster node connection from 127.0.0.1:59034
67965:M 09 Jan 2026 14:31:22.182 - Accepting cluster node connection from 127.0.0.1:59037
67965:M 09 Jan 2026 14:31:22.207 - Accepting cluster node connection from 127.0.0.1:59038
67965:M 09 Jan 2026 14:31:22.234 # Missing implement of connection type tls
67965:S 09 Jan 2026 14:31:22.235 * Connecting to PRIMARY 127.0.0.1:21115
67965:S 09 Jan 2026 14:31:22.235 * PRIMARY <-> REPLICA sync started
67965:S 09 Jan 2026 14:31:22.235 * Cluster state changed: ok
67965:S 09 Jan 2026 14:31:22.236 * Non blocking connect for SYNC fired the event.
67965:S 09 Jan 2026 14:31:22.236 * Primary replied to PING, replication can continue...
67965:S 09 Jan 2026 14:31:22.258 * Partial resynchronization not possible (no cached primary)
67965:S 09 Jan 2026 14:31:22.263 * Full resync from primary: a54e9798a181182f85e4df5fc9ed0d1503eedac1:0
67965:S 09 Jan 2026 14:31:22.267 * Replica main thread creating Bio thread to save RDB to disk
67965:S 09 Jan 2026 14:31:22.267 * Replica bio thread: PRIMARY <-> REPLICA sync: receiving streamed RDB from primary with EOF to disk
67965:S 09 Jan 2026 14:31:22.268 * Replica bio thread: Done downloading RDB
67965:S 09 Jan 2026 14:31:22.268 # DEBUG LOG: ========== I am replica 4 ==========
67965:S 09 Jan 2026 14:31:23.144 * Replica main thread detected RDB download completion in Bio thread
67965:S 09 Jan 2026 14:31:23.144 * Loading the RDB and finalizing primary-replica sync...
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Flushing old data
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Loading DB in memory
67965:S 09 Jan 2026 14:31:23.150 * Loading RDB produced by Valkey version 255.255.255
67965:S 09 Jan 2026 14:31:23.150 * RDB age 1 seconds
67965:S 09 Jan 2026 14:31:23.150 * RDB memory usage when created 2.94 Mb
67965:S 09 Jan 2026 14:31:23.150 * Done loading RDB, keys loaded: 0, keys expired: 0.
67965:S 09 Jan 2026 14:31:23.150 * PRIMARY <-> REPLICA sync: Finished with success
### Starting test Empty shard will not be reconfigured after the cluster soft reset in tests/unit/cluster/replica-migration.tcl
67965:S 09 Jan 2026 14:31:31.876 * Cluster reset (user request from 'id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= user=default lib-name= lib-ver=').
67965:S 09 Jan 2026 14:31:31.876 * Reconfiguring node 4cf8e6e4ee002bbb059f58cd7715df616e2322df () as primary for shard adfeced14b0029ddb7708b168a20d24d33bec43b
67965:M 09 Jan 2026 14:31:31.876 * Connection with primary lost.
67965:M 09 Jan 2026 14:31:31.876 * Caching the disconnected primary state.
67965:M 09 Jan 2026 14:31:31.876 * Discarding previously cached primary state.
67965:M 09 Jan 2026 14:31:31.876 * Setting secondary replication ID to a54e9798a181182f85e4df5fc9ed0d1503eedac1, valid up to offset: 15. New replication ID is 8ac13007fa8077f42ebd10ce787f9ff7cb0c2f2e
67965:M 09 Jan 2026 14:31:31.877 # Cluster state changed: fail
67965:M 09 Jan 2026 14:31:31.877 # Cluster is currently down: I am part of a minority partition.
### Starting test Check for memory leaks (pid 68135) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:31.907 - Accepting cluster node connection from 127.0.0.1:59048
67965:M 09 Jan 2026 14:31:31.921 - Accepting cluster node connection from 127.0.0.1:59049
67965:M 09 Jan 2026 14:31:31.921 - Accepted 127.0.0.1:59050
67965:M 09 Jan 2026 14:31:31.940 * Replica 127.0.0.1:21115 asks for synchronization
67965:M 09 Jan 2026 14:31:31.940 * Partial resynchronization request from 127.0.0.1:21115 accepted. Sending 0 bytes of backlog starting from offset 15.
67965:M 09 Jan 2026 14:31:31.959 - Accepting cluster node connection from 127.0.0.1:59051
67965:M 09 Jan 2026 14:31:31.962 - Accepting cluster node connection from 127.0.0.1:59052
67965:M 09 Jan 2026 14:31:32.636 - Client closed connection id=7 addr=127.0.0.1:59050 laddr=127.0.0.1:21111 fd=17 name= age=1 idle=1 flags=S capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=1 omem=16920 tot-mem=35416 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=333 tot-net-out=82 tot-cmds=6
67965:M 09 Jan 2026 14:31:32.636 * Connection with replica 127.0.0.1:21115 lost.
### Starting test Check for memory leaks (pid 68110) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:32.931 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 5223ecc8e829765d5214f99ba5453da981d4c391
67965:M 09 Jan 2026 14:31:32.933 * Mismatch in topology information for sender node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d () in shard 3deb306961e98805f228273e54e0f3c6cd7bfee3
67965:M 09 Jan 2026 14:31:32.933 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard e736912dd440bb6b0dc0c0f0162c7657b261bb2e
67965:M 09 Jan 2026 14:31:32.944 * Mismatch in topology information for sender node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () in shard a64c0eb757942b1524adcf67a8366ce0165cc3cb
67965:M 09 Jan 2026 14:31:33.032 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
### Starting test Check for memory leaks (pid 68084) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:33.133 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.133 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.234 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.234 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.335 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.335 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.343 * Mismatch in topology information for sender node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 () in shard 7cf2b27ddcb5ef258caf9b4a1f109f1824d65311
67965:M 09 Jan 2026 14:31:33.436 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.436 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
### Starting test Check for memory leaks (pid 68056) in tests/unit/cluster/replica-migration.tcl
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.537 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.638 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.738 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.839 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:M 09 Jan 2026 14:31:33.884 - Client closed connection id=3 addr=127.0.0.1:59012 laddr=127.0.0.1:21111 fd=14 name= age=12 idle=2 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=628 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=cluster|reset user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=400 tot-net-out=4216 tot-cmds=10
67965:M 09 Jan 2026 14:31:34.165 * NODE 147548acc7ff529db0b822f84c0b4f2bf1c4a009 () possibly failing.
67965:M 09 Jan 2026 14:31:34.165 * NODE c1f3aa72acdb4331c5d282ba6bfee27445dcd96a () possibly failing.
67965:M 09 Jan 2026 14:31:34.166 * Cluster state changed: ok
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 147548acc7ff529db0b822f84c0b4f2bf1c4a009 at 127.0.0.1:31115 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node c1f3aa72acdb4331c5d282ba6bfee27445dcd96a at 127.0.0.1:31114 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node a43e1c5ff1dd22da904cdab0e7ea99d3b8e07d3d at 127.0.0.1:31112 failed: Connection refused
67965:M 09 Jan 2026 14:31:34.166 - Connection with Node 7872677129aedb6bcc9ff2ceb05b23df272c1c67 at 127.0.0.1:31113 failed: Connection refused
67965:signal-handler (1767940294) Received SIGTERM scheduling shutdown...
67965:M 09 Jan 2026 14:31:34.266 * User requested shutdown...
67965:M 09 Jan 2026 14:31:34.266 * Removing the pid file.
67965:M 09 Jan 2026 14:31:34.266 * Saving the cluster configuration file before exiting.
67965:M 09 Jan 2026 14:31:34.280 * Removing the unix socket file.
67965:M 09 Jan 2026 14:31:34.280 # Valkey is now ready to exit, bye bye...
Signed-off-by: Binbin <binloveplay1314@qq.com>
When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>
When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see #2283. However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of #2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@enjoy-binbin, I don't think we should take this change. Please see my concerns at #2989 (comment). |
|
@PingXie Yes, i understand the concerns. I am just trying to merge the unstable code. Since we are all here, i looked at it again.
So in the comments i mention
And sadly this is not working either. I forgot the details at the very fisrt beginning, but now i have it. Do you have other ideas? It's getting really tricky. @hpatro feel free to jump in. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
…3192) When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@enjoy-binbin I didn't follow the discussion. Did you change the implementation after Ping's concern (#2989 (comment), #2989 (comment))? |
|
Sorry, it is a tricky issue, the fix is not perfect, i guess we need more people to kick in, let me try to summarize the situation.
This PR adopts a compromise: it introduces |
This change started with #445, which means it has been present since Valkey 8.0.
In this case, an emptry shard, primary and replica, when the replica do a
CLUSTER RESET SOFT, replica will become a new primary, and the primary will
become a replica, that is wrong in this case. The configuration is incorrectly
inverted: R4 becomes the new primary and R0 becomes a replica of R4.
The reason is that in clusterUpdateSlotsConfigWith, we have this logic, to
handle failover within empty shards:
But we don't have the right shard_id, so the CLUSTER RESET SOFT case become
a FAILOVER case and cause the trouble. See #2586 for more details.
Extract the shard_id extension processing into a separate lightweight
function (clusterProcessShardIdExtension) and call it before
clusterUpdateSlotsConfigWith. This ensures the sender's shard_id is
up-to-date when making shard membership decisions, without triggering
side effects from other extensions like FORGOTTEN_NODE. The subsequent
updateShardId call in clusterProcessPingExtensions becomes a no-op
since the shard_id is already current.