The failover test seems to be flakily failing over the past few days.
Sharing part of the stack trace where the test failed, otherwise it's too huge.
===== Start of server log (pid 41461) =====
### Starting server for test
41461:M 03 Oct 2025 00:17:42.401 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
41461:M 03 Oct 2025 00:17:42.401 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
41461:M 03 Oct 2025 00:17:42.401 * Valkey version=255.255.255, bits=64, commit=d5bb986f, modified=0, pid=41461, just started
41461:M 03 Oct 2025 00:17:42.401 * Configuration loaded
41461:M 03 Oct 2025 00:17:42.420 * monotonic clock: POSIX clock_gettime
41461:M 03 Oct 2025 00:17:42.422 # Failed to write PID file: Permission denied
.+^+.
.+#########+.
.+########+########+. Valkey 255.255.255 (d5bb986f/0) 64 bit
.+########+' '+########+.
.########+' .+. '+########. Running in cluster mode
|####+' .+#######+. '+####| Port: 22677
|###| .+###############+. |###| PID: 41461
|###| |#####*'' ''*#####| |###|
|###| |####' .-. '####| |###|
|###| |###( (@@@) )###| |###| https://valkey.io/
|###| |####. '-' .####| |###|
|###| |#####*. .*#####| |###|
|###| '+#####| |#####+' |###|
|####+. +##| |#+' .+####|
'#######+ |##| .+########'
'+###| |##| .+########+'
'| |####+########+'
+#########+'
'+v+'
41461:M 03 Oct 2025 00:17:42.422 * No cluster configuration found, I'm 6a5f20df0b200d1b58ef5889449de3f17ba50c1d
41461:M 03 Oct 2025 00:17:42.428 * Server initialized
41461:M 03 Oct 2025 00:17:42.429 * Ready to accept connections tcp
41461:M 03 Oct 2025 00:17:42.429 * Ready to accept connections unix
41461:M 03 Oct 2025 00:17:42.486 - Accepted 127.0.0.1:37181
41461:M 03 Oct 2025 00:17:42.489 - Client closed connection id=2 addr=127.0.0.1:37181 laddr=127.0.0.1:22677 fd=12 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16384 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33350 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
41461:M 03 Oct 2025 00:17:42.502 - Accepted 127.0.0.1:35635
41461:M 03 Oct 2025 00:17:44.004 * configEpoch set to 9 via CLUSTER SET-CONFIG-EPOCH
41461:M 03 Oct 2025 00:17:44.109 - Accepting cluster node connection from 127.0.0.1:43284
41461:M 03 Oct 2025 00:17:44.109 * IP address for this node updated to 127.0.0.1
41461:M 03 Oct 2025 00:17:44.153 * Mismatch in topology information for sender node 67ad6ccba4ffa61b55df01fd0d1355861a410752 () in shard 31af5c4799d9e9c2e5c7138a5cdb0c21401ce18f
41461:M 03 Oct 2025 00:17:44.229 - Accepting cluster node connection from 127.0.0.1:43292
41461:M 03 Oct 2025 00:17:44.251 * Successfully completed handshake with 35062b4ed705e54f43812eee9bd41c6e0c496627 ()
41461:M 03 Oct 2025 00:17:44.285 - Accepting cluster node connection from 127.0.0.1:43296
41461:M 03 Oct 2025 00:17:44.314 - Accepting cluster node connection from 127.0.0.1:43308
41461:M 03 Oct 2025 00:17:44.315 * Mismatch in topology information for sender node a2e2120e124856107e40731842c94842658192a4 () in shard cefe678411f184b2aa0e13adca3caed509b7905a
41461:M 03 Oct 2025 00:17:44.355 - Accepting cluster node connection from 127.0.0.1:43322
41461:M 03 Oct 2025 00:17:44.362 * Mismatch in topology information for sender node e53c62e73a1912b3a926e600a08bd50139ecba85 () in shard 21f69416a73a3b436e24af8d27f4a1c6b9c2804c
41461:M 03 Oct 2025 00:17:44.383 - Accepting cluster node connection from 127.0.0.1:43336
41461:M 03 Oct 2025 00:17:44.400 - Accepting cluster node connection from 127.0.0.1:43346
41461:M 03 Oct 2025 00:17:44.415 - Accepting cluster node connection from 127.0.0.1:43358
41461:M 03 Oct 2025 00:17:44.423 - Accepting cluster node connection from 127.0.0.1:43370
41461:M 03 Oct 2025 00:17:44.452 * Cluster state changed: ok
41461:M 03 Oct 2025 00:17:44.664 # Missing implement of connection type tls
41461:M 03 Oct 2025 00:17:44.692 * Node 7259d703afbedd95ac3e7061030e6ca18539ac51 () is no longer primary of shard 3fcfcd71f93a304db35b490f8b7dcaa16e380bc1; removed all 0 slot(s) it used to own
41461:M 03 Oct 2025 00:17:44.692 * Node 7259d703afbedd95ac3e7061030e6ca18539ac51 () is now part of shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:M 03 Oct 2025 00:17:44.692 * Node 7259d703afbedd95ac3e7061030e6ca18539ac51 () is now a replica of node a2e2120e124856107e40731842c94842658192a4 () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:M 03 Oct 2025 00:17:44.702 * Node dc713e69651e957f21e3e44af1c33cd06439f81f () is no longer primary of shard b04575e6a5090f971a99bed17ab8d8febda4323f; removed all 0 slot(s) it used to own
41461:M 03 Oct 2025 00:17:44.704 * Node dc713e69651e957f21e3e44af1c33cd06439f81f () is now part of shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:M 03 Oct 2025 00:17:44.704 * Node dc713e69651e957f21e3e44af1c33cd06439f81f () is now a replica of node a2e2120e124856107e40731842c94842658192a4 () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:M 03 Oct 2025 00:17:44.717 * Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 () is no longer primary of shard 90b9dc5cf7567309984c2b554ba771bec762e104; removed all 0 slot(s) it used to own
41461:M 03 Oct 2025 00:17:44.717 * Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 () is now part of shard dabdff46766cdd52ecc930db51940ca650998a10
41461:M 03 Oct 2025 00:17:44.717 * Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 () is now a replica of node 67ad6ccba4ffa61b55df01fd0d1355861a410752 () in shard dabdff46766cdd52ecc930db51940ca650998a10
41461:M 03 Oct 2025 00:17:44.741 * Node 35062b4ed705e54f43812eee9bd41c6e0c496627 () is no longer primary of shard 17d4e90c0a0160a0aaea886cd03047591ca88497; removed all 0 slot(s) it used to own
41461:M 03 Oct 2025 00:17:44.741 * Node 35062b4ed705e54f43812eee9bd41c6e0c496627 () is now part of shard dabdff46766cdd52ecc930db51940ca650998a10
41461:M 03 Oct 2025 00:17:44.741 * Node 35062b4ed705e54f43812eee9bd41c6e0c496627 () is now a replica of node 67ad6ccba4ffa61b55df01fd0d1355861a410752 () in shard dabdff46766cdd52ecc930db51940ca650998a10
41461:M 03 Oct 2025 00:17:44.776 * Node c45d56ab32fa1e5489b5acfb5a34e83182de37ec () is no longer primary of shard bd50c82fce837dbf5dd9d3515ad26b3215991477; removed all 0 slot(s) it used to own
41461:M 03 Oct 2025 00:17:44.776 * Node c45d56ab32fa1e5489b5acfb5a34e83182de37ec () is now part of shard 0d7c55a266935f7d40a0dd9fea945dc3b7dd684a
41461:M 03 Oct 2025 00:17:44.776 * Node c45d56ab32fa1e5489b5acfb5a34e83182de37ec () is now a replica of node e53c62e73a1912b3a926e600a08bd50139ecba85 () in shard 0d7c55a266935f7d40a0dd9fea945dc3b7dd684a
41461:S 03 Oct 2025 00:17:44.797 * Connecting to PRIMARY 127.0.0.1:22683
41461:S 03 Oct 2025 00:17:44.797 * PRIMARY <-> REPLICA sync started
41461:S 03 Oct 2025 00:17:44.803 * Non blocking connect for SYNC fired the event.
41461:S 03 Oct 2025 00:17:44.818 * Primary replied to PING, replication can continue...
41461:S 03 Oct 2025 00:17:44.826 # DEBUG LOG: ========== I am replica 8 ==========
41461:S 03 Oct 2025 00:17:44.841 * Partial resynchronization not possible (no cached primary)
41461:S 03 Oct 2025 00:17:44.877 * Full resync from primary: a4895e2ff557b00203d7548d7cd3595e64458b22:0
41461:S 03 Oct 2025 00:17:44.881 * Replica main thread creating Bio thread to save RDB to disk
41461:S 03 Oct 2025 00:17:44.882 * Replica bio thread: PRIMARY <-> REPLICA sync: receiving streamed RDB from primary with EOF to disk
41461:S 03 Oct 2025 00:17:44.882 * Replica bio thread: Done downloading RDB
41461:S 03 Oct 2025 00:17:45.480 * Replica main thread detected RDB download completion in Bio thread
41461:S 03 Oct 2025 00:17:45.480 * Loading the RDB and finalizing primary-replica sync...
41461:S 03 Oct 2025 00:17:45.483 * PRIMARY <-> REPLICA sync: Flushing old data
41461:S 03 Oct 2025 00:17:45.483 * PRIMARY <-> REPLICA sync: Loading DB in memory
41461:S 03 Oct 2025 00:17:45.483 * Loading RDB produced by Valkey version 255.255.255
41461:S 03 Oct 2025 00:17:45.483 * RDB age 1 seconds
41461:S 03 Oct 2025 00:17:45.483 * RDB memory usage when created 2.61 Mb
41461:S 03 Oct 2025 00:17:45.483 * Done loading RDB, keys loaded: 0, keys expired: 0.
41461:S 03 Oct 2025 00:17:45.483 * PRIMARY <-> REPLICA sync: Finished with success
### Starting test Cluster is up in tests/unit/cluster/failover.tcl
### Starting test Cluster is writable in tests/unit/cluster/failover.tcl
41461:S 03 Oct 2025 00:17:55.743 - Accepted 127.0.0.1:40449
41461:S 03 Oct 2025 00:17:56.089 - Client closed connection id=7 addr=127.0.0.1:40449 laddr=127.0.0.1:22677 fd=30 name= age=1 idle=1 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16384 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=17990 events=r cmd=NULL user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=0 tot-net-out=0 tot-cmds=0
### Starting test Killing the first primary node in tests/unit/cluster/failover.tcl
### Starting test Wait for failover in tests/unit/cluster/failover.tcl
41461:S 03 Oct 2025 00:17:57.768 - DB 0: 30 keys (0 volatile) in 210 slots HT.
41461:S 03 Oct 2025 00:17:59.413 * NODE a2e2120e124856107e40731842c94842658192a4 () possibly failing.
41461:S 03 Oct 2025 00:17:59.517 - Node e53c62e73a1912b3a926e600a08bd50139ecba85 () reported node a2e2120e124856107e40731842c94842658192a4 () as not reachable.
41461:S 03 Oct 2025 00:17:59.517 - Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 () reported node a2e2120e124856107e40731842c94842658192a4 () as not reachable.
41461:S 03 Oct 2025 00:17:59.517 * Marking node a2e2120e124856107e40731842c94842658192a4 () as failing (quorum reached).
41461:S 03 Oct 2025 00:17:59.517 # Cluster state changed: fail
41461:S 03 Oct 2025 00:17:59.517 # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
### Starting test Cluster should eventually be up again in tests/unit/cluster/failover.tcl
41461:S 03 Oct 2025 00:17:59.857 * Reconfiguring node 7259d703afbedd95ac3e7061030e6ca18539ac51 () as primary for shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:17:59.857 * Mismatch in topology information for sender node 7259d703afbedd95ac3e7061030e6ca18539ac51 () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:17:59.858 * Cluster state changed: ok
41461:S 03 Oct 2025 00:17:59.887 * Reconfiguring node dc713e69651e957f21e3e44af1c33cd06439f81f () as primary for shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:17:59.887 * Mismatch in topology information for sender node dc713e69651e957f21e3e44af1c33cd06439f81f () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:17:59.906 * Node 7259d703afbedd95ac3e7061030e6ca18539ac51 () is now a replica of node dc713e69651e957f21e3e44af1c33cd06439f81f () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
### Starting test Restarting the previously killed primary node in tests/unit/cluster/failover.tcl
### Starting test Instance #0 gets converted into a replica in tests/unit/cluster/failover.tcl
41461:S 03 Oct 2025 00:18:00.701 * Mismatch in topology information for sender node a2e2120e124856107e40731842c94842658192a4 () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:18:00.701 - Node a2e2120e124856107e40731842c94842658192a4 () has old slots configuration, sending an UPDATE message about dc713e69651e957f21e3e44af1c33cd06439f81f ()
41461:S 03 Oct 2025 00:18:00.722 * Clear FAIL state for node a2e2120e124856107e40731842c94842658192a4 (): primary without slots is reachable again.
41461:S 03 Oct 2025 00:18:00.722 * Node a2e2120e124856107e40731842c94842658192a4 () is now a replica of node dc713e69651e957f21e3e44af1c33cd06439f81f () in shard c8987b2a50fbe392e5a95167e6e6c609a200206f
41461:S 03 Oct 2025 00:18:00.738 - Node e53c62e73a1912b3a926e600a08bd50139ecba85 () reported node a2e2120e124856107e40731842c94842658192a4 () is back online.
### Starting test Make sure the replicas always get the different ranks in tests/unit/cluster/failover.tcl
41461:S 03 Oct 2025 00:18:01.145 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.355 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.355 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.450 - Client closed connection id=5 addr=127.0.0.1:22683 laddr=127.0.0.1:58238 fd=29 name= age=16 idle=5 flags=M capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=32768 argv-mem=0 multi-mem=0 rbs=1024 rbp=37 obl=0 oll=0 omem=0 tot-mem=34374 events=r cmd=set user=(superuser) redir=-1 resp=2 lib-name= lib-ver= tot-net-in=1683 tot-net-out=561 tot-cmds=32
41461:S 03 Oct 2025 00:18:01.450 * Connection with primary lost.
41461:S 03 Oct 2025 00:18:01.450 * Caching the disconnected primary state.
41461:S 03 Oct 2025 00:18:01.450 * Reconnecting to PRIMARY 127.0.0.1:22683
41461:S 03 Oct 2025 00:18:01.451 * PRIMARY <-> REPLICA sync started
41461:S 03 Oct 2025 00:18:01.451 # Error condition on socket for SYNC: Connection refused
41461:S 03 Oct 2025 00:18:01.458 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.562 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.562 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.562 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.663 - Connection with Node 7259d703afbedd95ac3e7061030e6ca18539ac51 at 127.0.0.1:32682 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.663 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.663 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.765 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.765 - Connection with Node 7259d703afbedd95ac3e7061030e6ca18539ac51 at 127.0.0.1:32682 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.765 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.765 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.866 * Connecting to PRIMARY 127.0.0.1:22683
41461:S 03 Oct 2025 00:18:01.871 * PRIMARY <-> REPLICA sync started
41461:S 03 Oct 2025 00:18:01.871 # Error condition on socket for SYNC: Connection refused
41461:S 03 Oct 2025 00:18:01.973 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.973 - Connection with Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 at 127.0.0.1:32681 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.973 - Connection with Node 7259d703afbedd95ac3e7061030e6ca18539ac51 at 127.0.0.1:32682 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.973 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:01.973 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node c45d56ab32fa1e5489b5acfb5a34e83182de37ec at 127.0.0.1:32680 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 at 127.0.0.1:32681 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node 7259d703afbedd95ac3e7061030e6ca18539ac51 at 127.0.0.1:32682 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.178 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node c45d56ab32fa1e5489b5acfb5a34e83182de37ec at 127.0.0.1:32680 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node a2e2120e124856107e40731842c94842658192a4 at 127.0.0.1:32685 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node dc713e69651e957f21e3e44af1c33cd06439f81f at 127.0.0.1:32679 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node c4d9ce4d28e32e5f7f74219c2cddc7a122c26295 at 127.0.0.1:32681 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node 7259d703afbedd95ac3e7061030e6ca18539ac51 at 127.0.0.1:32682 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node 67ad6ccba4ffa61b55df01fd0d1355861a410752 at 127.0.0.1:32684 failed: Connection refused
41461:S 03 Oct 2025 00:18:02.380 - Connection with Node e53c62e73a1912b3a926e600a08bd50139ecba85 at 127.0.0.1:32683 failed: Connection refused
===== End of server log (pid 41461) =====
Summary
The failover test seems to be flakily failing over the past few days.
Failing test(s)
Make sure the replicas always get the different ranksintests/unit/cluster/failover.tclError stack trace
Sharing part of the stack trace where the test failed, otherwise it's too huge.