Skip to content

Fix chained replica crash when doing dual channel replication#2983

Merged
enjoy-binbin merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:dual_channel_fix
Dec 29, 2025
Merged

Fix chained replica crash when doing dual channel replication#2983
enjoy-binbin merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:dual_channel_fix

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

There is a crash in freeReplicationBacklog:

Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, disconnectReplicas is called to disconnect all
replica clients, but since the RDB channel is protected, freeClient does not
actually free the replica client. Later, we encounter an assertion failure in
freeReplicationBacklog.

void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}

Dual channel replication was introduced in #60.

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/networking.c
@enjoy-binbin

Copy link
Copy Markdown
Member Author

The crash report:

### Starting server for test Chained replicas does not assert when using dual channel replication in tests/integration/dual-channel-replication.tcl
12392:M 26 Dec 2025 19:02:57.215 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
12392:M 26 Dec 2025 19:02:57.215 * Valkey version=255.255.255, bits=64, commit=3d7f4c82, modified=1, pid=12392, just started
12392:M 26 Dec 2025 19:02:57.215 * Configuration loaded
12392:M 26 Dec 2025 19:02:57.216 * monotonic clock: POSIX clock_gettime
12392:M 26 Dec 2025 19:02:57.217 # Failed to write PID file: Permission denied
                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 255.255.255 (3d7f4c82/1) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in standalone mode
 |####+'     .+#######+.     '+####|    Port: 21111
 |###|   .+###############+.   |###|    PID: 12392                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+'                                                

12392:M 26 Dec 2025 19:02:57.217 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
12392:M 26 Dec 2025 19:02:57.223 * Module 'lua' loaded from libvalkeylua.so
12392:M 26 Dec 2025 19:02:57.223 * Server initialized
12392:M 26 Dec 2025 19:02:57.223 * Ready to accept connections tcp
12392:M 26 Dec 2025 19:02:57.223 * Ready to accept connections unix
12392:M 26 Dec 2025 19:02:57.284 - Accepted 127.0.0.1:57191
12392:M 26 Dec 2025 19:02:57.284 - Client closed connection id=3 addr=127.0.0.1:57191 laddr=127.0.0.1:21111 fd=14 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
12392:M 26 Dec 2025 19:02:57.302 - Accepted 127.0.0.1:57192
### Starting test Psync established after rdb load - within grace period in tests/integration/dual-channel-replication.tcl
12392:M 26 Dec 2025 19:02:57.503 - Accepted 127.0.0.1:57195
12392:M 26 Dec 2025 19:02:57.503 * Replica 127.0.0.1:21112 asks for synchronization
12392:M 26 Dec 2025 19:02:57.503 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'b40e376f30a78a2f349b444a28d6aa16a9556124', my replication IDs are '61d4bf934b6aad3aa12823527d5fbbaaf2866b60' and '0000000000000000000000000000000000000000')
12392:M 26 Dec 2025 19:02:57.503 * Dual channel replication: Replica 127.0.0.1:21112 is capable of dual channel synchronization, and partial sync isn't possible. Full sync will continue with dedicated RDB channel.
12392:M 26 Dec 2025 19:02:57.504 - Accepted 127.0.0.1:57196
12392:M 26 Dec 2025 19:02:57.505 * Replica 127.0.0.1:21112 asks for synchronization
12392:M 26 Dec 2025 19:02:57.505 * Replication backlog created, my new replication IDs are '460b2a5549de639e8b2cd4284d6e29fb5ffce744' and '0000000000000000000000000000000000000000'
12392:M 26 Dec 2025 19:02:57.505 * Starting BGSAVE for SYNC with target: replicas sockets using: dual-channel
12392:M 26 Dec 2025 19:02:57.505 * Dual channel replication: Sending to replica 127.0.0.1:21112 RDB end offset 0 and client-id 6
12392:M 26 Dec 2025 19:02:57.506 * Background RDB transfer started by pid 12490 to direct socket to replica
12392:M 26 Dec 2025 19:02:57.506 * Process is about to stop.
12490:C 26 Dec 2025 19:02:57.507 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
12490:C 26 Dec 2025 19:02:57.507 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
12392:M 26 Dec 2025 19:02:58.465 * Process has been continued.
12392:M 26 Dec 2025 19:02:58.465 * Background RDB transfer terminated with success
12392:M 26 Dec 2025 19:02:58.465 * Streamed RDB transfer with replica 127.0.0.1:21112 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
12392:M 26 Dec 2025 19:02:58.465 * RDB transfer completed, rdb only replica (127.0.0.1:21112) should be disconnected asap
12392:M 26 Dec 2025 19:02:58.465 - Dual channel replication: Postpone RDB client id=6 (127.0.0.1:21112) free for 60 seconds
12392:S 26 Dec 2025 19:02:58.654 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer.
12392:S 26 Dec 2025 19:02:58.654 * Connecting to PRIMARY 127.0.0.1:21113
12392:S 26 Dec 2025 19:02:58.654 * PRIMARY <-> REPLICA sync started
12392:S 26 Dec 2025 19:02:58.654 * REPLICAOF 127.0.0.1:21113 enabled (user request from 'id=4 addr=127.0.0.1:57192 laddr=127.0.0.1:21111 fd=14 name= user=default lib-name= lib-ver=')
12392:S 26 Dec 2025 19:02:58.655 * Non blocking connect for SYNC fired the event.
12392:S 26 Dec 2025 19:02:58.655 * Primary replied to PING, replication can continue...
12392:S 26 Dec 2025 19:02:58.655 * Trying a partial resynchronization (request 460b2a5549de639e8b2cd4284d6e29fb5ffce744:1).
12392:S 26 Dec 2025 19:02:58.655 * Full resync from primary: addef143959ae47e324a2b71198ddadfe68581d5:0
12392:S 26 Dec 2025 19:02:58.656 * Replica main thread creating Bio thread to save RDB to disk
12392:S 26 Dec 2025 19:02:58.657 * Replica bio thread: PRIMARY <-> REPLICA sync: receiving streamed RDB from primary with EOF to disk
12392:S 26 Dec 2025 19:02:58.657 * Replica bio thread: Done downloading RDB
12392:S 26 Dec 2025 19:02:58.669 - Client closed connection id=5 addr=127.0.0.1:57195 laddr=127.0.0.1:21111 fd=15 name= age=1 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=1024 rbp=61 obl=0 oll=0 omem=0 tot-mem=18496 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=388 tot-net-out=88 tot-cmds=7
12392:S 26 Dec 2025 19:02:58.669 - Accepted 127.0.0.1:57204
12392:S 26 Dec 2025 19:02:58.670 - Client closed connection id=8 addr=127.0.0.1:57204 laddr=127.0.0.1:21111 fd=15 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=33856 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=231 tot-net-out=83 tot-cmds=5
12392:S 26 Dec 2025 19:02:59.170 * Replica main thread detected RDB download completion in Bio thread
12392:S 26 Dec 2025 19:02:59.170 * Loading the RDB and finalizing primary-replica sync...
12392:S 26 Dec 2025 19:02:59.178 * Discarding previously cached primary state.


=== VALKEY BUG REPORT START: Cut & paste starting from here ===
12392:S 26 Dec 2025 19:02:59.178 # === ASSERTION FAILED ===
12392:S 26 Dec 2025 19:02:59.178 # ==> replication.c:160 'listLength(server.replicas) == 0' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x000000010281f70c freeReplicationBacklog + 444
1   valkey-server                       0x00000001027d456c serverCron + 6256
2   valkey-server                       0x00000001027c46d0 aeProcessEvents + 688
3   valkey-server                       0x00000001027ee264 main + 19252
4   dyld                                0x00000001898bf154 start + 2476

@codecov

codecov Bot commented Dec 26, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.91%. Comparing base (7385586) to head (b5b6b55).
⚠️ Report is 6 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2983      +/-   ##
============================================
+ Coverage     73.79%   73.91%   +0.12%     
============================================
  Files           125      125              
  Lines         69345    69349       +4     
============================================
+ Hits          51173    51260      +87     
+ Misses        18172    18089      -83     
Files with missing lines Coverage Δ
src/networking.c 88.34% <100.00%> (+<0.01%) ⬆️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks safe to me.

Comment thread tests/integration/dual-channel-replication.tcl Outdated
Comment thread tests/integration/dual-channel-replication.tcl Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>

@naglera naglera left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for the fix!

Comment thread tests/integration/dual-channel-replication.tcl Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Dec 28, 2025
@github-actions github-actions Bot removed the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Dec 28, 2025
@enjoy-binbin enjoy-binbin merged commit 9c5d004 into valkey-io:unstable Dec 29, 2025
24 checks passed
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 8.0 Dec 29, 2025
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 8.1 Dec 29, 2025
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.0 Dec 29, 2025
@enjoy-binbin enjoy-binbin deleted the dual_channel_fix branch December 29, 2025 02:20
@enjoy-binbin enjoy-binbin added bug Something isn't working release-notes This issue should get a line item in the release notes labels Dec 29, 2025
jdheyburn pushed a commit to jdheyburn/valkey that referenced this pull request Jan 8, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
@roshkhatri roshkhatri moved this from To be backported to 8.1.6 in Valkey 8.1 Jan 29, 2026
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 30, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
@roshkhatri roshkhatri moved this from To be backported to 8.0.7 in Valkey 8.0 Jan 30, 2026
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Jan 30, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@zuiderkwast zuiderkwast moved this from To be backported to 9.0.2 WIP in Valkey 9.0 Jan 30, 2026
zuiderkwast pushed a commit that referenced this pull request Feb 3, 2026
There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in #60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 4, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 18, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 20, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
madolson pushed a commit that referenced this pull request Feb 24, 2026
There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in #60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
madolson pushed a commit that referenced this pull request Feb 24, 2026
There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in #60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…-io#2983)

There is a crash in freeReplicationBacklog:
```
Discarding previously cached primary state.
ASSERTION FAILED
'listLength(server.replicas) == 0' is not true
freeReplicationBacklog
```

The reason is that during dual channel operation, the RDB channel is protected.
In the chained replica case, `disconnectReplicas` is called to disconnect all
replica clients, but since the RDB channel is protected, `freeClient` does not
actually free the replica client. Later, we encounter an assertion failure in
`freeReplicationBacklog`.
```
void replicationAttachToNewPrimary(void) {
    /* Replica starts to apply data from new primary, we must discard the cached
     * primary structure. */
    serverAssert(server.primary == NULL);
    replicationDiscardCachedPrimary();

    /* Cancel any in progress imports (we will now use the primary's) */
    clusterCleanSlotImportsOnFullSync();

    disconnectReplicas();     /* Force our replicas to resync with us as well. */
    freeReplicationBacklog(); /* Don't allow our chained replicas to PSYNC. */
}
```

Dual channel replication was introduced in valkey-io#60.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
lmagomes pushed a commit to lmagomes/home-services that referenced this pull request May 12, 2026
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [docker.io/valkey/valkey](https://github.com/valkey-io/valkey) | image | patch | `9.0.1` → `9.0.4` |

---

### Release Notes

<details>
<summary>valkey-io/valkey (docker.io/valkey/valkey)</summary>

### [`v9.0.4`](https://github.com/valkey-io/valkey/releases/tag/9.0.4)

[Compare Source](valkey-io/valkey@9.0.3...9.0.4)

Upgrade urgency SECURITY: This release includes security fixes we recommend you
apply as soon as possible.

##### Security fixes

- (CVE-2026-23479) Use-After-Free in unblock client flow
- (CVE-2026-25243) Invalid Memory Access in RESTORE command
- (CVE-2026-23631) Use-after-free when full sync occurs during a yielding Lua/function execution

### [`v9.0.3`](https://github.com/valkey-io/valkey/releases/tag/9.0.3)

[Compare Source](valkey-io/valkey@9.0.2...9.0.3)

##### Valkey 9.0.3

Upgrade urgency SECURITY: This release includes security fixes we recommend you
apply as soon as possible.

##### Security fixes

- (CVE-2025-67733) RESP Protocol Injection via Lua error\_reply
- (CVE-2026-21863) Remote DoS with malformed Valkey Cluster bus message
- (CVE-2026-27623) Reset request type after handling empty requests

##### Bug fixes

- Avoids crash during MODULE UNLOAD when ACL rules reference a module command and subcommand ([#&#8203;3160](valkey-io/valkey#3160))
- Fix server assert on ACL LOAD when current user loses permission to channels ([#&#8203;3182](valkey-io/valkey#3182))
- Fix bug causing no response flush sometimes when IO threads are busy ([#&#8203;3205](valkey-io/valkey#3205))

### [`v9.0.2`](https://github.com/valkey-io/valkey/releases/tag/9.0.2)

[Compare Source](valkey-io/valkey@9.0.1...9.0.2)

Upgrade urgency HIGH: There are critical bugs that may affect a subset of users.

#### Bug fixes

- Avoid memory leak of new argv when HEXPIRE commands target only non-exiting fields ([#&#8203;2973](valkey-io/valkey#2973))
- Fix HINCRBY and HINCRBYFLOAT to update volatile key tracking ([#&#8203;2974](valkey-io/valkey#2974))
- Avoid empty hash object when HSETEX added no fields ([#&#8203;2998](valkey-io/valkey#2998))
- Fix case-sensitive check for the FNX and FXX arguments in HSETEX ([#&#8203;3000](valkey-io/valkey#3000))
- Prevent assertion in active expiration job after a hash with volatile fields is overwritten ([#&#8203;3003](valkey-io/valkey#3003), [#&#8203;3007](valkey-io/valkey#3007))
- Fix HRANDFIELD to return null response when no field could be found ([#&#8203;3022](valkey-io/valkey#3022))
- Fix HEXPIRE to not delete items when validation rules fail and expiration is in the past ([#&#8203;3023](valkey-io/valkey#3023), [#&#8203;3048](valkey-io/valkey#3048))
- Fix how hash is handling overriding of expired fields overwrite ([#&#8203;3060](valkey-io/valkey#3060))
- HSETEX - Always issue keyspace notifications after validation ([#&#8203;3001](valkey-io/valkey#3001))
- Make zero a valid TTL for hash fields during import mode and data loading ([#&#8203;3006](valkey-io/valkey#3006))
- Trigger prepareCommand on argc change in module command filters ([#&#8203;2945](valkey-io/valkey#2945))
- Restrict TTL from being negative and avoid crash in import-mode ([#&#8203;2944](valkey-io/valkey#2944))
- Fix chained replica crash when doing dual channel replication ([#&#8203;2983](valkey-io/valkey#2983))
- Skip slot cache optimization for AOF client to prevent key duplication and data corruption ([#&#8203;3004](valkey-io/valkey#3004))
- Fix used\_memory\_dataset underflow due to miscalculated used\_memory\_overhead ([#&#8203;3005](valkey-io/valkey#3005))
- Avoid duplicate calculations of network-bytes-out in slot stats with copy-avoidance ([#&#8203;3046](valkey-io/valkey#3046))
- Fix XREAD returning error on empty stream with + ID ([#&#8203;2742](valkey-io/valkey#2742))

#### Performance/Efficiency Improvements

- Track reply bytes in I/O threads if commandlog-reply-larger-than is -1 ([#&#8203;3086](valkey-io/valkey#3086), [#&#8203;3126](valkey-io/valkey#3126)).
  This makes it possible to mitigate a performance regression in 9.0.1 caused by the bug fix [#&#8203;2652](valkey-io/valkey#2652).

**Full Changelog**: <valkey-io/valkey@9.0.1...9.0.2>

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - "before 6am"
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xNjkuNCIsInVwZGF0ZWRJblZlciI6IjQzLjE2OS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working release-notes This issue should get a line item in the release notes

Projects

Status: 8.0.7 (WIP)
Status: 8.1.6
Status: 9.0.2
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants