Stabilize dual replication test to avoid getting LOADING error#1288
Conversation
When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com>
|
Failed CI link: https://github.com/valkey-io/valkey/actions/runs/11756610760/job/32752932086?pr=1009#step:5:5820. Mentioned in here: #1009 (comment) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1288 +/- ##
============================================
+ Coverage 70.56% 70.70% +0.13%
============================================
Files 114 114
Lines 63147 63154 +7
============================================
+ Hits 44560 44653 +93
+ Misses 18587 18501 -86 |
|
LGTM. |
zuiderkwast
left a comment
There was a problem hiding this comment.
Makes sense. It's a complicated test case, but it seems we need this pause to stabilize it.
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@naglera https://github.com/valkey-io/valkey/actions/runs/11788187416/job/32834861441#step:6:6097 Today we have a CI failure, it look like i introduced it. The reason is that, since i added a pause, the replica will reconnect, and the rdb channel is connected. And then the rdb child is drop, and we will postpone the rdb client. And then in here, we will wait a client_output_buffer_limit_disconnections, and we got an unexpected rdb client, the previous one, so the test fail. i was wondering in here, in this case, should we still want to postpone the rdb client? In this case, it seems like we no longer need the rdb client since the rdb child is dead. void updateReplicasWaitingBgsave(int bgsaveerr, int type) {
listNode *ln;
listIter li;
/* Note: there's a chance we got here from within the REPLCONF ACK command
* so we must avoid using freeClient, otherwise we'll crash on our way up. */
listRewind(server.replicas, &li);
while ((ln = listNext(&li))) {
client *replica = ln->value;
if (replica->repl_state == REPLICA_STATE_WAIT_BGSAVE_END) {
int repldbfd;
struct valkey_stat buf;
if (bgsaveerr != C_OK) {
+ if (replica->flag.protected_rdb_channel) {
+ replica->flag.protected_rdb_channel = 0;
+ // removeReplicaFromPsyncWait(replica);
+ serverLog(xxx)
+ }
freeClientAsync(replica);
serverLog(LL_WARNING, "SYNC failed. BGSAVE child returned an error");
continue;
}i ran a daily test with --single and --loops 10, the test passed, so the fix seems solid. daily test: https://github.com/enjoy-binbin/valkey/actions/runs/11810068690 |
…y-io#1288) When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@enjoy-binbin This problem occurs again, maybe you can take a look. https://github.com/artikell/valkey/actions/runs/14535230702/job/4078222933 |
Supplement the repair of #1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com>
…y-io#1979) Supplement the repair of valkey-io#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com> Signed-off-by: Nitai Caro <caronita@amazon.com>
…y-io#1979) Supplement the repair of valkey-io#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com>
…y-io#1979) Supplement the repair of valkey-io#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com>
…y-io#1979) Supplement the repair of valkey-io#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com> Signed-off-by: hwware <wen.hui.ware@gmail.com>
…y-io#1288) When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com>
…y-io#1288) When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit 82d551d)
…y-io#1979) Supplement the repair of valkey-io#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com> (cherry picked from commit 0b94ca6)
…y-io#1288) When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit 82d551d)
When doing `$replica replicaof no one`, we may get a LOADING error, this is because during the test execution, the replica may reconnect very quickly, and the full sync is initiated, and the replica has entered the LOADING state. In this commit, we make sure the primary is pasued after the fork, so the replica won't enter the LOADING state, and with this fix, this test seems more natural and predictable. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit 82d551d)
…#1979) Supplement the repair of valkey-io/valkey#1288, need to wait for the Loading to complete before executing `$replica replicaof no one`. Signed-off-by: artikell <739609084@qq.com>
When doing
$replica replicaof no one, we may get a LOADINGerror, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.
In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.