Skip to content

Stabilize dual replication test to avoid getting LOADING error#1288

Merged
enjoy-binbin merged 2 commits into
valkey-io:unstablefrom
enjoy-binbin:fix_test
Nov 11, 2024
Merged

Stabilize dual replication test to avoid getting LOADING error#1288
enjoy-binbin merged 2 commits into
valkey-io:unstablefrom
enjoy-binbin:fix_test

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

When doing $replica replicaof no one, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin

enjoy-binbin commented Nov 11, 2024

Copy link
Copy Markdown
Member Author

@codecov

codecov Bot commented Nov 11, 2024

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.70%. Comparing base (45d596e) to head (5afd936).
Report is 3 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1288      +/-   ##
============================================
+ Coverage     70.56%   70.70%   +0.13%     
============================================
  Files           114      114              
  Lines         63147    63154       +7     
============================================
+ Hits          44560    44653      +93     
+ Misses        18587    18501      -86     

see 17 files with indirect coverage changes

@naglera

naglera commented Nov 11, 2024

Copy link
Copy Markdown
Contributor

LGTM.
I previously believed that the replicaof command would be processed even during RDB loading. However, upon examining the code more closely, I see that it is actually disallowed when the loading process is asynchronous, such as in the case of dual-channel synchronization.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It's a complicated test case, but it seems we need this pause to stabilize it.

Comment thread tests/integration/dual-channel-replication.tcl Outdated
@zuiderkwast zuiderkwast changed the title Stable dual replication test to avoid getting LOADING error Stabilize dual replication test to avoid getting LOADING error Nov 11, 2024
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin merged commit 4aacffa into valkey-io:unstable Nov 11, 2024
@enjoy-binbin enjoy-binbin deleted the fix_test branch November 11, 2024 13:42
@enjoy-binbin

enjoy-binbin commented Nov 12, 2024

Copy link
Copy Markdown
Member Author

@naglera https://github.com/valkey-io/valkey/actions/runs/11788187416/job/32834861441#step:6:6097

Today we have a CI failure, it look like i introduced it.

The reason is that, since i added a pause, the replica will reconnect, and the rdb channel is connected. And then the rdb child is drop, and we will postpone the rdb client.

And then in here, we will wait a client_output_buffer_limit_disconnections, and we got an unexpected rdb client, the previous one, so the test fail.

            wait_for_condition 500 1000 {
                [s -1 client_output_buffer_limit_disconnections] > $cur_client_closed_count
            } else {
                fail "Primary should disconnect replica due to COB overrun"
            }

i was wondering in here, in this case, should we still want to postpone the rdb client? In this case, it seems like we no longer need the rdb client since the rdb child is dead.

void updateReplicasWaitingBgsave(int bgsaveerr, int type) {
    listNode *ln;
    listIter li;

    /* Note: there's a chance we got here from within the REPLCONF ACK command
     * so we must avoid using freeClient, otherwise we'll crash on our way up. */

    listRewind(server.replicas, &li);
    while ((ln = listNext(&li))) {
        client *replica = ln->value;

        if (replica->repl_state == REPLICA_STATE_WAIT_BGSAVE_END) {
            int repldbfd;
            struct valkey_stat buf;

            if (bgsaveerr != C_OK) {
+                if (replica->flag.protected_rdb_channel) {
+                    replica->flag.protected_rdb_channel = 0;
+                    // removeReplicaFromPsyncWait(replica);
+                    serverLog(xxx)
+                }
                freeClientAsync(replica);
                serverLog(LL_WARNING, "SYNC failed. BGSAVE child returned an error");
                continue;
            }

i ran a daily test with --single and --loops 10, the test passed, so the fix seems solid. daily test: https://github.com/enjoy-binbin/valkey/actions/runs/11810068690

zvi-code pushed a commit to zvi-code/valkey-z that referenced this pull request Nov 13, 2024
…y-io#1288)

When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@artikell

Copy link
Copy Markdown
Contributor

@enjoy-binbin This problem occurs again, maybe you can take a look.

https://github.com/artikell/valkey/actions/runs/14535230702/job/4078222933

zuiderkwast pushed a commit that referenced this pull request Apr 20, 2025
Supplement the repair of #1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
…y-io#1979)

Supplement the repair of valkey-io#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: Nitai Caro <caronita@amazon.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
…y-io#1979)

Supplement the repair of valkey-io#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
…y-io#1979)

Supplement the repair of valkey-io#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
hwware pushed a commit to wuranxx/valkey that referenced this pull request Apr 24, 2025
…y-io#1979)

Supplement the repair of valkey-io#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 24, 2026
…y-io#1288)

When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 27, 2026
…y-io#1288)

When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 82d551d)
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 29, 2026
…y-io#1979)

Supplement the repair of valkey-io#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
(cherry picked from commit 0b94ca6)
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 3, 2026
…y-io#1288)

When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 82d551d)
madolson pushed a commit that referenced this pull request May 6, 2026
When doing `$replica replicaof no one`, we may get a LOADING
error, this is because during the test execution, the replica
may reconnect very quickly, and the full sync is initiated,
and the replica has entered the LOADING state.

In this commit, we make sure the primary is pasued after the
fork, so the replica won't enter the LOADING state, and with
this fix, this test seems more natural and predictable.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 82d551d)
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request May 25, 2026
…#1979)

Supplement the repair of valkey-io/valkey#1288,
need to wait for the Loading to complete before executing
`$replica replicaof no one`.

Signed-off-by: artikell <739609084@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants