Fix timing issue in cluster manual-takeover test by enjoy-binbin · Pull Request #3826 · valkey-io/valkey

enjoy-binbin · 2026-05-25T09:49:05Z

Tests sometimes fail here. If we add after 1000 inside the foreach
loop, it becomes very easy to reproduce.

test "Use takeover to bring slaves back" {
    foreach id $replica_ids {
        R $id cluster failover takeover
    }
}

[exception]: Executing test client: ERR You should send CLUSTER FAILOVER to a replica.

The reason is that the transition from the PFAIL state to the FAIL
state which requires a quorum, may take a little extra time. From a
node's own perspective, it detects PFAIL and consequently views the
cluster as being in a FAIL state; and then, once the replica-no-failover
setting is disabled, auto failover may kick in, thereby causing the
test to fail.

Since we are verifying manual failover takeover, it is best to keep
cluster-replica-no-failover disabled at all times.

Tests sometimes fail here. If we add `after 1000` inside the foreach loop, it becomes very easy to reproduce. ``` test "Use takeover to bring slaves back" { foreach id $replica_ids { R $id cluster failover takeover } } [exception]: Executing test client: ERR You should send CLUSTER FAILOVER to a replica. ``` The reason is that the transition from the PFAIL state to the FAIL state which requires a quorum, may take a little extra time. From a node's own perspective, it detects PFAIL and consequently views the cluster as being in a FAIL state; and then, once the replica-no-failover setting is disabled, auto failover may kick in, thereby causing the test to fail. Since we are verifying manual failover takeover, it is best to keep `cluster-replica-no-failover` disabled at all times. Signed-off-by: Binbin <binloveplay1314@qq.com>

coderabbitai · 2026-05-25T09:49:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c2bea79c-8d40-4a96-8edc-346c789b5212

📥 Commits

Reviewing files that changed from the base of the PR and between 42765b2 and 031414d.

📒 Files selected for processing (1)

tests/unit/cluster/manual-takeover.tcl

💤 Files with no reviewable changes (1)

tests/unit/cluster/manual-takeover.tcl

📝 Walkthrough

Walkthrough

The PR removes replica failover reset logic from the manual takeover cluster test. Lines 29–30, which re-enabled replica failover on replicas 5, 6, 7 after pausing/terminating the majority of masters, are deleted. The test now proceeds from the kill phase directly to cluster-down assertions with replica failover disabled throughout.

Changes

Test Flow Modification

Layer / File(s)	Summary
Replica failover reset removal `tests/unit/cluster/manual-takeover.tcl`	Removed the loop resetting `cluster-replica-no-failover` back to `no` for replicas 5, 6, 7. The cluster-down phase now runs with replica failover still disabled after the majority-kill step.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

zuiderkwast
murphyjacob4

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix timing issue in cluster manual-takeover test' accurately and concisely describes the primary change: addressing a timing issue in a test file.
Description check	✅ Passed	The description is directly related to the changeset, clearly explaining the timing issue, root cause, and the reasoning for removing the replica failover re-enablement logic.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-25T10:12:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.71%. Comparing base (42765b2) to head (031414d).

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3826      +/-   ##
============================================
- Coverage     76.94%   76.71%   -0.24%     
============================================
  Files           162      162              
  Lines         80704    80704              
============================================
- Hits          62100    61911     -189     
- Misses        18604    18793     +189

see 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast · 2026-05-25T10:31:31Z

Thanks!

enjoy-binbin requested a review from zuiderkwast May 25, 2026 09:49

github-actions Bot assigned enjoy-binbin May 25, 2026

This was referenced May 25, 2026

Trigger failover check ASAP when cluster-replica-no-failover is disabled #3827

Merged

Enable multi-command parsing for replicated clients #3597

Merged

zuiderkwast approved these changes May 25, 2026

View reviewed changes

enjoy-binbin merged commit 68ed258 into valkey-io:unstable May 25, 2026
64 checks passed

enjoy-binbin deleted the fix_test branch May 25, 2026 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix timing issue in cluster manual-takeover test#3826

Fix timing issue in cluster manual-takeover test#3826
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:fix_test

enjoy-binbin commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

codecov Bot commented May 25, 2026

Uh oh!

zuiderkwast commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

enjoy-binbin commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

codecov Bot commented May 25, 2026

Codecov Report

Uh oh!

zuiderkwast commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 25, 2026 •

edited

Loading