Skip to content

Fix timing issue in cluster manual-takeover test#3826

Merged
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:fix_test
May 25, 2026
Merged

Fix timing issue in cluster manual-takeover test#3826
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:fix_test

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

Tests sometimes fail here. If we add after 1000 inside the foreach
loop, it becomes very easy to reproduce.

test "Use takeover to bring slaves back" {
    foreach id $replica_ids {
        R $id cluster failover takeover
    }
}

[exception]: Executing test client: ERR You should send CLUSTER FAILOVER to a replica.

The reason is that the transition from the PFAIL state to the FAIL
state which requires a quorum, may take a little extra time. From a
node's own perspective, it detects PFAIL and consequently views the
cluster as being in a FAIL state; and then, once the replica-no-failover
setting is disabled, auto failover may kick in, thereby causing the
test to fail.

Since we are verifying manual failover takeover, it is best to keep
cluster-replica-no-failover disabled at all times.

Tests sometimes fail here. If we add `after 1000` inside the foreach
loop, it becomes very easy to reproduce.
```
test "Use takeover to bring slaves back" {
    foreach id $replica_ids {
        R $id cluster failover takeover
    }
}

[exception]: Executing test client: ERR You should send CLUSTER FAILOVER to a replica.
```

The reason is that the transition from the PFAIL state to the FAIL
state which requires a quorum, may take a little extra time. From a
node's own perspective, it detects PFAIL and consequently views the
cluster as being in a FAIL state; and then, once the replica-no-failover
setting is disabled, auto failover may kick in, thereby causing the
test to fail.

Since we are verifying manual failover takeover, it is best to keep
`cluster-replica-no-failover` disabled at all times.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c2bea79c-8d40-4a96-8edc-346c789b5212

📥 Commits

Reviewing files that changed from the base of the PR and between 42765b2 and 031414d.

📒 Files selected for processing (1)
  • tests/unit/cluster/manual-takeover.tcl
💤 Files with no reviewable changes (1)
  • tests/unit/cluster/manual-takeover.tcl

📝 Walkthrough

Walkthrough

The PR removes replica failover reset logic from the manual takeover cluster test. Lines 29–30, which re-enabled replica failover on replicas 5, 6, 7 after pausing/terminating the majority of masters, are deleted. The test now proceeds from the kill phase directly to cluster-down assertions with replica failover disabled throughout.

Changes

Test Flow Modification

Layer / File(s) Summary
Replica failover reset removal
tests/unit/cluster/manual-takeover.tcl
Removed the loop resetting cluster-replica-no-failover back to no for replicas 5, 6, 7. The cluster-down phase now runs with replica failover still disabled after the majority-kill step.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

  • zuiderkwast
  • murphyjacob4
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix timing issue in cluster manual-takeover test' accurately and concisely describes the primary change: addressing a timing issue in a test file.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the timing issue, root cause, and the reasoning for removing the replica failover re-enablement logic.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.71%. Comparing base (42765b2) to head (031414d).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3826      +/-   ##
============================================
- Coverage     76.94%   76.71%   -0.24%     
============================================
  Files           162      162              
  Lines         80704    80704              
============================================
- Hits          62100    61911     -189     
- Misses        18604    18793     +189     

see 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast

Copy link
Copy Markdown
Contributor

Thanks!

@enjoy-binbin enjoy-binbin merged commit 68ed258 into valkey-io:unstable May 25, 2026
64 checks passed
@enjoy-binbin enjoy-binbin deleted the fix_test branch May 25, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants