Skip to content

[DEFLAKE] Deflake replica selection test by relaxing cluster configurations#3261

Merged
hpatro merged 2 commits into
valkey-io:unstablefrom
rainsupreme:deflake-replica-selection
Mar 12, 2026
Merged

[DEFLAKE] Deflake replica selection test by relaxing cluster configurations#3261
hpatro merged 2 commits into
valkey-io:unstablefrom
rainsupreme:deflake-replica-selection

Conversation

@rainsupreme

@rainsupreme rainsupreme commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

The "New Master down consecutively" test was sometimes failing under Valgrind by timing out. The new overrides match those used for the cluster in the first part of the file - see #2672

Under Valgrind's 10-20x slowdown, a single failover requiring ~15 seconds of server time can exceed the test's 100-second wall-clock wait.

Error text:

*** [err]: New Master down consecutively in tests/unit/cluster/slave-selection.tcl
No failover detected when master 12 fails

Daily run failure: https://github.com/valkey-io/valkey/actions/runs/22421982161/job/64921545936#logs

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Comment thread tests/unit/cluster/slave-selection.tcl
Signed-off-by: Rain Valentine <rainval@amazon.com>

@hpatro hpatro left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sarthakaggarwal97 Do we have mechanism to discover since when a test became flaky? Performance regression is easier to discover with the perf dashboard however behavior regression (failover time) is difficult to keep track off. Good to think a bit more on this.

@hpatro hpatro merged commit 5133023 into valkey-io:unstable Mar 12, 2026
56 checks passed
@hpatro

hpatro commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Thank you @rainsupreme!

@codecov

codecov Bot commented Mar 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (c471364) to head (d6d8d72).
⚠️ Report is 50 commits behind head on unstable.

Additional details and impacted files
@@       Coverage Diff        @@
##   unstable   #3261   +/-   ##
================================
================================
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

JimB123 pushed a commit that referenced this pull request Mar 19, 2026
…ations (#3261)

The "New Master down consecutively" test was sometimes failing under
Valgrind by timing out. The new overrides match those used for the
cluster in the first part of the file - see #2672

Under Valgrind's 10-20x slowdown, a single failover requiring ~15
seconds of server time can exceed the test's 100-second wall-clock wait.

Error text:
```
*** [err]: New Master down consecutively in tests/unit/cluster/slave-selection.tcl
No failover detected when master 12 fails
```

Daily run failure:
https://github.com/valkey-io/valkey/actions/runs/22421982161/job/64921545936#logs

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rainval@amazon.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 23, 2026
…ations (valkey-io#3261)

The "New Master down consecutively" test was sometimes failing under
Valgrind by timing out. The new overrides match those used for the
cluster in the first part of the file - see valkey-io#2672

Under Valgrind's 10-20x slowdown, a single failover requiring ~15
seconds of server time can exceed the test's 100-second wall-clock wait.

Error text:
```
*** [err]: New Master down consecutively in tests/unit/cluster/slave-selection.tcl
No failover detected when master 12 fails
```

Daily run failure:
https://github.com/valkey-io/valkey/actions/runs/22421982161/job/64921545936#logs

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rainval@amazon.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 27, 2026
…ations (valkey-io#3261)

The "New Master down consecutively" test was sometimes failing under
Valgrind by timing out. The new overrides match those used for the
cluster in the first part of the file - see valkey-io#2672

Under Valgrind's 10-20x slowdown, a single failover requiring ~15
seconds of server time can exceed the test's 100-second wall-clock wait.

Error text:
```
*** [err]: New Master down consecutively in tests/unit/cluster/slave-selection.tcl
No failover detected when master 12 fails
```

Daily run failure:
https://github.com/valkey-io/valkey/actions/runs/22421982161/job/64921545936#logs

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Signed-off-by: Rain Valentine <rainval@amazon.com>
(cherry picked from commit 09a13ec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants