Skip to content

Increase the cluster-node-timeout to have longer delay between failover of each shard#3946

Merged
hpatro merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:fix_test
Jun 10, 2026
Merged

Increase the cluster-node-timeout to have longer delay between failover of each shard#3946
hpatro merged 1 commit into
valkey-io:unstablefrom
enjoy-binbin:fix_test

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

We previously attempted to set cluster-node-timeout to 15000 in #2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but start_cluster internally sets it to 3000.

Closes #3932.

…er of each shard

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin requested a review from hpatro June 9, 2026 16:06
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 37e51acb-48df-4e9b-a843-23328894bed6

📥 Commits

Reviewing files that changed from the base of the PR and between 69004ca and 919201d.

📒 Files selected for processing (1)
  • tests/unit/cluster/failover2.tcl

📝 Walkthrough

Walkthrough

The cluster failover test suite configuration was updated to extend the cluster-node-timeout to 15000 milliseconds, alongside the existing cluster-ping-interval 1000 override, to provide additional time for failover operations in the test environment.

Changes

Cluster failover test timeout configuration

Layer / File(s) Summary
Cluster failover test timeout configuration
tests/unit/cluster/failover2.tcl
The start_cluster 7 3 invocation now includes cluster-node-timeout 15000 to extend the node timeout, keeping the existing cluster-ping-interval 1000 and external skip tag.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

  • zuiderkwast
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive The PR partially addresses #3932 by implementing the timeout configuration fix, but the issue also requires investigation and comprehensive testing. Verify that setting cluster-node-timeout to 15000 resolves the test failure and that no additional logging, timing adjustments, or code fixes are required per #3932.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: increasing cluster-node-timeout to create longer delays between failover attempts.
Description check ✅ Passed The description explains the context and reason for the change, referencing previous attempt #2793 and linking to issue #3932.
Out of Scope Changes check ✅ Passed The single line change directly addresses the issue by explicitly setting cluster-node-timeout to 15000 in the test configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.69%. Comparing base (d9a80af) to head (919201d).
⚠️ Report is 2 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3946      +/-   ##
============================================
+ Coverage     76.64%   76.69%   +0.04%     
============================================
  Files           162      162              
  Lines         80733    80733              
============================================
+ Hits          61880    61920      +40     
+ Misses        18853    18813      -40     

see 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this would work. I was about to raise a PR to fix this today. I tested the fix I was working on, 200-300 times and it passes. I am confident your change would pass too.

@enjoy-binbin

Copy link
Copy Markdown
Member Author

@sarthakaggarwal97 thanks for the verify, 5000 was actually the value we used at the first time, #2793 changed it to 3000 by accident, i was surprised that it managed to pass so many times.

@hpatro hpatro merged commit f769037 into valkey-io:unstable Jun 10, 2026
64 checks passed
@enjoy-binbin enjoy-binbin deleted the fix_test branch June 10, 2026 11:44
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 22, 2026
…er of each shard (valkey-io#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit f769037)
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 22, 2026
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 22, 2026
…er of each shard (valkey-io#3946) (#314)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#319)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#323)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#327)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#334)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#338)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#343)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.1 Jul 2, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request Jul 3, 2026
…er of each shard (#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in #2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes #3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be backported

Development

Successfully merging this pull request may close these issues.

[TEST-FAILURE] Primaries will not time out then they are elected in the same epoch in tests/unit/cluster/failover2.tcl

3 participants