Skip to content

[flaky-failure-fix] Increase the cluster-node-timeout to have longer delay between failover of each shard#2793

Merged
hpatro merged 3 commits into
valkey-io:unstablefrom
hpatro:failover2_flaky
Nov 7, 2025
Merged

[flaky-failure-fix] Increase the cluster-node-timeout to have longer delay between failover of each shard#2793
hpatro merged 3 commits into
valkey-io:unstablefrom
hpatro:failover2_flaky

Conversation

@hpatro

@hpatro hpatro commented Oct 31, 2025

Copy link
Copy Markdown
Contributor

Fixes: #2699

Removing the cluster-node-timeout override to allow more delay between a shard failover operation and avoid collision.

More explanation: #2699 (comment)

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, i will take a look next week.

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hpatro. The explanation makes sense.

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor

Comment thread tests/unit/cluster/failover2.tcl Outdated
@sarthakaggarwal97

sarthakaggarwal97 commented Nov 5, 2025

Copy link
Copy Markdown
Contributor

@hpatro I am seeing this flaky test in 8.0 and 8.1 as well.
Link for 8.0: https://github.com/sarthakaggarwal97/valkey/actions/runs/19117539761/job/54630320795#step:7:7896

I can help backport once this PR is merged

@codecov

codecov Bot commented Nov 5, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.40%. Comparing base (f54818c) to head (d448ddd).
⚠️ Report is 13 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2793      +/-   ##
============================================
+ Coverage     72.30%   72.40%   +0.10%     
============================================
  Files           128      128              
  Lines         70211    70270      +59     
============================================
+ Hits          50763    50877     +114     
+ Misses        19448    19393      -55     

see 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hpatro and others added 3 commits November 6, 2025 13:14
…er of each shard

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro hpatro merged commit 7f8c5b6 into valkey-io:unstable Nov 7, 2025
55 checks passed
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 15, 2025
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
zhijun42 pushed a commit to zhijun42/valkey that referenced this pull request Nov 28, 2025
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Dec 3, 2025
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.5 (not yet released) in Valkey 8.1 Dec 3, 2025
zuiderkwast pushed a commit that referenced this pull request Dec 4, 2025
@zuiderkwast zuiderkwast moved this from To be backported to 9.0.1 (WIP) in Valkey 9.0 Dec 4, 2025
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Dec 4, 2025
zuiderkwast pushed a commit that referenced this pull request Dec 9, 2025
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 29, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 30, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
@roshkhatri roshkhatri moved this from To be backported to 8.0.7 in Valkey 8.0 Jan 30, 2026
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Jan 30, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 4, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 18, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request Feb 20, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
madolson pushed a commit that referenced this pull request Feb 24, 2026
…delay between failover of each shard (#2793)

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
hpatro added a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…delay between failover of each shard (valkey-io#2793)

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
hpatro pushed a commit that referenced this pull request Jun 10, 2026
…er of each shard (#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in #2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes #3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 22, 2026
…er of each shard (valkey-io#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit f769037)
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 22, 2026
…er of each shard (valkey-io#3946) (#314)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#319)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#323)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#327)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#334)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#338)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 added a commit to sarthakaggarwal97/valkey that referenced this pull request Jun 23, 2026
…er of each shard (valkey-io#3946) (#343)

We previously attempted to set `cluster-node-timeout` to 15000 in valkey-io#2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes valkey-io#3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jul 3, 2026
…er of each shard (#3946)

We previously attempted to set `cluster-node-timeout` to 15000 in #2793
but failed. This was because we did not explicitly specify it and relied
on the default value, but `start_cluster` internally sets it to 3000.

Closes #3932.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 8.0.7 (WIP)
Status: 8.1.5
Status: 9.0.1

Development

Successfully merging this pull request may close these issues.

[TEST-FAILURE] Primaries will not time out then they are elected in the same epoch

5 participants