Skip to content

Reduce execution time for replica migration tests#4006

Merged
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
sushilpaneru1:simplify_replica_migration
Jun 22, 2026
Merged

Reduce execution time for replica migration tests#4006
enjoy-binbin merged 1 commit into
valkey-io:unstablefrom
sushilpaneru1:simplify_replica_migration

Conversation

@sushilpaneru1

@sushilpaneru1 sushilpaneru1 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

The test "Migrated replica reports zero repl offset and rank, and fails to win election - shutdown" takes upto 25s. This PR reduces it to ~4s while maintaining the same coverage.

This might end up closing #3992 as well

From https://github.com/valkey-io/valkey/actions/runs/27586402593/job/81557596685

[ok]: Migrated replica reports zero repl offset and rank, and fails to win election - shutdown (26545 ms)

After the fix

[ok]: Migrated replica reports zero repl offset and rank, and fails to win election - shutdown (4847 ms)

Update - Applied similar fix to other tests as well resulting

Test Before After
Migrated replica - shutdown ~25s ~4s
Migrated replica - sigstop ~4s ~4s
Non-empty replica - shutdown ~12.5s ~1.5s
Non-empty replica - sigstop ~3.5s ~2.5s
Sub-replica - shutdown ~13.5s ~2s
Sub-replica - sigstop ~3.5s ~2s

Fix

  • Replace ~10k individual INCR round-trips with single SET commands using large values.
  • Set shutdown-timeout 0 before shutting down primary 0.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Three cluster replica-migration test procedures (test_migrated_replica, test_nonempty_replica, test_sub_replica) are updated to seed data using fixed-size string repeat values (1 KB and 10 KB) instead of looped incr counters, validate post-write get results immediately, compare post-failover consistency checks against the captured string values, and add shutdown-timeout 0 to cluster startup configurations.

Changes

Replica migration test updates

Layer / File(s) Summary
test_migrated_replica updates
tests/unit/cluster/replica-migration.tcl
Direct writes of 1 KB and 10 KB strings to primary nodes replace looped increments; immediate get assertions validate seeded values. Post-failover consistency check on nodes 3/4/7 now compares reads against $small_value/$large_value. Shutdown-timeout 0 added to startup config for shutdown and sigstop variants.
test_nonempty_replica updates
tests/unit/cluster/replica-migration.tcl
Direct writes of 1 KB and 10 KB strings to primary nodes replace looped increments; immediate get assertions validate seeded values. Consistency check on nodes 4/7 now compares reads against $small_value/$large_value. Shutdown-timeout 0 added to startup config for shutdown and sigstop variants.
test_sub_replica updates
tests/unit/cluster/replica-migration.tcl
Direct writes of 1 KB and 10 KB strings to primary nodes replace looped increments; immediate get assertions validate seeded values. Consistency check on nodes 3/4/7 now compares reads against $small_value/$large_value. Shutdown-timeout 0 added to startup config for shutdown and sigstop variants.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • zuiderkwast
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'Reduce execution time for replica migration tests' directly reflects the main objective of the changeset, which is to optimize test performance by replacing repetitive operations with efficient SET commands and adjusting shutdown configuration.
Description check ✅ Passed The PR description clearly explains the performance optimization, showing specific timing improvements and listing the changes made to address test execution time.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sushilpaneru1

Copy link
Copy Markdown
Contributor Author

Looks @enjoy-binbin is the author of this test. Can you take a look please? Thanks.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/cluster/replica-migration.tcl`:
- Around line 30-34: The test writes a 10KB large value to key_977613 in slot 0
on primary 3, and when slot 0 is migrated to primary 0, this large payload
inflates server 4's replication offset, which weakens the regression check. To
fix this, keep the large padding write to generate the initial large
repl_offset, but then overwrite key_977613 with a small final value before the
slot migration/rebalancing occurs. This ensures the offset padding is
established without the migrated value itself being large. Apply the same fix to
the similar code at lines 110-112 where the large value write pattern is
repeated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8777d27d-edaf-4ff5-9847-727d4c35521f

📥 Commits

Reviewing files that changed from the base of the PR and between 436dcae and 5273e4e.

📒 Files selected for processing (1)
  • tests/unit/cluster/replica-migration.tcl

Comment thread tests/unit/cluster/replica-migration.tcl
Comment thread tests/unit/cluster/replica-migration.tcl
Comment thread tests/unit/cluster/replica-migration.tcl Outdated
Comment thread tests/unit/cluster/replica-migration.tcl Outdated
@enjoy-binbin

Copy link
Copy Markdown
Member

This might end up closing #3992 as well

I see the test fail in "failover not happen", did you know the reason?

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.77%. Comparing base (436dcae) to head (75857af).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #4006      +/-   ##
============================================
+ Coverage     76.75%   76.77%   +0.02%     
============================================
  Files           162      162              
  Lines         81017    81017              
============================================
+ Hits          62187    62204      +17     
+ Misses        18830    18813      -17     

see 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sushilpaneru1

Copy link
Copy Markdown
Contributor Author

I see the test fail in "failover not happen", did you know the reason?

No, my hunch is something to do with long running time. I have done 20+ runs and have not encountered any failure. (finger crossed) this fixes the flakiness as well.

Signed-off-by: Sushil Paneru <sushil.paneru1@gmail.com>
@sushilpaneru1 sushilpaneru1 force-pushed the simplify_replica_migration branch from 5273e4e to 75857af Compare June 19, 2026 04:43
@sushilpaneru1 sushilpaneru1 changed the title Reduce test time for - Migrated replica reports zero repl offset and rank, and fails to win election Reduce execution time for replica migration tests Jun 19, 2026

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@enjoy-binbin enjoy-binbin merged commit fa62dce into valkey-io:unstable Jun 22, 2026
64 checks passed
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.1 Jul 2, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request Jul 3, 2026
Replace ~10k individual INCR round-trips with single SET commands
using large values. And set shutdown-timeout 0 before shutting down
the node.

Signed-off-by: Sushil Paneru <sushil.paneru1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be backported

Development

Successfully merging this pull request may close these issues.

3 participants