Skip to content

raftstore: Fix flaky test_split_region_with_no_valid_split_keys#17953

Merged
ti-chi-bot[bot] merged 3 commits intotikv:masterfrom
hbisheng:fix-flaky-split
Dec 10, 2024
Merged

raftstore: Fix flaky test_split_region_with_no_valid_split_keys#17953
ti-chi-bot[bot] merged 3 commits intotikv:masterfrom
hbisheng:fix-flaky-split

Conversation

@hbisheng
Copy link
Member

@hbisheng hbisheng commented Dec 9, 2024

What is changed and how it works?

Issue Number: Close #17557

What's Changed:

The test expects a split to occur but sometimes failed because the 
split check was triggered too early (before the DB had enough keys) 
and found no split key. Once the first split check finds nothing, 
subsequent checks are delayed until `size_diff_hint` (an approximate 
measure of region size change) reaches the `region_split_check_diff` 
threshold. By lowering this threshold, the second split check is 
triggered in time to meet the test's expectation.

In prod, `region_split_check_diff` is typically set to 1/16 of the 
`region_split_size`. For this test, setting it equal to 
`region_split_size` was sufficient to fix the test flakiness.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Before the fix, the failure can be consistently reproduced within 100 runs.
  • After the fix, there was no failure after 500 runs.
for i in {1..500}; do                                                
  echo "Run #$i"
  cargo test --package tests --test failpoints --features failpoints --features testexport -- cases::test_split_region::test_split_region_with_no_valid_split_keys --exact --show-output || break
done
  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

The test expects a split to occur but sometimes failed because the split check was triggered too early (before the DB had enough keys) and found no split key. Once the first split check finds nothing, subsequent checks are delayed until size_diff_hint (an approximate measure of region size change) reaches the region_split_check_diff threshold. By lowering this threshold (default: 10,000, defined in components/test_raftstore/src/common-test.toml), the second split check is triggered in time to meet the test's expectations.

Before the fix, the failure can be consistently reproduced within 100 runs. After the fix, there is no failure after 500 runs.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-triage-completed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 9, 2024
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@glorv
Copy link
Contributor

glorv commented Dec 9, 2024

/check-issue-triage-complete

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 9, 2024
@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 10, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: glorv, LykxSassinator

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LykxSassinator,glorv]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 10, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 10, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-12-09 11:10:26.078732237 +0000 UTC m=+264016.167534772: ☑️ agreed by glorv.
  • 2024-12-10 09:06:52.514699258 +0000 UTC m=+343002.603501801: ☑️ agreed by LykxSassinator.

@glorv
Copy link
Contributor

glorv commented Dec 10, 2024

/retest

@ti-chi-bot ti-chi-bot bot merged commit 4c68641 into tikv:master Dec 10, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Dec 10, 2024
@hbisheng hbisheng deleted the fix-flaky-split branch July 14, 2025 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: test_split_region_with_no_valid_split_keys

3 participants