Skip to content

Fix flaky test_jbod_balancer test#88104

Merged
azat merged 1 commit intoClickHouse:masterfrom
amosbird:try-fix-flaky-test-of-jbod-balancer
Oct 5, 2025
Merged

Fix flaky test_jbod_balancer test#88104
azat merged 1 commit intoClickHouse:masterfrom
amosbird:try-fix-flaky-test-of-jbod-balancer

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Oct 5, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

The test_jbod_balancer test occasionally fails with "There are still merges on-going after 20 assignments". It's likely due to recent changes in merge strategy which may result in more merge iterations needed to fully merge parts. Since the test performs 200 inserts, increase the merge attempt limit from 20 to 200 should be enough. The test still exits early when optimize_throw_if_noop confirms no more merges are possible, so this doesn't affect normal test time.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Oct 5, 2025

Workflow [PR], commit [0b01387]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 5, 2025
@azat azat self-assigned this Oct 5, 2025
@azat azat added the 🍃 green ci 🌿 Fixing flaky tests in CI label Oct 5, 2025

def wait_until_fully_merged(node, table):
for i in range(20):
for i in range(200):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that 20 is not enough, since there is assert_eq_with_retry(node, merges_count_query, "0\n", retry_count=20) below, which will wait 10 seconds each, so it means that before the maximum waiting period was 200, and now it is 2000.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it looks safe, since likely there will be zero merges usually.

Though maybe a better fix will be to tune merge selector algorithm, @amosbird WDYT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, assert_eq_with_retry here is not waiting for OPTIMIZE. It's synchronous, so merges are finished when it returns.
This check was only ensuring there are no background merges running before the test. It can probably be removed. It was likely added to reduce flakiness earlier.

Though maybe a better fix will be to tune merge selector algorithm

That might be too complicated. For this test we just need to wait until background merges are done and no new ones are triggered.

@azat azat added this pull request to the merge queue Oct 5, 2025
Merged via the queue into ClickHouse:master with commit 24074bc Oct 5, 2025
123 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants