Skip to content

tests: fix flakiness of tests for system.filesystem_cache#79633

Merged
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
azat:tests/fix-filesystem-cache-tests
May 19, 2025
Merged

tests: fix flakiness of tests for system.filesystem_cache#79633
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
azat:tests/fix-filesystem-cache-tests

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Apr 26, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

The problem is that some entries can be locked, i.e. due to merges, so let's:

  • disable merges for the tables that can cause this
  • add retries for SYSTEM DROP FILESYSTEM CACHE into all tests that query system.filesystem_cache w/o extra filters (i.e. by key)

Fixes: #75876

azat added 2 commits April 26, 2025 13:12
… segments

Merges in those tables can take significant amount of time (more them 10
minutes on some debug builds) and for the whole time they can lock some
cache segments and SYSTEM DROP FILESYSTEM CACHE cannot prune them.
It is ignored due to build folder can contain them, but we use them in
tests to store some additional helpers
@azat azat requested a review from kssenii April 26, 2025 11:27
@azat azat added the 🍃 green ci 🌿 Fixing flaky tests in CI label Apr 26, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 26, 2025

Workflow [PR], commit [2aeb3c5]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 26, 2025
@azat azat force-pushed the tests/fix-filesystem-cache-tests branch from f0184a5 to 856d0cd Compare April 26, 2025 15:01
@kssenii kssenii self-assigned this Apr 26, 2025
The problem is that for flaky check the thread fuzzer is enabled, and
access to system.remote_data_paths is too slow with it, i.e. 2.5 sec vs
7.7 sec, but on CI it is even bigger.
SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192, storage_policy='s3_cache'"

# Avoid locking filesystem cache
clickhouse-client --query "SYSTEM STOP MERGES test.hits"
Copy link
Copy Markdown
Member

@kssenii kssenii Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not stop merges, but use a its own cache for tests which fail if it actually does not matter for the test (there are tests which on purpose use a common cache to generate more load/concurrency with it). In this exact case for tests which were modified in this PR it does not matter.
Reason - I think I remember cases when caching on background merges helped to find some bug.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how separate cache will help.

Let me try to explain the problem, 02286_drop_filesystem_cache assumes that there will be no cache entries after SYSTEM DROP FILESYSTEM CACHE, but, due to very slow merges for test.hits some entries has not been removed since they are still in use.

Reason - I think I remember cases when caching on background merges helped to find some bug.

I think merges for other tables in tests should be enough to cover this.

Copy link
Copy Markdown
Member

@kssenii kssenii May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

02286_drop_filesystem_cache assumes that there will be no cache entries after SYSTEM DROP FILESYSTEM CACHE, but, due to very slow merges for test.hits some entries has not been removed since they are still in use.

yes, but there is no need to use the same cache object in test 02286_drop_filesystem_cache as for test.hits, so with independent caches we do not care about slow merges of test.hits. system.filesystem_cache has a name column, which is a name of the cache, so we can check only a specific cache in 02286_drop_filesystem_cache, so test.hits will not have any impact on it.

@alexey-milovidov alexey-milovidov self-assigned this May 19, 2025
@alexey-milovidov alexey-milovidov merged commit 1b7dc9c into ClickHouse:master May 19, 2025
117 of 120 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filesystem cache tests are flaky

4 participants