Skip to content

ci: limit stateless tests to 5G of RAM and fix tests that exceeds this limit#82361

Merged
azat merged 31 commits intoClickHouse:masterfrom
azat:tests-memory-limit
Jan 7, 2026
Merged

ci: limit stateless tests to 5G of RAM and fix tests that exceeds this limit#82361
azat merged 31 commits intoClickHouse:masterfrom
azat:tests-memory-limit

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jun 22, 2025

Jobs with sanitizers shows lots of OOM, last time (#76143) it happened due to some test spawns tons of clients, and sanitizer binary takes 650MiB each in RAM. But sanitizers was not the only problem.

After proper limit has been applied, multiple tests has been fixed:

  • some runs clickhouse-client in parallel - converted to use curl
  • some runs clickhouse-server - converted to integration tests
  • some requires more memory - added ability to override limit on a per-test basis (via Memory limits: 10 GiB comment)
    • 02481_parquet_list_monotonically_increasing_offsets - this one may eat up to 16GiB with different randomizations, due to larger block size, pin some settings to avoid excessive memory usage

And overall I think it should help with OOMs (maybe not entirely, but at least now we have memory usage under control), and continue investigate it if it will still be an issue

This is resubmit of #76388, which did not work because address space limit, which is way different thing, especially for the binaries built with sanitizers

Fixes: #86244
Refs: #82036

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@azat azat changed the title tests: add 5GB per-test memory limit tests: add per-test memory limit Jun 22, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 22, 2025

Workflow [PR], commit [333b9dc]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 2/5) failure
test_backup_restore_on_cluster/test_concurrency.py::test_create_or_drop_tables_during_backup[Replicated-ReplicatedMergeTree] FAIL cidb, issue
test_backup_restore_on_cluster/test_concurrency.py::test_create_or_drop_tables_during_backup[Memory-MergeTree] FAIL cidb, issue
test_backup_restore_on_cluster/test_concurrency.py::test_create_or_drop_tables_during_backup[Lazy-Log] FAIL cidb, issue
test_backup_restore_on_cluster/test_concurrency.py::test_kill_mutation_during_backup FAIL cidb, issue
Integration tests (arm_binary, distributed plan, 1/4) failure
test_backup_restore_on_cluster/test_disallow_concurrency.py::test_concurrent_restores_on_different_node FAIL cidb
Integration tests (arm_binary, distributed plan, 3/4) failure
test_userspace_page_cache/test.py::test_size_adjustment FAIL cidb, issue
Integration tests (arm_binary, distributed plan, 4/4) failure
test_peak_memory_usage/test.py::test_clickhouse_client_max_peak_memory_usage_distributed FAIL cidb, issue
BuzzHouse (amd_tsan) failure
Logical error: Invalid number of columns in chunk pushed to OutputPort. Expected A, found B (STID: 2270-3184) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jun 22, 2025
@azat azat mentioned this pull request Jun 21, 2025
1 task
@azat azat force-pushed the tests-memory-limit branch from 73c5683 to 66b34aa Compare June 22, 2025 16:35
@azat azat marked this pull request as draft June 22, 2025 20:25
@azat azat force-pushed the tests-memory-limit branch from 25ed136 to 97ba40b Compare June 22, 2025 20:42
@azat azat force-pushed the tests-memory-limit branch from 97ba40b to d3769a9 Compare June 23, 2025 13:25
@azat

This comment was marked as off-topic.

@clickhouse-gh

This comment was marked as off-topic.

2 similar comments
@clickhouse-gh

This comment was marked as off-topic.

@clickhouse-gh

This comment was marked as off-topic.

@alexey-milovidov
Copy link
Copy Markdown
Member

Looks outdated, better to open a new one.

@alexey-milovidov
Copy link
Copy Markdown
Member

Although I really want it to be done.

@azat azat reopened this Jan 1, 2026
@azat azat force-pushed the tests-memory-limit branch from d3769a9 to 578292f Compare January 1, 2026 17:56
@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 1, 2026

After playing with this on and on, it looks like it finally works - https://pastila.nl/?048262d2/25f2b6db5891bc511142691744c99b29#qW7nSg2x1zmwYp/tsic5ww==GCM

Though it will not work outside of docker (systemd does not allows to propagate cgroup.subtree_control properly, and you cannot adjust it after the cgroup has processes)

@azat azat marked this pull request as ready for review January 1, 2026 17:58
@azat azat force-pushed the tests-memory-limit branch from 578292f to 073d4db Compare January 1, 2026 18:11
Jobs with sanitizers shows lots of OOM, last time (ClickHouse#76143) it happened
due to some test spawns tons of clients, and sanitizer binary takes
650MiB each in RAM

This is resubmit of ClickHouse#76388, which did not work because address space
limit, which is way different thing, especially for the binaries built
with sanitizers

Refs: ClickHouse#82036

v2: use cgroup v2
@azat azat force-pushed the tests-memory-limit branch from 073d4db to 93536c2 Compare January 1, 2026 18:12
@azat azat marked this pull request as draft January 1, 2026 19:46
@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 2, 2026

And we already have tests that does not fit into 5G (at least with sanitizers)

@azat azat marked this pull request as ready for review January 2, 2026 16:09
@azat azat force-pushed the tests-memory-limit branch from b72522d to dcbf59c Compare January 2, 2026 16:37
@azat
Copy link
Copy Markdown
Member Author

azat commented Jan 3, 2026

Splitted out tests changes - #93362 (to avoid conflicts in case this will take sometime to merge #82361 (comment))

@maxknv maxknv self-assigned this Jan 7, 2026
@azat azat enabled auto-merge January 7, 2026 22:22
@azat azat added this pull request to the merge queue Jan 7, 2026
Merged via the queue into ClickHouse:master with commit 339c408 Jan 7, 2026
125 of 131 checks passed
@azat azat deleted the tests-memory-limit branch January 7, 2026 22:35
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 7, 2026
azat added a commit to azat/ClickHouse that referenced this pull request Jan 13, 2026
Now it always fails with

    [2026-01-13 20:29:23] 2026-01-13 15:29:23 Running about 1 stateless tests (Process-3).
    [2026-01-13 20:29:23] Failed to configure cgroup clickhouse-test-5281: [Errno 30] Read-only file system: '/sys/fs/cgroup/memory/clickhouse-test-5281'
    [2026-01-13 20:29:23] 2026-01-13 15:29:23 03790_uniqTheta_error:                                                  [ UNKNOWN ] 0.14 sec.
    [2026-01-13 20:29:23] 2026-01-13 15:29:23 Reason: Test internal error:
    [2026-01-13 20:29:23] 2026-01-13 15:29:23 FileNotFoundError
    [2026-01-13 20:29:23] 2026-01-13 15:29:23 [Errno 2] No such file or directory: '/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/tests/queries/0_stateless/03790_uniqTheta_error.stdout'

Introduced in: ClickHouse#82361
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stateless tests: OOM (TSan, UBsan)

4 participants