Skip test_overcommit_tracker/test_user_overcommit under MSan by alexey-milovidov · Pull Request #100621 · ClickHouse/ClickHouse

alexey-milovidov · 2026-03-24T15:37:59Z

Skip the test under MSan instead of adding retries — simpler fix.

The test relies on memory pressure to selectively kill queries based on memory_overcommit_ratio_denominator. Under MSan (~3x memory overhead), the test is too unreliable.

Supersedes #100567

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100404&sha=4e239671fdc3523725369f5ad6028b4a52c43b45&name_0=PR&name_1=Integration%20tests%20%28amd_msan%2C%204%2F6%29

Related issue: #45173

Changelog category (leave one):

CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

The test relies on memory pressure to selectively kill queries based on `memory_overcommit_ratio_denominator`. Under MSan (~3x memory overhead), the test is too unreliable. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100404&sha=4e239671fdc3523725369f5ad6028b4a52c43b45&name_0=PR&name_1=Integration%20tests%20%28amd_msan%2C%204%2F6%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-24T15:38:46Z

Workflow [PR], commit [9da63c5]

Summary: ✅

AI Review

Summary

This PR updates tests/integration/test_overcommit_tracker/test.py to reduce flakiness by skipping test_user_overcommit on sanitizer builds with high memory overhead and by adding retry logic for nondeterministic memory-pressure behavior. After reviewing the final diff and existing inline discussions, I did not find additional correctness, safety, or performance issues that warrant new inline comments.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	➖
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
PR metadata quality	✅
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ✅ Approve

The test failed under ASan with "No route to host" because the server was killed by OOM during the memory pressure phase (ASan has ~2x memory overhead). This matches `test_global_overcommit_tracker` which already skips under TSan, ASan, MSan, and LLVM coverage. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100621&sha=edaffdae42cd6e7dfcefec21e766f15f479e24ff&name_0=PR&name_1=Integration%20tests%20%28amd_asan_ubsan%2C%20flaky%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-24T21:02:14Z

tests/integration/test_overcommit_tracker/test.py


 def test_user_overcommit():
+    if (
+        node.is_built_with_thread_sanitizer()


The PR title/description says this should be skipped under MSan, but this condition now skips under TSan, ASan, and LLVM coverage as well. That broadens the reduction in coverage and can hide regressions outside the reported failure mode. Please either narrow this to is_built_with_memory_sanitizer() only, or add evidence that each additional mode is also consistently OOM/flaky for this test.

arsenmuk · 2026-03-25T07:19:14Z

tests/integration/test_overcommit_tracker/test.py

@alexey-milovidov not related to the change, but am I correct to suspect that we never check overcommited_killed? The logic should be

assert A_is_killed and not B_is_killed

isn't it?

tests/integration/test_overcommit_tracker/test.py

Co-authored-by: Arsen Muk <arsen.mukuchyan@gmail.com>

clickhouse-gh · 2026-03-25T12:32:02Z

tests/integration/test_overcommit_tracker/test.py

        if err == "":
            finished = True

+    assert overcommited_killed, "overcommited is not killed"


Typo on a changed line: overcommited → overcommitted in the assertion message.

Suggested message: assert overcommited_killed, "overcommitted query is not killed".

…mmit-tracker

Address review feedback: - Narrow the sanitizer skip to MSan only (where ~3x memory overhead makes the test consistently unreliable), removing unnecessary skips for ASan, TSan, and coverage builds. - Add retry logic (up to 3 attempts) for the nondeterministic memory pressure behavior that can cause sporadic failures on regular builds. - Fix typo in assertion message: "overcommited" -> "overcommitted". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-03-28T16:39:16Z

tests/integration/test_overcommit_tracker/test.py

+            break
+
+        last_error = (
+            f"attempt {attempt + 1}: overcommited_killed={overcommited_killed}, finished={finished}"


Minor typo in changed user-visible diagnostics: overcommited_killed should be overcommitted_killed in this last_error string for consistency/readability.

Suggested change:

last_error = ( f"attempt {attempt + 1}: overcommitted_killed={overcommited_killed}, finished={finished}" )

…mmit-tracker

Address review feedback: fix the spelling of "overcommitted" in variable names and diagnostic messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI shows the test also times out under ASan/UBSan (600s timeout). The memory overhead from sanitizers makes memory pressure tests unreliable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rcommit-tracker Skip test_overcommit_tracker/test_user_overcommit under MSan

azat · 2026-03-30T20:16:46Z

tests/integration/test_overcommit_tracker/test.py

+        )

-    assert finished, "all tasks are killed"
+    assert overcommitted_killed, f"overcommitted query is not killed ({last_error})"


The test verifies that the memory overcommit tracker selectively kills queries with low memory_overcommit_ratio_denominator under user memory pressure. It became flaky after PR ClickHouse#100621 added the assert overcommitted_killed check, because slow CI builds (coverage, ARM, sanitizers) run fewer concurrent queries, so peak user memory rarely reached the old 2 GB limit. Root cause: 100 queries × ~20 MB each need 100+ concurrent overlap to hit 2 GB; slow builds achieve maybe 10-20 concurrent, well below the threshold. Changes: - Lower max_memory_usage_for_user from 2 GB to 300 MB — even 8 concurrent queries at 40 MB each now exceed the limit - Increase per-query memory from ~20 MB to ~40 MB (numbers(5000000)) - Reduce total queries from 100 to 40 (still plenty for the test) - Skip under ThreadSanitizer (joins MSan/ASan/coverage skip) — TSAN 5x memory overhead causes all queries to be killed including the "protected" ones, producing the opposite failure mode - Increase retry attempts from 3 to 5 for probabilistic robustness Fixes: ClickHouse#101318 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh bot added the pr-ci label Mar 24, 2026

clickhouse-gh bot reviewed Mar 24, 2026

View reviewed changes

arsenmuk self-assigned this Mar 25, 2026

arsenmuk reviewed Mar 25, 2026

View reviewed changes

arsenmuk approved these changes Mar 25, 2026

View reviewed changes

arsenmuk reviewed Mar 25, 2026

View reviewed changes

tests/integration/test_overcommit_tracker/test.py Outdated Show resolved Hide resolved

Update tests/integration/test_overcommit_tracker/test.py

74128e3

Co-authored-by: Arsen Muk <arsen.mukuchyan@gmail.com>

clickhouse-gh bot reviewed Mar 25, 2026

View reviewed changes

alexey-milovidov and others added 2 commits March 28, 2026 16:31

Merge remote-tracking branch 'origin/master' into no-msan-test-overco…

c1149ae

…mmit-tracker

clickhouse-gh bot reviewed Mar 28, 2026

View reviewed changes

alexey-milovidov and others added 3 commits March 28, 2026 23:50

Merge remote-tracking branch 'origin/master' into no-msan-test-overco…

71f582a

…mmit-tracker

Fix typo: overcommited → overcommitted

f7cc0af

Address review feedback: fix the spelling of "overcommitted" in variable names and diagnostic messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Also skip test_user_overcommit under ASan

9da63c5

CI shows the test also times out under ASan/UBSan (600s timeout). The memory overhead from sanitizers makes memory pressure tests unreliable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexey-milovidov merged commit 1433e40 into master Mar 30, 2026
152 of 153 checks passed

alexey-milovidov deleted the no-msan-test-overcommit-tracker branch March 30, 2026 03:26

robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 30, 2026

Desel72 pushed a commit to Desel72/ClickHouse that referenced this pull request Mar 30, 2026

Merge pull request ClickHouse#100621 from ClickHouse/no-msan-test-ove…

20d8e0b

…rcommit-tracker Skip test_overcommit_tracker/test_user_overcommit under MSan

azat mentioned this pull request Mar 30, 2026

tests: attempt to fix test_overcommit_tracker flakiness under amd_llvm_coverage #101270

Merged

azat reviewed Mar 30, 2026

View reviewed changes

groeneai mentioned this pull request Apr 2, 2026

Fix flaky test_overcommit_tracker/test_user_overcommit #101574

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip test_overcommit_tracker/test_user_overcommit under MSan#100621

Skip test_overcommit_tracker/test_user_overcommit under MSan#100621
alexey-milovidov merged 8 commits intomasterfrom
no-msan-test-overcommit-tracker

alexey-milovidov commented Mar 24, 2026

Uh oh!

clickhouse-gh bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

clickhouse-gh bot Mar 24, 2026

Uh oh!

arsenmuk Mar 25, 2026

Uh oh!

Uh oh!

clickhouse-gh bot Mar 25, 2026

Uh oh!

clickhouse-gh bot Mar 28, 2026

Uh oh!

Uh oh!

azat Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexey-milovidov commented Mar 24, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

ClickHouse Rules

Final Verdict

Uh oh!

clickhouse-gh bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

arsenmuk Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clickhouse-gh bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

azat Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clickhouse-gh bot commented Mar 24, 2026 •

edited

Loading