Skip to content

Fix MSan false positive in fiber stack memory#100138

Merged
alexey-milovidov merged 5 commits intomasterfrom
fix-msan-fiber-stack
Mar 22, 2026
Merged

Fix MSan false positive in fiber stack memory#100138
alexey-milovidov merged 5 commits intomasterfrom
fix-msan-fiber-stack

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Fiber stacks are heap-allocated via aligned_alloc, so MSan considers them entirely uninitialized. Unlike the program's main stack (which the OS zero-initializes), fiber stack bytes that are never explicitly written to remain "uninitialized" in MSan's shadow. This causes false positives when stack slots are reused across function calls within the fiber — MSan traces the origin back to an earlier allocation in a completely unrelated function.

The fix calls __msan_unpoison on the fiber stack after allocation. This is safe because MSan's per-variable lifetime tracking (via __lifetime.start / __lifetime.end compiler intrinsics) still properly resets shadow state for each local variable, so real uninitialized value bugs are still detected.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98677&sha=8536dc82c7ed23a6a5ebf073f36235f5be6ea9d7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Fiber stacks are heap-allocated via `aligned_alloc`, so MSan considers
them entirely uninitialized. Unlike the program's main stack (which
the OS zero-initializes), fiber stack bytes that are never explicitly
written to remain "uninitialized" in MSan's shadow. This causes false
positives when stack slots are reused across function calls within the
fiber — MSan traces the origin back to an earlier allocation in a
completely unrelated function.

The fix calls `__msan_unpoison` on the fiber stack after allocation.
This is safe because MSan's per-variable lifetime tracking (via
`__lifetime.start` / `__lifetime.end` compiler intrinsics) still
properly resets shadow state for each local variable, so real
uninitialized value bugs are still detected.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98677&sha=8536dc82c7ed23a6a5ebf073f36235f5be6ea9d7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 19, 2026

Workflow [PR], commit [296cd06]

Summary:


AI Review

Summary

This PR fixes an MSan false positive by unpoisoning memory allocated for FiberStack right after allocation. The change is minimal, targeted, and consistent with existing sanitizer abstractions (base/MemorySanitizer.h). I did not find correctness, safety, performance, or compatibility issues in the submitted diff.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-ci label Mar 19, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 22, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.90% +0.10%
Functions 24.90% 24.90% +0.00%
Branches 76.40% 76.50% +0.10%

PR changed lines: PR changed-lines coverage: 100.00% (3/3, 0 noise lines excluded)
Diff coverage report
Uncovered code

@alexey-milovidov alexey-milovidov self-assigned this Mar 22, 2026
@alexey-milovidov alexey-milovidov merged commit 4a2a70d into master Mar 22, 2026
151 of 152 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-msan-fiber-stack branch March 22, 2026 17:39
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 22, 2026

/// Mark the fiber stack memory as initialized for MSan.
/// Unlike the program's main stack (which the OS zero-initializes), fiber stacks are
/// heap-allocated via aligned_alloc, so MSan considers them uninitialized.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov can you explain why? MSan should handle aligned_alloc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this will help - #101621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants