Skip to content

Fix heap-use-after-free in MergeTreeReadTask::createReaders#99483

Merged
alexey-milovidov merged 21 commits into
masterfrom
fix-heap-use-after-free-merge-tree-read-task
Mar 26, 2026
Merged

Fix heap-use-after-free in MergeTreeReadTask::createReaders#99483
alexey-milovidov merged 21 commits into
masterfrom
fix-heap-use-after-free-merge-tree-read-task

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Mar 14, 2026

Copy link
Copy Markdown
Member

Summary

Fix heap-use-after-free where MergeTreeReadPoolBase accesses storage through data_part->storage (a bare const MergeTreeData & reference) after the storage has been destroyed.

Querying checks.test_context_raw on ClickHouse Playground reveals 19 occurrences of the same root cause in the last 90 days across two distinct crash patterns:

Pattern 1 — LoadedMergeTreeDataPartInfoForReader constructor (3 occurrences): During query execution, createReaders constructs LoadedMergeTreeDataPartInfoForReader which calls data_part->storage.getContext(). If the storage is already freed, this is a use-after-free.

Pattern 2 — IMergeTreeDataPart::clearCaches during part destruction (16 occurrences): When the read pool is destroyed, data parts are released. Part destructors call removeIfNeeded()clearCaches()storage.getContext() to clear mark/uncompressed caches. If the storage is freed before parts, this crashes.

Root cause: The query pipeline normally holds a StoragePtr via QueryPlanResourceHolder, but the read pool itself does not hold a storage reference. If any code path fails to set up addStorageHolder at the pipeline level (or if the pipeline resources are released before the pool is destroyed), the storage can be freed while parts still reference it.

Fix: Store a ConstStoragePtr (via shared_from_this) in MergeTreeReadPoolBase. It is declared as the first data member so it is destroyed last, guaranteeing the storage outlives all data parts held by the pool (parts_ranges, per_part_infos). This fixes both crash patterns.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=022e467a014b9b3f1fc993735bc4f6b3974e59a0&name_0=MasterCI&name_1=Stress%20test%20%28arm_asan%2C%20s3%29

Test plan

  • CI stress tests with ASan/TSan should be clean (covers both patterns)

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix heap-use-after-free when a table is dropped concurrently with a running read query (19 occurrences in CI over the last 90 days).

🤖 Generated with Claude Code

`MergeTreeReadTask::createReaders` accessed `data_part->storage.getContext()`
(via `LoadedMergeTreeDataPartInfoForReader` constructor) and
`data_part->storage.getSettings()` through a potentially dangling reference.

When a table is dropped by `DatabaseCatalog::dropTablesParallel` while a
query is still reading from it, the `StorageMergeTree` object is destroyed,
but the data parts held by the query still reference it via a bare
`const MergeTreeData & storage`. Accessing this dangling reference causes
a heap-use-after-free.

The fix captures the `ContextPtr` and `MergeTreeSettingsPtr` during read
pool construction (when the storage is guaranteed to be alive) and passes
them via `MergeTreeReadTask::Extras` to avoid accessing `data_part->storage`
during query execution.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=022e467a014b9b3f1fc993735bc4f6b3974e59a0&name_0=MasterCI&name_1=Stress%20test%20%28arm_asan%2C%20s3%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 14, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [6cf20e8]

Summary:


AI Review

Summary

This PR fixes a heap-use-after-free in MergeTree read teardown when enable_shared_storage_snapshot_in_query=0 by preserving storage lifetime in stripped snapshots instead of dropping all snapshot data. It also adds targeted regression coverage, including the projection-index path that dereferences data_part->storage during reader creation. Based on the current diff, I did not find new high-confidence correctness, safety, or performance issues to block merge.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

Comment thread src/Storages/MergeTree/MergeTreeReadPoolBase.cpp Outdated
alexey-milovidov and others added 2 commits March 13, 2026 20:17
…uction

The `MergeTreeReadPoolBase` accesses storage through `data_part->storage`
(a bare reference) during query execution. If the storage is destroyed
concurrently (e.g. by `DatabaseCatalog::dropTablesParallel`), this
becomes a dangling reference, causing a heap-use-after-free.

The query pipeline normally holds a `StoragePtr` via
`QueryPlanResourceHolder`, but certain code paths (e.g. backup restore)
may not set this up correctly, allowing the storage to be freed.

The fix stores a `ConstStoragePtr` (via `shared_from_this`) in
`MergeTreeReadPoolBase` during construction, guaranteeing the storage
outlives the read pool regardless of the pipeline's resource management.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=022e467a014b9b3f1fc993735bc4f6b3974e59a0&name_0=MasterCI&name_1=Stress%20test%20%28arm_asan%2C%20s3%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move owned_data_storage before parts_ranges and per_part_infos to
ensure it is destroyed after them. Part destructors call
clearCaches() -> storage.getContext(), which requires the storage
to still be alive. With the previous ordering, owned_data_storage
could be destroyed before parts_ranges, causing use-after-free in
part destructors.

This addresses both crash patterns found in CI:
- Pattern 1 (3 occurrences): LoadedMergeTreeDataPartInfoForReader
  constructor accessing storage.getContext() during query execution
- Pattern 2 (16 occurrences): IMergeTreeDataPart::clearCaches()
  accessing storage.getContext() during part destruction in
  the read pool destructor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Root cause analysis

Querying checks.test_context_raw on ClickHouse Playground reveals 19 occurrences of the same root cause in the last 90 days across two distinct crash patterns:

Pattern 1 — LoadedMergeTreeDataPartInfoForReader constructor (3 occurrences): During query execution, createReaders constructs LoadedMergeTreeDataPartInfoForReader which calls data_part->storage.getContext() on a freed storage.

Pattern 2 — IMergeTreeDataPart::clearCaches during part destruction (16 occurrences): When parts are destroyed, their destructors call clearCaches()storage.getContext() on a freed storage. This is the dominant pattern.

Destruction chain (from CI ASan traces)

Both the "freed by" and the crash happen on the same thread — this is not a concurrency race, it's a sequential destruction order issue.

The ASan traces from stateless tests (distributed plan) show the crash happens inside SnapshotData::~SnapshotData, where SnapshotData::parts (RangesInDataPartsPtr, a shared_ptr) releases the last reference to parts:

StorageSnapshot::~StorageSnapshot
  → SnapshotData::~SnapshotData
    → ~shared_ptr<RangesInDataParts>     ← parts destroyed HERE
      → ~RangesInDataPart → ~MergeTreeDataPartCompact
        → clearCaches() → storage.getContext()   ← CRASH

And the "freed by" trace (same thread, earlier in the chain) shows the storage was freed by a different SnapshotData:

SnapshotData::~SnapshotData
  → ~shared_ptr<IStorage const>          ← storage freed

Root cause

There are two StorageSnapshot objects with different SnapshotData instances. One has SnapshotData::storage set (holds ConstStoragePtr), the other has it null (created via getStorageSnapshotWithoutData at MergeTreeData.cpp:10483-10484, which default-initializes SnapshotData without calling shared_from_this()). But both can share the same RangesInDataPartsPtr parts (since it's a shared_ptr).

During destruction on the same thread:

  1. SnapshotData A destroyed: parts released (shared, refcount still > 0), then storage released → storage freed (was the last ConstStoragePtr)
  2. SnapshotData B destroyed: parts released (last ref) → parts destroyed → clearCaches()storage.getContext()CRASH (storage already freed)

Fix in this PR

The fix stores a ConstStoragePtr (via shared_from_this) as the first member of MergeTreeReadPoolBase, ensuring it is destroyed last — after all data parts held by the pool. This guarantees the storage outlives all parts regardless of the snapshot lifecycle or pipeline destruction order.

A deeper fix would be to always set SnapshotData::storage in getStorageSnapshotWithoutData as well, but that has a broader scope.

@alexey-milovidov alexey-milovidov requested a review from azat March 14, 2026 04:39
Preserve  when  removes parts from the , and make  retain the actual data storage instead of blindly retaining .

This covers wrapper storages such as  and prevents  from observing dangling  during read-pool teardown.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=022e467a014b9b3f1fc993735bc4f6b3974e59a0&name_0=MasterCI&name_1=Stress%20test%20%28arm_asan%2C%20s3%29
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Further investigation by Codex GPT 5.4

I found the missing explanation for Mechanism B.

ReadFromMergeTree::initializePipeline can execute merge_tree_enable_remove_parts_from_snapshot_optimization and replace storage_snapshot->data with a new MergeTreeData::SnapshotData. That destroys the original SnapshotData::storage before the read pools are torn down. So the pool can still hold a live StorageSnapshotPtr, but its data payload no longer keeps the underlying MergeTreeData alive, and later part teardown reaches clearCaches -> data_part->storage.getContext() with a dangling storage.

I pushed commit b14fa7f4a10 with two hardenings:

  • preserve SnapshotData::storage when stripping SnapshotData::parts in ReadFromMergeTree;
  • make MergeTreeReadPoolBase retain MergeTreeData::SnapshotData::storage or the parts' own storage, not blindly storage_snapshot->storage.

The second point matters for wrapper storages such as StorageFromMergeTreeProjection, where storage_snapshot->storage is not the same object as data_part->storage.

I also ran direct -fsyntax-only compiles for ReadFromMergeTree.cpp and MergeTreeReadPoolBase.cpp; both passed.

@Algunenano Algunenano added pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-critical-bugfix labels Mar 15, 2026
alexey-milovidov and others added 2 commits March 16, 2026 02:43
The test was waiting for `MergeTreeReadersChain` to appear in the
`--server_logs_file`, but `WriteBufferFromFile` uses a 1 MB buffer
(`DBMS_DEFAULT_BUFFER_SIZE`) that is only flushed when the query
finishes or the buffer fills up. So `grep` on the file while the
query is running never sees the text, and the test times out.

Replace the log-file-based synchronization with a check on
`system.processes.read_rows > 0`, which reliably detects that the
query has started reading (and therefore has created readers that
hold data parts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--enable_shared_storage_snapshot_in_query 0 \
--merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_injection_probability 0 \
--query "
SELECT sleepEachRow(0.01)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This regression test validates the regular read-pool teardown path, but it does not force the projection-index read-pool path where MergeTreeReadTask::createReaders still dereferences data_part->storage.getSettings(). Please add/adjust coverage to exercise projection-index reads under teardown race as well.

alexey-milovidov and others added 2 commits March 16, 2026 03:40
Exercise `MergeTreeReadPoolProjectionIndex` during teardown to cover
the `data_part->storage.getSettings()` dereference in `createReaders`
on projection parts.

#99483 (review)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…teardown_race`

The test checked for unexpected exceptions by grepping for "Exception" in
stderr, then excluding lines containing "QUERY_WAS_CANCELLED". However,
the `KILL QUERY` stack trace includes lines like `DB::Exception::Exception`
that do not contain the string "QUERY_WAS_CANCELLED", causing false positives.

Use `grep -F "Code:"` instead, which matches actual error message lines
(e.g. `Code: 394. DB::Exception: ...`) rather than stack trace class names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread .claude/learnings.md Outdated
@@ -0,0 +1,2 @@
- `ReadFromMergeTree::initializePipeline` can replace `storage_snapshot->data` during `merge_tree_enable_remove_parts_from_snapshot_optimization`; when stripping `SnapshotData::parts`, preserve `SnapshotData::storage` or `IMergeTreeDataPart::clearCaches` can observe a dangling `data_part->storage`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This file looks like local reviewer/assistant state rather than product code. Please remove .claude/learnings.md from the PR before merge to avoid committing tool-specific internal notes into the repository history.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks related

@azat azat self-assigned this Mar 16, 2026

@azat azat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part in ReadFromMergeTree looks good
Until #95612 will be resolved

Comment thread .claude/learnings.md Outdated
@@ -0,0 +1,2 @@
- `ReadFromMergeTree::initializePipeline` can replace `storage_snapshot->data` during `merge_tree_enable_remove_parts_from_snapshot_optimization`; when stripping `SnapshotData::parts`, preserve `SnapshotData::storage` or `IMergeTreeDataPart::clearCaches` can observe a dangling `data_part->storage`.
- `MergeTreeReadPoolBase` must retain the storage referenced by `MergeTreeData::SnapshotData::storage` or the parts themselves, not blindly `storage_snapshot->storage`, because wrappers such as `StorageFromMergeTreeProjection` are not the same object as `data_part->storage`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageFromMergeTreeProjection has parent_storage which points to the MergeTreeData

namespace
{

ConstStoragePtr getOwnedDataStorage(const StorageSnapshotPtr & storage_snapshot, const RangesInDataParts * parts_ranges = nullptr)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks hacky, are you sure that we need this, after the fix for enable_shared_storage_snapshot_in_query == false?

Comment thread .claude/learnings.md Outdated
@@ -0,0 +1,2 @@
- `ReadFromMergeTree::initializePipeline` can replace `storage_snapshot->data` during `merge_tree_enable_remove_parts_from_snapshot_optimization`; when stripping `SnapshotData::parts`, preserve `SnapshotData::storage` or `IMergeTreeDataPart::clearCaches` can observe a dangling `data_part->storage`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks related

The `ReadFromMergeTree.cpp` fix already preserves `SnapshotData::storage` when
stripping the snapshot, so the storage is kept alive via `storage_snapshot` which
outlives `parts_ranges` and `per_part_infos` due to member declaration order.
The extra `owned_data_storage` member is redundant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Ok. Now the change is minimal. And it will be amazing if we finally nail it!

alexey-milovidov and others added 2 commits March 17, 2026 00:19
The test used `today()` for the non-expired row, but with randomized
`session_timezone` settings (e.g. UTC-7), the Date value stored can be
yesterday in UTC terms, making the TTL `age + 1 day` expire when the
server evaluates it in UTC. Use a far-future date instead.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99483&sha=bc5fcd5658a549c3eb59b2d9c27fb4b9b2fc966c&name_0=PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 24.50% 24.50% +0.00%
Branches 76.60% 76.70% +0.10%

PR changed lines: PR changed-lines coverage: 100.00% (8/8, 0 noise lines excluded)
Diff coverage report
Uncovered code

@alexey-milovidov alexey-milovidov merged commit 9f827a6 into master Mar 26, 2026
151 of 152 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-heap-use-after-free-merge-tree-read-task branch March 26, 2026 22:55
@robot-clickhouse robot-clickhouse added pr-synced-to-cloud The PR is synced to the cloud repo pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Mar 26, 2026
robot-clickhouse added a commit that referenced this pull request Mar 26, 2026
robot-clickhouse added a commit that referenced this pull request Mar 26, 2026
robot-clickhouse added a commit that referenced this pull request Mar 26, 2026
robot-clickhouse added a commit that referenced this pull request Mar 26, 2026
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Mar 27, 2026
clickhouse-gh Bot added a commit that referenced this pull request Mar 27, 2026
Backport #99483 to 26.3: Fix heap-use-after-free in MergeTreeReadTask::createReaders
nikitamikhaylov added a commit that referenced this pull request Mar 31, 2026
Backport #99483 to 26.1: Fix heap-use-after-free in MergeTreeReadTask::createReaders
nikitamikhaylov added a commit that referenced this pull request Mar 31, 2026
Backport #99483 to 26.2: Fix heap-use-after-free in MergeTreeReadTask::createReaders
ilejn pushed a commit to Altinity/ClickHouse that referenced this pull request May 27, 2026
…r-free-merge-tree-read-task

Fix heap-use-after-free in MergeTreeReadTask::createReaders

Signed-off-by: Ilya Golshtein <igolshtein@altinity.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants