Skip to content

Disable enable_shared_storage_snapshot_in_query (leads to memory leaks)#90770

Merged
azat merged 1 commit into
ClickHouse:masterfrom
azat:disable-enable_shared_storage_snapshot_in_query
Nov 25, 2025
Merged

Disable enable_shared_storage_snapshot_in_query (leads to memory leaks)#90770
azat merged 1 commit into
ClickHouse:masterfrom
azat:disable-enable_shared_storage_snapshot_in_query

Conversation

@azat

@azat azat commented Nov 24, 2025

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Disable enable_shared_storage_snapshot_in_query (leads to memory leaks)

Some functions stores a copy of ContextPtr (not a weak_ptr as it should be, but shared_ptr), i.e. modulo, but this IFunction itself can be stored in KeyDescription, which is part of StorageInMemoryMetadata.

And with metadata cache (enable_shared_storage_snapshot_in_query=1) this will create memory leaks (since this Context will never be destroyed)

As @Michicosun has been told, this cache should never be part of Context (but we abuse it often).

(Another issue is IFunction holds a ContextPtr not a ContextWeakPtr, but this is another story)

Anyway, let's disable it for now (backported to 25.11 is a must).

Repro
drop table if exists test;
create table test (id Int, key String) engine=MergeTree order by id % 10;
insert into test select number, number::String from numbers(3);
alter table test modify comment 'new description';
-- to gracefully terminate the server
quit

And terminate the server - it will trigger same ASan report

Reverts: #82634
Refs: #79471 (cc @amosbird )

Some functions stores a copy of `ContextPtr` (not a `weak_ptr` as it
should be, but `shared_ptr`), i.e. [`modulo`](https://github.com/ClickHouse/ClickHouse/blob/407e33a47cf707155d1c4f3f12d2bb2291c80009/src/Functions/FunctionBinaryArithmetic.h#L827),
but this `IFunction` itself can be stored in `KeyDescription`, which is
part of `StorageInMemoryMetadata`.

And with metadata cache (`enable_shared_storage_snapshot_in_query=1`)
this will create memory leaks (since this `Context` will never be
destroyed)

As @Michicosun has been told, this cache should never be part of
`Context` (but we abuse it often).

(Another issue is `IFunction` holds a `ContextPtr` not a
`ContextWeakPtr`, but this is another story)

Anyway, let's disable it for now (backported to 25.11 is a must).
enable_scalar_subquery_optimization 1
enable_scopes_for_with_statement 1
enable_shared_storage_snapshot_in_query 1
enable_shared_storage_snapshot_in_query 0

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Algunenano hope this is OK

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rewriting the history for critical bugs is ok. The other option would be to set it to false, false in 25.11 and again in 25.12. But since 25.11 is unreleased it's better to just disable it

@clickhouse-gh

clickhouse-gh Bot commented Nov 24, 2025

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [6c3ae9b]

Summary:

job_name test_name status info comment
Bugfix validation (functional tests) failure
Tests failure
Integration tests (arm_binary, distributed plan, 2/4) failure
test_refreshable_mat_view/test.py::test_long_query_cancel FAIL cidb
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb
BuzzHouse (amd_ubsan) failure
/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/src/IO/VarInt.h:32:5: runtime error: store to null pointer of type 'char' FAIL cidb

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Nov 24, 2025
@alexey-milovidov

Copy link
Copy Markdown
Member

Let's ask @amosbird.

@alexey-milovidov

Copy link
Copy Markdown
Member

Interesting, what is the metadata cache, and how does the Context from queries appear to be shared there?

@azat

azat commented Nov 24, 2025

Copy link
Copy Markdown
Member Author

Interesting, what is the metadata cache

Maybe we don't need it at all, but StorageSnapshot (second cache) also has StorageMetadataPtr

and how does the Context from queries appear to be shared there?

This is also a bug in functions -

It should use WithContext, which is internally ContextWeakPtr

But this is not the only example (UDF AFAICS also has some issues)

So no need to hurry here, let's disable it by default, and fix all the problems

@Algunenano Algunenano self-assigned this Nov 24, 2025
@azat azat enabled auto-merge November 25, 2025 06:56
@azat azat added this pull request to the merge queue Nov 25, 2025
Merged via the queue into ClickHouse:master with commit dd868b0 Nov 25, 2025
125 of 130 checks passed
@azat azat deleted the disable-enable_shared_storage_snapshot_in_query branch November 25, 2025 07:19
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 25, 2025
@amosbird

Copy link
Copy Markdown
Collaborator

Interesting, what is the metadata cache

Some parts of the analyzer use storage_snapshot->metadata, while others use getInMemoryMetadataPtr(). We also pass getInMemoryMetadataPtr() as an argument to getStorageSnapshot(). To make getInMemoryMetadataPtr() consistent across query, storage_metadata_cache was introduced, serving the same role as storage_snapshot_cache.

Maybe we don't need it at all, but StorageSnapshot (second cache) also has StorageMetadataPtr

It's interesting that without storage_metadata_cache, I cannot reproduce the issue.

@azat

azat commented Nov 25, 2025

Copy link
Copy Markdown
Member Author

It's interesting that without storage_metadata_cache, I cannot reproduce the issue.

+1, but maybe it is just pure luck, or maybe it uses correct context, but anyway, it is pure luck

@amosbird

Copy link
Copy Markdown
Collaborator

it is pure luck

Indeed. Fortunately, the storage_metadata_cache was added; otherwise, this issue could have gone unnoticed for a very long time.

robot-clickhouse added a commit that referenced this pull request Nov 25, 2025
@robot-clickhouse robot-clickhouse added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Nov 26, 2025
clickhouse-gh Bot added a commit that referenced this pull request Nov 26, 2025
Backport #90770 to 25.11: Disable enable_shared_storage_snapshot_in_query (leads to memory leaks)
@rschu1ze

rschu1ze commented Dec 1, 2025

Copy link
Copy Markdown
Member

@azat I think an issue to re-enable the setting will be good to have (just to not forget).

@azat

azat commented Dec 1, 2025

Copy link
Copy Markdown
Member Author

There is a private issue (see referenced issues in this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-bugfix Pull request with bugfix, not backported by default pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants