Skip to content

Fix possible overestimate memory tracking#73081

Merged
Algunenano merged 2 commits intoClickHouse:masterfrom
azat:mm-fix-accounting
Dec 11, 2024
Merged

Fix possible overestimate memory tracking#73081
Algunenano merged 2 commits intoClickHouse:masterfrom
azat:mm-fix-accounting

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Dec 10, 2024

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix possible overestimate memory tracking (difference between MemoryTracking and MemoryResident kept growing)

Before this patch the untracked_memory was dumped before reseting the current_thread which could lead to leaking some of untracked_memory in ThreadStatus::detachFromGroup 1.

In particular I found that the local_data (which is 136 bytes) can be "leaked" that way.

This pops up on one of clusters where I noticed that the difference between MemoryTracking and MemoryResident keeps growing over the time. the difference grows ~200KiB per second - 16.5GiB per day.

That cluster has tons MVs triggered from Buffer/Distributed (i.e. no query context), and this is indeed is the reason for leaking local_data, since in this case there is no thread group in the query, and it is created explicitly in InterpreterInsertQuery::buildChain 2.

Fixes: #71511
Fixes: #71906
Example of stacktrace: https://gist.github.com/azat/96a015046ebe72ddcb59bc89ca91a20e

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-bugfix Pull request with bugfix, not backported by default label Dec 10, 2024
@azat azat force-pushed the mm-fix-accounting branch from 88bf437 to f7bd625 Compare December 10, 2024 16:51
@robot-clickhouse-ci-1
Copy link
Copy Markdown
Contributor

robot-clickhouse-ci-1 commented Dec 10, 2024

This is an automated comment for commit fb7cd9f with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success

Before this patch the untracked_memory was dumped before reseting the
current_thread which could lead to leaking some of untracked_memory in
`ThreadStatus::detachFromGroup` [1].

In particular I found that the local_data (which is 136 bytes) can be
"leaked" that way.

  [1]: https://github.com/ClickHouse/ClickHouse/blob/8999fdd95ea98fa083133d5cdf64ee149cd7f864/src/Interpreters/ThreadStatusExt.cpp#L292-L306

This pops up on one of clusters where I noticed that the difference
between MemoryTracking and MemoryResident keeps growing over the time.
the difference grows ~200KiB per second - 16.5GiB per day.

That cluster has tons MVs triggered from Buffer/Distributed (i.e. now
query context), and this is indeed is the reason for leaking local_data,
since in this case there is no thread group in the query, and it is
created explicitly in `InterpreterInsertQuery::buildChain` [1].

  [1]: https://github.com/ClickHouse/ClickHouse/blob/20ccb638ba00103f116855afbc473950ef2b867b/src/Interpreters/InterpreterInsertQuery.cpp#L321

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@Algunenano Algunenano self-assigned this Dec 11, 2024
@Algunenano Algunenano added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Dec 11, 2024
@Algunenano Algunenano enabled auto-merge December 11, 2024 13:10
@Algunenano Algunenano added this pull request to the merge queue Dec 11, 2024
Merged via the queue into ClickHouse:master with commit 8e4bd54 Dec 11, 2024
robot-clickhouse added a commit that referenced this pull request Dec 11, 2024
Cherry pick #73081 to 24.3: Fix possible overestimate memory tracking
robot-clickhouse added a commit that referenced this pull request Dec 11, 2024
Cherry pick #73081 to 24.8: Fix possible overestimate memory tracking
robot-clickhouse added a commit that referenced this pull request Dec 11, 2024
Cherry pick #73081 to 24.9: Fix possible overestimate memory tracking
robot-clickhouse added a commit that referenced this pull request Dec 11, 2024
Cherry pick #73081 to 24.10: Fix possible overestimate memory tracking
robot-clickhouse added a commit that referenced this pull request Dec 11, 2024
Cherry pick #73081 to 24.11: Fix possible overestimate memory tracking
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 11, 2024
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Dec 11, 2024
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created-cloud deprecated label, NOOP label Dec 11, 2024
robot-clickhouse-ci-1 added a commit that referenced this pull request Dec 11, 2024
Backport #73081 to 24.9: Fix possible overestimate memory tracking
robot-ch-test-poll1 added a commit that referenced this pull request Dec 11, 2024
Backport #73081 to 24.11: Fix possible overestimate memory tracking
Algunenano added a commit that referenced this pull request Dec 12, 2024
Backport #73081 to 24.8: Fix possible overestimate memory tracking
Algunenano added a commit that referenced this pull request Dec 12, 2024
Backport #73081 to 24.10: Fix possible overestimate memory tracking
Algunenano added a commit that referenced this pull request Dec 12, 2024
Backport #73081 to 24.3: Fix possible overestimate memory tracking
@azat azat deleted the mm-fix-accounting branch December 12, 2024 15:32
@azat azat mentioned this pull request Jun 21, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-bugfix Pull request with bugfix, not backported by default pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server memory usage increases gradually after upgrading from 24.7 to 24.10. Wrong MemoryTracking values prevents merges

7 participants