Skip to content

Use information from cgroup (if applicable) to adjust memory tracker#83981

Merged
azat merged 1 commit intoClickHouse:masterfrom
azat:memory-worker-rss
Jul 19, 2025
Merged

Use information from cgroup (if applicable) to adjust memory tracker#83981
azat merged 1 commit intoClickHouse:masterfrom
azat:memory-worker-rss

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jul 18, 2025

Note, even though it is marked as improvement it can be considered as bug fix

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Use information from cgroup (if applicable, i.e. memory_worker_use_cgroup and cgroups are available) to adjust memory tracker (memory_worker_correct_memory_tracker)

Right now memory_worker_correct_memory_tracker always uses information from jemalloc to update the MemoryTracking, but, this may be not good enough in some cases (i.e. when server requested more memory then it use) and may lead to MEMORY_LIMIT_EXCEEDED errors with MemoryTracking > RSS.

So use information from cgroup if applicable, and if not, use information from jemalloc.allocated.

Refs: #82036

Right now `memory_worker_correct_memory_tracker` always uses information
from jemalloc to update the `MemoryTracking`, but, this may be not good
enough in some cases (i.e. when server requested more memory then it
use) and may lead to `MEMORY_LIMIT_EXCEEDED` errors with
`MemoryTracking` > `RSS`.

So use information from cgroup if applicable, and if not, use
information from `jemalloc.allocated`.
@azat azat requested a review from antonio2368 July 18, 2025 13:46
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jul 18, 2025

Workflow [PR], commit [80888be]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, 2/2) failure
Start ClickHouse Server failure
Stateless tests (amd_tsan, 3/3) failure
00900_long_parquet FAIL
Stress test (amd_tsan) failure
Server died FAIL
Hung check failed, possible deadlock found (see hung_check.log) FAIL
Cannot start clickhouse-server FAIL
Server failed to start (see application_errors.txt and clickhouse-server.clean.log) FAIL
Killed by signal (in clickhouse-server.log) FAIL
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL
Killed by signal (output files) FAIL
Found signal in gdb.log FAIL

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Jul 18, 2025
@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 19, 2025

@azat azat enabled auto-merge July 19, 2025 07:08
@azat azat added this pull request to the merge queue Jul 19, 2025
@azat azat added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Jul 19, 2025
Merged via the queue into ClickHouse:master with commit 350c75b Jul 19, 2025
118 of 125 checks passed
@azat azat deleted the memory-worker-rss branch July 19, 2025 07:25
@robot-ch-test-poll1 robot-ch-test-poll1 added pr-synced-to-cloud The PR is synced to the cloud repo pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Jul 19, 2025
robot-ch-test-poll added a commit that referenced this pull request Jul 19, 2025
Cherry pick #83981 to 25.3: Use information from cgroup (if applicable) to adjust memory tracker
robot-clickhouse added a commit that referenced this pull request Jul 19, 2025
robot-ch-test-poll added a commit that referenced this pull request Jul 19, 2025
Cherry pick #83981 to 25.4: Use information from cgroup (if applicable) to adjust memory tracker
robot-clickhouse added a commit that referenced this pull request Jul 19, 2025
robot-ch-test-poll added a commit that referenced this pull request Jul 19, 2025
Cherry pick #83981 to 25.5: Use information from cgroup (if applicable) to adjust memory tracker
robot-clickhouse added a commit that referenced this pull request Jul 19, 2025
robot-ch-test-poll added a commit that referenced this pull request Jul 19, 2025
Cherry pick #83981 to 25.6: Use information from cgroup (if applicable) to adjust memory tracker
robot-clickhouse added a commit that referenced this pull request Jul 19, 2025
robot-ch-test-poll added a commit that referenced this pull request Jul 19, 2025
Cherry pick #83981 to 25.7: Use information from cgroup (if applicable) to adjust memory tracker
robot-clickhouse added a commit that referenced this pull request Jul 19, 2025
clickhouse-gh bot added a commit that referenced this pull request Jul 19, 2025
Backport #83981 to 25.7: Use information from cgroup (if applicable) to adjust memory tracker
clickhouse-gh bot added a commit that referenced this pull request Jul 19, 2025
Backport #83981 to 25.6: Use information from cgroup (if applicable) to adjust memory tracker
azat added a commit that referenced this pull request Jul 19, 2025
Backport #83981 to 25.5: Use information from cgroup (if applicable) to adjust memory tracker
azat added a commit that referenced this pull request Jul 19, 2025
Backport #83981 to 25.4: Use information from cgroup (if applicable) to adjust memory tracker
azat added a commit that referenced this pull request Jul 19, 2025
Backport #83981 to 25.3: Use information from cgroup (if applicable) to adjust memory tracker
@robot-ch-test-poll robot-ch-test-poll added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jul 19, 2025
azat added a commit to azat/ClickHouse that referenced this pull request Mar 27, 2026
The `CgroupsV2Reader` sums `anon + sock + kernel` from `memory.stat`
to feed `MemoryTracking`. But `kernel` includes `slab_reclaimable`
(dentry/inode cache) — a filesystem metadata cache the kernel drops
under memory pressure, functionally equivalent to page cache.

On one of instances I can see that it can take significant portion of
memory:

  Process RSS:      8.47 GiB
  anon:             7.75 GiB
  kernel:           4.22 GiB  (of which slab_reclaimable = 3.85 GiB)
  MemoryTracking:  11.99 GiB  (inflated ~42% vs real RSS)

The 3.85 GiB of reclaimable slab was dominated by `ext4_inode_cache`
(with 56% usage only).

This inflated tracker causes premature `MEMORY_LIMIT_EXCEEDED` errors.

Cgroups v1 is not affected — its `rss` field excludes kernel memory.

Refs: ClickHouse#82036, ClickHouse#83981
azat added a commit to azat/ClickHouse that referenced this pull request Mar 27, 2026
The `CgroupsV2Reader` sums `anon + sock + kernel` from `memory.stat`
to feed `MemoryTracking`. But `kernel` includes `slab_reclaimable`
(dentry/inode cache) — a filesystem metadata cache the kernel drops
under memory pressure, functionally equivalent to page cache.

On one of instances I can see that it can take significant portion of
memory:

  Process RSS:      8.47 GiB
  anon:             7.75 GiB
  kernel:           4.22 GiB  (of which slab_reclaimable = 3.85 GiB)
  MemoryTracking:  11.99 GiB  (inflated ~42% vs real RSS)

The 3.85 GiB of reclaimable slab was dominated by `ext4_inode_cache`
(with 56% usage only).

This inflated tracker causes premature `MEMORY_LIMIT_EXCEEDED` errors.

Cgroups v1 is not affected — its `rss` field excludes kernel memory.

Refs: ClickHouse#82036, ClickHouse#83981
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-improvement Pull request with some product improvements pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants