Skip to content

Subtract slab_reclaimable from kernel memory in cgroupv2 reader#100901

Merged
antonio2368 merged 3 commits intomasterfrom
cgroups-subtract-slab-reclaimable
Apr 2, 2026
Merged

Subtract slab_reclaimable from kernel memory in cgroupv2 reader#100901
antonio2368 merged 3 commits intomasterfrom
cgroups-subtract-slab-reclaimable

Conversation

@antonio2368
Copy link
Copy Markdown
Member

@antonio2368 antonio2368 commented Mar 27, 2026

The cgroupv2 memory usage calculation now uses anon + sock + (kernel - slab_reclaimable) instead of anon + sock + kernel. The slab_reclaimable portion is reclaimed synchronously by the kernel under memory pressure before invoking the OOM killer, so it should not count against the application's memory budget. This gives a more accurate picture of actual memory pressure.

On one of instances @azat saw that it can take significant portion of memory:

Process RSS: 8.47 GiB
anon: 7.75 GiB
kernel: 4.22 GiB (of which slab_reclaimable = 3.85 GiB)
MemoryTracking: 11.99 GiB (inflated ~42% vs real RSS)

The 3.85 GiB of reclaimable slab was dominated by ext4_inode_cache (with 56% usage only).

This inflated tracker causes premature MEMORY_LIMIT_EXCEEDED errors.

Cgroups v1 is not affected — its rss field excludes kernel memory.

Refs: #82036, #83981

Also refactors readMetricsFromStatFile to populate a caller-owned map of individual metric values instead of returning a sum, allowing callers to apply their own aggregation logic.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Cgroupv2 memory tracking now excludes slab_reclaimable from kernel memory, giving a more accurate measure of non-reclaimable memory usage.

cc @al13n321 as you added kernel for page cache.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

The cgroupv2 memory usage calculation now uses `anon + sock + (kernel - slab_reclaimable)`
instead of `anon + sock + kernel`. The `slab_reclaimable` portion is reclaimed synchronously
by the kernel under memory pressure before invoking the OOM killer, so it should not count
against the application's memory budget.

Also refactors `readMetricsFromStatFile` to populate a caller-owned map of individual metric
values instead of returning a sum, allowing callers to apply their own aggregation logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 27, 2026

Workflow [PR], commit [ea7a8e3]

Summary:

job_name test_name status info comment
Stress test (arm_ubsan) failure
UndefinedBehaviorSanitizer: undefined behavior (STID: 7218-6809) FAIL cidb
Unit tests (asan_ubsan) error

AI Review

Summary

This PR updates cgroup v2 memory accounting to compute usage as anon + sock + max(kernel - slab_reclaimable, 0), and refactors stat parsing to reuse a caller-owned metrics map instead of summing directly. The implementation is consistent, thread-safe under existing locking, and covered by tests for both normal v2 data and older kernels without kernel/slab_reclaimable. I did not find correctness, safety, or performance issues that should block merge.

Missing context
  • ⚠️ Full CI results are not available yet (Fast test and the aggregate PR check are still pending), so this review is based on code inspection and currently available checks.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Mar 27, 2026
Use `const auto *` instead of `auto` for the iterator over
`std::initializer_list<std::string_view>`, matching the existing
style on line 88.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100901&sha=b4ef919a6193782e42a3c463a9bc0a43c632ebb0&name_0=PR&name_1=Build%20%28arm_tidy%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nikitamikhaylov
Copy link
Copy Markdown
Member

@azat was first #100898

@antonio2368
Copy link
Copy Markdown
Member Author

@azat my comment on your PR is this PR, see if some of the changes make more sense. I can close this PR.

Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this version

@azat azat self-assigned this Mar 27, 2026
stat_buf.rewind();
return readMetricsFromStatFile(stat_buf, {"anon", "sock", "kernel"}, {"kernel"}, &warnings_printed);
readMetricsFromStatFile(
stat_buf, metrics, {"anon", "sock", "kernel", "slab_reclaimable"}, {"kernel", "slab_reclaimable"}, &warnings_printed);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS slab_reclaimable was there from the beginning

Copy link
Copy Markdown
Member

@azat azat Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And TBH I don't think we need this separation for optional keys and mandatory keys, we already have flag to print warnings only once, so we can simply print it for any keys, that way we will also know that the kernel was old (or we have a bug)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right

All keys (`kernel`, `slab_reclaimable`) have been in cgroupv2
`memory.stat` from the beginning, so there is no need for an
optional/mandatory distinction. Warn once on any missing key —
this also surfaces old-kernel or buggy environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 30, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.20% 84.20% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.80% 76.70% -0.10%

Changed lines: 92.63% (88/95) · Uncovered code

Full report · Diff report

@azat azat added the memory When memory usage is higher than expected label Mar 30, 2026
@azat
Copy link
Copy Markdown
Member

azat commented Mar 30, 2026

@azat
Copy link
Copy Markdown
Member

azat commented Mar 30, 2026

@antonio2368 let's also include details of how it can affect (you can get them from my PR)

@antonio2368
Copy link
Copy Markdown
Member Author

@azat do you mean to update PR description or the comment in code?

@azat
Copy link
Copy Markdown
Member

azat commented Mar 31, 2026

PR description

@antonio2368 antonio2368 added this pull request to the merge queue Apr 2, 2026
@antonio2368 antonio2368 added the pr-must-backport Pull request should be backported intentionally. Use this label with great care! label Apr 2, 2026
Merged via the queue into master with commit 4319487 Apr 2, 2026
161 of 164 checks passed
@antonio2368 antonio2368 deleted the cgroups-subtract-slab-reclaimable branch April 2, 2026 07:48
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 2, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Apr 2, 2026
robot-clickhouse-ci-1 added a commit that referenced this pull request Apr 2, 2026
Cherry pick #100901 to 25.8: Subtract slab_reclaimable from kernel memory in cgroupv2 reader
robot-clickhouse added a commit that referenced this pull request Apr 2, 2026
robot-clickhouse-ci-1 added a commit that referenced this pull request Apr 2, 2026
Cherry pick #100901 to 26.1: Subtract slab_reclaimable from kernel memory in cgroupv2 reader
robot-clickhouse added a commit that referenced this pull request Apr 2, 2026
robot-clickhouse-ci-1 added a commit that referenced this pull request Apr 2, 2026
Cherry pick #100901 to 26.2: Subtract slab_reclaimable from kernel memory in cgroupv2 reader
robot-clickhouse added a commit that referenced this pull request Apr 2, 2026
robot-clickhouse-ci-1 added a commit that referenced this pull request Apr 2, 2026
Cherry pick #100901 to 26.3: Subtract slab_reclaimable from kernel memory in cgroupv2 reader
robot-clickhouse added a commit that referenced this pull request Apr 2, 2026
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Apr 2, 2026
clickhouse-gh bot added a commit that referenced this pull request Apr 2, 2026
Backport #100901 to 26.3: Subtract slab_reclaimable from kernel memory in cgroupv2 reader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory When memory usage is higher than expected pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-improvement Pull request with some product improvements pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants