Skip to content

Extend zmalloc_purge to also trim the libc main arena#3640

Merged
enjoy-binbin merged 6 commits into
valkey-io:unstablefrom
webbsssss:fix-issue-3517-libc-heap-trim
Jun 22, 2026
Merged

Extend zmalloc_purge to also trim the libc main arena#3640
enjoy-binbin merged 6 commits into
valkey-io:unstablefrom
webbsssss:fix-issue-3517-libc-heap-trim

Conversation

@webbsssss

Copy link
Copy Markdown
Contributor

Summary

This PR makes MEMORY PURGE also release free pages from the libc main arena
(the process [heap] segment) back to the OS, in addition to purging
jemalloc dirty pages. It also extends the memory doctor diagnostic for high
process-RSS overhead and updates the MEMORY HELP text.

Fixes #3517.

Why

When jemalloc is built with a prefix (the default for the bundled copy
shipped under deps/jemalloc, configured with
--with-jemalloc-prefix=je_ in deps/Makefile), only Valkey's own
zmalloc/zfree calls reach jemalloc. Any allocation that goes through
plain malloc(3) — most notably libc-internal allocations made by glibc
itself, e.g. from getaddrinfo(3), NSS lookups, pthread internals or
stdio — is served by libc malloc and lives in the main arena, growing the
program break (the [heap] segment) via sbrk(2).

That memory is invisible to jemalloc, and glibc rarely returns it to the
OS automatically: sbrk(2) can only shrink the heap from the top, so a
single live allocation near the top of the heap is enough to pin a multi-
gigabyte segment. Over long uptimes the [heap] segment can grow to many
gigabytes while jemalloc itself reports a healthy footprint, which shows
up as a large rss_overhead_bytes and a high mem_fragmentation_ratio in
INFO MEMORY.

This is exactly what is reported in #3517 on a Valkey 8.1.6 cluster master
with ~41 days of uptime:

Metric Value
used_memory 2.59 GiB
used_memory_rss 9.43 GiB
mem_fragmentation_ratio 3.64
allocator_frag_ratio 1.00 (jemalloc is healthy)
allocator_resident 2.64 GiB
rss_overhead_bytes 6.79 GiB

The submitted /proc/<pid>/smaps shows a single [heap] mapping of
Size: 7110580 kB (~6.78 GiB), which exactly accounts for the
rss_overhead.

The codebase already had a zlibc_trim() helper that wraps
malloc_trim(3) — the standard glibc primitive for releasing free pages
of the main arena back to the OS — but the helper was only invoked from
the Lua engine after lua_close (see src/modules/lua/engine_lua.c).
MEMORY PURGE only called jemalloc_purge(), leaving operators on glibc
with no built-in way to reclaim the libc heap and confused by the fact
that PURGE did nothing for the symptom they were observing.

What the change does

  1. MEMORY PURGE now also calls zlibc_trim() after jemalloc_purge().
    The two operations are independent, so the result code reported to the
    client is still driven by jemalloc_purge().
  2. The MEMORY HELP text for PURGE is updated to mention the libc main
    arena behavior on glibc.
  3. The memory doctor high_proc_rss diagnostic, which currently only
    blames Lua scripts or modules, now also mentions libc-internal
    allocations as a possible cause on glibc and suggests MEMORY PURGE.

The helper is already a no-op on non-glibc builds (gated on __GLIBC__
and !USE_LIBC inside zlibc_trim), so this change has no effect on
macOS, FreeBSD or musl-based Linux.

Verification

Built on Ubuntu 22.04 / glibc 2.35 / gcc 11.4. Ran the resulting
valkey-server under strace -f -e trace=brk,madvise to confirm that
MEMORY PURGE now drives malloc_trim(3):

[heap] size before activity: 135168 bytes
[heap] size after libc-heap-touching activity: 135168 bytes
--- MEMORY PURGE ---
OK
[heap] size after PURGE: 114688 bytes

10287 brk(0x5bbba6d19000)               = 0x5bbba6d19000
10287 madvise(0x750f8085f000, 4096,  MADV_DONTNEED) = 0
10287 madvise(0x750f80888000, 12288, MADV_DONTNEED) = 0
10287 madvise(0x750f8094b000, 12288, MADV_DONTNEED) = 0
10287 madvise(0x750f8081d000, 8192,  MADV_DONTNEED) = 0
10287 madvise(0x750f80865000, 16384, MADV_DONTNEED) = 0
10287 madvise(0x750f80828000, 24576, MADV_DONTNEED) = 0
10287 madvise(0x750f80810000, 49152, MADV_DONTNEED) = 0
10287 brk(0x5bbba6d14000)               = 0x5bbba6d14000

The madvise(MADV_DONTNEED) calls release physical pages of free
chunks; the final brk(<lower>) lowers the program break — exactly
what the reporter needs to reclaim those 6.8 GiB on demand. Activity in
this reproducer was generated by issuing many CLUSTER MEET commands
against bogus hostnames, which calls getaddrinfo(3) and grows the
libc heap.

The MEMORY HELP and MEMORY DOCTOR outputs were also exercised
manually and look correct.

Notes for reviewers

  • This intentionally keeps the change minimal and on-demand. A periodic
    invocation from serverCron was considered, but malloc_trim(0)
    walks all glibc arenas and locks them briefly, so I'd rather not
    change default cron behavior in the same patch. Happy to add it as a
    follow-up (gated by a config option) if maintainers prefer.
  • This should be safe to backport to 8.1.x and 8.0.x: it only adds a
    call that is a no-op on non-glibc and only does work when there are
    free pages to release.
  • No new tests are added because the existing tests/ suite does not
    cover MEMORY PURGE and a deterministic test for malloc_trim
    side effects would depend on libc-internal state. I am happy to add a
    smoke test that just exercises the new code path if requested.

@rainsupreme rainsupreme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty safe and reasonable. Thanks for the improvement!

edit: do add DCO signoff though - take a look at the automated checks

When jemalloc is built with a prefix (the default for the bundled copy
shipped under deps/jemalloc), libc-internal allocations made by glibc
itself - for example from getaddrinfo(3), NSS lookups, pthread internals
or stdio - are served by libc malloc rather than jemalloc and live in
the main arena, growing the program break (the [heap] segment) via
sbrk(2). That memory is invisible to jemalloc and is rarely returned to
the OS automatically: a single live allocation near the top of the heap
prevents glibc from shrinking the program break, so over long uptimes
the [heap] segment can grow to many gigabytes while jemalloc itself
reports a healthy footprint.

This is what is reported in valkey-io#3517: a Valkey 8.1.6 cluster master with
~41 days of uptime where used_memory is ~2.6 GiB but used_memory_rss is
~9.4 GiB. jemalloc shows allocator_frag_ratio = 1.00 and is fully
healthy; /proc/<pid>/smaps shows the [heap] segment grown to ~6.8 GiB,
which exactly accounts for the rss_overhead.

The codebase already had a zlibc_trim() helper that wraps malloc_trim(3)
- the standard glibc primitive for releasing free pages of the main
arena back to the OS - but the helper was only invoked from the Lua
engine after lua_close. MEMORY PURGE only called jemalloc_purge(),
leaving operators on glibc with no built-in way to reclaim the libc
heap and confused by the fact that PURGE did nothing for the symptom
they were observing.

Wire zlibc_trim() into MEMORY PURGE so operators can reclaim the libc
main arena on demand alongside jemalloc dirty pages, update the help
text, and extend the memory doctor diagnostic for high process-RSS
overhead to mention libc-internal allocations as a possible cause on
glibc systems and to point at MEMORY PURGE.

The helper is already a no-op on non-glibc builds, so this change has
no effect on macOS, FreeBSD or musl-based Linux. Verified locally by
running valkey-server under strace -e brk,madvise: MEMORY PURGE now
triggers malloc_trim, which issues madvise(MADV_DONTNEED) on free
chunks and calls brk() to lower the program break, shrinking the
[heap] segment as expected.

Fixes valkey-io#3517

Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
@webbsssss webbsssss force-pushed the fix-issue-3517-libc-heap-trim branch from b081f09 to 444c348 Compare May 8, 2026 09:37

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me, thanks!
@bpint would you be able to take a look?

@bpint

bpint commented May 11, 2026

Copy link
Copy Markdown
Contributor

Make sense to me, thanks! @bpint would you be able to take a look?

thank you guys very much for digging into this. I agree with the PR since I cannot see any harm.

However, i must apologize for forgetting to report our progress with strace in our environment. we found abnormal brk() calls that are not caused by valkey itself, but by our modifications. So it was very likely a memory leak bug in our own codes instead of valkey oss.

Since I do not consider it as a valkey bug, i will close issue #3517 to avoid any feature misleading. Sorry for the trouble.

@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.73%. Comparing base (418c98b) to head (6f44d5f).
⚠️ Report is 76 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3640      +/-   ##
============================================
+ Coverage     76.66%   76.73%   +0.07%     
============================================
  Files           162      162              
  Lines         80647    81021     +374     
============================================
+ Hits          61825    62175     +350     
- Misses        18822    18846      +24     
Files with missing lines Coverage Δ
src/db.c 94.99% <ø> (+0.42%) ⬆️
src/object.c 91.38% <100.00%> (-1.26%) ⬇️
src/zmalloc.c 85.60% <100.00%> (+0.99%) ⬆️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin enjoy-binbin requested a review from zuiderkwast May 11, 2026 04:43

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I wonder why zlibc_trim() is defined in zmalloc.c and declared in zmalloc.h but we don't call it anywhere in the repo. Maybe it was used in the past but there was a problem with it? 🤔

(...)

I see it was added by @enjoy-binbin in a7abc2f and it was called from freeLuaScriptsSync() but this seems to have become unused when in #2858 when Lua was moved to a separate module. It now calls malloc_trim() directly instead.

Comment thread src/object.c Outdated
Address review feedback from @zuiderkwast on valkey-io#3640: instead of calling
zlibc_trim() explicitly from the MEMORY PURGE handler, fold it into
jemalloc_purge() itself. That way the libc main arena is also trimmed
automatically when jemalloc_purge() is invoked from other code paths,
which today means the synchronous variants of FLUSHDB and FLUSHALL in
src/db.c.

This also fixes the historical oddity that @zuiderkwast pointed out:
zlibc_trim() was added in a7abc2f to be called from freeLuaScriptsSync(),
but became dead code when valkey-io#2858 moved Lua to a separate module that calls
malloc_trim(3) directly. zlibc_trim() is now reachable again through the
single jemalloc_purge() entry point.

The async paths (lazyfree-lazy-user-flush=yes, EMPTYDB_ASYNC) intentionally
do not call jemalloc_purge() so they continue to skip the libc trim too -
that is correct: malloc_trim(3) briefly locks every glibc arena and we
don't want to do that on the main thread when the user explicitly asked
for an async flush.

Verified locally on Ubuntu 22.04 / glibc 2.35:

  MEMORY PURGE    : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHALL SYNC   : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHDB SYNC    : [heap] 135168 -> 114688 bytes (-20480)
  FLUSHDB (async) : [heap] 135168 -> 135168 bytes (unchanged, as intended)

The behaviour described by the MEMORY PURGE help text and by the memory
doctor's high_proc_rss diagnostic is unchanged; both still mention the
libc main arena and direct users at MEMORY PURGE.

Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's better in zmalloc, and now I have another idea about the function name. See the comment below.

I also complained a little about too long code comments. I think you should trust the AI agent less and remove some of the text it is writing. :)

Comment thread src/zmalloc.c Outdated
Comment thread src/object.c Outdated
Comment thread src/zmalloc.c Outdated
Address the rest of @zuiderkwast's review on valkey-io#3640:

  - Rename jemalloc_purge() to zmallocTrim(). The name "jemalloc_purge"
    no longer fits now that the function also trims the libc main arena,
    and the new name matches the rest of the zmalloc.{c,h} abstraction.
    Updates the three call sites in src/db.c (x2) and src/object.c.

  - Consolidate the two #ifdef branches of the function into a single
    definition with #if defined(USE_JEMALLOC) around just the jemalloc-
    specific code, as sketched in the review. zlibc_trim() runs after
    the ifdef on every build.

  - Trim the comment above zlibc_trim(): keep the part explaining what
    libc-internal allocations are (getaddrinfo, NSS, pthread, stdio,
    modules) and why they end up in the [heap] segment, drop the part
    enumerating which commands call this code.

  - Drop the now-redundant call-site comment in the MEMORY PURGE handler.

Behaviour is unchanged. Locally re-verified MEMORY PURGE, FLUSHALL SYNC
and FLUSHDB SYNC each shrink the [heap] segment by the same amount as
before the rename.

Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
Comment thread src/zmalloc.h Outdated

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not block this PR. Thanks for this work!

@enjoy-binbin Will you merge it when you have time?

@zuiderkwast zuiderkwast moved this from Todo to Needs Review in Valkey 10 Jun 9, 2026
@github-project-automation github-project-automation Bot moved this to Todo in Valkey 10 Jun 9, 2026
Comment thread src/zmalloc.c Outdated
Rename zmallocTrim() to zmalloc_purge() (snake_case, matches the MEMORY PURGE command) across its declaration, definition and the call sites in db.c and object.c. Also shorten the zlibc_trim() comment.

Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

jemalloc_purge() is renamed to zmalloc_purge() in zmalloc.h and zmalloc.c. The new implementation unconditionally calls zlibc_trim() (via malloc_trim(3)) after the optional jemalloc arena purge. All call sites in db.c and object.c are updated, along with related help strings and the memory-doctor diagnostic message.

Changes

zmalloc_purge Refactor

Layer / File(s) Summary
zmalloc_purge() implementation and public header
src/zmalloc.h, src/zmalloc.c
Removes jemalloc_purge() from the USE_JEMALLOC block and introduces zmalloc_purge(), which runs jemalloc arena purging via je_mallctl when available and then unconditionally calls zlibc_trim(). Updates the comment on zlibc_trim() to clarify it applies even in jemalloc builds. Renames the public header declaration from jemalloc_purge to zmalloc_purge.
Call site updates and documentation
src/db.c, src/object.c
Switches the purge calls in flushAllDataAndResetRDB() and flushdbCommand() from jemalloc_purge() to zmalloc_purge(). Updates the MEMORY PURGE command handler and its help text to reference the new function and glibc malloc_trim(3) behavior. Revises the memory-doctor "High process RSS overhead" message to list glibc-specific causes (libc main arena retention via getaddrinfo(3), pthread/NSS internals).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the motivation, implementation details, verification, and implications of the changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main change: extending zmalloc_purge to also trim the libc main arena, which is the core purpose of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/db.c (1)

820-825: 🧹 Nitpick | 🔵 Trivial

The USE_JEMALLOC guards are overly conservative and may warrant reconsideration.

The zmalloc_purge() function is now defined unconditionally and always calls zlibc_trim() at the end, which trims the libc arena on glibc builds regardless of the jemalloc guard. The guards in db.c (lines 824, 849) prevent the function call entirely on non-jemalloc builds, despite zlibc_trim() providing value on glibc-only systems. Meanwhile, the unconditional call in object.c:1927 demonstrates the function is safe to call without guards. Consider removing the guards to allow libc arena trimming on all glibc builds during flush operations, or document why the conservative approach is intentional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/db.c` around lines 820 - 825, The `USE_JEMALLOC` preprocessor guards
around the `zmalloc_purge()` function call in the database flush operation are
overly conservative. Since `zmalloc_purge()` is now defined unconditionally and
safely calls `zlibc_trim()` for glibc arena trimming on all builds (as evidenced
by its unconditional use in object.c), remove the `#if defined(USE_JEMALLOC)`
and corresponding `#endif` directives that wrap the zmalloc_purge() call to
allow the function to execute on all platforms. This enables libc arena trimming
during flush operations on glibc-only systems where it provides value, not just
on jemalloc builds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/db.c`:
- Around line 820-825: The `USE_JEMALLOC` preprocessor guards around the
`zmalloc_purge()` function call in the database flush operation are overly
conservative. Since `zmalloc_purge()` is now defined unconditionally and safely
calls `zlibc_trim()` for glibc arena trimming on all builds (as evidenced by its
unconditional use in object.c), remove the `#if defined(USE_JEMALLOC)` and
corresponding `#endif` directives that wrap the zmalloc_purge() call to allow
the function to execute on all platforms. This enables libc arena trimming
during flush operations on glibc-only systems where it provides value, not just
on jemalloc builds.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c73eb716-1c81-453c-b727-f5f6dfd6a608

📥 Commits

Reviewing files that changed from the base of the PR and between 418c98b and 6f44d5f.

📒 Files selected for processing (4)
  • src/db.c
  • src/object.c
  • src/zmalloc.c
  • src/zmalloc.h

@enjoy-binbin enjoy-binbin changed the title Fix #3517: trim libc main arena from MEMORY PURGE Extend zmalloc_purge to also trim the libc main arena Jun 22, 2026
@enjoy-binbin enjoy-binbin merged commit bee30d4 into valkey-io:unstable Jun 22, 2026
64 checks passed
@zuiderkwast zuiderkwast moved this from Needs Review to Merged in Valkey 10 Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Merged

Development

Successfully merging this pull request may close these issues.

[BUG] High used_memory_rss and mem_fragmentation_ratio

5 participants