Extend zmalloc_purge to also trim the libc main arena#3640
Conversation
When jemalloc is built with a prefix (the default for the bundled copy shipped under deps/jemalloc), libc-internal allocations made by glibc itself - for example from getaddrinfo(3), NSS lookups, pthread internals or stdio - are served by libc malloc rather than jemalloc and live in the main arena, growing the program break (the [heap] segment) via sbrk(2). That memory is invisible to jemalloc and is rarely returned to the OS automatically: a single live allocation near the top of the heap prevents glibc from shrinking the program break, so over long uptimes the [heap] segment can grow to many gigabytes while jemalloc itself reports a healthy footprint. This is what is reported in valkey-io#3517: a Valkey 8.1.6 cluster master with ~41 days of uptime where used_memory is ~2.6 GiB but used_memory_rss is ~9.4 GiB. jemalloc shows allocator_frag_ratio = 1.00 and is fully healthy; /proc/<pid>/smaps shows the [heap] segment grown to ~6.8 GiB, which exactly accounts for the rss_overhead. The codebase already had a zlibc_trim() helper that wraps malloc_trim(3) - the standard glibc primitive for releasing free pages of the main arena back to the OS - but the helper was only invoked from the Lua engine after lua_close. MEMORY PURGE only called jemalloc_purge(), leaving operators on glibc with no built-in way to reclaim the libc heap and confused by the fact that PURGE did nothing for the symptom they were observing. Wire zlibc_trim() into MEMORY PURGE so operators can reclaim the libc main arena on demand alongside jemalloc dirty pages, update the help text, and extend the memory doctor diagnostic for high process-RSS overhead to mention libc-internal allocations as a possible cause on glibc systems and to point at MEMORY PURGE. The helper is already a no-op on non-glibc builds, so this change has no effect on macOS, FreeBSD or musl-based Linux. Verified locally by running valkey-server under strace -e brk,madvise: MEMORY PURGE now triggers malloc_trim, which issues madvise(MADV_DONTNEED) on free chunks and calls brk() to lower the program break, shrinking the [heap] segment as expected. Fixes valkey-io#3517 Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
b081f09 to
444c348
Compare
enjoy-binbin
left a comment
There was a problem hiding this comment.
Make sense to me, thanks!
@bpint would you be able to take a look?
thank you guys very much for digging into this. I agree with the PR since I cannot see any harm. However, i must apologize for forgetting to report our progress with strace in our environment. we found abnormal brk() calls that are not caused by valkey itself, but by our modifications. So it was very likely a memory leak bug in our own codes instead of valkey oss. Since I do not consider it as a valkey bug, i will close issue #3517 to avoid any feature misleading. Sorry for the trouble. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3640 +/- ##
============================================
+ Coverage 76.66% 76.73% +0.07%
============================================
Files 162 162
Lines 80647 81021 +374
============================================
+ Hits 61825 62175 +350
- Misses 18822 18846 +24
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Looks good, but I wonder why zlibc_trim() is defined in zmalloc.c and declared in zmalloc.h but we don't call it anywhere in the repo. Maybe it was used in the past but there was a problem with it? 🤔
(...)
I see it was added by @enjoy-binbin in a7abc2f and it was called from freeLuaScriptsSync() but this seems to have become unused when in #2858 when Lua was moved to a separate module. It now calls malloc_trim() directly instead.
Address review feedback from @zuiderkwast on valkey-io#3640: instead of calling zlibc_trim() explicitly from the MEMORY PURGE handler, fold it into jemalloc_purge() itself. That way the libc main arena is also trimmed automatically when jemalloc_purge() is invoked from other code paths, which today means the synchronous variants of FLUSHDB and FLUSHALL in src/db.c. This also fixes the historical oddity that @zuiderkwast pointed out: zlibc_trim() was added in a7abc2f to be called from freeLuaScriptsSync(), but became dead code when valkey-io#2858 moved Lua to a separate module that calls malloc_trim(3) directly. zlibc_trim() is now reachable again through the single jemalloc_purge() entry point. The async paths (lazyfree-lazy-user-flush=yes, EMPTYDB_ASYNC) intentionally do not call jemalloc_purge() so they continue to skip the libc trim too - that is correct: malloc_trim(3) briefly locks every glibc arena and we don't want to do that on the main thread when the user explicitly asked for an async flush. Verified locally on Ubuntu 22.04 / glibc 2.35: MEMORY PURGE : [heap] 135168 -> 114688 bytes (-20480) FLUSHALL SYNC : [heap] 135168 -> 114688 bytes (-20480) FLUSHDB SYNC : [heap] 135168 -> 114688 bytes (-20480) FLUSHDB (async) : [heap] 135168 -> 135168 bytes (unchanged, as intended) The behaviour described by the MEMORY PURGE help text and by the memory doctor's high_proc_rss diagnostic is unchanged; both still mention the libc main arena and direct users at MEMORY PURGE. Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
Yeah, it's better in zmalloc, and now I have another idea about the function name. See the comment below.
I also complained a little about too long code comments. I think you should trust the AI agent less and remove some of the text it is writing. :)
Address the rest of @zuiderkwast's review on valkey-io#3640: - Rename jemalloc_purge() to zmallocTrim(). The name "jemalloc_purge" no longer fits now that the function also trims the libc main arena, and the new name matches the rest of the zmalloc.{c,h} abstraction. Updates the three call sites in src/db.c (x2) and src/object.c. - Consolidate the two #ifdef branches of the function into a single definition with #if defined(USE_JEMALLOC) around just the jemalloc- specific code, as sketched in the review. zlibc_trim() runs after the ifdef on every build. - Trim the comment above zlibc_trim(): keep the part explaining what libc-internal allocations are (getaddrinfo, NSS, pthread, stdio, modules) and why they end up in the [heap] segment, drop the part enumerating which commands call this code. - Drop the now-redundant call-site comment in the MEMORY PURGE handler. Behaviour is unchanged. Locally re-verified MEMORY PURGE, FLUSHALL SYNC and FLUSHDB SYNC each shrink the [heap] segment by the same amount as before the rename. Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
I will not block this PR. Thanks for this work!
@enjoy-binbin Will you merge it when you have time?
Rename zmallocTrim() to zmalloc_purge() (snake_case, matches the MEMORY PURGE command) across its declaration, definition and the call sites in db.c and object.c. Also shorten the zlibc_trim() comment. Signed-off-by: Vaibhav Gupta <wweebbssss@gmail.com>
📝 WalkthroughWalkthrough
Changeszmalloc_purge Refactor
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/db.c (1)
820-825: 🧹 Nitpick | 🔵 TrivialThe
USE_JEMALLOCguards are overly conservative and may warrant reconsideration.The
zmalloc_purge()function is now defined unconditionally and always callszlibc_trim()at the end, which trims the libc arena on glibc builds regardless of the jemalloc guard. The guards indb.c(lines 824, 849) prevent the function call entirely on non-jemalloc builds, despitezlibc_trim()providing value on glibc-only systems. Meanwhile, the unconditional call inobject.c:1927demonstrates the function is safe to call without guards. Consider removing the guards to allow libc arena trimming on all glibc builds during flush operations, or document why the conservative approach is intentional.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/db.c` around lines 820 - 825, The `USE_JEMALLOC` preprocessor guards around the `zmalloc_purge()` function call in the database flush operation are overly conservative. Since `zmalloc_purge()` is now defined unconditionally and safely calls `zlibc_trim()` for glibc arena trimming on all builds (as evidenced by its unconditional use in object.c), remove the `#if defined(USE_JEMALLOC)` and corresponding `#endif` directives that wrap the zmalloc_purge() call to allow the function to execute on all platforms. This enables libc arena trimming during flush operations on glibc-only systems where it provides value, not just on jemalloc builds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/db.c`:
- Around line 820-825: The `USE_JEMALLOC` preprocessor guards around the
`zmalloc_purge()` function call in the database flush operation are overly
conservative. Since `zmalloc_purge()` is now defined unconditionally and safely
calls `zlibc_trim()` for glibc arena trimming on all builds (as evidenced by its
unconditional use in object.c), remove the `#if defined(USE_JEMALLOC)` and
corresponding `#endif` directives that wrap the zmalloc_purge() call to allow
the function to execute on all platforms. This enables libc arena trimming
during flush operations on glibc-only systems where it provides value, not just
on jemalloc builds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c73eb716-1c81-453c-b727-f5f6dfd6a608
📒 Files selected for processing (4)
src/db.csrc/object.csrc/zmalloc.csrc/zmalloc.h
Summary
This PR makes
MEMORY PURGEalso release free pages from the libc main arena(the process
[heap]segment) back to the OS, in addition to purgingjemalloc dirty pages. It also extends the memory doctor diagnostic for high
process-RSS overhead and updates the
MEMORY HELPtext.Fixes #3517.
Why
When jemalloc is built with a prefix (the default for the bundled copy
shipped under
deps/jemalloc, configured with--with-jemalloc-prefix=je_indeps/Makefile), only Valkey's ownzmalloc/zfreecalls reach jemalloc. Any allocation that goes throughplain
malloc(3)— most notably libc-internal allocations made by glibcitself, e.g. from
getaddrinfo(3), NSS lookups, pthread internals orstdio — is served by libc malloc and lives in the main arena, growing the
program break (the
[heap]segment) viasbrk(2).That memory is invisible to jemalloc, and glibc rarely returns it to the
OS automatically:
sbrk(2)can only shrink the heap from the top, so asingle live allocation near the top of the heap is enough to pin a multi-
gigabyte segment. Over long uptimes the
[heap]segment can grow to manygigabytes while jemalloc itself reports a healthy footprint, which shows
up as a large
rss_overhead_bytesand a highmem_fragmentation_ratioinINFO MEMORY.This is exactly what is reported in #3517 on a Valkey 8.1.6 cluster master
with ~41 days of uptime:
used_memoryused_memory_rssmem_fragmentation_ratioallocator_frag_ratioallocator_residentrss_overhead_bytesThe submitted
/proc/<pid>/smapsshows a single[heap]mapping ofSize: 7110580 kB(~6.78 GiB), which exactly accounts for therss_overhead.The codebase already had a
zlibc_trim()helper that wrapsmalloc_trim(3)— the standard glibc primitive for releasing free pagesof the main arena back to the OS — but the helper was only invoked from
the Lua engine after
lua_close(seesrc/modules/lua/engine_lua.c).MEMORY PURGEonly calledjemalloc_purge(), leaving operators on glibcwith no built-in way to reclaim the libc heap and confused by the fact
that PURGE did nothing for the symptom they were observing.
What the change does
MEMORY PURGEnow also callszlibc_trim()afterjemalloc_purge().The two operations are independent, so the result code reported to the
client is still driven by
jemalloc_purge().MEMORY HELPtext forPURGEis updated to mention the libc mainarena behavior on glibc.
high_proc_rssdiagnostic, which currently onlyblames Lua scripts or modules, now also mentions libc-internal
allocations as a possible cause on glibc and suggests
MEMORY PURGE.The helper is already a no-op on non-glibc builds (gated on
__GLIBC__and
!USE_LIBCinsidezlibc_trim), so this change has no effect onmacOS, FreeBSD or musl-based Linux.
Verification
Built on Ubuntu 22.04 / glibc 2.35 / gcc 11.4. Ran the resulting
valkey-serverunderstrace -f -e trace=brk,madviseto confirm thatMEMORY PURGEnow drivesmalloc_trim(3):The
madvise(MADV_DONTNEED)calls release physical pages of freechunks; the final
brk(<lower>)lowers the program break — exactlywhat the reporter needs to reclaim those 6.8 GiB on demand. Activity in
this reproducer was generated by issuing many
CLUSTER MEETcommandsagainst bogus hostnames, which calls
getaddrinfo(3)and grows thelibc heap.
The
MEMORY HELPandMEMORY DOCTORoutputs were also exercisedmanually and look correct.
Notes for reviewers
invocation from
serverCronwas considered, butmalloc_trim(0)walks all glibc arenas and locks them briefly, so I'd rather not
change default cron behavior in the same patch. Happy to add it as a
follow-up (gated by a config option) if maintainers prefer.
call that is a no-op on non-glibc and only does work when there are
free pages to release.
tests/suite does notcover
MEMORY PURGEand a deterministic test formalloc_trimside effects would depend on libc-internal state. I am happy to add a
smoke test that just exercises the new code path if requested.