TUI Native Memory Leak - RSS grows to 13+ GB after ~40 min active usage
Summary
Since the Apr 23–24 update (commits bd929ea5 and 67bfd4b8), the TUI frontend (Node.js Ink renderer) leaks native memory at an alarming rate. The JS heap stays bounded at ~2.7 GB while RSS climbs to 13+ GB, causing the process to freeze and get killed via SIGTERM within ~1 hour of active streaming.
The gateway backend remains healthy (~98 MB) throughout. Auto-heap-dump triggers are miscalibrated because they measure JS heapUsed, which is much smaller than the actual RSS leak.
Environment
- Hermes version:
v0.11.0 / commit 34c3e671 (Apr 24 hotfix)
- Base commit:
bf196a3f (v0.11.0 tag)
- OS: Fedora Linux 41 (Wayland, KDE)
- Node.js: v22.22.0
- Display: TUI (
hermes --tui)
- Config defaults:
thinking: expanded, tools: expanded, activity: hidden (from 67bfd4b8)
Reproduction Steps
- Start Hermes TUI:
- Engage in normal streaming conversation with tool calls and reasoning blocks
- Leave
thinking and tools sections expanded (default since Apr 24)
- Observe RSS every 30 s:
watch -n 5 'ps -o pid,rss,vsz,comm -p $(pgrep -f "ui-tui/dist/entry.js")'
Expected Behavior
RSS should stay under ~1 GB for indefinite usage. Occasional bump during large streaming payloads, but stable between turns.
Actual Behavior
| Phase |
Time |
RSS (MB) |
Notes |
| Start |
t+0 |
~157 |
Baseline |
| Idle/light |
~10 min |
~247 |
Slow growth |
| Active streaming |
~20–40 min |
~525 → 6,066 |
Accelerating |
| Peak |
~52 min |
13,978 |
Process unresponsive |
| Crash |
~53 min |
— |
SIGTERM, auto-restart with new PID |
Two confirmed crash cycles (same day)
| PID |
Start |
Peak RSS |
Duration before crash |
| 58836 (morning) |
— |
~9.4 GB |
~20 min |
| 76262 (afternoon) |
14:03 |
13.978 GB |
~53 min |
Diagnostic Evidence
Heap dump .diagnostics.json at peak (auto-critical)
{
"memoryUsage": {
"arrayBuffers": 768803,
"external": 21238571,
"heapTotal": 2798071808,
"heapUsed": 2728422704,
"rss": 9572970496
},
"memoryGrowthRate": {
"mbPerHour": 17684.5
}
}
RSS (9.5 GB) >> heapUsed (2.7 GB). The leak is entirely outside V8.
Process comparison at peak
PID RSS COMMAND
76262 13,978 MB node /.../ui-tui/dist/entry.js ← leaking
58744 98 MB python -m hermes_cli.main gateway run ← stable
Suspected Root Cause
Primary suspect: bd929ea5 — Ink text measurement cache
perf(ink): cache text measurements across yoga flex re-passes
File: ui-tui/packages/hermes-ink/src/ink/dom.ts
The commit added _textMeasureCache to ink-text DOM elements, keyed by ${width}|${widthMode}. While bounded to 16 entries per node (FIFO eviction), the underlying Yoga layout system is backed by C++ WASM state. When the Ink reconciler tears down a subtree via freeRecursive() / clearYogaNodeReferences(), it nulls JS references but may leave:
- WASM text measurement buffers
- Yoga layout node C++ instances
- Cache generation counter objects that hold references
Each streaming update triggers markDirty() on expanded sections (default since 67bfd4b8), causing Yoga to re-measure. With continuous thinking + tools streaming, this becomes a fast leak.
Amplifier: 67bfd4b8 - expanded sections by default
From Apr 24, thinking: expanded and tools: expanded dramatically increase the number of Yoga measure/re-layout cycles per frame compared to the previous collapsed-by-default UI.
Additional Context
Heap dump misfire
The memoryMonitor.ts triggers on JS heapUsed (high=1.5 GB, critical=2.5 GB). Because this leak is native, a process can climb to 13+ GB RSS while JS heap sits at 2.7 GB. The monitor dumps 2.5 GB .heapsnapshot files repeatedly with zero diagnostic value for this bug, and disk usage in ~/.hermes/heapdumps/ grows to 24+ GB.
Gateway unaffected, crash log confirms TUI death
~/.hermes/logs/tui_gateway_crash.log has Python bridge alive in sys.stdin loop at SIGTERM delivery. The Node parent dies first; the Python subprocess is orphaned.
Related PRs checked
Possible Fixes (for discussion)
- Investigate
clearYogaNodeReferences — ensure all WASM nodes are explicitly freed before nulling. Check if yoga-layout WASM bindings need explicit free() calls.
- Invalidate
_textMeasureCache before clearYogaNodeReferences — the cache is cleared in clearYogaNodeReferences via _textMeasureCache = undefined, but if the Map retains entries referenced by the WASM side, this doesn't help.
- Cap
_textMeasureCache.entries growth — already 16 entries, but keyed by ${width}|${widthMode}. If width probes are sparse, the cache may churn without actually hitting. Consider a global/shared cache with TTL.
- Monitor RSS in
memoryMonitor.ts — add rss alongside heapUsed to detect native leaks earlier.
Data Available
Full monitor log at ~/.hermes/logs/tui-rss-monitor.log:
- 30-second RSS samples across two crash cycles
- Format:
pid,time,rss_kb,rss_mb,vsz_kb,command
- Captures transition from PID 76262 (peak 13,978 MB) to new PID 81703
Checklist
TUI Native Memory Leak - RSS grows to 13+ GB after ~40 min active usage
Summary
Since the Apr 23–24 update (commits
bd929ea5and67bfd4b8), the TUI frontend (Node.js Ink renderer) leaks native memory at an alarming rate. The JS heap stays bounded at ~2.7 GB while RSS climbs to 13+ GB, causing the process to freeze and get killed via SIGTERM within ~1 hour of active streaming.The gateway backend remains healthy (~98 MB) throughout. Auto-heap-dump triggers are miscalibrated because they measure JS
heapUsed, which is much smaller than the actual RSS leak.Environment
v0.11.0/ commit34c3e671(Apr 24 hotfix)bf196a3f(v0.11.0 tag)hermes --tui)thinking: expanded,tools: expanded,activity: hidden(from67bfd4b8)Reproduction Steps
thinkingandtoolssections expanded (default since Apr 24)watch -n 5 'ps -o pid,rss,vsz,comm -p $(pgrep -f "ui-tui/dist/entry.js")'Expected Behavior
RSS should stay under ~1 GB for indefinite usage. Occasional bump during large streaming payloads, but stable between turns.
Actual Behavior
Two confirmed crash cycles (same day)
Diagnostic Evidence
Heap dump
.diagnostics.jsonat peak (auto-critical){ "memoryUsage": { "arrayBuffers": 768803, "external": 21238571, "heapTotal": 2798071808, "heapUsed": 2728422704, "rss": 9572970496 }, "memoryGrowthRate": { "mbPerHour": 17684.5 } }RSS (9.5 GB) >>
heapUsed(2.7 GB). The leak is entirely outside V8.Process comparison at peak
Suspected Root Cause
Primary suspect:
bd929ea5— Ink text measurement cacheFile:
ui-tui/packages/hermes-ink/src/ink/dom.tsThe commit added
_textMeasureCachetoink-textDOM elements, keyed by${width}|${widthMode}. While bounded to 16 entries per node (FIFO eviction), the underlying Yoga layout system is backed by C++ WASM state. When the Ink reconciler tears down a subtree viafreeRecursive()/clearYogaNodeReferences(), it nulls JS references but may leave:Each streaming update triggers
markDirty()on expanded sections (default since67bfd4b8), causing Yoga to re-measure. With continuous thinking + tools streaming, this becomes a fast leak.Amplifier:
67bfd4b8- expanded sections by defaultFrom Apr 24,
thinking: expandedandtools: expandeddramatically increase the number of Yoga measure/re-layout cycles per frame compared to the previous collapsed-by-default UI.Additional Context
Heap dump misfire
The
memoryMonitor.tstriggers on JSheapUsed(high=1.5 GB, critical=2.5 GB). Because this leak is native, a process can climb to 13+ GB RSS while JS heap sits at 2.7 GB. The monitor dumps 2.5 GB.heapsnapshotfiles repeatedly with zero diagnostic value for this bug, and disk usage in~/.hermes/heapdumps/grows to 24+ GB.Gateway unaffected, crash log confirms TUI death
~/.hermes/logs/tui_gateway_crash.loghas Python bridge alive insys.stdinloop at SIGTERM delivery. The Node parent dies first; the Python subprocess is orphaned.Related PRs checked
904f20d6, "idle queue OOM fix") is already present. It fixed a JS-heap idle-time loop — not this native leak. The two bugs have different signatures and triggers.Possible Fixes (for discussion)
clearYogaNodeReferences— ensure all WASM nodes are explicitly freed before nulling. Check if yoga-layout WASM bindings need explicitfree()calls._textMeasureCachebeforeclearYogaNodeReferences— the cache is cleared inclearYogaNodeReferencesvia_textMeasureCache = undefined, but if the Map retains entries referenced by the WASM side, this doesn't help._textMeasureCache.entriesgrowth — already 16 entries, but keyed by${width}|${widthMode}. If width probes are sparse, the cache may churn without actually hitting. Consider a global/shared cache with TTL.memoryMonitor.ts— addrssalongsideheapUsedto detect native leaks earlier.Data Available
Full monitor log at
~/.hermes/logs/tui-rss-monitor.log:pid,time,rss_kb,rss_mb,vsz_kb,commandChecklist
main(34c3e671)