Port from cline/cline#10343: periodic gateway memory logging#17667
Closed
teknium1 wants to merge 1 commit into
Closed
Port from cline/cline#10343: periodic gateway memory logging#17667teknium1 wants to merge 1 commit into
teknium1 wants to merge 1 commit into
Conversation
Emit a grep-friendly '[MEMORY] rss=...MB ...' line in agent.log / gateway.log every N minutes (default 5) so slow leaks in the long-lived gateway process show up as a time series. Based on cline/cline#10343 (src/standalone/memory-monitor.ts). - gateway/memory_monitor.py: new module. Daemon thread, baseline on start, final snapshot on stop. Uses resource.getrusage() (stdlib) first, falls back to psutil, disables itself with one WARNING if neither is available. - gateway/run.py: start monitor right after setup_logging() in start_gateway(); stop it in the shutdown block next to MCP teardown. - hermes_cli/config.py: logging.memory_monitor { enabled, interval_seconds } defaults under the existing logging section. - tests/gateway/test_memory_monitor.py: 10 unit tests covering format, baseline/shutdown snapshots, double-start noop, periodic timer, daemon thread invariant, and unavailable-RSS warn-and-skip path. Adapted from TypeScript/Node to Python (threading.Event-based daemon thread instead of setInterval/unref), added Python-specific gc + thread counts to the log line (handier than ext/arrayBuffers for diagnosing Python gateway leaks), and gated behind a config.yaml toggle so users can silence the periodic line if they want. No heap-snapshot-on-OOM equivalent — CPython doesn't have V8's --heapsnapshot-near-heap-limit; tracemalloc would be the Python equivalent but adds non-trivial overhead, so leaving that out.
Contributor
Author
|
Closing in favor of #27102, which salvages this PR onto current main. Clean cherry-pick — all new files. 10 new tests pass, 5506 gateway regression tests pass, E2E smoke run confirms baseline + periodic + shutdown |
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Gateway logs
[MEMORY] rss=...MB gc=... threads=... uptime=...stoagent.log/gateway.logevery 5 minutes so slow leaks in the long-lived process show up as a time series.Ported from cline/cline#10343 (
src/standalone/memory-monitor.ts). Their cline-core Node process and our gateway are the same shape of problem — a long-running autonomous-agent backend where a leak in any of caching / sessions / MCP / memory provider is invisible until you watch RSS climb for hours.Changes
gateway/memory_monitor.py(new): daemon thread that logs a baseline on start, periodic snapshots atinterval_seconds, and a final[MEMORY] shutdown ...line on stop. Usesresource.getrusage()(stdlib, Linux/macOS) first, falls back topsutil(already an optional dep viamcp_tool.py), disables itself with one WARNING if neither works.gateway/run.py(~12010, ~12200): start right aftersetup_logging(), stop next toshutdown_mcp_servers(). Gated onlogging.memory_monitor.enabled(default true) and wrapped in best-effort try/except so a monitor failure can never break gateway startup.hermes_cli/config.py: newlogging.memory_monitorblock —enabled: true,interval_seconds: 300.tests/gateway/test_memory_monitor.py: 10 targeted unit tests.Adaptation notes (vs. the upstream TS port)
setInterval+.unref()→ Pythonthreading.Thread(daemon=True)driven by athreading.Event.wait()so shutdown is immediate instead of waiting for the next tick.gc=(gen0,gen1,gen2)andthreads=Ninstead of V8'sexternal/arrayBuffers— more useful for Python leaks (thread leaks + GC pressure are the common gateway failure modes).--heapsnapshot-near-heap-limitequivalent. CPython's closest analogue istracemalloc, which has non-trivial steady-state overhead; deferring that to a separate PR if someone asks for it.logging.memory_monitor.enabled: falsesilences the line entirely, matching other diagnostic toggles underlogging:.Validation
Sample log output (interval 0.3s for the smoke run; rounds to 0s in the "started" line which is cosmetic at sub-second intervals, not an issue at the 300s default):
Grep-friendly:
grep '\[MEMORY\] rss=' ~/.hermes/logs/gateway.log | awk '{print $1,$2,$4}'gives a quick "RSS over time" view.Context
Hermes has a
memory-leak-auditskill and the gateway is a known long-running process that caches agent instances, session transcripts, MCP connections, tool schemas, and memory providers. This adds the basic instrumentation a leak audit starts from — without it, every audit has to recommend the user add temporarypslogging first.