Port from cline/cline#10343: periodic gateway memory logging by teknium1 · Pull Request #17667 · NousResearch/hermes-agent

teknium1 · 2026-04-30T00:08:25Z

Summary

Gateway logs [MEMORY] rss=...MB gc=... threads=... uptime=...s to agent.log / gateway.log every 5 minutes so slow leaks in the long-lived process show up as a time series.

Ported from cline/cline#10343 (src/standalone/memory-monitor.ts). Their cline-core Node process and our gateway are the same shape of problem — a long-running autonomous-agent backend where a leak in any of caching / sessions / MCP / memory provider is invisible until you watch RSS climb for hours.

Changes

gateway/memory_monitor.py (new): daemon thread that logs a baseline on start, periodic snapshots at interval_seconds, and a final [MEMORY] shutdown ... line on stop. Uses resource.getrusage() (stdlib, Linux/macOS) first, falls back to psutil (already an optional dep via mcp_tool.py), disables itself with one WARNING if neither works.
gateway/run.py (~12010, ~12200): start right after setup_logging(), stop next to shutdown_mcp_servers(). Gated on logging.memory_monitor.enabled (default true) and wrapped in best-effort try/except so a monitor failure can never break gateway startup.
hermes_cli/config.py: new logging.memory_monitor block — enabled: true, interval_seconds: 300.
tests/gateway/test_memory_monitor.py: 10 targeted unit tests.

Adaptation notes (vs. the upstream TS port)

Node setInterval + .unref() → Python threading.Thread(daemon=True) driven by a threading.Event.wait() so shutdown is immediate instead of waiting for the next tick.
Log line includes gc=(gen0,gen1,gen2) and threads=N instead of V8's external/arrayBuffers — more useful for Python leaks (thread leaks + GC pressure are the common gateway failure modes).
No Node --heapsnapshot-near-heap-limit equivalent. CPython's closest analogue is tracemalloc, which has non-trivial steady-state overhead; deferring that to a separate PR if someone asks for it.
Config-gated: logging.memory_monitor.enabled: false silences the line entirely, matching other diagnostic toggles under logging:.

Validation

$ bash scripts/run_tests.sh tests/gateway/test_memory_monitor.py -v
============================== 10 passed in 1.27s ==============================

Sample log output (interval 0.3s for the smoke run; rounds to 0s in the "started" line which is cosmetic at sub-second intervals, not an issue at the 300s default):

[MEMORY] baseline rss=28MB gc=(549, 2, 3) threads=1 uptime=0s
[MEMORY] Periodic memory monitoring started (interval: 300s)
[MEMORY] rss=28MB gc=(590, 2, 3) threads=2 uptime=300s
[MEMORY] rss=29MB gc=(591, 2, 3) threads=2 uptime=600s
[MEMORY] shutdown rss=29MB gc=(594, 2, 3) threads=2 uptime=903s
[MEMORY] Periodic memory monitoring stopped

Grep-friendly: grep '\[MEMORY\] rss=' ~/.hermes/logs/gateway.log | awk '{print $1,$2,$4}' gives a quick "RSS over time" view.

Context

Hermes has a memory-leak-audit skill and the gateway is a known long-running process that caches agent instances, session transcripts, MCP connections, tool schemas, and memory providers. This adds the basic instrumentation a leak audit starts from — without it, every audit has to recommend the user add temporary ps logging first.

Emit a grep-friendly '[MEMORY] rss=...MB ...' line in agent.log / gateway.log every N minutes (default 5) so slow leaks in the long-lived gateway process show up as a time series. Based on cline/cline#10343 (src/standalone/memory-monitor.ts). - gateway/memory_monitor.py: new module. Daemon thread, baseline on start, final snapshot on stop. Uses resource.getrusage() (stdlib) first, falls back to psutil, disables itself with one WARNING if neither is available. - gateway/run.py: start monitor right after setup_logging() in start_gateway(); stop it in the shutdown block next to MCP teardown. - hermes_cli/config.py: logging.memory_monitor { enabled, interval_seconds } defaults under the existing logging section. - tests/gateway/test_memory_monitor.py: 10 unit tests covering format, baseline/shutdown snapshots, double-start noop, periodic timer, daemon thread invariant, and unavailable-RSS warn-and-skip path. Adapted from TypeScript/Node to Python (threading.Event-based daemon thread instead of setInterval/unref), added Python-specific gc + thread counts to the log line (handier than ext/arrayBuffers for diagnosing Python gateway leaks), and gated behind a config.yaml toggle so users can silence the periodic line if they want. No heap-snapshot-on-OOM equivalent — CPython doesn't have V8's --heapsnapshot-near-heap-limit; tracemalloc would be the Python equivalent but adds non-trivial overhead, so leaving that out.

teknium1 · 2026-05-16T19:55:30Z

Closing in favor of #27102, which salvages this PR onto current main. Clean cherry-pick — all new files. 10 new tests pass, 5506 gateway regression tests pass, E2E smoke run confirms baseline + periodic + shutdown [MEMORY] lines all emit with RSS, GC, threads, and uptime populated.

alt-glitch added type/feature New feature or request comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles P3 Low — cosmetic, nice to have labels Apr 30, 2026

teknium1 mentioned this pull request May 16, 2026

feat(gateway): periodic memory logging for leak detection (salvage of #17667) #27102

Merged

teknium1 closed this May 16, 2026

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port from cline/cline#10343: periodic gateway memory logging#17667

Port from cline/cline#10343: periodic gateway memory logging#17667
teknium1 wants to merge 1 commit into
mainfrom
cline-port/gateway-memory-monitor

teknium1 commented Apr 30, 2026

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Apr 30, 2026

Summary

Changes

Adaptation notes (vs. the upstream TS port)

Validation

Context

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants