Bug Report
Description
The Hermes Gateway accumulates approximately 8 GB of RAM within 1 hour when 2 Discord sessions are active simultaneously. This leads to OOM kills even on systems with 32 GB RAM.
Environment
- Version: v2026.4.30-19-gbbbce9265
- OS: Ubuntu 24.04, 32 GB RAM, 8 GB swap
- Platform: Discord + Telegram + API server
- MCPs: 3 (context7, sequential-thinking, qmd)
Reproduction
- Start gateway with Discord platform enabled
- Have 2 active Discord threads/sessions running agent loops
- Sessions do intensive work (30-60 api_calls per response, responses take 300-2400 seconds)
- Memory grows from ~700 MB base to 8 GB in ~1 hour
- Gateway killed by OOM
Evidence (4 crashes in 4 hours)
May 1 06:09 - OOM kill, memory peak: 27.5 GB + 4.6 GB swap (no limit)
May 1 07:17 - OOM kill, memory peak: 4.0 GB + 5.9 GB swap (MemoryMax=4G)
May 1 08:17 - OOM kill, memory peak: 8.0 GB + 0B swap (MemoryMax=8G, MemorySwapMax=0)
May 1 09:58 - OOM kill, memory peak: 8.0 GB + 0B swap (MemoryMax=8G, MemorySwapMax=0)
Gateway log shows sessions with high api_call counts:
response ready: chat=1494413622397501610 time=2484.2s api_calls=52
response ready: chat=1497022197854634156 time=477.4s api_calls=43
Session compression fires (compressed 419 -> 11 msgs, ~238265 -> ~2162 tokens) but doesn't reclaim RSS memory.
Observations
- Chrome headless processes accumulate under gateway cgroup (5+ instances, ~500 MB total)
- Idle-TTL eviction works for inactive sessions but active sessions keep growing
- Growth is proportional to api_calls and session duration
- Python GC doesn't seem to reclaim the memory
Workaround
Running without MemoryMax, relying on 32 GB RAM + swappiness=10 + Restart=always.
Possible Causes
- Conversation history growing in memory even after token compression
- Tool execution outputs accumulating without GC
- Chrome instances not cleaned between tool calls
- Python reference cycles in the agent loop
Bug Report
Description
The Hermes Gateway accumulates approximately 8 GB of RAM within 1 hour when 2 Discord sessions are active simultaneously. This leads to OOM kills even on systems with 32 GB RAM.
Environment
Reproduction
Evidence (4 crashes in 4 hours)
Gateway log shows sessions with high api_call counts:
response ready: chat=1494413622397501610 time=2484.2s api_calls=52response ready: chat=1497022197854634156 time=477.4s api_calls=43Session compression fires (
compressed 419 -> 11 msgs, ~238265 -> ~2162 tokens) but doesn't reclaim RSS memory.Observations
Workaround
Running without MemoryMax, relying on 32 GB RAM + swappiness=10 + Restart=always.
Possible Causes