Skip to content

Memory leak: Gateway accumulates ~8GB RAM in 1h with 2 active Discord sessions #18438

@n30j0su3

Description

@n30j0su3

Bug Report

Description

The Hermes Gateway accumulates approximately 8 GB of RAM within 1 hour when 2 Discord sessions are active simultaneously. This leads to OOM kills even on systems with 32 GB RAM.

Environment

  • Version: v2026.4.30-19-gbbbce9265
  • OS: Ubuntu 24.04, 32 GB RAM, 8 GB swap
  • Platform: Discord + Telegram + API server
  • MCPs: 3 (context7, sequential-thinking, qmd)

Reproduction

  1. Start gateway with Discord platform enabled
  2. Have 2 active Discord threads/sessions running agent loops
  3. Sessions do intensive work (30-60 api_calls per response, responses take 300-2400 seconds)
  4. Memory grows from ~700 MB base to 8 GB in ~1 hour
  5. Gateway killed by OOM

Evidence (4 crashes in 4 hours)

May 1 06:09 - OOM kill, memory peak: 27.5 GB + 4.6 GB swap (no limit)
May 1 07:17 - OOM kill, memory peak: 4.0 GB + 5.9 GB swap (MemoryMax=4G)
May 1 08:17 - OOM kill, memory peak: 8.0 GB + 0B swap (MemoryMax=8G, MemorySwapMax=0)
May 1 09:58 - OOM kill, memory peak: 8.0 GB + 0B swap (MemoryMax=8G, MemorySwapMax=0)

Gateway log shows sessions with high api_call counts:

  • response ready: chat=1494413622397501610 time=2484.2s api_calls=52
  • response ready: chat=1497022197854634156 time=477.4s api_calls=43

Session compression fires (compressed 419 -> 11 msgs, ~238265 -> ~2162 tokens) but doesn't reclaim RSS memory.

Observations

  • Chrome headless processes accumulate under gateway cgroup (5+ instances, ~500 MB total)
  • Idle-TTL eviction works for inactive sessions but active sessions keep growing
  • Growth is proportional to api_calls and session duration
  • Python GC doesn't seem to reclaim the memory

Workaround

Running without MemoryMax, relying on 32 GB RAM + swappiness=10 + Restart=always.

Possible Causes

  1. Conversation history growing in memory even after token compression
  2. Tool execution outputs accumulating without GC
  3. Chrome instances not cleaned between tool calls
  4. Python reference cycles in the agent loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/discordDiscord bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions