Skip to content

macOS gateway eventually hits [Errno 24] Too many open files and needs restart (redacted) #14210

@quqi1599

Description

@quqi1599

Summary

Hermes gateway on macOS hit OSError: [Errno 24] Too many open files and eventually became unable to process Telegram messages, cron jobs, .env loads, dynamic imports, and outbound LLM/API requests. Restarting the launch agent temporarily recovers the service, but the failure suggests a file-descriptor leak or repeated resource retention under normal runtime load.

Environment

  • OS: macOS (Apple Silicon)
  • Runtime: launchd LaunchAgent
  • Hermes command:
    • <venv>/bin/python -m hermes_cli.main gateway run --replace
  • Hermes home:
    • ~/.hermes
  • Repo:
    • NousResearch/hermes-agent

Symptoms

After running for a while, Hermes starts failing broadly with [Errno 24] Too many open files, including:

  • Telegram handling failures for inbound DM sessions
  • Cron scheduler failures opening temp files and .env
  • gh CLI helper/tool invocations failing with the same error
  • OpenAI/httpx connection errors caused by FD exhaustion
  • Python import machinery failing to scan the agent/ package directory

Representative failing paths observed:

  • ~/.hermes/.env
  • ~/.hermes/cron/*.tmp
  • ~/.hermes/.channel_directory_*.tmp
  • <hermes-agent>/agent

Representative stack traces

Gateway / import failure

OSError: [Errno 24] Too many open files: '<hermes-agent>/agent'
  File ".../gateway/run.py", line 2920, in _handle_message_with_agent
  File ".../gateway/run.py", line 7179, in _run_agent
  File ".../gateway/run.py", line 6718, in run_sync
  File ".../run_agent.py", line 757, in __init__
  File "<frozen importlib._bootstrap_external>", line 1662, in _fill_cache

Cron / dotenv failure

OSError: [Errno 24] Too many open files: '~/.hermes/.env'
  File ".../cron/scheduler.py", line 559, in run_job
  File ".../site-packages/dotenv/main.py", line 63, in _get_stream

OpenAI/httpx failure under FD exhaustion

openai.APIConnectionError: Connection error.
httpx.ConnectError: [Errno 24] Too many open files
  File ".../tools/session_search_tool.py", line 155, in _summarize_session
  File ".../agent/auxiliary_client.py", line 2289, in async_call_llm

Additional observations

At the time of failure, the Hermes process had a high FD count and many repeated opens around SQLite-related files:

  • ~/.hermes/state.db
  • ~/.hermes/state.db-wal
  • ~/.hermes/response_store.db
  • ~/.hermes/response_store.db-wal

There were also socket entries such as:

  • 127.0.0.1:<ephemeral> -> 127.0.0.1:7897 (CLOSE_WAIT)

This may indicate one or both of:

  1. Repeated DB handle creation without timely close/reuse
  2. Network/client/socket leakage (e.g. lingering CLOSE_WAIT connections)

Recovery

A full restart of the launch agent recovers Hermes immediately.
After restart, the new Hermes process came up healthy with a low FD count (~42 open files), which supports the theory that the process accumulates descriptors over time rather than starting high.

Why this matters

Once this state is reached, Hermes effectively degrades across multiple subsystems at once:

  • messaging
  • cron jobs
  • session summarization
  • tool execution
  • import/loading logic

So the impact is broad, not isolated.

Request

Please help investigate potential file descriptor leaks in the gateway runtime, especially around:

  • agent/auxiliary_client.py
  • tools/session_search_tool.py
  • cron dotenv loading
  • repeated SQLite handle reuse (response_store.db, state.db)
  • lingering network connections / CLOSE_WAIT

If useful, I can provide more logs or test a diagnostic patch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/cronCron scheduler and job managementcomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions