Summary
Hermes gateway on macOS hit OSError: [Errno 24] Too many open files and eventually became unable to process Telegram messages, cron jobs, .env loads, dynamic imports, and outbound LLM/API requests. Restarting the launch agent temporarily recovers the service, but the failure suggests a file-descriptor leak or repeated resource retention under normal runtime load.
Environment
- OS: macOS (Apple Silicon)
- Runtime: launchd LaunchAgent
- Hermes command:
<venv>/bin/python -m hermes_cli.main gateway run --replace
- Hermes home:
- Repo:
NousResearch/hermes-agent
Symptoms
After running for a while, Hermes starts failing broadly with [Errno 24] Too many open files, including:
- Telegram handling failures for inbound DM sessions
- Cron scheduler failures opening temp files and
.env
gh CLI helper/tool invocations failing with the same error
- OpenAI/httpx connection errors caused by FD exhaustion
- Python import machinery failing to scan the
agent/ package directory
Representative failing paths observed:
~/.hermes/.env
~/.hermes/cron/*.tmp
~/.hermes/.channel_directory_*.tmp
<hermes-agent>/agent
Representative stack traces
Gateway / import failure
OSError: [Errno 24] Too many open files: '<hermes-agent>/agent'
File ".../gateway/run.py", line 2920, in _handle_message_with_agent
File ".../gateway/run.py", line 7179, in _run_agent
File ".../gateway/run.py", line 6718, in run_sync
File ".../run_agent.py", line 757, in __init__
File "<frozen importlib._bootstrap_external>", line 1662, in _fill_cache
Cron / dotenv failure
OSError: [Errno 24] Too many open files: '~/.hermes/.env'
File ".../cron/scheduler.py", line 559, in run_job
File ".../site-packages/dotenv/main.py", line 63, in _get_stream
OpenAI/httpx failure under FD exhaustion
openai.APIConnectionError: Connection error.
httpx.ConnectError: [Errno 24] Too many open files
File ".../tools/session_search_tool.py", line 155, in _summarize_session
File ".../agent/auxiliary_client.py", line 2289, in async_call_llm
Additional observations
At the time of failure, the Hermes process had a high FD count and many repeated opens around SQLite-related files:
~/.hermes/state.db
~/.hermes/state.db-wal
~/.hermes/response_store.db
~/.hermes/response_store.db-wal
There were also socket entries such as:
127.0.0.1:<ephemeral> -> 127.0.0.1:7897 (CLOSE_WAIT)
This may indicate one or both of:
- Repeated DB handle creation without timely close/reuse
- Network/client/socket leakage (e.g. lingering CLOSE_WAIT connections)
Recovery
A full restart of the launch agent recovers Hermes immediately.
After restart, the new Hermes process came up healthy with a low FD count (~42 open files), which supports the theory that the process accumulates descriptors over time rather than starting high.
Why this matters
Once this state is reached, Hermes effectively degrades across multiple subsystems at once:
- messaging
- cron jobs
- session summarization
- tool execution
- import/loading logic
So the impact is broad, not isolated.
Request
Please help investigate potential file descriptor leaks in the gateway runtime, especially around:
agent/auxiliary_client.py
tools/session_search_tool.py
- cron dotenv loading
- repeated SQLite handle reuse (
response_store.db, state.db)
- lingering network connections /
CLOSE_WAIT
If useful, I can provide more logs or test a diagnostic patch.
Summary
Hermes gateway on macOS hit
OSError: [Errno 24] Too many open filesand eventually became unable to process Telegram messages, cron jobs,.envloads, dynamic imports, and outbound LLM/API requests. Restarting the launch agent temporarily recovers the service, but the failure suggests a file-descriptor leak or repeated resource retention under normal runtime load.Environment
<venv>/bin/python -m hermes_cli.main gateway run --replace~/.hermesNousResearch/hermes-agentSymptoms
After running for a while, Hermes starts failing broadly with
[Errno 24] Too many open files, including:.envghCLI helper/tool invocations failing with the same erroragent/package directoryRepresentative failing paths observed:
~/.hermes/.env~/.hermes/cron/*.tmp~/.hermes/.channel_directory_*.tmp<hermes-agent>/agentRepresentative stack traces
Gateway / import failure
Cron / dotenv failure
OpenAI/httpx failure under FD exhaustion
Additional observations
At the time of failure, the Hermes process had a high FD count and many repeated opens around SQLite-related files:
~/.hermes/state.db~/.hermes/state.db-wal~/.hermes/response_store.db~/.hermes/response_store.db-walThere were also socket entries such as:
127.0.0.1:<ephemeral> -> 127.0.0.1:7897 (CLOSE_WAIT)This may indicate one or both of:
Recovery
A full restart of the launch agent recovers Hermes immediately.
After restart, the new Hermes process came up healthy with a low FD count (~42 open files), which supports the theory that the process accumulates descriptors over time rather than starting high.
Why this matters
Once this state is reached, Hermes effectively degrades across multiple subsystems at once:
So the impact is broad, not isolated.
Request
Please help investigate potential file descriptor leaks in the gateway runtime, especially around:
agent/auxiliary_client.pytools/session_search_tool.pyresponse_store.db,state.db)CLOSE_WAITIf useful, I can provide more logs or test a diagnostic patch.