Problem
Commit 970042de introduced a hard 10-minute timeout (HERMES_AGENT_TIMEOUT, default 600s) on gateway agent execution via asyncio.wait_for(). This was designed to prevent stuck sessions, but it also kills legitimate long-running tasks — particularly subagent/delegate work with reasoning-heavy models.
When a user sets the agent to work autonomously (e.g., multi-step research, code implementation via delegate_task, or reasoning models with long chain-of-thought), hitting the 10-minute wall results in:
- Agent is force-interrupted mid-work (
agent.interrupt())
- All intermediate progress is lost
- User gets a generic error: "Request timed out after 10 minutes. Try again, or use /reset to start fresh."
- The session transcript is marked as
failed: true, breaking conversation continuity
This significantly reduces the appeal of autonomous/unsupervised agent workflows, which is a core value proposition.
Environment
- OS: Oracle Linux 9 (aarch64)
- Python: 3.11.15
- Hermes Agent: v0.7.0 (v2026.4.3)
gateway/run.py line 6042-6070: asyncio.wait_for(loop.run_in_executor(None, run_sync), timeout=_agent_timeout)
cron/scheduler.py: Similar HERMES_CRON_TIMEOUT for cron jobs
- Timeout is env-var only (
HERMES_AGENT_TIMEOUT=600), not exposed in config.yaml or DEFAULT_CONFIG
Steps to Reproduce
- Start hermes gateway:
hermes gateway
- Send a task that requires subagent delegation with a reasoning model (e.g., via Telegram):
- "Analyze the Hermes codebase for all timeout-related code"
- This triggers
delegate_task, which spawns a subagent doing multiple tool calls
- Wait for the agent to work for 10+ minutes
- Observe: the agent is interrupted and the user receives:
⏱️ Request timed out after 10 minutes. The agent may have been stuck on a tool or API call.
Try again, or use /reset to start fresh.
Suggested Improvements
1. Activity-based timeout instead of wall-clock timeout
Instead of a fixed wall-clock limit, track the last "active" timestamp (updated on each tool_call completion / API response). Only trigger timeout if there's been no activity for N seconds. This distinguishes "working hard on a complex task" from "hung on a dead API call."
# Pseudocode: reset on each successful tool/API round-trip
self._last_activity = time.time()
# Timeout check: time.time() - self._last_activity > INACTIVITY_TIMEOUT
2. Expose timeout in config.yaml
HERMES_AGENT_TIMEOUT is env-var only. Users shouldn't need to edit .env for a behavioral setting. Add to DEFAULT_CONFIG:
agent:
gateway_timeout: 600 # seconds, 0 = unlimited
3. Timeout extension prompt
When approaching the timeout (e.g., at 80% elapsed), send a non-blocking notification to the user with an option to extend. On platforms that support it (Telegram inline keyboard), offer a "Continue" button that resets the timer.
4. Graceful degradation over hard kill
On timeout, instead of agent.interrupt() (which kills everything), consider:
- Saving the current conversation state for resumption
- Returning a partial result with what was completed so far
- Offering
/resume or auto-retry with a fresh context window
5. Subagent-aware timeout accounting
When the main agent delegates to a subagent (delegate_task), the subagent runs in a separate process/thread. The main agent is effectively "waiting" (idle). This idle-wait time should not count against the timeout, since the agent isn't stuck — it's waiting for a child to finish.
Workaround
Set a larger timeout via environment variable:
# In ~/.hermes/.env
HERMES_AGENT_TIMEOUT=1800 # 30 minutes
HERMES_CRON_TIMEOUT=1800 # also for cron jobs
Problem
Commit
970042deintroduced a hard 10-minute timeout (HERMES_AGENT_TIMEOUT, default 600s) on gateway agent execution viaasyncio.wait_for(). This was designed to prevent stuck sessions, but it also kills legitimate long-running tasks — particularly subagent/delegate work with reasoning-heavy models.When a user sets the agent to work autonomously (e.g., multi-step research, code implementation via
delegate_task, or reasoning models with long chain-of-thought), hitting the 10-minute wall results in:agent.interrupt())failed: true, breaking conversation continuityThis significantly reduces the appeal of autonomous/unsupervised agent workflows, which is a core value proposition.
Environment
gateway/run.pyline 6042-6070:asyncio.wait_for(loop.run_in_executor(None, run_sync), timeout=_agent_timeout)cron/scheduler.py: SimilarHERMES_CRON_TIMEOUTfor cron jobsHERMES_AGENT_TIMEOUT=600), not exposed inconfig.yamlorDEFAULT_CONFIGSteps to Reproduce
hermes gatewaydelegate_task, which spawns a subagent doing multiple tool callsSuggested Improvements
1. Activity-based timeout instead of wall-clock timeout
Instead of a fixed wall-clock limit, track the last "active" timestamp (updated on each tool_call completion / API response). Only trigger timeout if there's been no activity for N seconds. This distinguishes "working hard on a complex task" from "hung on a dead API call."
2. Expose timeout in config.yaml
HERMES_AGENT_TIMEOUTis env-var only. Users shouldn't need to edit.envfor a behavioral setting. Add toDEFAULT_CONFIG:3. Timeout extension prompt
When approaching the timeout (e.g., at 80% elapsed), send a non-blocking notification to the user with an option to extend. On platforms that support it (Telegram inline keyboard), offer a "Continue" button that resets the timer.
4. Graceful degradation over hard kill
On timeout, instead of
agent.interrupt()(which kills everything), consider:/resumeor auto-retry with a fresh context window5. Subagent-aware timeout accounting
When the main agent delegates to a subagent (
delegate_task), the subagent runs in a separate process/thread. The main agent is effectively "waiting" (idle). This idle-wait time should not count against the timeout, since the agent isn't stuck — it's waiting for a child to finish.Workaround
Set a larger timeout via environment variable: