fix: replace wall-clock agent timeout with inactivity-based timeout#4864
Closed
Mibayy wants to merge 1 commit into
Closed
fix: replace wall-clock agent timeout with inactivity-based timeout#4864Mibayy wants to merge 1 commit into
Mibayy wants to merge 1 commit into
Conversation
…ousResearch#4815) The hard 10-minute wall-clock timeout killed legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Replace with inactivity-based timeout: the timer resets on every tool call, step completion, and streaming token. Only fires when the agent has been completely idle (no callbacks) for the configured duration, which correctly detects hung API calls without killing active work. Changes: - Poll loop (5s interval) checks last_activity timestamp vs timeout - Activity tracked via tool_progress, step, and stream_delta callbacks - Stream delta tracking throttled to every 2s to avoid per-token overhead - Support 0 = unlimited timeout - Expose via config.yaml: agent.gateway_timeout (bridges to env var) - HERMES_AGENT_TIMEOUT env var still works, config.yaml doesn't override it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
teknium1
added a commit
that referenced
this pull request
Apr 6, 2026
…timeout The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR #4864 (Mibayy) and issue #4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
teknium1
added a commit
that referenced
this pull request
Apr 6, 2026
…timeout (#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR #4864 (Mibayy) and issue #4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
Contributor
|
Merged via #5389. Your design — inactivity-based timeout instead of wall-clock, and the |
jinzheming
pushed a commit
to jinzheming/hermes-agent
that referenced
this pull request
Apr 6, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
saxster
pushed a commit
to saxster/hermes-agent
that referenced
this pull request
Apr 8, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
Tommyeds
pushed a commit
to Tommyeds/hermes-agent
that referenced
this pull request
Apr 12, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
7 tasks
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…timeout (NousResearch#5389) The gateway previously used a hard wall-clock asyncio.wait_for timeout that killed agents after a fixed duration regardless of activity. This punished legitimate long-running tasks (subagent delegation, reasoning models, multi-step research). Now uses an inactivity-based polling loop that checks the agent's built-in activity tracker (get_activity_summary) every 5 seconds. The agent can run indefinitely as long as it's actively calling tools or receiving API responses. Only fires when the agent has been completely idle for the configured duration. Changes: - Replace asyncio.wait_for with asyncio.wait poll loop checking agent idle time via get_activity_summary() - Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited) - Update stale session eviction to use agent idle time instead of pure wall-clock (prevents evicting active long-running tasks) - Preserve all existing diagnostic logging and user-facing context Inspired by PR NousResearch#4864 (Mibayy) and issue NousResearch#4815 (BongSuCHOI). Reimplemented on current main using existing _touch_activity() infrastructure rather than a parallel tracker.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #4815 — The hard 10-minute
HERMES_AGENT_TIMEOUTwall-clock timeout kills legitimate long-running tasks (subagent delegation with reasoning models, multi-step research, autonomous code implementation).This PR replaces it with an inactivity-based timeout: the timer resets on every tool call, step completion, and streaming API response. It only fires when the agent has been completely idle for the configured duration — correctly detecting hung API calls without killing active work.
How it works
Before:
asyncio.wait_for(executor, timeout=600)— fixed wall-clock, kills at 10 min regardless of activity.After: Poll loop (5s interval) checks
time.time() - last_activityagainst the timeout threshold:tool_progress_callback→ resets activity timer (tool execution)step_callback→ resets activity timer (iteration completion)stream_delta_callback→ resets activity timer (API streaming tokens, throttled to every 2s)A 30-minute subagent task with continuous tool calls will never timeout. A stuck API call with no activity for 10 minutes will.
Config
Two ways to configure (existing env var still works):
# .env (takes precedence over config.yaml) HERMES_AGENT_TIMEOUT=1800Changes
gateway/run.py:agent.gateway_timeout→HERMES_AGENT_TIMEOUTenv var_last_activityupdated by 3 callbacksasyncio.wait_for()with inactivity polling loop0for unlimited timeoutTest plan
HERMES_AGENT_TIMEOUT=0allows unlimited executionagent.gateway_timeoutin config.yaml is respectedHERMES_AGENT_TIMEOUTtakes precedence over config.yaml🤖 Generated with Claude Code