Bug Description
Describe the bug
When using Hermes Agent with a local LLM backend(especiallly heavy ones), the agent falls into an infinite loop of timeouts and immediate retries if the local model's prompt processing (prefill) time exceeds the default stream timeout (e.g., 180s).
The agent aborts the request ([stream_generate] Aborting request) before the first token is generated, causing the backend to interrupt the prefill (prefill interrupted). The agent then immediately retries the same heavy request, leading to a permanent deadlock where the model never gets enough time to finish the prefill phase.
Additional context
Increasing the stream timeout in the configuration serves as a temporary workaround, but the core issue lies in the retry logic aggressively looping without allowing the local backend sufficient time to complete prompt ingestion.
Steps to Reproduce
- Set up Hermes Agent with an MLX-based local LLM backend on an Apple Silicon machine (e.g., Mac M3 Ultra with 512GB Unified RAM).
- Load a massive local model, such as
baa-ai/GLM-5.1-RAM-420GB-MLX.
- Send a complex prompt to the agent. Due to the sheer size of the model (420GB), the initial prompt processing (prefill) naturally takes longer than 180 seconds.
- Observe the logs: The agent times out after 180s, the backend throws a
prefill interrupted error, and the agent immediately retries, starting the infinite loop.
Expected Behavior
- The agent should ideally have a separate, longer timeout configuration for the "Time To First Token" (TTFT) or prefill phase, distinct from the inter-token stream timeout.
- When a timeout occurs, the retry mechanism should implement an exponential backoff or limit the maximum number of retries, rather than immediately and endlessly spamming the backend with the same heavy context.
Actual Behavior
The agent forcibly aborts the connection after the default 180-second timeout ([stream_generate] Aborting request). This causes the local backend to halt its prompt evaluation (prefill interrupted). Immediately after the timeout, the agent retries the exact same heavy request without any delay or backoff. This results in an infinite retry loop, completely preventing the model from ever finishing the prefill phase and returning a response.
Affected Component
Other
Messaging Platform (if gateway-related)
Telegram
Operating System
MacOS Tahoe
Python Version
3.11
Hermes Version
0.8.0
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
Describe the bug
When using Hermes Agent with a local LLM backend(especiallly heavy ones), the agent falls into an infinite loop of timeouts and immediate retries if the local model's prompt processing (prefill) time exceeds the default stream timeout (e.g., 180s).
The agent aborts the request (
[stream_generate] Aborting request) before the first token is generated, causing the backend to interrupt the prefill (prefill interrupted). The agent then immediately retries the same heavy request, leading to a permanent deadlock where the model never gets enough time to finish the prefill phase.Additional context
Increasing the stream timeout in the configuration serves as a temporary workaround, but the core issue lies in the retry logic aggressively looping without allowing the local backend sufficient time to complete prompt ingestion.
Steps to Reproduce
baa-ai/GLM-5.1-RAM-420GB-MLX.prefill interruptederror, and the agent immediately retries, starting the infinite loop.Expected Behavior
Actual Behavior
The agent forcibly aborts the connection after the default 180-second timeout (
[stream_generate] Aborting request). This causes the local backend to halt its prompt evaluation (prefill interrupted). Immediately after the timeout, the agent retries the exact same heavy request without any delay or backoff. This results in an infinite retry loop, completely preventing the model from ever finishing the prefill phase and returning a response.Affected Component
Other
Messaging Platform (if gateway-related)
Telegram
Operating System
MacOS Tahoe
Python Version
3.11
Hermes Version
0.8.0
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?