fix(run_agent): disable stale timeout for local providers (#5889) by Archerouyang · Pull Request #6123 · NousResearch/hermes-agent

Archerouyang · 2026-04-08T11:07:58Z

Fix #5889: Local Provider Timeout Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Fix 180s timeout for local providers (oMLX, Ollama, etc.) so long-running local inference isn't killed prematurely

Architecture: Detect local providers by URL pattern (localhost/127.0.0.1) and disable/extend stale stream timeout for them. This is a minimal, non-breaking change that respects the existing timeout mechanism for cloud providers while allowing local models to run uninterrupted.

Tech Stack: Python, existing Hermes agent streaming infrastructure

Problem Analysis

Current behavior in run_agent.py:4705:

HERMES_STREAM_STALE_TIMEOUT defaults to 180s
Dynamic scaling only based on token count (50k/100k thresholds)
No distinction between cloud API and local inference

For local providers (oMLX, Ollama, llama-cpp):

Prefill can legitimately take 300s+ for large contexts
180s timeout causes false-positive "stale stream" detection
Results in abandoned requests and wasted compute

Solution Design

Detect local providers and adjust stale timeout:

Local provider detection: Check if base_url contains localhost, 127.0.0.1, or is empty (default local)
Timeout adjustment: Set _stream_stale_timeout = float('inf') or very large value for local providers
Configurability: Respect HERMES_STREAM_STALE_TIMEOUT if explicitly set

Task 1: Create Helper Function for Local Provider Detection

Files:

Modify: run_agent.py (find _stream_stale_timeout calculation section)
Step 1: Add local provider detection function

def _is_local_provider(self) -> bool:
    """Detect if provider is local (oMLX, Ollama, etc.) vs cloud API.
    
    Local providers may have long prefill times that shouldn't trigger
    stale stream detection.
    """
    base_url = str(self.base_url or "").lower()
    # Local providers typically use localhost/127.0.0.1 or no URL
    local_patterns = [
        "localhost",
        "127.0.0.1",
        "0.0.0.0",
        "/tmp/",  # Unix sockets
        "ollama",  # Common local setups
    ]
    return any(p in base_url for p in local_patterns) or not base_url

Step 2: Commit the helper function

git add run_agent.py
git commit -m "feat(run_agent): add _is_local_provider() helper function

Add method to detect local inference providers (oMLX, Ollama, etc.)
for special timeout handling."

Task 2: Modify Stale Timeout Logic for Local Providers

Files:

Modify: run_agent.py:4705-4718 (stale timeout calculation)
Step 3: Add local provider timeout override

Find this section (around line 4705):

_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large.  Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").

Replace with:

_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large.  Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").

# Local providers (oMLX, Ollama, etc.) may take much longer for prefill
# without being "stale". Disable timeout for local providers unless
# explicitly configured via HERMES_STREAM_STALE_TIMEOUT.

…ct CAMOFOX_PROFILE_DIR docs - Add missing import for get_hermes_home in hindsight plugin - Remove incorrect CAMOFOX_PROFILE_DIR documentation (not a real Camofox env var) Fixes NousResearch#6098, NousResearch#6087

Remove skill file uploads from Daytona and Modal environments. Skills are loaded on the host side via skill_view(), build_skills_system_prompt(), and _load_skill_payload() - the synced files were never read inside sandboxes. Impact: - Daytona: saves ~275 seconds per session start (445 files × 2 SDK calls) - Modal: reduces sandbox creation overhead significantly Fixes NousResearch#6035

Replace Python 3.10+ union syntax (X | Y) with Optional[X] for core module that may be imported in various environments.

…ch#5889) Local providers like oMLX and Ollama may have legitimately long prefill times (300s+ for large contexts). Disable the 180s stale stream timeout for detected local providers. - Add _is_local_provider() to detect localhost/127.0.0.1/ollama URLs - Skip stale detection when timeout is infinity - Respect HERMES_STREAM_STALE_TIMEOUT if explicitly set Fixes NousResearch#5889

teknium1 · 2026-04-09T02:53:47Z

Closed in favor of PR #6368, which fixes the same issue (#5889) using the existing is_local_endpoint() from agent/model_metadata.py — proper URL parsing with RFC-1918/localhost/WSL detection, no false positives from substring matching. Thanks for identifying the problem, @Archerouyang!

欧阳 added 4 commits April 8, 2026 17:40

fix(hermes_constants): make type annotations Python 3.9+ compatible

e2a449e

Replace Python 3.10+ union syntax (X | Y) with Optional[X] for core module that may be imported in various environments.

teknium1 mentioned this pull request Apr 9, 2026

fix(agent): disable stale stream timeout for local providers #6368

Merged

teknium1 closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(run_agent): disable stale timeout for local providers (#5889)#6123

fix(run_agent): disable stale timeout for local providers (#5889)#6123
Archerouyang wants to merge 4 commits into
NousResearch:mainfrom
Archerouyang:fix/5889-local-provider-timeout

Archerouyang commented Apr 8, 2026

Uh oh!

teknium1 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Archerouyang commented Apr 8, 2026

Fix #5889: Local Provider Timeout Implementation Plan

Problem Analysis

Solution Design

Task 1: Create Helper Function for Local Provider Detection

Task 2: Modify Stale Timeout Logic for Local Providers

Uh oh!

teknium1 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants